From gmann at ghanshyammann.com Wed Mar 1 02:02:36 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 28 Feb 2023 18:02:36 -0800 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> Message-ID: <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> ---- On Thu, 23 Feb 2023 01:33:12 -0800 Rajat Dhasmana wrote --- > Hi, > It looks like there is a confusion between 3 things1) Multiattach volume type2) multiattach flag on the volume3) The policy volume:multiattach > I will try to briefly describe all of the 3 so there is clarity on the issue.1) Multiattach volume type: This is a volume type created with an extra spec multiattach=" True". This allows multiattach volumes to be created by using this type.Previously we used to allow a parameter --allow-multiattach while creating the volume. This was deprecated in Queens and removed in Train in favor of the volume type way of creating the multiattach volume[1].2) Multiattach flag of a volume: This is a parameter of volume that specifies if a volume is multiattach or not.3) volume:multiattach policy: The policy verifies if the user creating a multiattach volume is member or admin (and not reader). > Coming to the issue, I verified that what you're observing is correct. We removed the support for providing the "multiattach" flag from cinderclient and openstackclient but there still exists code on the API side that allows you to provide "multiattach": "True" in the JSON body of a curl command to create a multiattach volume.I will work on fixing the issue on the API side. I think removing from client is good way to stop exposing this old/not-recommended way to users but API is separate things and removing the API request parameter 'multiattach' from it can break the existing users using it this way. Tempest test is one good example of such users use case. To maintain the backward compatibility/interoperability it should be removed by bumping the microversion so that it continue working for older microversions. This way we will not break the existing users and will provide the new way for users to start using. Similar way in Nova we have lot of deprecated API and we need to keep them for older microversions. -gmann In the meantime, can you report an issue on launchpad for the same? > https://bugs.launchpad.net/cinder/+filebug > > Snippet of curl command$ curl -g -i -X POST http://127.0.0.1/volume/v3/a5df9e29f521464f9158ff7a30b7e51f/volumes -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABj9zDtZO1mTld-BC-Yd8FRHDunc4-Xyg1jsgLembA-Ke7cr8aA4kCHHYYB4EPvhq1xL02FBYuXahhYBl_nKWjVbOTpd7R3kS4Libf-Kd9ackaYpWq4Mq4g7-2ORi7FcVg2IOdj3wUkDWegu9lI5PI-brNsAGUh8R1fW_y5bpDYWtfEFdw" -d '{"volume": {"size": 1, "consistencygroup_id": null, "snapshot_id": null, "name": null, "description": null, "volume_type": null, "availability_zone": null, "metadata": {}, "imageRef": null, "source_volid": null, "backup_id": null, "multiattach": "True"}}' > HTTP/1.1 202 Accepted > Date: Thu, 23 Feb 2023 09:25:23 GMT > Server: Apache/2.4.41 (Ubuntu) > Content-Type: application/json > x-compute-request-id: req-131b4a2d-f9d4-4d9d-b99c-c52012056dec > Content-Length: 798 > OpenStack-API-Version: volume 3.0 > Vary: OpenStack-API-Version > x-openstack-request-id: req-131b4a2d-f9d4-4d9d-b99c-c52012056dec > Connection: close > > [1]?https://github.com/openstack/python-cinderclient/commit/3c1b417959689c85a2f54505057ca995fedca075 > ThanksRajat Dhasmana > On Thu, Feb 23, 2023 at 3:08 AM Albert Braden ozzzo at yahoo.com> wrote: > We didn't create a multi-attach volume type, and when we try to create a multi-attach volume via CLI we aren't able to. It appears that our customer was able to circumvent the restriction by using the API via TF. Is this a bug? > On Wednesday, February 22, 2023, 02:32:57 PM EST, Danny Webb danny.webb at thehutgroup.com> wrote: > > Creating a volume is not the same as creating a volume type.? A tenant can consume a volume type that allows multi-attach with no issue as you see in that policy.?? > > From: Albert Braden ozzzo at yahoo.com> > Sent: 22 February 2023 17:12 > To: Openstack-discuss openstack-discuss at lists.openstack.org> > Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API?CAUTION: This email originates from outside THG > > According to this document [1] multiattach volumes can only be setup if explicitly allowed by creating a ?multiattach? volume type. > > ?Starting from the Queens release the ability to attach a volume to multiple hosts/servers requires that the volume is of a special type that includes an extra-spec capability setting of multiattach= True? Creating a new volume type is an admin-only operation by default. > > One of our customers appears to have used TerraForm to create a volume with the multiattach flag set and it worked, and that volume has multiple attachments. When I look here [2] it appears that the default is: > > #"volume:multiattach": "rule:xena_system_admin_or_project_member" > > So it looks like, by default, any project member can create a multiattach volume. What am I missing? > > [1]: https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html > [2]: https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/policy.yaml.html#policy-file > ? > Danny Webb > Principal OpenStack Engineer > Danny.Webb at thehutgroup.com > > > www.thg.com > ? > From knikolla at bu.edu Wed Mar 1 02:53:49 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Wed, 1 Mar 2023 02:53:49 +0000 Subject: [all][tc] Technical Committee next weekly meeting on 2023 Mar 1 at 1600 UTC In-Reply-To: <186949eba68.bb0de83e1432598.5051006967090034367@ghanshyammann.com> References: <186949eba68.bb0de83e1432598.5051006967090034367@ghanshyammann.com> Message-ID: <9F242D12-58E5-497F-AEF7-AA380AD0A921@bu.edu> Hi all, Please find below the agenda for tomorrow's TC meeting that will be held over Zoom on 2023 Mar 1, at 1600 UTC. Link to connect can be found on https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting Agenda ? Roll call ? Follow up on past action items ? Welcome new and returning TC members ? Gate health check ? Discussion on uwsgi alternative and if we should define wsgi standard server in PTI ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032345.html ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032369.html ? Discussion of "Add guidelines about naming versions of the OpenStack projects" ? https://review.opendev.org/c/openstack/governance/+/874484 ? TC 2023.1 tracker status checks ? https://etherpad.opendev.org/p/tc-2023.1-tracker ? Deprecation process for TripleO ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032083.html ? Cleanup of PyPI maintainer list for OpenStack Projects ? Etherpad for audit and cleanup of additional PyPi maintainers ? https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup ? ML discussion ? https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html ? Recurring tasks check ? Bare 'recheck' state ? https://etherpad.opendev.org/p/recheck-weekly-summary ? Open Reviews ? https://review.opendev.org/q/projects:openstack/governance+is:open No noted absences. > On Feb 27, 2023, at 3:44 PM, Ghanshyam Mann wrote: > > Hello Everyone, > > The technical Committee's next weekly meeting is scheduled for 2023 Mar 1, at 1600 UTC. > > If you would like to add topics for discussion, please add them to the below wiki page by > Tuesday, Feb 28 at 2100 UTC. > > https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting > > -gmann > From adivya1.singh at gmail.com Wed Mar 1 05:41:19 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Wed, 1 Mar 2023 11:11:19 +0530 Subject: (OpenStack-Upgrade) Message-ID: Hi Team, I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Wed Mar 1 07:16:46 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Wed, 1 Mar 2023 01:16:46 -0600 Subject: (OpenStack-Upgrade) In-Reply-To: References: Message-ID: That will depend on how did you installed your environment: OSA, TripleO, etc. Can you provide more information? --- Alvaro Soto. Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: > Hi Team, > > I am planning to upgrade my Current Environment, The Upgrade procedure is > available in OpenStack Site and Forums. > > But i am looking fwd to roll back Plan , Other then have a Local backup > copy of galera Database > > Regards > Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arne.Wiebalck at cern.ch Wed Mar 1 07:53:36 2023 From: Arne.Wiebalck at cern.ch (Arne Wiebalck) Date: Wed, 1 Mar 2023 07:53:36 +0000 Subject: [tc][all] OpenStack Technical Committee new Chair In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> Message-ID: Ghanshyam, Thanks for all your work as the TC chair during the past two years! I think you did an amazing job driving all the background activities and required decisions to maintain and improve the OpenStack ecosystem ... and the weekly updates helped big time to keep in the community in the loop! Cheers, Arne ________________________________________ From: Ghanshyam Mann Sent: Tuesday, 28 February 2023 22:58 To: openstack-discuss Subject: [tc][all] OpenStack Technical Committee new Chair Hello Everyone, I would like to inform the community and congratulate/welcome Kristi as the new Chair of Technical Committee. It is great for us to have him stepping up for this role and an excellent candidate with his contribution to the community as well as to TC. Thanks for having me as a Chair for the past 2 years. I will continue as TC and my other activities/role in the community. Also thanks for reading my weekly updates which were lengthy sometimes or maybe many times :) -gmann From eblock at nde.ag Wed Mar 1 08:11:42 2023 From: eblock at nde.ag (Eugen Block) Date: Wed, 01 Mar 2023 08:11:42 +0000 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: Message-ID: <20230301081142.Horde.hEzM_pv6c33ED_YOh17hbIc@webmail.nde.ag> I'm not familiar with TripleO so I'm not sure how much of help I can be here, maybe someone else with can chime in. I would look for network and rabbit issues. Are the control nodes heavily loaded? Do you see the compute services from the edge site up all the time? If you run a 'watch -n 20 openstack compute service list', do they "flap" all the time or only if you launch instances? Maybe rabbitmq needs some tweaking? Can you show your policies? rabbitmqctl list_policies -p What network connection do they have, is the network saturated? Is it different on the edge site compared to the central site? Zitat von Swogat Pradhan : > Hi Eugen, > For some reason i am not getting your email to me directly, i am checking > the email digest and there i am able to find your reply. > Here is the log for download: https://we.tl/t-L8FEkGZFSq > Yes, these logs are from the time when the issue occurred. > > *Note: i am able to create vm's and perform other activities in the central > site, only facing this issue in the edge site.* > > With regards, > Swogat Pradhan > > On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan > wrote: > >> Hi Eugen, >> Thanks for your response. >> I have actually a 4 controller setup so here are the details: >> >> *PCS Status:* >> * Container bundle set: rabbitmq-bundle [ >> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-no-ceph-3 >> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-2 >> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-1 >> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-0 >> >> I have tried restarting the bundle multiple times but the issue is still >> present. >> >> *Cluster status:* >> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >> Cluster status of node >> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> Basics >> >> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >> Disk Nodes >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >> Running Nodes >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >> Versions >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 on >> Erlang 22.3.4.1 >> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 on >> Erlang 22.3.4.1 >> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 on >> Erlang 22.3.4.1 >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 on Erlang 22.3.4.1 >> >> Alarms >> >> (none) >> >> Network Partitions >> >> (none) >> >> Listeners >> >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and >> CLI tool communication >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> and AMQP 1.0 >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: [::], port: 15672, protocol: http, purpose: HTTP API >> >> Feature flags >> >> Flag: drop_unroutable_metric, state: enabled >> Flag: empty_basic_get_metric, state: enabled >> Flag: implicit_default_bindings, state: enabled >> Flag: quorum_queue, state: enabled >> Flag: virtual_host_metadata, state: enabled >> >> *Logs:* >> *(Attached)* >> >> With regards, >> Swogat Pradhan >> >> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan >> wrote: >> >>> Hi, >>> Please find the nova conductor as well as nova api log. >>> >>> nova-conuctor: >>> >>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 16152921c1eb45c2b1f562087140168b >>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> 83dbe5f567a940b698acfe986f6194fa >>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> f3bfd7f65bd542b18d84cea3033abb43: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a >>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> d4b9180f91a94f9a82c3c9c4b7595566: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 897911a234a445d8a0d8af02ece40f6f: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>> backend dogpile.cache.null. >>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 8f723ceb10c3472db9a9f324861df2bb: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan >>> wrote: >>> >>>> Hi, >>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>> launch vm's. >>>> When the VM is in spawning state the node goes down (openstack compute >>>> service list), the node comes backup when i restart the nova compute >>>> service but then the launch of the vm fails. >>>> >>>> nova-compute.log >>>> >>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>> instance usage >>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to >>>> 2023-02-26 08:00:00. 0 instances. >>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>> dcn01-hci-0.bdxworld.com >>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: >>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>>> backend dogpile.cache.null. >>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>> privsep helper: >>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', >>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep >>>> daemon via rootwrap >>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep daemon >>>> starting >>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>> process running with uid/gid: 0/0 >>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> process running with capabilities (eff/prm/inh): >>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep daemon >>>> running as pid 2647 >>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>> execution error >>>> in _get_host_uuid: Unexpected error while running command. >>>> Command: blkid overlay -s UUID -o value >>>> Exit code: 2 >>>> Stdout: '' >>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> Unexpected error while running command. >>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>> Is there a way to solve this issue? >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>> From smooney at redhat.com Wed Mar 1 08:15:16 2023 From: smooney at redhat.com (Sean Mooney) Date: Wed, 01 Mar 2023 08:15:16 +0000 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > BTW, this link ( > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) said > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? no its not wrong but for dpu smart nics you have to make a choice when you deploy either they can be used in dpu mode in which case remote_managed shoudl be set to true and you can only use them via neutron ports with vnic-type=remote_managed as descried in that doc https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port or if you disable dpu mode in the nic frimware then you shoudl remvoe remote_managed form the pci device list and then it can be used liek a normal vf either for neutron sriov ports vnic-type=direct or via flavor based pci passthough. the issue you were havign is you configured the pci device list to contain "remote_managed: ture" which means the vf can only be consumed by a neutron port with vnic-type=remote_managed, when you have "remote_managed: false" or unset you can use it via vnic-type=direct i forgot that slight detail that vnic-type=remote_managed is required for "remote_managed: ture". in either case you foudn the correct doc https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html neutorn sriov port configuration is documented here https://docs.openstack.org/neutron/latest/admin/config-sriov.html and nova flavor based pci passthough is documeted here https://docs.openstack.org/nova/latest/admin/pci-passthrough.html all three server slightly differnt uses. both neutron proceedures are exclusivly fo network interfaces. https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html requires the use of ovn deployed on the dpu to configure the VF contolplane. https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses the sriov nic agent to manage the VF with ip tools. https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is intended for pci passthough of stateless acclerorators like qat devices. while the nova flavor approch cna be used with nics it not how its generally ment to be used and when used to passthough a nic expectation is that its not related to a neuton network. From skaplons at redhat.com Wed Mar 1 08:18:17 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Wed, 01 Mar 2023 09:18:17 +0100 Subject: [tc][all] OpenStack Technical Committee new Chair In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> Message-ID: <4789451.GXAFRqVoOG@p1> Hi, Dnia wtorek, 28 lutego 2023 22:58:34 CET Ghanshyam Mann pisze: > Hello Everyone, > > I would like to inform the community and congratulate/welcome Kristi as the new > Chair of Technical Committee. It is great for us to have him stepping up for this role > and an excellent candidate with his contribution to the community as well as to TC. > > Thanks for having me as a Chair for the past 2 years. I will continue as TC and my > other activities/role in the community. Also thanks for reading my weekly updates > which were lengthy sometimes or maybe many times :) > > -gmann > > > Thx gmann for all Your work in those past 2 years as TC Chair - You did great job there. Congrats Kristi for being our new Chair. Good luck in Your new role :) -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From thierry at openstack.org Wed Mar 1 08:44:10 2023 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 1 Mar 2023 09:44:10 +0100 Subject: [tc][all] OpenStack Technical Committee new Chair In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> Message-ID: <3f9a86c8-15d0-ea84-0751-13749fc97d7b@openstack.org> Congrats to Kristi ! And welcome to the TC chair emeritus group, Ghanshyam! Ghanshyam Mann wrote: > Hello Everyone, > > I would like to inform the community and congratulate/welcome Kristi as the new > Chair of Technical Committee. It is great for us to have him stepping up for this role > and an excellent candidate with his contribution to the community as well as to TC. > > Thanks for having me as a Chair for the past 2 years. I will continue as TC and my > other activities/role in the community. Also thanks for reading my weekly updates > which were lengthy sometimes or maybe many times :) > > -gmann -- Thierry Carrez (ttx) From yasufum.o at gmail.com Wed Mar 1 09:41:04 2023 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Wed, 1 Mar 2023 18:41:04 +0900 Subject: [tc][heat][tacker] Moving governance of tosca-parser(and heat-translator ?) to Tacker In-Reply-To: <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com> References: <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com> <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com> Message-ID: On 2023/02/28 3:49, Ghanshyam Mann wrote: > ---- On Sun, 26 Feb 2023 19:54:45 -0800 Takashi Kajinami wrote --- > > > > > > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com> wrote: > > Hi, > > > > On 2023/02/27 10:51, Takashi Kajinami wrote: > > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann gmann at ghanshyammann.com> > > > wrote: > > > > > >>? ?---- On Sun, 19 Feb 2023 18:44:14 -0800? Takashi Kajinami? wrote --- > > >>? ?> Hello, > > >>? ?> > > >>? ?> Currently tosca-parser is part of heat's governance, but the core > > >> reviewers of this repositorydoes not contain any active heat cores while we > > >> see multiple Tacker cores in this group.Considering the fact the project is > > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate this > > >> repository to Tacker's governance. Most of the current heat cores are not > > >> quitefamiliar with the codes in this repository, and if Tacker team is not > > >> interested in maintainingthis repository then I'd propose retiring this. > > As you mentioned, tacker still using tosca-parser and heat-translator. > > > > >> > > >> I think it makes sense and I remember its usage/maintenance by the Tacker > > >> team since starting. > > >> But let's wait for the Tacker team opinion and accordingly you can propose > > >> the governance patch. > > Although I've not joined to tacker team since starting, it might not be > > true because there was no cores of tosca-parser and heat-translator in > > tacker team. We've started to help maintenance the projects because no > > other active contributer. > > > > >> > > >>? ?> > > >>? ?> Similarly, we have heat-translator project which has both heat cores > > >> and tacker cores as itscore reviewers. IIUC this is tightly related to the > > >> work in tosca-parser, I'm wondering it makesmore sense to move this project > > >> to Tacker, because the requirement is mostly made fromTacker side rather > > >> than Heat side. > > >> > > >> I am not sure about this and from the name, it seems like more of a heat > > >> thing but it is not got beyond the Tosca template > > >> conversion. Are there no users of it outside of the Tacker service? or any > > >> request to support more template conversions than > > >> Tosca? > > >> > > > > > > Current hea-translator supports only the TOSCA template[1]. > > > The heat-translator project can be a generic template converter by its > > > nature but we haven't seen any interest > > > in implementing support for different template formats. > > > > > > [1] > > > https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49 > > > > > > > > > > > >> If no other user or use case then I think one option can be to merge it > > >> into Tosca-parser itself and retire heat-translator. > > >> > > >> Opinion? > > Hmm, as a core of tosca-parser, I'm not sure it's a good idea because it > > is just a parser TOSCA and independent from heat-translator. In > > addition, there is no experts of Heat or HOT in current tacker team > > actually, so it might be difficult to maintain heat-translator without > > any help from heat team. > > > > The hea-translator project was initially created to implement a translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2] but we have never increased scope of tosca-parser. So it has beenno more than the TOSCA template translator. > > > > [1] https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2] https://review.opendev.org/c/openstack/project-config/+/211204 > > We (Heat team) can provide help with any problems with heat, but we own no actual use case of template translation.Maintaining the heat-translator repository with tacker, which currently provides actual use cases would make more sense.This also gives the benefit that Tacker team can decide when stable branches of heat-translator should be retiredalong with the other Tacker repos. > > > > By the way, may I ask what will be happened if the governance is move on > > to tacker? Is there any extra tasks for maintenance? > > > > TC would have better (and more precise) explanation but my understanding is that?- creating a release > > ?- maintaining stable branches > > ?- maintaining gate healthwould be the required tasks along with moderating dev discussion in mailing list/PTG/etc. > > I think you covered all and the Core team (Tacker members) might be already doing a few of the tasks. From the > governance perspective, tacker PTL will be the point of contact for this repo in the case repo becomes inactive or so > but it will be the project team's decision to merge/split things whatever way makes maintenance easy. I understand. I've shared the proposal again in the previous meeting and no objection raised. So, we'd agree to move the governance as Tacker team. Thanks, Yasufumi > > -gmann > > > > ?Thanks, > > Yasufumi > > > > >> > > > > > > That also sounds good to me. > > > > > > > > >> Also, correcting the email subject tag as [tc]. > > >> > > >> -gmann > > >> > > >>? ?> > > >>? ?> [1] > > >> https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2] > > >> https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members > > >>? ?> Thank you,Takashi > > >>? ?> > > >> > > >> > > > > > > > From swogatpradhan22 at gmail.com Wed Mar 1 09:54:16 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 1 Mar 2023 15:24:16 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: Message-ID: Hi Eugen, Request you to please add my email either on 'to' or 'cc' as i am not getting email's from you. Coming to the issue: [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p / Listing policies for vhost "/" ... vhost name pattern apply-to definition priority / ha-all ^(?!amq\.).* queues {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 I have the edge site compute nodes up, it only goes down when i am trying to launch an instance and the instance comes to a spawning state and then gets stuck. I have a tunnel setup between the central and the edge sites. With regards, Swogat Pradhan On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan wrote: > Hi Eugen, > For some reason i am not getting your email to me directly, i am checking > the email digest and there i am able to find your reply. > Here is the log for download: https://we.tl/t-L8FEkGZFSq > Yes, these logs are from the time when the issue occurred. > > *Note: i am able to create vm's and perform other activities in the > central site, only facing this issue in the edge site.* > > With regards, > Swogat Pradhan > > On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan > wrote: > >> Hi Eugen, >> Thanks for your response. >> I have actually a 4 controller setup so here are the details: >> >> *PCS Status:* >> * Container bundle set: rabbitmq-bundle [ >> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-no-ceph-3 >> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-2 >> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-1 >> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Started >> overcloud-controller-0 >> >> I have tried restarting the bundle multiple times but the issue is still >> present. >> >> *Cluster status:* >> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >> Cluster status of node >> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> Basics >> >> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >> Disk Nodes >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >> Running Nodes >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >> Versions >> >> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 >> on Erlang 22.3.4.1 >> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 >> on Erlang 22.3.4.1 >> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 >> on Erlang 22.3.4.1 >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 on Erlang 22.3.4.1 >> >> Alarms >> >> (none) >> >> Network Partitions >> >> (none) >> >> Listeners >> >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >> communication >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 >> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >> [::], port: 15672, protocol: http, purpose: HTTP API >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and >> CLI tool communication >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> and AMQP 1.0 >> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> interface: [::], port: 15672, protocol: http, purpose: HTTP API >> >> Feature flags >> >> Flag: drop_unroutable_metric, state: enabled >> Flag: empty_basic_get_metric, state: enabled >> Flag: implicit_default_bindings, state: enabled >> Flag: quorum_queue, state: enabled >> Flag: virtual_host_metadata, state: enabled >> >> *Logs:* >> *(Attached)* >> >> With regards, >> Swogat Pradhan >> >> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan >> wrote: >> >>> Hi, >>> Please find the nova conductor as well as nova api log. >>> >>> nova-conuctor: >>> >>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 16152921c1eb45c2b1f562087140168b >>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> 83dbe5f567a940b698acfe986f6194fa >>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> f3bfd7f65bd542b18d84cea3033abb43: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a >>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> d4b9180f91a94f9a82c3c9c4b7595566: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 897911a234a445d8a0d8af02ece40f6f: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>> backend dogpile.cache.null. >>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> 8f723ceb10c3472db9a9f324861df2bb: >>> oslo_messaging.exceptions.MessageUndeliverable >>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a >>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>> oslo_messaging.exceptions.MessageUndeliverable >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Hi, >>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>> launch vm's. >>>> When the VM is in spawning state the node goes down (openstack compute >>>> service list), the node comes backup when i restart the nova compute >>>> service but then the launch of the vm fails. >>>> >>>> nova-compute.log >>>> >>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running instance usage >>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to >>>> 2023-02-26 08:00:00. 0 instances. >>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>> dcn01-hci-0.bdxworld.com >>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: >>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>>> backend dogpile.cache.null. >>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running privsep helper: >>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', >>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep >>>> daemon via rootwrap >>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>> daemon starting >>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>> process running with uid/gid: 0/0 >>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> process running with capabilities (eff/prm/inh): >>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> daemon running as pid 2647 >>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>> in _get_host_uuid: Unexpected error while running command. >>>> Command: blkid overlay -s UUID -o value >>>> Exit code: 2 >>>> Stdout: '' >>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> Unexpected error while running command. >>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>> Is there a way to solve this issue? >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Wed Mar 1 09:59:53 2023 From: eblock at nde.ag (Eugen Block) Date: Wed, 01 Mar 2023 09:59:53 +0000 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: Message-ID: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> One more thing coming to mind is MTU size. Are they identical between central and edge site? Do you see packet loss through the tunnel? Zitat von Swogat Pradhan : > Hi Eugen, > Request you to please add my email either on 'to' or 'cc' as i am not > getting email's from you. > Coming to the issue: > > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p / > Listing policies for vhost "/" ... > vhost name pattern apply-to definition priority > / ha-all ^(?!amq\.).* queues > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > > I have the edge site compute nodes up, it only goes down when i am trying > to launch an instance and the instance comes to a spawning state and then > gets stuck. > > I have a tunnel setup between the central and the edge sites. > > With regards, > Swogat Pradhan > > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan > wrote: > >> Hi Eugen, >> For some reason i am not getting your email to me directly, i am checking >> the email digest and there i am able to find your reply. >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >> Yes, these logs are from the time when the issue occurred. >> >> *Note: i am able to create vm's and perform other activities in the >> central site, only facing this issue in the edge site.* >> >> With regards, >> Swogat Pradhan >> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan >> wrote: >> >>> Hi Eugen, >>> Thanks for your response. >>> I have actually a 4 controller setup so here are the details: >>> >>> *PCS Status:* >>> * Container bundle set: rabbitmq-bundle [ >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started >>> overcloud-controller-no-ceph-3 >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started >>> overcloud-controller-2 >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started >>> overcloud-controller-1 >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Started >>> overcloud-controller-0 >>> >>> I have tried restarting the bundle multiple times but the issue is still >>> present. >>> >>> *Cluster status:* >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>> Cluster status of node >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>> Basics >>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>> Disk Nodes >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> Running Nodes >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> Versions >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 >>> on Erlang 22.3.4.1 >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 >>> on Erlang 22.3.4.1 >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 >>> on Erlang 22.3.4.1 >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 on Erlang 22.3.4.1 >>> >>> Alarms >>> >>> (none) >>> >>> Network Partitions >>> >>> (none) >>> >>> Listeners >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >>> communication >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> and AMQP 1.0 >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface: >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >>> communication >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> and AMQP 1.0 >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface: >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool >>> communication >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> and AMQP 1.0 >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface: >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >>> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and >>> CLI tool communication >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> and AMQP 1.0 >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Feature flags >>> >>> Flag: drop_unroutable_metric, state: enabled >>> Flag: empty_basic_get_metric, state: enabled >>> Flag: implicit_default_bindings, state: enabled >>> Flag: quorum_queue, state: enabled >>> Flag: virtual_host_metadata, state: enabled >>> >>> *Logs:* >>> *(Attached)* >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan >>> wrote: >>> >>>> Hi, >>>> Please find the nova conductor as well as nova api log. >>>> >>>> nova-conuctor: >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> 16152921c1eb45c2b1f562087140168b >>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> 83dbe5f567a940b698acfe986f6194fa >>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>>> backend dogpile.cache.null. >>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Hi, >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>>> launch vm's. >>>>> When the VM is in spawning state the node goes down (openstack compute >>>>> service list), the node comes backup when i restart the nova compute >>>>> service but then the launch of the vm fails. >>>>> >>>>> nova-compute.log >>>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>>> instance usage >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to >>>>> 2023-02-26 08:00:00. 0 instances. >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>>> dcn01-hci-0.bdxworld.com >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with >>>>> backend dogpile.cache.null. >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>>> privsep helper: >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep >>>>> daemon via rootwrap >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>>> daemon starting >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>>> process running with uid/gid: 0/0 >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> process running with capabilities (eff/prm/inh): >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> daemon running as pid 2647 >>>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>>> execution error >>>>> in _get_host_uuid: Unexpected error while running command. >>>>> Command: blkid overlay -s UUID -o value >>>>> Exit code: 2 >>>>> Stdout: '' >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>>> Unexpected error while running command. >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>> >>>>> Is there a way to solve this issue? >>>>> >>>>> >>>>> With regards, >>>>> >>>>> Swogat Pradhan >>>>> >>>> From swogatpradhan22 at gmail.com Wed Mar 1 10:04:50 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 1 Mar 2023 15:34:50 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> Message-ID: Hi, Yes the MTU is the same as the default '1500'. Generally I haven't seen any packet loss, but never checked when launching the instance. I will check that and come back. But everytime i launch an instance the instance gets stuck at spawning state and there the hypervisor becomes down, so not sure if packet loss causes this. With regards, Swogat pradhan On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: > One more thing coming to mind is MTU size. Are they identical between > central and edge site? Do you see packet loss through the tunnel? > > Zitat von Swogat Pradhan : > > > Hi Eugen, > > Request you to please add my email either on 'to' or 'cc' as i am not > > getting email's from you. > > Coming to the issue: > > > > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p / > > Listing policies for vhost "/" ... > > vhost name pattern apply-to definition priority > > / ha-all ^(?!amq\.).* queues > > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} > 0 > > > > I have the edge site compute nodes up, it only goes down when i am trying > > to launch an instance and the instance comes to a spawning state and then > > gets stuck. > > > > I have a tunnel setup between the central and the edge sites. > > > > With regards, > > Swogat Pradhan > > > > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > > wrote: > > > >> Hi Eugen, > >> For some reason i am not getting your email to me directly, i am > checking > >> the email digest and there i am able to find your reply. > >> Here is the log for download: https://we.tl/t-L8FEkGZFSq > >> Yes, these logs are from the time when the issue occurred. > >> > >> *Note: i am able to create vm's and perform other activities in the > >> central site, only facing this issue in the edge site.* > >> > >> With regards, > >> Swogat Pradhan > >> > >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >> wrote: > >> > >>> Hi Eugen, > >>> Thanks for your response. > >>> I have actually a 4 controller setup so here are the details: > >>> > >>> *PCS Status:* > >>> * Container bundle set: rabbitmq-bundle [ > >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-no-ceph-3 > >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-2 > >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-1 > >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-0 > >>> > >>> I have tried restarting the bundle multiple times but the issue is > still > >>> present. > >>> > >>> *Cluster status:* > >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status > >>> Cluster status of node > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... > >>> Basics > >>> > >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>> > >>> Disk Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Running Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Versions > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > RabbitMQ > >>> 3.8.3 on Erlang 22.3.4.1 > >>> > >>> Alarms > >>> > >>> (none) > >>> > >>> Network Partitions > >>> > >>> (none) > >>> > >>> Listeners > >>> > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, > >>> interface: [::], port: 25672, protocol: clustering, purpose: > inter-node and > >>> CLI tool communication > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, > >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, > >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API > >>> > >>> Feature flags > >>> > >>> Flag: drop_unroutable_metric, state: enabled > >>> Flag: empty_basic_get_metric, state: enabled > >>> Flag: implicit_default_bindings, state: enabled > >>> Flag: quorum_queue, state: enabled > >>> Flag: virtual_host_metadata, state: enabled > >>> > >>> *Logs:* > >>> *(Attached)* > >>> > >>> With regards, > >>> Swogat Pradhan > >>> > >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> Please find the nova conductor as well as nova api log. > >>>> > >>>> nova-conuctor: > >>>> > >>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 16152921c1eb45c2b1f562087140168b > >>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> 83dbe5f567a940b698acfe986f6194fa > >>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply > >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due > to a > >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due > to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due > to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > b240e3e89d99489284cd731e75f2a5db > >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with > >>>> backend dogpile.cache.null. > >>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due > to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> > >>>> With regards, > >>>> Swogat Pradhan > >>>> > >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> wrote: > >>>> > >>>>> Hi, > >>>>> I currently have 3 compute nodes on edge site1 where i am trying to > >>>>> launch vm's. > >>>>> When the VM is in spawning state the node goes down (openstack > compute > >>>>> service list), the node comes backup when i restart the nova compute > >>>>> service but then the launch of the vm fails. > >>>>> > >>>>> nova-compute.log > >>>>> > >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running > >>>>> instance usage > >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to > >>>>> 2023-02-26 08:00:00. 0 instances. > >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node > >>>>> dcn01-hci-0.bdxworld.com > >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: > >>>>> /dev/vda. Libvirt can't honour user-supplied dev names > >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume > >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > with > >>>>> backend dogpile.cache.null. > >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running > >>>>> privsep helper: > >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > 'privsep-helper', > >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', > >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new > privsep > >>>>> daemon via rootwrap > >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon starting > >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with uid/gid: 0/0 > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with capabilities (eff/prm/inh): > >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon running as pid 2647 > >>>>> 2023-02-26 08:49:55.956 7 WARNING > os_brick.initiator.connectors.nvmeof > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process > >>>>> execution error > >>>>> in _get_host_uuid: Unexpected error while running command. > >>>>> Command: blkid overlay -s UUID -o value > >>>>> Exit code: 2 > >>>>> Stdout: '' > >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > >>>>> Unexpected error while running command. > >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>>> > >>>>> Is there a way to solve this issue? > >>>>> > >>>>> > >>>>> With regards, > >>>>> > >>>>> Swogat Pradhan > >>>>> > >>>> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Mar 1 10:34:55 2023 From: smooney at redhat.com (Sean Mooney) Date: Wed, 01 Mar 2023 10:34:55 +0000 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: > Thanks a lot !!! > > As you say, I follow > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > And I want to use DPU mode. Not "disable DPU mode". > So I think I should follow the link above exactlly, so I use > vnic-type=remote_anaged. > In my opnion, after I run first three command (which is "openstack network > create ...", "openstack subnet create", "openstack port create ..."), the > VF rep port and OVN and OVS rules are all ready. not at that point nothign will have been done on ovn/ovs that will only happen after the port is bound to a vm and host. > What I should do in "openstack server create ..." is to JUST add PCI device > into VM, do NOT call neutron-server in nova-compute of compute node ( like > call port_binding or something). this is incorrect. > > But as the log and steps said in the emails above, nova-compute call > port_binding to neutron-server while running the command "openstack server > create ...". > > So I still have questions is: > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT call > neutron-server in nova-compute of compute node ( like call port_binding or > something)" . no this is not how its designed. until you attach the logical port to a vm (either at runtime or as part of vm create) the logical port is not assocated with any host or phsical dpu/vf. so its not possibel to instanciate the openflow rules in ovs form the logical switch model in the ovn north db as no chassie info has been populated and we do not have the dpu serial info in the port binding details. > 2) If it's right, how to deal with this? Which is how to JUST add PCI > device into VM, do NOT call neutron-server? By command or by configure? Is > there come document ? no this happens automaticaly when nova does the port binding which cannot happen until after teh vm is schduled to a host. > > ---- > Simon Jones > > > Sean Mooney ?2023?3?1??? 16:15??? > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > > > BTW, this link ( > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) > > said > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? > > > > no its not wrong but for dpu smart nics you have to make a choice when you > > deploy > > either they can be used in dpu mode in which case remote_managed shoudl be > > set to true > > and you can only use them via neutron ports with vnic-type=remote_managed > > as descried in that doc > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port > > > > > > or if you disable dpu mode in the nic frimware then you shoudl remvoe > > remote_managed form the pci device list and > > then it can be used liek a normal vf either for neutron sriov ports > > vnic-type=direct or via flavor based pci passthough. > > > > the issue you were havign is you configured the pci device list to contain > > "remote_managed: ture" which means > > the vf can only be consumed by a neutron port with > > vnic-type=remote_managed, when you have "remote_managed: false" or unset > > you can use it via vnic-type=direct i forgot that slight detail that > > vnic-type=remote_managed is required for "remote_managed: ture". > > > > > > in either case you foudn the correct doc > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > neutorn sriov port configuration is documented here > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > > and nova flavor based pci passthough is documeted here > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > all three server slightly differnt uses. both neutron proceedures are > > exclusivly fo network interfaces. > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > requires the use of ovn deployed on the dpu > > to configure the VF contolplane. > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses > > the sriov nic agent > > to manage the VF with ip tools. > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is > > intended for pci passthough > > of stateless acclerorators like qat devices. while the nova flavor approch > > cna be used with nics it not how its generally > > ment to be used and when used to passthough a nic expectation is that its > > not related to a neuton network. > > > > From senrique at redhat.com Wed Mar 1 11:00:49 2023 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 1 Mar 2023 11:00:49 +0000 Subject: [cinder] Bug Report | 01-03-2023 Message-ID: Hello Argonauts, Medium - Volume multiattach exposed to non-admin users via API . - *Status*: Fix proposed to master . - [yadro] tatlin_client is_port_assigned method broken . - *Status*: Unassigned. Low - image_utils: code hardening around decompression . - *Status*: Unassigned and tagged as low-hanging-fruit . Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From adivya1.singh at gmail.com Wed Mar 1 11:02:26 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Wed, 1 Mar 2023 16:32:26 +0530 Subject: (OpenStack-Upgrade) In-Reply-To: References: Message-ID: hi Alvaro, i have installed using Openstack-ansible, The upgrade procedure is consistent but what is the roll back procedure , i m looking for Regards Adivya Singh On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: > That will depend on how did you installed your environment: OSA, TripleO, > etc. > > Can you provide more information? > > --- > Alvaro Soto. > > Note: My work hours may not be your work hours. Please do not feel the > need to respond during a time that is not convenient for you. > ---------------------------------------------------------- > Great people talk about ideas, > ordinary people talk about things, > small people talk... about other people. > > On Tue, Feb 28, 2023, 11:46 PM Adivya Singh > wrote: > >> Hi Team, >> >> I am planning to upgrade my Current Environment, The Upgrade procedure is >> available in OpenStack Site and Forums. >> >> But i am looking fwd to roll back Plan , Other then have a Local backup >> copy of galera Database >> >> Regards >> Adivya Singh >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lokendrarathour at gmail.com Wed Mar 1 12:00:57 2023 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Wed, 1 Mar 2023 17:30:57 +0530 Subject: Openstack Baremetal Instance creation using Alma Message-ID: Hi Team, Was trying to check whether we can launch a bare-metal Instance using Alma Image. Problem statement: I have some applications that would need Alma 8 OS as the base, we are planning to launch that application on OpenStack baremetal Instance, but we are not able to create OpenStack Baremetal Instance using Alma Image. OpenStack Version: TripleO Wallaby Alam image Qcow2: https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-9.1-20221118.x86_64.qcow2 This image works fine when launched as a VM instance, but not able to launch the same as Baremetal Instance. Best Regards, Lokendra -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arne.Wiebalck at cern.ch Wed Mar 1 12:25:57 2023 From: Arne.Wiebalck at cern.ch (Arne Wiebalck) Date: Wed, 1 Mar 2023 12:25:57 +0000 Subject: Openstack Baremetal Instance creation using Alma In-Reply-To: References: Message-ID: Lokendra, We ran into issues recently where a missing module in the initrd prevented nodes with specific h/w configurations from booting into 8 and 9 (after successful instantiation). Can you share some more details on what exactly fails, i.e. does the deployment itself fail, or does the instance not boot after a successful deployment, or ... Providing corresponding log snippets (paste.openstack.org) may also help to pinpoint issue. Cheers, Arne ________________________________________ From: Lokendra Rathour Sent: Wednesday, 1 March 2023 13:00 To: openstack-discuss Subject: Openstack Baremetal Instance creation using Alma Hi Team, Was trying to check whether we can launch a bare-metal Instance using Alma Image. Problem statement: I have some applications that would need Alma 8 OS as the base, we are planning to launch that application on OpenStack baremetal Instance, but we are not able to create OpenStack Baremetal Instance using Alma Image. OpenStack Version: TripleO Wallaby Alam image Qcow2: https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-9.1-20221118.x86_64.qcow2 This image works fine when launched as a VM instance, but not able to launch the same as Baremetal Instance. Best Regards, Lokendra -- ~ Lokendra skype: lokendrarathour [https://ci3.googleusercontent.com/mail-sig/AIorK4zyd6LpJOGqagxmzUlY59eMQx0-FN0t8HtjdtGE7VLZSKIxBUz3bI7z-MBqbgDVg1-XbtvHgN_ATJ10N6bonyO-JSGTtl5s_mNSbDoXBg] From noonedeadpunk at gmail.com Wed Mar 1 12:47:01 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Wed, 1 Mar 2023 13:47:01 +0100 Subject: (OpenStack-Upgrade) In-Reply-To: References: Message-ID: Hey, Regarding rollaback of upgrade in OSA we indeed don't have any good established/documented process for that. At the same time it should be completely possible with some "BUT". It also depends on what exactly you want to rollback - roles, openstack services or both. As OSA roles can actually install any openstack service version. We keep all virtualenvs from the previous version, so during upgrade we build just new virtualenvs and reconfigure systemd units to point there. So fastest way likely would be to just edit systemd unit files and point them to old venv version and reload systemd daemon and service and restore DB from backup of course. You can also define _venv_tag (ie `glance_venv_tag`) to the old OSA version you was running and execute openstack-ansible os--install.yml --tags systemd-service,uwsgi - that in most cases will be enough to just edit systemd units for the service and start old version of it. BUT running without tags will result in having new packages in old venv which is smth you totally want to avoid. To prevent that you can also define _git_install_branch and requirements_git_install_branch in /etc/openstack_deploy/group_vars (it's important to use group vars if you want to rollback only one service) and take value from https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml (ofc pick your old version!) For a full rollback and not in-place workarounds, I think it should be like that * checkout to previous osa version * re-execute scripts/bootstrap-ansible.sh * you should still take current versions of mariadb and rabbitmq and define them in user_variables (galera_major_version, galera_minor_version, rabbitmq_package_version, rabbitmq_erlang_version_spec) - it's close to never ends well downgrading these. * Restore DB backup * Re-run setup-openstack.yml It's quite a rough summary of how I do see this process, but to be frank I never had to execute full downgrade - I was limited mostly by downgrading 1 service tops after the upgrade. Hope that helps! ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > hi Alvaro, > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > but what is the roll back procedure , i m looking for > > Regards > Adivya Singh > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: >> >> That will depend on how did you installed your environment: OSA, TripleO, etc. >> >> Can you provide more information? >> >> --- >> Alvaro Soto. >> >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. >> ---------------------------------------------------------- >> Great people talk about ideas, >> ordinary people talk about things, >> small people talk... about other people. >> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: >>> >>> Hi Team, >>> >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. >>> >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database >>> >>> Regards >>> Adivya Singh From jungleboyj at gmail.com Wed Mar 1 13:37:00 2023 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 1 Mar 2023 07:37:00 -0600 Subject: [tc][all] OpenStack Technical Committee new Chair In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> Message-ID: <4eae1d95-f0ff-1d6c-efc8-7a1b9156401c@gmail.com> Gmann, You did a great job as the TC Chair!? Thank you for all of the great leadership you provided! Congratualtions Kristi! Jay On 2/28/2023 3:58 PM, Ghanshyam Mann wrote: > Hello Everyone, > > I would like to inform the community and congratulate/welcome Kristi as the new > Chair of Technical Committee. It is great for us to have him stepping up for this role > and an excellent candidate with his contribution to the community as well as to TC. > > Thanks for having me as a Chair for the past 2 years. I will continue as TC and my > other activities/role in the community. Also thanks for reading my weekly updates > which were lengthy sometimes or maybe many times :) > > -gmann > > From knikolla at bu.edu Wed Mar 1 14:53:12 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Wed, 1 Mar 2023 14:53:12 +0000 Subject: [tc][all] OpenStack Technical Committee new Chair In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com> Message-ID: Thank you, and thank you for your amazing work as chair for the past two years! From: Ghanshyam Mann Date: Tuesday, February 28, 2023 at 4:59 PM To: openstack-discuss Subject: [tc][all] OpenStack Technical Committee new Chair Hello Everyone, I would like to inform the community and congratulate/welcome Kristi as the new Chair of Technical Committee. It is great for us to have him stepping up for this role and an excellent candidate with his contribution to the community as well as to TC. Thanks for having me as a Chair for the past 2 years. I will continue as TC and my other activities/role in the community. Also thanks for reading my weekly updates which were lengthy sometimes or maybe many times :) -gmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Wed Mar 1 15:04:17 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 1 Mar 2023 09:04:17 -0600 Subject: June 2023 PTG Team Signup Kickoff Message-ID: Hello Everyone, As you may have seen, we are hosting an abbreviated PTG in conjunction with the Vancouver OpenInfra Summit[0]! To sign your team up, you must complete the survey[1] by April 2nd at 7:00 UTC. We NEED accurate contact information for the moderator of your team?s sessions. This is because the survey information will be used to organize the schedule signups which will be done via the PTGBot. If you are not on IRC, please get setup[2] on the OFTC network and join #openinfra-events. You are also encouraged to familiarize yourself with the PTGBot documentation[3] as well. If you have any questions, please reach out! Information about signing up for timeslots will be sent to moderators shortly after the team signup deadline. Registration is open[4] and prices will increase May 5th! Continue to visit openinfra.dev/ptg for updates. -Kendall (diablo_rojo) [0] OpenInfra Summit Site: [1] Team Survey: https://openinfra.dev/summit/vancouver-2023 https://openinfrafoundation.formstack.com/forms/june2023_ptg_survey [2] Setup IRC: https://docs.openstack.org/contributors/common/irc.html [3] PTGBot README: https://opendev.org/openstack/ptgbot/src/branch/master/README.rst [4] OpenInfra Summit Registration: https://vancouver2023.openinfra.dev/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Wed Mar 1 01:34:55 2023 From: batmanustc at gmail.com (Simon Jones) Date: Wed, 1 Mar 2023 09:34:55 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: You got the point what I want to say ! Let me explain more: 1. The hole story is I want to deploy openstack Yoga, and the compute node use DPU (BF2, BlueFiled2). So I follow this link: https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html 2. After deploy exactly as the link said, I use these command to create a VM (which also called instance). ``` openstack network create selfservice openstack subnet create --network selfservice --subnet-range 192.168.1.0/24 selfservice-v4 openstack port create --network selfservice --vnic-type remote-managed \ --binding-profile '{"pci_vendor_info":"", "pci_slot":"", "physical_network":"", "card_serial_number": "AB0123XX0042", "pf_mac_address": "08:c0:eb:8e:bd:f4", "vf_num":1, "vnic_type": "remote-managed"}' \ pf0vf0 openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1 cirros-os-dpu-test-1 --property "pci_passthrough:alias"="a1:2" All command above pass. openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ --security-group default provider-instance This command got ERROR. The ERROR is shown in the first email. ``` 3. So I have few questions: ``` question 1: Why got ERROR? Why "No valid host was found"? question 2: When I run command "openstack port create ...", I could specify which VF-rep port (virtual function's representor port) plug into br-int in DPU's OVS. As normal operate, I should start VM and plug THE VF into VM. But in "openstack port create ...", how to specify THE VF ? ``` 4. For question 1, I debug it as the first mail said. And I will check the second email to solve it. 5. For question 2, I have no idea, as these is no document to refer this question. What should I do ? ---- Simon Jones Sean Mooney ?2023?3?1??? 01:18??? > On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote: > > Hi all, > > > > I'm working on openstack Yoga's PCI passthrough feature, follow this > link: > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > I configure exactly as the link said, but when I create server use this > > command, I found ERROR: > > ``` > > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ > > --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ > > --security-group default --key-name mykey provider-instance > > > > > > > fault | {'code': 500, 'created': > > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are > not > > enough hosts available.', 'details': 'Traceback (most recent call > last):\n > > File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line > > 1548, in schedule_and_build_instances\n host_lists = > > self._schedule_instances(context, request_specs[0],\n File > > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in > > _schedule_instances\n host_lists = > > self.query_client.select_destinations(\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line 41, > > in select_destinations\n return > > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in > > select_destinations\n return cctxt.call(ctxt, \'select_destinations\', > > **msg_args)\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, > in > > call\n result = self.transport._send(\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, > in > > _send\n return self._driver.send(target, ctxt, message,\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 689, in send\n return self._send(target, ctxt, message, > > wait_for_reply, timeout,\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 681, in _send\n raise > > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was > found. > > There are not enough hosts available.\nTraceback (most recent call > > last):\n\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 241, > in > > inner\n return func(*args, **kwargs)\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in > > select_destinations\n selections = self._select_destinations(\n\n > File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in > > _select_destinations\n selections = self._schedule(\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in > > _schedule\n self._ensure_sufficient_hosts(\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in > > _ensure_sufficient_hosts\n raise > > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No > > valid host was found. There are not enough hosts available.\n\n'} | > > > > // this is what I configured:NovaInstance > > > > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1 > > +----------------------------+------------------------------+ > > > Field | Value | > > +----------------------------+------------------------------+ > > > OS-FLV-DISABLED:disabled | False | > > > OS-FLV-EXT-DATA:ephemeral | 0 | > > > access_project_ids | None | > > > description | None | > > > disk | 1 | > > > id | 0 | > > > name | cirros-os-dpu-test-1 | > > > os-flavor-access:is_public | True | > > > properties | pci_passthrough:alias='a1:1' | > > > ram | 64 | > > > rxtx_factor | 1.0 | > > > swap | | > > > vcpus | 1 | > > +----------------------------+------------------------------+ > > > > // in controller node /etc/nova/nova.conf > > > > [filter_scheduler] > > enabled_filters = PciPassthroughFilter > > available_filters = nova.scheduler.filters.all_filters > > > > [pci] > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > "physical_network": null, "remote_managed": "true"} > > alias = { "vendor_id":"15b3", "product_id":"101e", > "device_type":"type-VF", > > "name":"a1" } > > > > // in compute node /etc/nova/nova.conf > > > > [pci] > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > "physical_network": null, "remote_managed": "true"} > > alias = { "vendor_id":"15b3", "product_id":"101e", > "device_type":"type-VF", > > "name":"a1" } > > "remote_managed": "true" is only valid for neutron sriov port > not flavor based pci passhtough. > > so you need to use vnci_type=driect asusming you are trying to use > > https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html > > which is not the same as generic pci passthough. > > if you just want to use geneic pci passthive via a flavor remove > "remote_managed": "true" > > > > > ``` > > > > The detail ERROR I found is: > > - The reason why "There are not enough hosts available" is, > > nova-scheduler's log shows "There are 0 hosts available but 1 instances > > requested to build", which means no hosts support PCI passthough feature. > > > > This is nova-schduler's log > > ``` > > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule > > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60'] > select_destinations > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141 > > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] compute_status_filter > > request filter added forbidden trait COMPUTE_STATUS_DISABLED > > compute_status_filter > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254 > > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'compute_status_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'accelerators_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'remote_managed_ports_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock > > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by > > > "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" > > :: waited 0.000s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 > > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock > > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by > > > "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" > > :: held 0.003s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 > > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode set > > to > > > STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION > > _check_effective_sql_mode > > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314 > > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not > found > > for host c1c2. Not tracking instance info for this host. > > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', > 'c1c2')" > > acquired by > > "nova.scheduler.host_manager.HostState.update.._locked_update" :: > > waited 0.000s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 > > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > from > > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch": > > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": > > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features": > > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt", > > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand", > "adx", > > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc", > "xsavec", > > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", "avx2", > > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64", > > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov", > > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36", > "clflushopt", > > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de", > "invtsc", > > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse", > > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma", > > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid", > > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni", > > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe", > "est", > > > "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{" > > nova_object.name": "NUMATopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.2", "nova_object.data": {"cells": [{" > > nova_object.name": "NUMACell", "nova_object.namespace": "nova", > > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [0, > > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, > 8, > > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0, > > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, 5], > > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0, > "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, > "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}], > > "network_metadata": {"nova_object.name": "NetworkMetadata", > > "nova_object.namespace": "nova", "nova_object.version": "1.0", > > "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id", > > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings", > > "mempages", "memory"]}]}, "nova_object.changes": > > > ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0) > > _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > aggregates: [] _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a', > > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute', > > 'report_count': 121959, 'disabled': False, 'disabled_reason': None, > > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49, > > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61, > > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40, > > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2, > 28, > > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': > > False} _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > instances: [] _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176 > > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', > 'c1c2')" > > "released" by > > "nova.scheduler.host_manager.HostState.update.._locked_update" :: > > held 0.004s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 > > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1 > host(s) > > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ---- > > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:543 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ---- > > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545 > > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:546 > > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI > devices > > left to satisfy request _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:556 > > 2023-02-28 06:11:58.527 1942637 DEBUG > > nova.scheduler.filters.pci_passthrough_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram: > > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required > PCI > > devices > > (InstancePCIRequests(instance_uuid=,requests=[InstancePCIRequest])) > > host_passes > > > /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52 > > 2023-02-28 06:11:58.528 1942637 INFO nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter > > PciPassthroughFilter returned 0 hosts > > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all > > hosts for the request with instance ID > > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: > > [('PciPassthroughFilter', None)] get_filtered_objects > > /usr/lib/python3/dist-packages/nova/filters.py:114 > > 2023-02-28 06:11:58.528 1942637 INFO nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all > > hosts for the request with instance ID > > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: > > ['PciPassthroughFilter: (start: 1, end: 0)'] > > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered [] > > _get_sorted_hosts > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610 > > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts > > available but 1 instances requested to build. _ensure_sufficient_hosts > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450 > > ``` > > > > Then I search database, I found PCI configure of compute node is not > upload: > > ``` > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE > > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE > > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource class show PCI_DEVICE > > +-------+------------+ > > > Field | Value | > > +-------+------------+ > > > name | PCI_DEVICE | > > +-------+------------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 1.5 | > > > min_unit | 1 | > > > max_unit | 31890 | > > > reserved | 512 | > > > step_size | 1 | > > > total | 31890 | > > > used | 0 | > > +------------------+-------+ > > ?? 31890 ????compute node resource tracker???????? > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU > > ?^Cgyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 16.0 | > > > min_unit | 1 | > > > max_unit | 12 | > > > reserved | 0 | > > > step_size | 1 | > > > total | 12 | > > > used | 0 | > > +------------------+-------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF > > No inventory of class SRIOV_NET_VF for > c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 1.0 | > > > min_unit | 1 | > > > max_unit | 456 | > > > reserved | 0 | > > > step_size | 1 | > > > total | 456 | > > > used | 0 | > > +------------------+-------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS > > No inventory of class IPV4_ADDRESS for > c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > > > MariaDB [nova]> select * from compute_nodes; > > > +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ > > > created_at | updated_at | deleted_at | id | > > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used | > > local_gb_used | hypervisor_type | hypervisor_version | cpu_info > > > > > > > > > > > > > > > > > > > > > > > > > > > > | disk_available_least | free_ram_mb | free_disk_gb | > > current_workload | running_vms | hypervisor_hostname | deleted | host_ip > > | supported_instances > > > > > > > > > > > > > > > > > > | pci_stats > > > > > > > metrics | extra_resources | stats | numa_topology > > > > > > > > > > > > > > > > > > > > > > > > > > > > | host | ram_allocation_ratio | > cpu_allocation_ratio > > > uuid | disk_allocation_ratio | mapped | > > > +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ > > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 | 1 | > > NULL | 4 | 3931 | 60 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov", > > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall", > > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku", > > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni", > > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms", > > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma", > > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f", > "pclmuldq", > > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl", > "clflushopt", > > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8", > > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl", > "lahf_lm", > > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc", > > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", "vme", > > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush", > > "rdrand"]} | 49 | 3419 | 60 | > > 0 | 0 | gyw | 1 | 192.168.2.99 | > > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", > > "hvm"], ["x86_64", "kvm", "hvm"]] > > > > > > > > > > > > > > > > | {"nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, > 2, > > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], > > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "reserved", "size_kb", "total"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "reserved", "size_kb", "total"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null}, > > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages", > > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket", > > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw > > | 1.5 | 16 | > > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b | 1 | 1 | > > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 | 2 | > > NULL | 4 | 3931 | 60 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq", > > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic", > "abm", > > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", "msr", > > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3", > > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx", > "fpu", > > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves", > > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms", > > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx", > > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt", > > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb", > > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", "mmx", > > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities", > > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs", > > "sep", "md-clear"]} | 49 | 3419 | 60 | > > 0 | 0 | c1c1 | 2 | > 192.168.2.99 > > | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", > > "hvm"], ["x86_64", "kvm", "hvm"]] > > > > > > > > > > > > > > > > | {"nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, > 2, > > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], > > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "total", "size_kb", "reserved"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "total", "size_kb", "reserved"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, > > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings", > > "id", "mempages", "pinned_cpus", "memory", "pcpuset", "network_metadata", > > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1 | > > 1.5 | 16 | 1eac1c8d-d96a-4eeb-9868-5a341a80c6df > | > > 1 | 0 | > > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 | 3 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni", > > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear", > > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no", > > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg", > "smap", > > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16", > > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip", > > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq", > > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd", > > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae", > > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", "3dnowprefetch", > > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm", > > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx", > "cx8", > > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2", > > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni", > > "vaes", "aes"]} | 416 | 31378 | 456 | > > 0 | 0 | c-MS-7D42 | 3 | 192.168.2.99 > | > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", > > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", > > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", > > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total", > > "reserved", "used", "size_kb"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total", > > "reserved", "used", "size_kb"]}], "network_metadata": {"nova_object.name > ": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id", > > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings", > > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 | > > 1.5 | 16 | > f115a1c2-fda3-42c6-945a-8b54fef40daf > > > 1 | 0 | > > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 | 4 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no", > > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", "movdir64b", > > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16", > > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt", > > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2", > > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", "fsgsbase", > > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp", > > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx", > > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2", > > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", "pat", > > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all", > > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx", > > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", "ds_cpl", > > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni", > "rdtscp"]} | > > 416 | 31378 | 456 | 0 | > > 0 | c-MS-7D42 | 4 | 192.168.28.21 | [["alpha", > > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], > > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], > > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", > > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "total", "used", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket", > > "pcpuset", "memory", "memory_usage", "id", "network_metadata", > "cpu_usage", > > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | > 10ea8254-ad84-4db9-9acd-5c783cb8600e > > > 1 | 0 | > > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 | 5 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht", > > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall", > > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe", > > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx", > > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov", > > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed", > > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni", > > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt", > > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all", > > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme", > "cx16", > > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline", > > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3", > > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq", > > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb", > > "spec-ctrl", "waitpkg"]} | 416 | 31378 | > > 456 | 0 | 0 | c-MS-7D42 | 5 | > > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], > > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", > > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", > > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], > > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu", > > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", > > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"], > > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu", > "hvm"], > > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64", > > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"], > > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name": "PciDevicePoolList", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"objects": []}, "nova_object.changes": ["objects"]} > | > > [] | NULL | {"failed_builds": "0"} | {"nova_object.name > ": > > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": > > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "size_kb", "total", "reserved"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "size_kb", "total", "reserved"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0}, > > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id", > > "cpu_usage", "network_metadata", "siblings", "mempages", "socket", > > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | > > 8efa100f-ab14-45fd-8c39-644b49772883 | 1 | 0 | > > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 | 6 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid", > > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri", > > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce", > > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2", > > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep", > > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni", > > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni", > > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes", > > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic", > > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor", > > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid", > "xtpr", > > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl", > > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16", > > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat", > > "tsc-deadline"]} | 416 | 31378 | 456 | > > 0 | 0 | c-MS-7D42 | 6 | > 192.168.28.21 > > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", > "qemu", > > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", > > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", > > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus", > > "network_metadata", "pcpuset", "cpuset", "siblings", "socket", > "cpu_usage", > > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 | > > 1 | 0 | > > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 | 7 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid", "pse36", > > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm", > > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves", "erms", > > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2", > > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1", > > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce", > "md-clear", > > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse", > > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq", > > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes", > > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu", > "sse4.2", > > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est", > "rdctl-no", > > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt", > "rdrand", > > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx", "vme", > > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} | > > 416 | 31378 | 456 | 0 | > > 0 | c-MS-7D42 | 7 | 192.168.28.21 | [["alpha", > "qemu", > > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris", > > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["lm32", > > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"], > > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel", > > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"], > > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu", > > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb", > "qemu", > > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > ["unicore32", > > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"], > > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved", > > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Wed Mar 1 01:41:42 2023 From: batmanustc at gmail.com (Simon Jones) Date: Wed, 1 Mar 2023 09:41:42 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: Sorry, it should be But in "openstack server create ...", how to specify THE VF ? ---- Simon Jones Simon Jones ?2023?3?1??? 09:34??? > You got the point what I want to say ! Let me explain more: > > 1. The hole story is I want to deploy openstack Yoga, and the compute node > use DPU (BF2, BlueFiled2). So I follow this link: > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > 2. After deploy exactly as the link said, I use these command to create a > VM (which also called instance). > ``` > openstack network create selfservice > openstack subnet create --network selfservice --subnet-range > 192.168.1.0/24 selfservice-v4 > openstack port create --network selfservice --vnic-type remote-managed \ > --binding-profile '{"pci_vendor_info":"", "pci_slot":"", > "physical_network":"", "card_serial_number": "AB0123XX0042", > "pf_mac_address": "08:c0:eb:8e:bd:f4", "vf_num":1, "vnic_type": > "remote-managed"}' \ > pf0vf0 > openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1 > cirros-os-dpu-test-1 --property "pci_passthrough:alias"="a1:2" > > All command above pass. > > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ > --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ > --security-group default provider-instance > > This command got ERROR. The ERROR is shown in the first email. > ``` > 3. So I have few questions: > ``` > question 1: Why got ERROR? Why "No valid host was found"? > question 2: When I run command "openstack port create ...", I could > specify which VF-rep port (virtual function's representor port) plug into > br-int in DPU's OVS. As normal operate, I should start VM and plug THE VF > into VM. But in "openstack port create ...", how to specify THE VF ? > ``` > 4. For question 1, I debug it as the first mail said. And I will check the > second email to solve it. > 5. For question 2, I have no idea, as these is no document to refer this > question. What should I do ? > > > ---- > Simon Jones > > > Sean Mooney ?2023?3?1??? 01:18??? > >> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote: >> > Hi all, >> > >> > I'm working on openstack Yoga's PCI passthrough feature, follow this >> link: >> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html >> > >> > I configure exactly as the link said, but when I create server use this >> > command, I found ERROR: >> > ``` >> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ >> > --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ >> > --security-group default --key-name mykey provider-instance >> > >> > >> > > fault | {'code': 500, 'created': >> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are >> not >> > enough hosts available.', 'details': 'Traceback (most recent call >> last):\n >> > File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line >> > 1548, in schedule_and_build_instances\n host_lists = >> > self._schedule_instances(context, request_specs[0],\n File >> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in >> > _schedule_instances\n host_lists = >> > self.query_client.select_destinations(\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line >> 41, >> > in select_destinations\n return >> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in >> > select_destinations\n return cctxt.call(ctxt, >> \'select_destinations\', >> > **msg_args)\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line >> 189, in >> > call\n result = self.transport._send(\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, >> in >> > _send\n return self._driver.send(target, ctxt, message,\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 689, in send\n return self._send(target, ctxt, message, >> > wait_for_reply, timeout,\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 681, in _send\n raise >> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was >> found. >> > There are not enough hosts available.\nTraceback (most recent call >> > last):\n\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line >> 241, in >> > inner\n return func(*args, **kwargs)\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in >> > select_destinations\n selections = self._select_destinations(\n\n >> File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in >> > _select_destinations\n selections = self._schedule(\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in >> > _schedule\n self._ensure_sufficient_hosts(\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in >> > _ensure_sufficient_hosts\n raise >> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No >> > valid host was found. There are not enough hosts available.\n\n'} | >> > >> > // this is what I configured:NovaInstance >> > >> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1 >> > +----------------------------+------------------------------+ >> > > Field | Value | >> > +----------------------------+------------------------------+ >> > > OS-FLV-DISABLED:disabled | False | >> > > OS-FLV-EXT-DATA:ephemeral | 0 | >> > > access_project_ids | None | >> > > description | None | >> > > disk | 1 | >> > > id | 0 | >> > > name | cirros-os-dpu-test-1 | >> > > os-flavor-access:is_public | True | >> > > properties | pci_passthrough:alias='a1:1' | >> > > ram | 64 | >> > > rxtx_factor | 1.0 | >> > > swap | | >> > > vcpus | 1 | >> > +----------------------------+------------------------------+ >> > >> > // in controller node /etc/nova/nova.conf >> > >> > [filter_scheduler] >> > enabled_filters = PciPassthroughFilter >> > available_filters = nova.scheduler.filters.all_filters >> > >> > [pci] >> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >> > "physical_network": null, "remote_managed": "true"} >> > alias = { "vendor_id":"15b3", "product_id":"101e", >> "device_type":"type-VF", >> > "name":"a1" } >> > >> > // in compute node /etc/nova/nova.conf >> > >> > [pci] >> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >> > "physical_network": null, "remote_managed": "true"} >> > alias = { "vendor_id":"15b3", "product_id":"101e", >> "device_type":"type-VF", >> > "name":"a1" } >> >> "remote_managed": "true" is only valid for neutron sriov port >> not flavor based pci passhtough. >> >> so you need to use vnci_type=driect asusming you are trying to use >> >> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html >> >> which is not the same as generic pci passthough. >> >> if you just want to use geneic pci passthive via a flavor remove >> "remote_managed": "true" >> >> > >> > ``` >> > >> > The detail ERROR I found is: >> > - The reason why "There are not enough hosts available" is, >> > nova-scheduler's log shows "There are 0 hosts available but 1 instances >> > requested to build", which means no hosts support PCI passthough >> feature. >> > >> > This is nova-schduler's log >> > ``` >> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule >> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60'] >> select_destinations >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141 >> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] >> compute_status_filter >> > request filter added forbidden trait COMPUTE_STATUS_DISABLED >> > compute_status_filter >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254 >> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'compute_status_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'accelerators_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'remote_managed_ports_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock >> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by >> > >> "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" >> > :: waited 0.000s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 >> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock >> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by >> > >> "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" >> > :: held 0.003s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 >> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode >> set >> > to >> > >> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION >> > _check_effective_sql_mode >> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314 >> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not >> found >> > for host c1c2. Not tracking instance info for this host. >> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', >> 'c1c2')" >> > acquired by >> > "nova.scheduler.host_manager.HostState.update.._locked_update" >> :: >> > waited 0.000s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 >> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> from >> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch": >> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", >> "topology": >> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features": >> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt", >> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand", >> "adx", >> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc", >> "xsavec", >> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", >> "avx2", >> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64", >> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov", >> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36", >> "clflushopt", >> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de", >> "invtsc", >> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse", >> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma", >> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid", >> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni", >> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe", >> "est", >> > >> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{" >> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{" >> > nova_object.name": "NUMACell", "nova_object.namespace": "nova", >> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": >> [0, >> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, >> 8, >> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0, >> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, >> 5], >> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0, >> "reserved": >> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": >> ["size_kb", >> > "used", "reserved", "total"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, >> "reserved": >> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}], >> > "network_metadata": {"nova_object.name": "NetworkMetadata", >> > "nova_object.namespace": "nova", "nova_object.version": "1.0", >> > "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id", >> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings", >> > "mempages", "memory"]}]}, "nova_object.changes": >> > >> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0) >> > _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > aggregates: [] _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a', >> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute', >> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None, >> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49, >> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61, >> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40, >> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2, >> 28, >> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': >> > False} _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > instances: [] _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176 >> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', >> 'c1c2')" >> > "released" by >> > "nova.scheduler.host_manager.HostState.update.._locked_update" >> :: >> > held 0.004s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 >> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1 >> host(s) >> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ---- >> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ---- >> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545 >> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546 >> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI >> devices >> > left to satisfy request _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556 >> > 2023-02-28 06:11:58.527 1942637 DEBUG >> > nova.scheduler.filters.pci_passthrough_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram: >> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required >> PCI >> > devices >> > (InstancePCIRequests(instance_uuid=,requests=[InstancePCIRequest])) >> > host_passes >> > >> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52 >> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter >> > PciPassthroughFilter returned 0 hosts >> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed >> all >> > hosts for the request with instance ID >> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: >> > [('PciPassthroughFilter', None)] get_filtered_objects >> > /usr/lib/python3/dist-packages/nova/filters.py:114 >> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed >> all >> > hosts for the request with instance ID >> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: >> > ['PciPassthroughFilter: (start: 1, end: 0)'] >> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered [] >> > _get_sorted_hosts >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610 >> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts >> > available but 1 instances requested to build. _ensure_sufficient_hosts >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450 >> > ``` >> > >> > Then I search database, I found PCI configure of compute node is not >> upload: >> > ``` >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE >> > No inventory of class PCI_DEVICE for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE >> > No inventory of class PCI_DEVICE for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource class show PCI_DEVICE >> > +-------+------------+ >> > > Field | Value | >> > +-------+------------+ >> > > name | PCI_DEVICE | >> > +-------+------------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 1.5 | >> > > min_unit | 1 | >> > > max_unit | 31890 | >> > > reserved | 512 | >> > > step_size | 1 | >> > > total | 31890 | >> > > used | 0 | >> > +------------------+-------+ >> > ?? 31890 ????compute node resource tracker???????? >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU >> > ?^Cgyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 16.0 | >> > > min_unit | 1 | >> > > max_unit | 12 | >> > > reserved | 0 | >> > > step_size | 1 | >> > > total | 12 | >> > > used | 0 | >> > +------------------+-------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF >> > No inventory of class SRIOV_NET_VF for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 1.0 | >> > > min_unit | 1 | >> > > max_unit | 456 | >> > > reserved | 0 | >> > > step_size | 1 | >> > > total | 456 | >> > > used | 0 | >> > +------------------+-------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS >> > No inventory of class IPV4_ADDRESS for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > >> > MariaDB [nova]> select * from compute_nodes; >> > >> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ >> > > created_at | updated_at | deleted_at | id | >> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used >> | >> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > | disk_available_least | free_ram_mb | free_disk_gb | >> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip >> > | supported_instances >> > >> > >> > >> > >> > >> > >> > >> > >> > | pci_stats >> > >> > >> > > metrics | extra_resources | stats | numa_topology >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > | host | ram_allocation_ratio | >> cpu_allocation_ratio >> > > uuid | disk_allocation_ratio | mapped >> | >> > >> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ >> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 | 1 | >> > NULL | 4 | 3931 | 60 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov", >> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall", >> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku", >> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni", >> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms", >> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma", >> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f", >> "pclmuldq", >> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl", >> "clflushopt", >> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8", >> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl", >> "lahf_lm", >> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc", >> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", >> "vme", >> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush", >> > "rdrand"]} | 49 | 3419 | 60 | >> > 0 | 0 | gyw | 1 | 192.168.2.99 | >> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", >> > "hvm"], ["x86_64", "kvm", "hvm"]] >> > >> > >> > >> > >> > >> > >> > >> > | {"nova_object.name": >> > "PciDevicePoolList", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, >> 2, >> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": >> [], >> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "reserved", "size_kb", "total"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "reserved", "size_kb", "total"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null}, >> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages", >> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket", >> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw >> > | 1.5 | 16 | >> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b | 1 | 1 | >> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 | 2 | >> > NULL | 4 | 3931 | 60 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq", >> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic", >> "abm", >> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", >> "msr", >> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3", >> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx", >> "fpu", >> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves", >> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms", >> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx", >> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt", >> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb", >> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", >> "mmx", >> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities", >> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs", >> > "sep", "md-clear"]} | 49 | 3419 | 60 >> | >> > 0 | 0 | c1c1 | 2 | >> 192.168.2.99 >> > | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", >> > "hvm"], ["x86_64", "kvm", "hvm"]] >> > >> > >> > >> > >> > >> > >> > >> > | {"nova_object.name": >> > "PciDevicePoolList", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, >> 2, >> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": >> [], >> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "total", "size_kb", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "total", "size_kb", "reserved"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, >> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings", >> > "id", "mempages", "pinned_cpus", "memory", "pcpuset", >> "network_metadata", >> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1 | >> > 1.5 | 16 | >> 1eac1c8d-d96a-4eeb-9868-5a341a80c6df | >> > 1 | 0 | >> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 | 3 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni", >> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear", >> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no", >> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg", >> "smap", >> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16", >> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip", >> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq", >> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd", >> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae", >> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", >> "3dnowprefetch", >> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm", >> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx", >> "cx8", >> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2", >> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni", >> > "vaes", "aes"]} | 416 | 31378 | 456 | >> > 0 | 0 | c-MS-7D42 | 3 | >> 192.168.2.99 | >> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", >> "qemu", >> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", >> "kvm", >> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", >> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", >> "hvm"], >> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", >> "qemu", >> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", >> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], >> ["sh4eb", >> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", >> "kvm", >> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total", >> > "reserved", "used", "size_kb"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total", >> > "reserved", "used", "size_kb"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id", >> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings", >> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 | >> > 1.5 | 16 | >> f115a1c2-fda3-42c6-945a-8b54fef40daf >> > > 1 | 0 | >> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 | 4 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no", >> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", >> "movdir64b", >> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16", >> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt", >> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2", >> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", >> "fsgsbase", >> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp", >> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx", >> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2", >> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", >> "pat", >> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all", >> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx", >> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", >> "ds_cpl", >> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni", >> "rdtscp"]} | >> > 416 | 31378 | 456 | 0 | >> > 0 | c-MS-7D42 | 4 | 192.168.28.21 | [["alpha", >> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], >> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", >> "hvm"], >> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", >> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], >> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", >> "qemu", >> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", >> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], >> ["sh4eb", >> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", >> "kvm", >> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", >> > "total", "used", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": >> ["size_kb", >> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name >> ": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket", >> > "pcpuset", "memory", "memory_usage", "id", "network_metadata", >> "cpu_usage", >> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2 | >> > 1.5 | 16 | >> 10ea8254-ad84-4db9-9acd-5c783cb8600e >> > > 1 | 0 | >> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 | 5 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht", >> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall", >> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe", >> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx", >> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov", >> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed", >> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni", >> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt", >> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all", >> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme", >> "cx16", >> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline", >> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3", >> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq", >> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb", >> > "spec-ctrl", "waitpkg"]} | 416 | 31378 | >> > 456 | 0 | 0 | c-MS-7D42 | 5 | >> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], >> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", >> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", >> "qemu", >> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], >> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu", >> > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", >> > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"], >> > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu", >> "hvm"], >> > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64", >> > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"], >> > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name": >> "PciDevicePoolList", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"objects": []}, "nova_object.changes": >> ["objects"]} | >> > [] | NULL | {"failed_builds": "0"} | {"nova_object.name >> ": >> > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": >> > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "size_kb", "total", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "size_kb", "total", "reserved"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0}, >> > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id", >> > "cpu_usage", "network_metadata", "siblings", "mempages", "socket", >> > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2 | >> > 1.5 | 16 | >> > 8efa100f-ab14-45fd-8c39-644b49772883 | 1 | 0 | >> > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 | 6 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid", >> > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri", >> > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce", >> > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2", >> > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep", >> > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni", >> > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni", >> > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes", >> > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic", >> > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor", >> > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid", >> "xtpr", >> > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl", >> > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16", >> > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat", >> > "tsc-deadline"]} | 416 | 31378 | 456 | >> > 0 | 0 | c-MS-7D42 | 6 | >> 192.168.28.21 >> > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", >> "qemu", >> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", >> "kvm", >> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", >> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", >> "hvm"], >> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", >> "qemu", >> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", >> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], >> ["sh4eb", >> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", >> "kvm", >> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", >> > "used", "total", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": >> ["size_kb", >> > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name >> ": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus", >> > "network_metadata", "pcpuset", "cpuset", "siblings", "socket", >> "cpu_usage", >> > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2 | >> > 1.5 | 16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 | >> > 1 | 0 | >> > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 | 7 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid", >> "pse36", >> > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm", >> > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves", >> "erms", >> > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2", >> > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1", >> > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce", >> "md-clear", >> > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse", >> > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq", >> > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes", >> > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu", >> "sse4.2", >> > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est", >> "rdctl-no", >> > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt", >> "rdrand", >> > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx", >> "vme", >> > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} | >> > 416 | 31378 | 456 | 0 | >> > 0 | c-MS-7D42 | 7 | 192.168.28.21 | [["alpha", >> "qemu", >> > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris", >> > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], >> ["lm32", >> > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"], >> > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel", >> > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"], >> > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu", >> > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb", >> "qemu", >> > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> ["unicore32", >> > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"], >> > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> nova_object.name": >> > "PciDevicePoolList", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved", >> > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uday.dikshit at myrealdata.in Wed Mar 1 05:24:16 2023 From: uday.dikshit at myrealdata.in (Uday Dikshit) Date: Wed, 1 Mar 2023 05:24:16 +0000 Subject: Autoscaling in Kolla Ansible Wallaby Openstack release In-Reply-To: References: Message-ID: I am currently working on a similar approach with Senlin, gnocchi and aodh, but I find gnocchi metrics inconsistent with data points. Hence autoscaling is working fine sometime but then not at all responding in some cases. So I was looking for another approach where we could achieve the same goal with quality in data. Sent from Outlook for Android ________________________________ From: Satish Patel Sent: Wednesday, March 1, 2023 3:05:12 AM To: Dmitriy Rabotyagov Cc: Uday Dikshit ; openstack-discuss at lists.openstack.org Subject: Re: Autoscaling in Kolla Ansible Wallaby Openstack release I did some lab work with senlin and its awesome project. I did deploy with OSA (openstack-ansible) - https://satishdotpatel.github.io/openstack-senlin-autoscaling/ On Tue, Feb 28, 2023 at 2:53?PM Dmitriy Rabotyagov > wrote: Hey, There's an OpenStack project called Senlin [1] that provides auto-scaling of customer environments by leveraging heat templates. I have no idea if kolla does support it's deployment or not though. [1] https://docs.openstack.org/senlin ??, 28 ????. 2023??. ? 18:54, Uday Dikshit >: Hello Team As a public cloud service providers our aim is to provide our customers with autoscaling for instances feature. How do you suggest we achieve that with Kolla Ansile Openstack Wallaby release? Thanks & Regards, [https://acefone.com/email-signature/logo-new.png] [https://acefone.com/email-signature/facebook.png] [https://acefone.com/email-signature/linkedin.png] [https://acefone.com/email-signature/twitter.png] [https://acefone.com/email-signature/youtube.png] [https://acefone.com/email-signature/glassdoor.png] Uday Dikshit Cloud DevOps Engineer, Product Development uday.dikshit at myrealdata.in www.myrealdata.in 809-A Udyog Vihar, Phase 5, Gurugram - 122015, Haryana ________________________________ This email has been scanned for spam and viruses by Proofpoint Essentials. Click here to report this email as spam. -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Wed Mar 1 06:51:36 2023 From: batmanustc at gmail.com (Simon Jones) Date: Wed, 1 Mar 2023 14:51:36 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: Hi, 1. I try the 2nd method, which remove "remote-managed" tag in /etc/nova/nova.conf, but got ERROR in creating VM in compute node's nova-compute service. Detail log refer to LOG-1 section bellow, I think it's because hypervisor has no neutron-agent as I use DPU, neutron anget?which is ovn-controller? is on DPU. Is right ? 2. So I want to try the 1st method in the email, which is use vnic-type=direct. BUT, HOW TO USE ? IS THERE ANY DOCUMENT ? THANKS. LOG-1, which is compute node's nova-compute.log > ``` > 2023-03-01 14:24:02.631 504488 DEBUG oslo_concurrency.processutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Running cmd > (subprocess): /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 > --cpu=30 -- env LC_ALL=C LANG=C qemu-img info > /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk > --force-share --output=json execute > /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:384 > 2023-03-01 14:24:02.654 504488 DEBUG oslo_concurrency.processutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] CMD "/usr/bin/python3 > -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C > qemu-img info > /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk > --force-share --output=json" returned: 0 in 0.023s execute > /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:422 > 2023-03-01 14:24:02.655 504488 DEBUG nova.virt.disk.api > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Cannot resize image > /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk to a > smaller size. can_resize_image > /usr/lib/python3/dist-packages/nova/virt/disk/api.py:172 > 2023-03-01 14:24:02.655 504488 DEBUG nova.objects.instance > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lazy-loading > 'migration_context' on Instance uuid a2603eeb-8db0-489b-ba40-dff1d74be21f > obj_load_attr /usr/lib/python3/dist-packages/nova/objects/instance.py:1099 > 2023-03-01 14:24:02.673 504488 DEBUG nova.virt.libvirt.driver > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Created local disks _create_image > /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4768 > 2023-03-01 14:24:02.674 504488 DEBUG nova.virt.libvirt.driver > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Ensure instance console log exists: > /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/console.log > _ensure_console_log_for_instance > /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4531 > 2023-03-01 14:24:02.674 504488 DEBUG oslo_concurrency.lockutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources" > acquired by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" :: > waited 0.000s inner > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 > 2023-03-01 14:24:02.675 504488 DEBUG oslo_concurrency.lockutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources" > "released" by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" :: > held 0.000s inner > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Instance failed network > setup after 1 attempt(s): nova.exception.PortBindingFailed: Binding failed > for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs > for more information. > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager Traceback (most > recent call last): > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in > _allocate_network_async > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager nwinfo = > self.network_api.allocate_for_instance( > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in > allocate_for_instance > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > created_port_ids = self._update_ports_for_instance( > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in > _update_ports_for_instance > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager vif.destroy() > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in > __exit__ > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > self.force_reraise() > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in > force_reraise > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager raise > self.value > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in > _update_ports_for_instance > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager updated_port > = self._update_port( > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in > _update_port > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > _ensure_no_port_binding_failure(port) > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in > _ensure_no_port_binding_failure > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager raise > exception.PortBindingFailed(port_id=port['id']) > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > nova.exception.PortBindingFailed: Binding failed for port > 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more > information. > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > nova.exception.PortBindingFailed: Binding failed for port > 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more > information. > 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance failed to spawn: > nova.exception.PortBindingFailed: Binding failed for port > 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more > information. > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Traceback (most recent call last): > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2743, in > _build_resources > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] yield resources > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2503, in > _build_and_run_instance > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] self.driver.spawn(context, > instance, image_meta, > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4329, in > spawn > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] xml = > self._get_guest_xml(context, instance, network_info, > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7288, in > _get_guest_xml > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] network_info_str = > str(network_info) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/model.py", line 620, in __str__ > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] return self._sync_wrapper(fn, > *args, **kwargs) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/model.py", line 603, in > _sync_wrapper > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] self.wait() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/model.py", line 635, in wait > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] self[:] = self._gt.wait() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 181, in wait > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] return self._exit_event.wait() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] result = hub.switch() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] return self.greenlet.switch() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 221, in main > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] result = function(*args, **kwargs) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] return func(*args, **kwargs) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in > _allocate_network_async > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] raise e > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in > _allocate_network_async > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] result = function(*args, **kwargs) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] return func(*args, **kwargs) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in > _allocate_network_async > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] raise e > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in > _allocate_network_async > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] nwinfo = > self.network_api.allocate_for_instance( > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in > allocate_for_instance > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] created_port_ids = > self._update_ports_for_instance( > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in > _update_ports_for_instance > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] vif.destroy() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in > __exit__ > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] self.force_reraise() > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in > force_reraise > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] raise self.value > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in > _update_ports_for_instance > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] updated_port = self._update_port( > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in > _update_port > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] > _ensure_no_port_binding_failure(port) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] File > "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in > _ensure_no_port_binding_failure > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] raise > exception.PortBindingFailed(port_id=port['id']) > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] nova.exception.PortBindingFailed: > Binding failed for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check > neutron logs for more information. > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] > 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance > a073-e0f1c61fe178, please check neutron logs for more information. > 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] > 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance > 2023-03-01 14:24:03.349 504488 DEBUG oslo_concurrency.lockutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Acquired lock > "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:294 > 2023-03-01 14:24:03.350 504488 DEBUG nova.network.neutron > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Building network info cache for > instance _get_instance_nw_info > /usr/lib/python3/dist-packages/nova/network/neutron.py:2014 > 2023-03-01 14:24:03.431 504488 DEBUG nova.network.neutron > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance cache missing network info. > _get_preexisting_port_ids > /usr/lib/python3/dist-packages/nova/network/neutron.py:3327 > 2023-03-01 14:24:03.624 504488 DEBUG nova.network.neutron > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Updating instance_info_cache with > network_info: [] update_instance_cache_with_nw_info > /usr/lib/python3/dist-packages/nova/network/neutron.py:117 > 2023-03-01 14:24:03.638 504488 DEBUG oslo_concurrency.lockutils > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] Releasing lock > "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:312 > 2023-03-01 14:24:03.639 504488 DEBUG nova.compute.manager > [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 > 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: > a2603eeb-8db0-489b-ba40-dff1d74be21f] Start destroying the instance on the > hypervisor. _shutdown_instance > /usr/lib/python3/dist-packages/nova/compute/manager.py:2999 > 2023-03-01 14:24:03.648 504488 DEBUG nova.virt.libvirt.driver [-] > [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] During wait destroy, > instance disappeared. _wait_for_destroy > /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:1483 > 2023-03-01 14:24:03.648 504488 INFO nova.virt.libvirt.driver [-] > [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance destroyed > successfully. > ``` > ---- Simon Jones Sean Mooney ?2023?3?1??? 01:18??? > On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote: > > Hi all, > > > > I'm working on openstack Yoga's PCI passthrough feature, follow this > link: > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > I configure exactly as the link said, but when I create server use this > > command, I found ERROR: > > ``` > > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ > > --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ > > --security-group default --key-name mykey provider-instance > > > > > > > fault | {'code': 500, 'created': > > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are > not > > enough hosts available.', 'details': 'Traceback (most recent call > last):\n > > File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line > > 1548, in schedule_and_build_instances\n host_lists = > > self._schedule_instances(context, request_specs[0],\n File > > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in > > _schedule_instances\n host_lists = > > self.query_client.select_destinations(\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line 41, > > in select_destinations\n return > > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in > > select_destinations\n return cctxt.call(ctxt, \'select_destinations\', > > **msg_args)\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, > in > > call\n result = self.transport._send(\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, > in > > _send\n return self._driver.send(target, ctxt, message,\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 689, in send\n return self._send(target, ctxt, message, > > wait_for_reply, timeout,\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", > > line 681, in _send\n raise > > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was > found. > > There are not enough hosts available.\nTraceback (most recent call > > last):\n\n File > > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 241, > in > > inner\n return func(*args, **kwargs)\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in > > select_destinations\n selections = self._select_destinations(\n\n > File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in > > _select_destinations\n selections = self._schedule(\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in > > _schedule\n self._ensure_sufficient_hosts(\n\n File > > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in > > _ensure_sufficient_hosts\n raise > > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No > > valid host was found. There are not enough hosts available.\n\n'} | > > > > // this is what I configured:NovaInstance > > > > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1 > > +----------------------------+------------------------------+ > > > Field | Value | > > +----------------------------+------------------------------+ > > > OS-FLV-DISABLED:disabled | False | > > > OS-FLV-EXT-DATA:ephemeral | 0 | > > > access_project_ids | None | > > > description | None | > > > disk | 1 | > > > id | 0 | > > > name | cirros-os-dpu-test-1 | > > > os-flavor-access:is_public | True | > > > properties | pci_passthrough:alias='a1:1' | > > > ram | 64 | > > > rxtx_factor | 1.0 | > > > swap | | > > > vcpus | 1 | > > +----------------------------+------------------------------+ > > > > // in controller node /etc/nova/nova.conf > > > > [filter_scheduler] > > enabled_filters = PciPassthroughFilter > > available_filters = nova.scheduler.filters.all_filters > > > > [pci] > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > "physical_network": null, "remote_managed": "true"} > > alias = { "vendor_id":"15b3", "product_id":"101e", > "device_type":"type-VF", > > "name":"a1" } > > > > // in compute node /etc/nova/nova.conf > > > > [pci] > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > "physical_network": null, "remote_managed": "true"} > > alias = { "vendor_id":"15b3", "product_id":"101e", > "device_type":"type-VF", > > "name":"a1" } > > "remote_managed": "true" is only valid for neutron sriov port > not flavor based pci passhtough. > > so you need to use vnci_type=driect asusming you are trying to use > > https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html > > which is not the same as generic pci passthough. > > if you just want to use geneic pci passthive via a flavor remove > "remote_managed": "true" > > > > > ``` > > > > The detail ERROR I found is: > > - The reason why "There are not enough hosts available" is, > > nova-scheduler's log shows "There are 0 hosts available but 1 instances > > requested to build", which means no hosts support PCI passthough feature. > > > > This is nova-schduler's log > > ``` > > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule > > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60'] > select_destinations > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141 > > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] compute_status_filter > > request filter added forbidden trait COMPUTE_STATUS_DISABLED > > compute_status_filter > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254 > > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'compute_status_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'accelerators_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter > > 'remote_managed_ports_filter' took 0.0 seconds wrapper > > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 > > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock > > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by > > > "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" > > :: waited 0.000s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 > > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock > > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by > > > "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" > > :: held 0.003s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 > > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode set > > to > > > STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION > > _check_effective_sql_mode > > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314 > > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not > found > > for host c1c2. Not tracking instance info for this host. > > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', > 'c1c2')" > > acquired by > > "nova.scheduler.host_manager.HostState.update.._locked_update" :: > > waited 0.000s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 > > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > from > > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch": > > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": > > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features": > > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt", > > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand", > "adx", > > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc", > "xsavec", > > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", "avx2", > > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64", > > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov", > > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36", > "clflushopt", > > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de", > "invtsc", > > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse", > > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma", > > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid", > > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni", > > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe", > "est", > > > "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{" > > nova_object.name": "NUMATopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.2", "nova_object.data": {"cells": [{" > > nova_object.name": "NUMACell", "nova_object.namespace": "nova", > > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [0, > > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, > 8, > > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0, > > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, 5], > > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0, > "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, > "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}], > > "network_metadata": {"nova_object.name": "NetworkMetadata", > > "nova_object.namespace": "nova", "nova_object.version": "1.0", > > "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id", > > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings", > > "mempages", "memory"]}]}, "nova_object.changes": > > > ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0) > > _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > aggregates: [] _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a', > > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute', > > 'report_count': 121959, 'disabled': False, 'disabled_reason': None, > > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49, > > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61, > > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40, > > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2, > 28, > > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': > > False} _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173 > > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state > with > > instances: [] _locked_update > > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176 > > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', > 'c1c2')" > > "released" by > > "nova.scheduler.host_manager.HostState.update.._locked_update" :: > > held 0.004s inner > > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 > > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1 > host(s) > > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ---- > > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:543 > > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ---- > > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545 > > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:546 > > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI > devices > > left to satisfy request _filter_pools > > /usr/lib/python3/dist-packages/nova/pci/stats.py:556 > > 2023-02-28 06:11:58.527 1942637 DEBUG > > nova.scheduler.filters.pci_passthrough_filter > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram: > > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required > PCI > > devices > > (InstancePCIRequests(instance_uuid=,requests=[InstancePCIRequest])) > > host_passes > > > /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52 > > 2023-02-28 06:11:58.528 1942637 INFO nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter > > PciPassthroughFilter returned 0 hosts > > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all > > hosts for the request with instance ID > > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: > > [('PciPassthroughFilter', None)] get_filtered_objects > > /usr/lib/python3/dist-packages/nova/filters.py:114 > > 2023-02-28 06:11:58.528 1942637 INFO nova.filters > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all > > hosts for the request with instance ID > > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: > > ['PciPassthroughFilter: (start: 1, end: 0)'] > > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered [] > > _get_sorted_hosts > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610 > > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager > > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc > ff627ad39ed94479b9c5033bc462cf78 > > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts > > available but 1 instances requested to build. _ensure_sufficient_hosts > > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450 > > ``` > > > > Then I search database, I found PCI configure of compute node is not > upload: > > ``` > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE > > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE > > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource class show PCI_DEVICE > > +-------+------------+ > > > Field | Value | > > +-------+------------+ > > > name | PCI_DEVICE | > > +-------+------------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 1.5 | > > > min_unit | 1 | > > > max_unit | 31890 | > > > reserved | 512 | > > > step_size | 1 | > > > total | 31890 | > > > used | 0 | > > +------------------+-------+ > > ?? 31890 ????compute node resource tracker???????? > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU > > ?^Cgyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 16.0 | > > > min_unit | 1 | > > > max_unit | 12 | > > > reserved | 0 | > > > step_size | 1 | > > > total | 12 | > > > used | 0 | > > +------------------+-------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF > > No inventory of class SRIOV_NET_VF for > c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB > > +------------------+-------+ > > > Field | Value | > > +------------------+-------+ > > > allocation_ratio | 1.0 | > > > min_unit | 1 | > > > max_unit | 456 | > > > reserved | 0 | > > > step_size | 1 | > > > total | 456 | > > > used | 0 | > > +------------------+-------+ > > gyw at c1:~$ openstack resource provider inventory show > > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS > > No inventory of class IPV4_ADDRESS for > c360cc82-f0fd-4662-bccd-e1f02b27af51 > > (HTTP 404) > > > > MariaDB [nova]> select * from compute_nodes; > > > +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ > > > created_at | updated_at | deleted_at | id | > > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used | > > local_gb_used | hypervisor_type | hypervisor_version | cpu_info > > > > > > > > > > > > > > > > > > > > > > > > > > > > | disk_available_least | free_ram_mb | free_disk_gb | > > current_workload | running_vms | hypervisor_hostname | deleted | host_ip > > | supported_instances > > > > > > > > > > > > > > > > > > | pci_stats > > > > > > > metrics | extra_resources | stats | numa_topology > > > > > > > > > > > > > > > > > > > > > > > > > > > > | host | ram_allocation_ratio | > cpu_allocation_ratio > > > uuid | disk_allocation_ratio | mapped | > > > +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ > > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 | 1 | > > NULL | 4 | 3931 | 60 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov", > > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall", > > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku", > > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni", > > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms", > > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma", > > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f", > "pclmuldq", > > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl", > "clflushopt", > > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8", > > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl", > "lahf_lm", > > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc", > > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", "vme", > > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush", > > "rdrand"]} | 49 | 3419 | 60 | > > 0 | 0 | gyw | 1 | 192.168.2.99 | > > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", > > "hvm"], ["x86_64", "kvm", "hvm"]] > > > > > > > > > > > > > > > > | {"nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, > 2, > > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], > > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "reserved", "size_kb", "total"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "reserved", "size_kb", "total"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null}, > > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages", > > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket", > > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw > > | 1.5 | 16 | > > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b | 1 | 1 | > > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 | 2 | > > NULL | 4 | 3931 | 60 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq", > > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic", > "abm", > > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", "msr", > > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3", > > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx", > "fpu", > > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves", > > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms", > > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx", > > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt", > > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb", > > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", "mmx", > > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities", > > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs", > > "sep", "md-clear"]} | 49 | 3419 | 60 | > > 0 | 0 | c1c1 | 2 | > 192.168.2.99 > > | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", > > "hvm"], ["x86_64", "kvm", "hvm"]] > > > > > > > > > > > > > > > > | {"nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, > 2, > > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], > > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "total", "size_kb", "reserved"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "total", "size_kb", "reserved"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, > > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings", > > "id", "mempages", "pinned_cpus", "memory", "pcpuset", "network_metadata", > > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1 | > > 1.5 | 16 | 1eac1c8d-d96a-4eeb-9868-5a341a80c6df > | > > 1 | 0 | > > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 | 3 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni", > > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear", > > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no", > > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg", > "smap", > > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16", > > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip", > > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq", > > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd", > > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae", > > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", "3dnowprefetch", > > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm", > > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx", > "cx8", > > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2", > > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni", > > "vaes", "aes"]} | 416 | 31378 | 456 | > > 0 | 0 | c-MS-7D42 | 3 | 192.168.2.99 > | > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", > > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", > > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", > > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total", > > "reserved", "used", "size_kb"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total", > > "reserved", "used", "size_kb"]}], "network_metadata": {"nova_object.name > ": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id", > > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings", > > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 | > > 1.5 | 16 | > f115a1c2-fda3-42c6-945a-8b54fef40daf > > > 1 | 0 | > > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 | 4 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no", > > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", "movdir64b", > > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16", > > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt", > > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2", > > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", "fsgsbase", > > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp", > > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx", > > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2", > > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", "pat", > > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all", > > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx", > > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", "ds_cpl", > > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni", > "rdtscp"]} | > > 416 | 31378 | 456 | 0 | > > 0 | c-MS-7D42 | 4 | 192.168.28.21 | [["alpha", > > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], > > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], > > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", > > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "total", "used", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket", > > "pcpuset", "memory", "memory_usage", "id", "network_metadata", > "cpu_usage", > > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | > 10ea8254-ad84-4db9-9acd-5c783cb8600e > > > 1 | 0 | > > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 | 5 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht", > > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall", > > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe", > > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx", > > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov", > > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed", > > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni", > > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt", > > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all", > > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme", > "cx16", > > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline", > > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3", > > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq", > > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb", > > "spec-ctrl", "waitpkg"]} | 416 | 31378 | > > 456 | 0 | 0 | c-MS-7D42 | 5 | > > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], > > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", > > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", > > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], > > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu", > > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", > > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"], > > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu", > "hvm"], > > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64", > > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"], > > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name": "PciDevicePoolList", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"objects": []}, "nova_object.changes": ["objects"]} > | > > [] | NULL | {"failed_builds": "0"} | {"nova_object.name > ": > > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": > > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "size_kb", "total", "reserved"]}, {"nova_object.name": > "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", > > "size_kb", "total", "reserved"]}], "network_metadata": {" > nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0}, > > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id", > > "cpu_usage", "network_metadata", "siblings", "mempages", "socket", > > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | > > 8efa100f-ab14-45fd-8c39-644b49772883 | 1 | 0 | > > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 | 6 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid", > > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri", > > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce", > > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2", > > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep", > > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni", > > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni", > > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes", > > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic", > > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor", > > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid", > "xtpr", > > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl", > > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16", > > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat", > > "tsc-deadline"]} | 416 | 31378 | 456 | > > 0 | 0 | c-MS-7D42 | 6 | > 192.168.28.21 > > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", > "qemu", > > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", > > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", > > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], > > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", > "qemu", > > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", > > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], > ["sh4eb", > > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", > "kvm", > > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.1", > > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": > > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {" > > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, > > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", > > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name": > > "NetworkMetadata", "nova_object.namespace": "nova", > "nova_object.version": > > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, > > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, > > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus", > > "network_metadata", "pcpuset", "cpuset", "siblings", "socket", > "cpu_usage", > > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2 | > > 1.5 | 16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 | > > 1 | 0 | > > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 | 7 | > > NULL | 12 | 31890 | 456 | 0 | 512 | > > 0 | QEMU | 4002001 | {"arch": "x86_64", > > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": > > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid", "pse36", > > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm", > > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves", "erms", > > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2", > > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1", > > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce", > "md-clear", > > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse", > > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq", > > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes", > > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu", > "sse4.2", > > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est", > "rdctl-no", > > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt", > "rdrand", > > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx", "vme", > > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} | > > 416 | 31378 | 456 | 0 | > > 0 | c-MS-7D42 | 7 | 192.168.28.21 | [["alpha", > "qemu", > > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris", > > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["lm32", > > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"], > > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel", > > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"], > > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu", > > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb", > "qemu", > > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], > ["unicore32", > > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"], > > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" > nova_object.name": > > "PciDevicePoolList", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, > > "nova_object.changes": ["objects"]} | [] | NULL | > > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", > > "nova_object.namespace": "nova", "nova_object.version": "1.2", > > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", > > "nova_object.namespace": "nova", "nova_object.version": "1.5", > > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, > > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, > > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, > 1], > > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" > nova_object.name": > > "NUMAPagesTopology", "nova_object.namespace": "nova", > > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": > > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved", > > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Wed Mar 1 07:20:51 2023 From: batmanustc at gmail.com (Simon Jones) Date: Wed, 1 Mar 2023 15:20:51 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: BTW, this link ( https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) said I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? ---- Simon Jones Simon Jones ?2023?3?1??? 14:51??? > Hi, > > 1. I try the 2nd method, which remove "remote-managed" tag in > /etc/nova/nova.conf, but got ERROR in creating VM in compute node's > nova-compute service. Detail log refer to LOG-1 section bellow, I think > it's because hypervisor has no neutron-agent as I use DPU, neutron > anget?which is ovn-controller? is on DPU. Is right ? > > 2. So I want to try the 1st method in the email, which is use > vnic-type=direct. BUT, HOW TO USE ? IS THERE ANY DOCUMENT ? > > THANKS. > > LOG-1, which is compute node's nova-compute.log > >> ``` >> 2023-03-01 14:24:02.631 504488 DEBUG oslo_concurrency.processutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Running cmd >> (subprocess): /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 >> --cpu=30 -- env LC_ALL=C LANG=C qemu-img info >> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk >> --force-share --output=json execute >> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:384 >> 2023-03-01 14:24:02.654 504488 DEBUG oslo_concurrency.processutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] CMD "/usr/bin/python3 >> -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C >> qemu-img info >> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk >> --force-share --output=json" returned: 0 in 0.023s execute >> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:422 >> 2023-03-01 14:24:02.655 504488 DEBUG nova.virt.disk.api >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Cannot resize image >> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk to a >> smaller size. can_resize_image >> /usr/lib/python3/dist-packages/nova/virt/disk/api.py:172 >> 2023-03-01 14:24:02.655 504488 DEBUG nova.objects.instance >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lazy-loading >> 'migration_context' on Instance uuid a2603eeb-8db0-489b-ba40-dff1d74be21f >> obj_load_attr /usr/lib/python3/dist-packages/nova/objects/instance.py:1099 >> 2023-03-01 14:24:02.673 504488 DEBUG nova.virt.libvirt.driver >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Created local disks _create_image >> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4768 >> 2023-03-01 14:24:02.674 504488 DEBUG nova.virt.libvirt.driver >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Ensure instance console log exists: >> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/console.log >> _ensure_console_log_for_instance >> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4531 >> 2023-03-01 14:24:02.674 504488 DEBUG oslo_concurrency.lockutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources" >> acquired by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" :: >> waited 0.000s inner >> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 >> 2023-03-01 14:24:02.675 504488 DEBUG oslo_concurrency.lockutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources" >> "released" by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" :: >> held 0.000s inner >> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Instance failed network >> setup after 1 attempt(s): nova.exception.PortBindingFailed: Binding failed >> for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs >> for more information. >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager Traceback (most >> recent call last): >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in >> _allocate_network_async >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager nwinfo = >> self.network_api.allocate_for_instance( >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in >> allocate_for_instance >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> created_port_ids = self._update_ports_for_instance( >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in >> _update_ports_for_instance >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> vif.destroy() >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in >> __exit__ >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> self.force_reraise() >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in >> force_reraise >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager raise >> self.value >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in >> _update_ports_for_instance >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> updated_port = self._update_port( >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in >> _update_port >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> _ensure_no_port_binding_failure(port) >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in >> _ensure_no_port_binding_failure >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager raise >> exception.PortBindingFailed(port_id=port['id']) >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> nova.exception.PortBindingFailed: Binding failed for port >> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more >> information. >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> nova.exception.PortBindingFailed: Binding failed for port >> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more >> information. >> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance failed to spawn: >> nova.exception.PortBindingFailed: Binding failed for port >> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more >> information. >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Traceback (most recent call last): >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2743, in >> _build_resources >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] yield resources >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2503, in >> _build_and_run_instance >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] self.driver.spawn(context, >> instance, image_meta, >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4329, in >> spawn >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] xml = >> self._get_guest_xml(context, instance, network_info, >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7288, in >> _get_guest_xml >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] network_info_str = >> str(network_info) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/model.py", line 620, in __str__ >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] return self._sync_wrapper(fn, >> *args, **kwargs) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/model.py", line 603, in >> _sync_wrapper >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] self.wait() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/model.py", line 635, in wait >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] self[:] = self._gt.wait() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 181, in wait >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] return self._exit_event.wait() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] result = hub.switch() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] return self.greenlet.switch() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 221, in main >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] result = function(*args, **kwargs) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] return func(*args, **kwargs) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in >> _allocate_network_async >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] raise e >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in >> _allocate_network_async >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] result = function(*args, **kwargs) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] return func(*args, **kwargs) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in >> _allocate_network_async >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] raise e >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in >> _allocate_network_async >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] nwinfo = >> self.network_api.allocate_for_instance( >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in >> allocate_for_instance >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] created_port_ids = >> self._update_ports_for_instance( >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in >> _update_ports_for_instance >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] vif.destroy() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in >> __exit__ >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] self.force_reraise() >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in >> force_reraise >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] raise self.value >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in >> _update_ports_for_instance >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] updated_port = self._update_port( >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in >> _update_port >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] >> _ensure_no_port_binding_failure(port) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] File >> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in >> _ensure_no_port_binding_failure >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] raise >> exception.PortBindingFailed(port_id=port['id']) >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] nova.exception.PortBindingFailed: >> Binding failed for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check >> neutron logs for more information. >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] >> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance >> a073-e0f1c61fe178, please check neutron logs for more information. >> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] >> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance >> 2023-03-01 14:24:03.349 504488 DEBUG oslo_concurrency.lockutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Acquired lock >> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock >> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:294 >> 2023-03-01 14:24:03.350 504488 DEBUG nova.network.neutron >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Building network info cache for >> instance _get_instance_nw_info >> /usr/lib/python3/dist-packages/nova/network/neutron.py:2014 >> 2023-03-01 14:24:03.431 504488 DEBUG nova.network.neutron >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance cache missing network info. >> _get_preexisting_port_ids >> /usr/lib/python3/dist-packages/nova/network/neutron.py:3327 >> 2023-03-01 14:24:03.624 504488 DEBUG nova.network.neutron >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Updating instance_info_cache with >> network_info: [] update_instance_cache_with_nw_info >> /usr/lib/python3/dist-packages/nova/network/neutron.py:117 >> 2023-03-01 14:24:03.638 504488 DEBUG oslo_concurrency.lockutils >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] Releasing lock >> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock >> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:312 >> 2023-03-01 14:24:03.639 504488 DEBUG nova.compute.manager >> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78 >> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance: >> a2603eeb-8db0-489b-ba40-dff1d74be21f] Start destroying the instance on the >> hypervisor. _shutdown_instance >> /usr/lib/python3/dist-packages/nova/compute/manager.py:2999 >> 2023-03-01 14:24:03.648 504488 DEBUG nova.virt.libvirt.driver [-] >> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] During wait destroy, >> instance disappeared. _wait_for_destroy >> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:1483 >> 2023-03-01 14:24:03.648 504488 INFO nova.virt.libvirt.driver [-] >> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance destroyed >> successfully. >> ``` >> > > ---- > Simon Jones > > > Sean Mooney ?2023?3?1??? 01:18??? > >> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote: >> > Hi all, >> > >> > I'm working on openstack Yoga's PCI passthrough feature, follow this >> link: >> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html >> > >> > I configure exactly as the link said, but when I create server use this >> > command, I found ERROR: >> > ``` >> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \ >> > --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \ >> > --security-group default --key-name mykey provider-instance >> > >> > >> > > fault | {'code': 500, 'created': >> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are >> not >> > enough hosts available.', 'details': 'Traceback (most recent call >> last):\n >> > File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line >> > 1548, in schedule_and_build_instances\n host_lists = >> > self._schedule_instances(context, request_specs[0],\n File >> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in >> > _schedule_instances\n host_lists = >> > self.query_client.select_destinations(\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line >> 41, >> > in select_destinations\n return >> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in >> > select_destinations\n return cctxt.call(ctxt, >> \'select_destinations\', >> > **msg_args)\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line >> 189, in >> > call\n result = self.transport._send(\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, >> in >> > _send\n return self._driver.send(target, ctxt, message,\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 689, in send\n return self._send(target, ctxt, message, >> > wait_for_reply, timeout,\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", >> > line 681, in _send\n raise >> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was >> found. >> > There are not enough hosts available.\nTraceback (most recent call >> > last):\n\n File >> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line >> 241, in >> > inner\n return func(*args, **kwargs)\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in >> > select_destinations\n selections = self._select_destinations(\n\n >> File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in >> > _select_destinations\n selections = self._schedule(\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in >> > _schedule\n self._ensure_sufficient_hosts(\n\n File >> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in >> > _ensure_sufficient_hosts\n raise >> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No >> > valid host was found. There are not enough hosts available.\n\n'} | >> > >> > // this is what I configured:NovaInstance >> > >> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1 >> > +----------------------------+------------------------------+ >> > > Field | Value | >> > +----------------------------+------------------------------+ >> > > OS-FLV-DISABLED:disabled | False | >> > > OS-FLV-EXT-DATA:ephemeral | 0 | >> > > access_project_ids | None | >> > > description | None | >> > > disk | 1 | >> > > id | 0 | >> > > name | cirros-os-dpu-test-1 | >> > > os-flavor-access:is_public | True | >> > > properties | pci_passthrough:alias='a1:1' | >> > > ram | 64 | >> > > rxtx_factor | 1.0 | >> > > swap | | >> > > vcpus | 1 | >> > +----------------------------+------------------------------+ >> > >> > // in controller node /etc/nova/nova.conf >> > >> > [filter_scheduler] >> > enabled_filters = PciPassthroughFilter >> > available_filters = nova.scheduler.filters.all_filters >> > >> > [pci] >> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >> > "physical_network": null, "remote_managed": "true"} >> > alias = { "vendor_id":"15b3", "product_id":"101e", >> "device_type":"type-VF", >> > "name":"a1" } >> > >> > // in compute node /etc/nova/nova.conf >> > >> > [pci] >> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >> > "physical_network": null, "remote_managed": "true"} >> > alias = { "vendor_id":"15b3", "product_id":"101e", >> "device_type":"type-VF", >> > "name":"a1" } >> >> "remote_managed": "true" is only valid for neutron sriov port >> not flavor based pci passhtough. >> >> so you need to use vnci_type=driect asusming you are trying to use >> >> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html >> >> which is not the same as generic pci passthough. >> >> if you just want to use geneic pci passthive via a flavor remove >> "remote_managed": "true" >> >> > >> > ``` >> > >> > The detail ERROR I found is: >> > - The reason why "There are not enough hosts available" is, >> > nova-scheduler's log shows "There are 0 hosts available but 1 instances >> > requested to build", which means no hosts support PCI passthough >> feature. >> > >> > This is nova-schduler's log >> > ``` >> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule >> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60'] >> select_destinations >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141 >> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] >> compute_status_filter >> > request filter added forbidden trait COMPUTE_STATUS_DISABLED >> > compute_status_filter >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254 >> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'compute_status_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'accelerators_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter >> > 'remote_managed_ports_filter' took 0.0 seconds wrapper >> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46 >> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock >> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by >> > >> "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" >> > :: waited 0.000s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 >> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock >> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by >> > >> "nova.context.set_target_cell..get_or_set_cached_cell_and_set_connections" >> > :: held 0.003s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 >> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode >> set >> > to >> > >> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION >> > _check_effective_sql_mode >> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314 >> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not >> found >> > for host c1c2. Not tracking instance info for this host. >> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', >> 'c1c2')" >> > acquired by >> > "nova.scheduler.host_manager.HostState.update.._locked_update" >> :: >> > waited 0.000s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386 >> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> from >> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch": >> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", >> "topology": >> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features": >> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt", >> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand", >> "adx", >> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc", >> "xsavec", >> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", >> "avx2", >> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64", >> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov", >> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36", >> "clflushopt", >> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de", >> "invtsc", >> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse", >> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma", >> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid", >> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni", >> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe", >> "est", >> > >> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{" >> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{" >> > nova_object.name": "NUMACell", "nova_object.namespace": "nova", >> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": >> [0, >> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, >> 8, >> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0, >> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, >> 5], >> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0, >> "reserved": >> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": >> ["size_kb", >> > "used", "reserved", "total"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, >> "reserved": >> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}], >> > "network_metadata": {"nova_object.name": "NetworkMetadata", >> > "nova_object.namespace": "nova", "nova_object.version": "1.0", >> > "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id", >> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings", >> > "mempages", "memory"]}]}, "nova_object.changes": >> > >> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0) >> > _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > aggregates: [] _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a', >> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute', >> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None, >> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49, >> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61, >> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40, >> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2, >> 28, >> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': >> > False} _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173 >> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state >> with >> > instances: [] _locked_update >> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176 >> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2', >> 'c1c2')" >> > "released" by >> > "nova.scheduler.host_manager.HostState.update.._locked_update" >> :: >> > held 0.004s inner >> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400 >> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1 >> host(s) >> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ---- >> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543 >> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ---- >> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545 >> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546 >> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI >> devices >> > left to satisfy request _filter_pools >> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556 >> > 2023-02-28 06:11:58.527 1942637 DEBUG >> > nova.scheduler.filters.pci_passthrough_filter >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram: >> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required >> PCI >> > devices >> > (InstancePCIRequests(instance_uuid=,requests=[InstancePCIRequest])) >> > host_passes >> > >> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52 >> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter >> > PciPassthroughFilter returned 0 hosts >> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed >> all >> > hosts for the request with instance ID >> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: >> > [('PciPassthroughFilter', None)] get_filtered_objects >> > /usr/lib/python3/dist-packages/nova/filters.py:114 >> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed >> all >> > hosts for the request with instance ID >> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results: >> > ['PciPassthroughFilter: (start: 1, end: 0)'] >> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered [] >> > _get_sorted_hosts >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610 >> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager >> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc >> ff627ad39ed94479b9c5033bc462cf78 >> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts >> > available but 1 instances requested to build. _ensure_sufficient_hosts >> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450 >> > ``` >> > >> > Then I search database, I found PCI configure of compute node is not >> upload: >> > ``` >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE >> > No inventory of class PCI_DEVICE for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE >> > No inventory of class PCI_DEVICE for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource class show PCI_DEVICE >> > +-------+------------+ >> > > Field | Value | >> > +-------+------------+ >> > > name | PCI_DEVICE | >> > +-------+------------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 1.5 | >> > > min_unit | 1 | >> > > max_unit | 31890 | >> > > reserved | 512 | >> > > step_size | 1 | >> > > total | 31890 | >> > > used | 0 | >> > +------------------+-------+ >> > ?? 31890 ????compute node resource tracker???????? >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU >> > ?^Cgyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 16.0 | >> > > min_unit | 1 | >> > > max_unit | 12 | >> > > reserved | 0 | >> > > step_size | 1 | >> > > total | 12 | >> > > used | 0 | >> > +------------------+-------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF >> > No inventory of class SRIOV_NET_VF for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB >> > +------------------+-------+ >> > > Field | Value | >> > +------------------+-------+ >> > > allocation_ratio | 1.0 | >> > > min_unit | 1 | >> > > max_unit | 456 | >> > > reserved | 0 | >> > > step_size | 1 | >> > > total | 456 | >> > > used | 0 | >> > +------------------+-------+ >> > gyw at c1:~$ openstack resource provider inventory show >> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS >> > No inventory of class IPV4_ADDRESS for >> c360cc82-f0fd-4662-bccd-e1f02b27af51 >> > (HTTP 404) >> > >> > MariaDB [nova]> select * from compute_nodes; >> > >> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ >> > > created_at | updated_at | deleted_at | id | >> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used >> | >> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > | disk_available_least | free_ram_mb | free_disk_gb | >> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip >> > | supported_instances >> > >> > >> > >> > >> > >> > >> > >> > >> > | pci_stats >> > >> > >> > > metrics | extra_resources | stats | numa_topology >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > | host | ram_allocation_ratio | >> cpu_allocation_ratio >> > > uuid | disk_allocation_ratio | mapped >> | >> > >> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+ >> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 | 1 | >> > NULL | 4 | 3931 | 60 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov", >> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall", >> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku", >> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni", >> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms", >> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma", >> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f", >> "pclmuldq", >> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl", >> "clflushopt", >> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8", >> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl", >> "lahf_lm", >> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc", >> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", >> "vme", >> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush", >> > "rdrand"]} | 49 | 3419 | 60 | >> > 0 | 0 | gyw | 1 | 192.168.2.99 | >> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", >> > "hvm"], ["x86_64", "kvm", "hvm"]] >> > >> > >> > >> > >> > >> > >> > >> > | {"nova_object.name": >> > "PciDevicePoolList", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, >> 2, >> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": >> [], >> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "reserved", "size_kb", "total"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "reserved", "size_kb", "total"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null}, >> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages", >> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket", >> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw >> > | 1.5 | 16 | >> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b | 1 | 1 | >> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 | 2 | >> > NULL | 4 | 3931 | 60 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq", >> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic", >> "abm", >> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", >> "msr", >> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3", >> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx", >> "fpu", >> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves", >> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms", >> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx", >> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt", >> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb", >> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", >> "mmx", >> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities", >> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs", >> > "sep", "md-clear"]} | 49 | 3419 | 60 >> | >> > 0 | 0 | c1c1 | 2 | >> 192.168.2.99 >> > | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", >> > "hvm"], ["x86_64", "kvm", "hvm"]] >> > >> > >> > >> > >> > >> > >> > >> > | {"nova_object.name": >> > "PciDevicePoolList", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1, >> 2, >> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": >> [], >> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "total", "size_kb", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", >> > "total", "size_kb", "reserved"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, >> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings", >> > "id", "mempages", "pinned_cpus", "memory", "pcpuset", >> "network_metadata", >> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1 | >> > 1.5 | 16 | >> 1eac1c8d-d96a-4eeb-9868-5a341a80c6df | >> > 1 | 0 | >> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 | 3 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni", >> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear", >> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no", >> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg", >> "smap", >> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16", >> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip", >> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq", >> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd", >> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae", >> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", >> "3dnowprefetch", >> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm", >> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx", >> "cx8", >> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2", >> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni", >> > "vaes", "aes"]} | 416 | 31378 | 456 | >> > 0 | 0 | c-MS-7D42 | 3 | >> 192.168.2.99 | >> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", >> "qemu", >> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", >> "kvm", >> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", >> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", >> "hvm"], >> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", >> "qemu", >> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", >> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], >> ["sh4eb", >> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", >> "kvm", >> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total", >> > "reserved", "used", "size_kb"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total", >> > "reserved", "used", "size_kb"]}], "network_metadata": {" >> nova_object.name": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id", >> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings", >> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 | >> > 1.5 | 16 | >> f115a1c2-fda3-42c6-945a-8b54fef40daf >> > > 1 | 0 | >> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 | 4 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no", >> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", >> "movdir64b", >> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16", >> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt", >> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2", >> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", >> "fsgsbase", >> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp", >> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx", >> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2", >> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", >> "pat", >> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all", >> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx", >> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", >> "ds_cpl", >> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni", >> "rdtscp"]} | >> > 416 | 31378 | 456 | 0 | >> > 0 | c-MS-7D42 | 4 | 192.168.28.21 | [["alpha", >> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], >> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", >> "hvm"], >> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", >> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], >> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", >> "qemu", >> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", >> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], >> ["sh4eb", >> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"], >> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", >> "kvm", >> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {" >> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"objects": []}, >> > "nova_object.changes": ["objects"]} | [] | NULL | >> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.2", >> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", >> > "nova_object.namespace": "nova", "nova_object.version": "1.5", >> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, >> 10, >> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890, >> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0, >> 1], >> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{" >> nova_object.name": >> > "NUMAPagesTopology", "nova_object.namespace": "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, >> "total": >> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", >> > "total", "used", "reserved"]}, {"nova_object.name": >> "NUMAPagesTopology", >> > "nova_object.namespace": "nova", "nova_object.version": "1.1", >> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": >> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {" >> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": >> "nova", >> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, >> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": >> ["size_kb", >> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name >> ": >> > "NetworkMetadata", "nova_object.namespace": "nova", >> "nova_object.version": >> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, >> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, >> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket", >> > "pcpuset", "memory", "memory_usage", "id", "network_metadata", >> "cpu_usage", >> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2 | >> > 1.5 | 16 | >> 10ea8254-ad84-4db9-9acd-5c783cb8600e >> > > 1 | 0 | >> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 | 5 | >> > NULL | 12 | 31890 | 456 | 0 | 512 | >> > 0 | QEMU | 4002001 | {"arch": "x86_64", >> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": >> {"cells": >> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht", >> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall", >> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe", >> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx", >> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov", >> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed", >> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni", >> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt", >> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all", >> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme", >> "cx16", >> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline", >> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3", >> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq", >> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb", >> > "spec-ctrl", "waitpkg"]} | 416 | 31378 | >> > 456 | 0 | 0 | c-MS-7D42 | 5 | >> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], >> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", >> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", >> "qemu", >> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], >> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "h > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Wed Mar 1 10:12:11 2023 From: batmanustc at gmail.com (Simon Jones) Date: Wed, 1 Mar 2023 18:12:11 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: Thanks a lot !!! As you say, I follow https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. And I want to use DPU mode. Not "disable DPU mode". So I think I should follow the link above exactlly, so I use vnic-type=remote_anaged. In my opnion, after I run first three command (which is "openstack network create ...", "openstack subnet create", "openstack port create ..."), the VF rep port and OVN and OVS rules are all ready. What I should do in "openstack server create ..." is to JUST add PCI device into VM, do NOT call neutron-server in nova-compute of compute node ( like call port_binding or something). But as the log and steps said in the emails above, nova-compute call port_binding to neutron-server while running the command "openstack server create ...". So I still have questions is: 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT call neutron-server in nova-compute of compute node ( like call port_binding or something)" . 2) If it's right, how to deal with this? Which is how to JUST add PCI device into VM, do NOT call neutron-server? By command or by configure? Is there come document ? ---- Simon Jones Sean Mooney ?2023?3?1??? 16:15??? > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > > BTW, this link ( > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) > said > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? > > no its not wrong but for dpu smart nics you have to make a choice when you > deploy > either they can be used in dpu mode in which case remote_managed shoudl be > set to true > and you can only use them via neutron ports with vnic-type=remote_managed > as descried in that doc > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port > > > or if you disable dpu mode in the nic frimware then you shoudl remvoe > remote_managed form the pci device list and > then it can be used liek a normal vf either for neutron sriov ports > vnic-type=direct or via flavor based pci passthough. > > the issue you were havign is you configured the pci device list to contain > "remote_managed: ture" which means > the vf can only be consumed by a neutron port with > vnic-type=remote_managed, when you have "remote_managed: false" or unset > you can use it via vnic-type=direct i forgot that slight detail that > vnic-type=remote_managed is required for "remote_managed: ture". > > > in either case you foudn the correct doc > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > neutorn sriov port configuration is documented here > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > and nova flavor based pci passthough is documeted here > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > all three server slightly differnt uses. both neutron proceedures are > exclusivly fo network interfaces. > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > requires the use of ovn deployed on the dpu > to configure the VF contolplane. > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses > the sriov nic agent > to manage the VF with ip tools. > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is > intended for pci passthough > of stateless acclerorators like qat devices. while the nova flavor approch > cna be used with nics it not how its generally > ment to be used and when used to passthough a nic expectation is that its > not related to a neuton network. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed Mar 1 17:02:59 2023 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 1 Mar 2023 12:02:59 -0500 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> Message-ID: <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> On 2/28/23 9:02 PM, Ghanshyam Mann wrote: [snip] > I think removing from client is good way to stop exposing this old/not-recommended way to users > but API is separate things and removing the API request parameter 'multiattach' from it can break > the existing users using it this way. Tempest test is one good example of such users use case. To maintain > the backward compatibility/interoperability it should be removed by bumping the microversion so that > it continue working for older microversions. This way we will not break the existing users and will > provide the new way for users to start using. It's not just that this is not recommended, it can lead to data loss. We should only allow multiattach for volume types that actually support it. So I see this as a case of "I broke your script now, but you'll thank me later". We could microversion this, but then an end user has to go out of the way and add the correct mv to their request to get the correct behavior. Someone using the default mv + multiattach=true will unknowingly put themselves into a data loss situation. I think it's better to break that person's API request. cheers, brian From gmann at ghanshyammann.com Wed Mar 1 17:19:22 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 01 Mar 2023 09:19:22 -0800 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> Message-ID: <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com> ---- On Wed, 01 Mar 2023 09:02:59 -0800 Brian Rosmaita wrote --- > On 2/28/23 9:02 PM, Ghanshyam Mann wrote: > [snip] > > > I think removing from client is good way to stop exposing this old/not-recommended way to users > > but API is separate things and removing the API request parameter 'multiattach' from it can break > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain > > the backward compatibility/interoperability it should be removed by bumping the microversion so that > > it continue working for older microversions. This way we will not break the existing users and will > > provide the new way for users to start using. > > It's not just that this is not recommended, it can lead to data loss. > We should only allow multiattach for volume types that actually support > it. So I see this as a case of "I broke your script now, but you'll > thank me later". > > We could microversion this, but then an end user has to go out of the > way and add the correct mv to their request to get the correct behavior. > Someone using the default mv + multiattach=true will unknowingly put > themselves into a data loss situation. I think it's better to break > that person's API request. Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes) then I think changing it without microversion bump makes sense. But if we know there is any success case for xyz configuration/backend then I feel we should not break such success use case. I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC, the test does not check the data things so we do not completely test it in Tempest. -gmann > > > cheers, > brian > > > From fungi at yuggoth.org Wed Mar 1 20:01:59 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 1 Mar 2023 20:01:59 +0000 Subject: [security-sig] Polls in preparation to revive our meetings In-Reply-To: <20230126183907.tiamhukqq6ixpp43@yuggoth.org> References: <20230126183907.tiamhukqq6ixpp43@yuggoth.org> Message-ID: <20230301200158.qjnoobr25sjmrpp2@yuggoth.org> On 2023-01-26 18:40:08 +0000 (+0000), Jeremy Stanley wrote: > As discussed at the last PTG, the present meeting time (15:00 UTC on > the first Thursday of each month) is inconvenient for some > attendees, and that combined with year-end holidays and general busy > weeks recently have led to skipping them entirely. In order to start > narrowing down the potential meeting schedule, I have two initial > polls. > > The first is to determine what frequency we should meet. [...] > The second poll is to hopefully determine what day of the week is > optimal for potential attendees. [...] The results are in: we had three respondents weigh in on meeting frequency, with a tie between switching to every two weeks or weekly instead of monthly. I'll cast the tie-breaking vote for this and say we'll try for weekly meetings initially. As for day of the week, that poll had only one respondent, expressing a preference for Wednesdays. For the next step, we need to decide what time on Wednesdays to meet. In order to determine that, I've started another poll here: https://framadate.org/soF558fXvxrrXyt7 Please indicate your availability/preference by Wednesday, March 15 if you're interested in meeting, and that will give me a week to analyze the results and publish the new time. I've set the initial revised meeting date to Wednesday, March 22, which is the week prior to the virtual PTG, so a good opportunity for us to plan a little bit for how we can best utilize our upcoming PTG slot. In the meantime, let's plan to skip the normal March meeting tomorrow (Thursday) at the old time. Thanks to everyone who's participated in these polls so far! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Mar 1 20:53:45 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 1 Mar 2023 20:53:45 +0000 Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu to OpenStack community Message-ID: <20230301205344.oypz2ceadu73vqqz@yuggoth.org> The OpenStack Ironic project relies on VirtualPDU, which is no longer actively developed. Ironic contributors reached out to the VirtualPDU maintainers with an offer to officially assume maintenance responsibilities in order to avoid forking it, and received a response from one (Mathieu Mitchell) who indicated support for that plan. Unfortunately, communication died off shortly thereafter, and the Ironic team has been unable to raise any of the current maintainers to actually add their access to the x/virtualpdu project in OpenDev's Gerrit. We don't have a documented process since this is the first time it's really come up, but I'm officially announcing that I intend to use my administrative permissions as an OpenDev sysadmin to add membership of an OpenStack Ironic team representative to the following Gerrit groups if no objections are raised before Wednesday, March 8: * virtualpdu-core * virtualpdu-release This will effectively grant full control of the x/virtualpdu repository to OpenStack Ironic contributors. Please follow up to the service-discuss at lists.opendev.org or openstack-discuss at lists.openstack.org mailing list with any concerns as soon as possible, or you can feel free to reach out to me directly by email or in IRC (fungi in the #opendev channel on the OFTC network). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Mar 1 22:59:09 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 1 Mar 2023 22:59:09 +0000 Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu to OpenStack community In-Reply-To: <20230301205344.oypz2ceadu73vqqz@yuggoth.org> References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org> Message-ID: <20230301225909.enp5xetacmzpjg7o@yuggoth.org> On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote: [...] > We don't have a documented process since this is the first time it's > really come up, but I'm officially announcing that I intend to use > my administrative permissions as an OpenDev sysadmin to add > membership of an OpenStack Ironic team representative to the > following Gerrit groups if no objections are raised before > Wednesday, March 8 [...] It seems the additional round of outreach worked, so no longer requires my direct intervention: https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From alsotoes at gmail.com Thu Mar 2 00:52:13 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Wed, 1 Mar 2023 18:52:13 -0600 Subject: (OpenStack-Upgrade) In-Reply-To: References: Message-ID: Maybe you can get your way around this procedure: https://github.com/openstack/ansible-role-openstack-operations/blob/master/README-backup-ops.md Also, you can tar.gz the containers or take snapshots so you can rollback. Cheers! On Wed, Mar 1, 2023 at 6:47?AM Dmitriy Rabotyagov wrote: > Hey, > > Regarding rollaback of upgrade in OSA we indeed don't have any good > established/documented process for that. At the same time it should be > completely possible with some "BUT". It also depends on what exactly > you want to rollback - roles, openstack services or both. As OSA roles > can actually install any openstack service version. > > We keep all virtualenvs from the previous version, so during upgrade > we build just new virtualenvs and reconfigure systemd units to point > there. So fastest way likely would be to just edit systemd unit files > and point them to old venv version and reload systemd daemon and > service and restore DB from backup of course. > You can also define _venv_tag (ie `glance_venv_tag`) to the > old OSA version you was running and execute openstack-ansible > os--install.yml --tags systemd-service,uwsgi - that in most > cases will be enough to just edit systemd units for the service and > start old version of it. BUT running without tags will result in > having new packages in old venv which is smth you totally want to > avoid. > To prevent that you can also define _git_install_branch and > requirements_git_install_branch in /etc/openstack_deploy/group_vars > (it's important to use group vars if you want to rollback only one > service) and take value from > > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml > (ofc pick your old version!) > > For a full rollback and not in-place workarounds, I think it should be > like that > * checkout to previous osa version > * re-execute scripts/bootstrap-ansible.sh > * you should still take current versions of mariadb and rabbitmq and > define them in user_variables (galera_major_version, > galera_minor_version, rabbitmq_package_version, > rabbitmq_erlang_version_spec) - it's close to never ends well > downgrading these. > * Restore DB backup > * Re-run setup-openstack.yml > > It's quite a rough summary of how I do see this process, but to be > frank I never had to execute full downgrade - I was limited mostly by > downgrading 1 service tops after the upgrade. > > Hope that helps! > > ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > > > > hi Alvaro, > > > > i have installed using Openstack-ansible, The upgrade procedure is > consistent > > > > but what is the roll back procedure , i m looking for > > > > Regards > > Adivya Singh > > > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: > >> > >> That will depend on how did you installed your environment: OSA, > TripleO, etc. > >> > >> Can you provide more information? > >> > >> --- > >> Alvaro Soto. > >> > >> Note: My work hours may not be your work hours. Please do not feel the > need to respond during a time that is not convenient for you. > >> ---------------------------------------------------------- > >> Great people talk about ideas, > >> ordinary people talk about things, > >> small people talk... about other people. > >> > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh > wrote: > >>> > >>> Hi Team, > >>> > >>> I am planning to upgrade my Current Environment, The Upgrade procedure > is available in OpenStack Site and Forums. > >>> > >>> But i am looking fwd to roll back Plan , Other then have a Local > backup copy of galera Database > >>> > >>> Regards > >>> Adivya Singh > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 2 02:55:47 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 2 Mar 2023 11:55:47 +0900 Subject: [tc][heat][tacker] Moving governance of tosca-parser(and heat-translator ?) to Tacker In-Reply-To: References: <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com> <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com> Message-ID: Thanks. So based on the agreement in this thread I've pushed the change to the governance repository to migrate tosca-parser and heat-translator to Tacker's governance. https://review.opendev.org/c/openstack/governance/+/876012 I'll keep heat-core group in heat-translator-core group for now, but we can revisit this in the future. On Wed, Mar 1, 2023 at 6:41?PM Yasufumi Ogawa wrote: > On 2023/02/28 3:49, Ghanshyam Mann wrote: > > ---- On Sun, 26 Feb 2023 19:54:45 -0800 Takashi Kajinami wrote --- > > > > > > > > > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com> > wrote: > > > Hi, > > > > > > On 2023/02/27 10:51, Takashi Kajinami wrote: > > > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann > gmann at ghanshyammann.com> > > > > wrote: > > > > > > > >> ---- On Sun, 19 Feb 2023 18:44:14 -0800 Takashi Kajinami > wrote --- > > > >> > Hello, > > > >> > > > > >> > Currently tosca-parser is part of heat's governance, but the > core > > > >> reviewers of this repositorydoes not contain any active heat > cores while we > > > >> see multiple Tacker cores in this group.Considering the fact the > project is > > > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate > this > > > >> repository to Tacker's governance. Most of the current heat cores > are not > > > >> quitefamiliar with the codes in this repository, and if Tacker > team is not > > > >> interested in maintainingthis repository then I'd propose > retiring this. > > > As you mentioned, tacker still using tosca-parser and > heat-translator. > > > > > > >> > > > >> I think it makes sense and I remember its usage/maintenance by > the Tacker > > > >> team since starting. > > > >> But let's wait for the Tacker team opinion and accordingly you > can propose > > > >> the governance patch. > > > Although I've not joined to tacker team since starting, it might not > be > > > true because there was no cores of tosca-parser and heat-translator > in > > > tacker team. We've started to help maintenance the projects because > no > > > other active contributer. > > > > > > >> > > > >> > > > > >> > Similarly, we have heat-translator project which has both > heat cores > > > >> and tacker cores as itscore reviewers. IIUC this is tightly > related to the > > > >> work in tosca-parser, I'm wondering it makesmore sense to move > this project > > > >> to Tacker, because the requirement is mostly made fromTacker side > rather > > > >> than Heat side. > > > >> > > > >> I am not sure about this and from the name, it seems like more of > a heat > > > >> thing but it is not got beyond the Tosca template > > > >> conversion. Are there no users of it outside of the Tacker > service? or any > > > >> request to support more template conversions than > > > >> Tosca? > > > >> > > > > > > > > Current hea-translator supports only the TOSCA template[1]. > > > > The heat-translator project can be a generic template converter by > its > > > > nature but we haven't seen any interest > > > > in implementing support for different template formats. > > > > > > > > [1] > > > > > https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49 > > > > > > > > > > > > > > > >> If no other user or use case then I think one option can be to > merge it > > > >> into Tosca-parser itself and retire heat-translator. > > > >> > > > >> Opinion? > > > Hmm, as a core of tosca-parser, I'm not sure it's a good idea > because it > > > is just a parser TOSCA and independent from heat-translator. In > > > addition, there is no experts of Heat or HOT in current tacker team > > > actually, so it might be difficult to maintain heat-translator > without > > > any help from heat team. > > > > > > The hea-translator project was initially created to implement a > translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2] > but we have never increased scope of tosca-parser. So it has beenno more > than the TOSCA template translator. > > > > > > [1] > https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2] > > https://review.opendev.org/c/openstack/project-config/+/211204 > > > We (Heat team) can provide help with any problems with heat, but we > own no actual use case of template translation.Maintaining the > heat-translator repository with tacker, which currently provides actual use > cases would make more sense.This also gives the benefit that Tacker team > can decide when stable branches of heat-translator should be retiredalong > with the other Tacker repos. > > > > > > By the way, may I ask what will be happened if the governance is > move on > > > to tacker? Is there any extra tasks for maintenance? > > > > > > TC would have better (and more precise) explanation but my > understanding is that - creating a release > > > - maintaining stable branches > > > - maintaining gate healthwould be the required tasks along with > moderating dev discussion in mailing list/PTG/etc. > > > > I think you covered all and the Core team (Tacker members) might be > already doing a few of the tasks. From the > > governance perspective, tacker PTL will be the point of contact for this > repo in the case repo becomes inactive or so > > but it will be the project team's decision to merge/split things > whatever way makes maintenance easy. > I understand. I've shared the proposal again in the previous meeting and > no objection raised. So, we'd agree to move the governance as Tacker team. > > Thanks, > Yasufumi > > > > -gmann > > > > > > > Thanks, > > > Yasufumi > > > > > > >> > > > > > > > > That also sounds good to me. > > > > > > > > > > > >> Also, correcting the email subject tag as [tc]. > > > >> > > > >> -gmann > > > >> > > > >> > > > > >> > [1] > > > >> > https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2] > > > >> > https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members > > > >> > Thank you,Takashi > > > >> > > > > >> > > > >> > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Thu Mar 2 07:54:14 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 2 Mar 2023 13:24:14 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> Message-ID: Hi I don't see any major packet loss. It seems the problem is somewhere in rabbitmq maybe but not due to packet loss. with regards, Swogat Pradhan On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan wrote: > Hi, > Yes the MTU is the same as the default '1500'. > Generally I haven't seen any packet loss, but never checked when launching > the instance. > I will check that and come back. > But everytime i launch an instance the instance gets stuck at spawning > state and there the hypervisor becomes down, so not sure if packet loss > causes this. > > With regards, > Swogat pradhan > > On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: > >> One more thing coming to mind is MTU size. Are they identical between >> central and edge site? Do you see packet loss through the tunnel? >> >> Zitat von Swogat Pradhan : >> >> > Hi Eugen, >> > Request you to please add my email either on 'to' or 'cc' as i am not >> > getting email's from you. >> > Coming to the issue: >> > >> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p / >> > Listing policies for vhost "/" ... >> > vhost name pattern apply-to definition priority >> > / ha-all ^(?!amq\.).* queues >> > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} >> 0 >> > >> > I have the edge site compute nodes up, it only goes down when i am >> trying >> > to launch an instance and the instance comes to a spawning state and >> then >> > gets stuck. >> > >> > I have a tunnel setup between the central and the edge sites. >> > >> > With regards, >> > Swogat Pradhan >> > >> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> > wrote: >> > >> >> Hi Eugen, >> >> For some reason i am not getting your email to me directly, i am >> checking >> >> the email digest and there i am able to find your reply. >> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >> >> Yes, these logs are from the time when the issue occurred. >> >> >> >> *Note: i am able to create vm's and perform other activities in the >> >> central site, only facing this issue in the edge site.* >> >> >> >> With regards, >> >> Swogat Pradhan >> >> >> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >> wrote: >> >> >> >>> Hi Eugen, >> >>> Thanks for your response. >> >>> I have actually a 4 controller setup so here are the details: >> >>> >> >>> *PCS Status:* >> >>> * Container bundle set: rabbitmq-bundle [ >> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-no-ceph-3 >> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-2 >> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-1 >> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-0 >> >>> >> >>> I have tried restarting the bundle multiple times but the issue is >> still >> >>> present. >> >>> >> >>> *Cluster status:* >> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >> >>> Cluster status of node >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> >>> Basics >> >>> >> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >>> >> >>> Disk Nodes >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>> >> >>> Running Nodes >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>> >> >>> Versions >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >> RabbitMQ >> >>> 3.8.3 on Erlang 22.3.4.1 >> >>> >> >>> Alarms >> >>> >> >>> (none) >> >>> >> >>> Network Partitions >> >>> >> >>> (none) >> >>> >> >>> Listeners >> >>> >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> >>> interface: [::], port: 25672, protocol: clustering, purpose: >> inter-node and >> >>> CLI tool communication >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >> 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com, >> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >> >>> >> >>> Feature flags >> >>> >> >>> Flag: drop_unroutable_metric, state: enabled >> >>> Flag: empty_basic_get_metric, state: enabled >> >>> Flag: implicit_default_bindings, state: enabled >> >>> Flag: quorum_queue, state: enabled >> >>> Flag: virtual_host_metadata, state: enabled >> >>> >> >>> *Logs:* >> >>> *(Attached)* >> >>> >> >>> With regards, >> >>> Swogat Pradhan >> >>> >> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> Please find the nova conductor as well as nova api log. >> >>>> >> >>>> nova-conuctor: >> >>>> >> >>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 16152921c1eb45c2b1f562087140168b >> >>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >> >>>> 83dbe5f567a940b698acfe986f6194fa >> >>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >> >>>> f3bfd7f65bd542b18d84cea3033abb43: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due >> to a >> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due >> to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 897911a234a445d8a0d8af02ece40f6f: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due >> to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> b240e3e89d99489284cd731e75f2a5db >> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >> with >> >>>> backend dogpile.cache.null. >> >>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 8f723ceb10c3472db9a9f324861df2bb: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due >> to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >> >>>> With regards, >> >>>> Swogat Pradhan >> >>>> >> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >> >>>> swogatpradhan22 at gmail.com> wrote: >> >>>> >> >>>>> Hi, >> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >> >>>>> launch vm's. >> >>>>> When the VM is in spawning state the node goes down (openstack >> compute >> >>>>> service list), the node comes backup when i restart the nova compute >> >>>>> service but then the launch of the vm fails. >> >>>>> >> >>>>> nova-compute.log >> >>>>> >> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >> >>>>> instance usage >> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to >> >>>>> 2023-02-26 08:00:00. 0 instances. >> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >> >>>>> dcn01-hci-0.bdxworld.com >> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: >> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >> with >> >>>>> backend dogpile.cache.null. >> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >> >>>>> privsep helper: >> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >> 'privsep-helper', >> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >> privsep >> >>>>> daemon via rootwrap >> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> daemon starting >> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> process running with uid/gid: 0/0 >> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> process running with capabilities (eff/prm/inh): >> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> daemon running as pid 2647 >> >>>>> 2023-02-26 08:49:55.956 7 WARNING >> os_brick.initiator.connectors.nvmeof >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >> >>>>> execution error >> >>>>> in _get_host_uuid: Unexpected error while running command. >> >>>>> Command: blkid overlay -s UUID -o value >> >>>>> Exit code: 2 >> >>>>> Stdout: '' >> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >> >>>>> Unexpected error while running command. >> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >> >>>>> >> >>>>> Is there a way to solve this issue? >> >>>>> >> >>>>> >> >>>>> With regards, >> >>>>> >> >>>>> Swogat Pradhan >> >>>>> >> >>>> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 2 10:54:38 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 02 Mar 2023 10:54:38 +0000 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com> adding Dmitrii who was the primary developer of the openstack integration so they can provide more insight. Dmitrii did you ever give a presentationon the DPU support and how its configured/integrated that might help fill in the gaps for simon? more inline. On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote: > E... > > But there are these things: > > 1) Show some real happened in my test: > > - Let me clear that, I use DPU in compute node: > The graph in > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html . > > - I configure exactly follow > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html, > which is said bellow in "3) Let me post all what I do follow this link". > > - In my test, I found after first three command (which is "openstack > network create ...", "openstack subnet create", "openstack port create ..."), > there are network topology exist in DPU side, and there are rules exist in > OVN north DB, south DB of controller, like this: > > > ``` > > root at c1:~# ovn-nbctl show > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976 > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice) > > port 01a68701-0e6a-4c30-bfba-904d1b9813e1 > > addresses: ["unknown"] > > port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1) > > addresses: ["fa:16:3e:13:36:e2 172.1.1.228"] > > > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding > > _uuid : 61dc8bc0-ab33-4d67-ac13-0781f89c905a > > chassis : [] > > datapath : 91d3509c-d794-496a-ba11-3706ebf143c8 > > encap : [] > > external_ids : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24", > > "neutron:device_id"="", "neutron:device_owner"="", > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69, > > "neutron:port_name"=pf0vf1, > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9", > > "neutron:revision_number"="1", > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"} > > > > root at c1c2dpu:~# sudo ovs-vsctl show > > 62cf78e5-2c02-471e-927e-1d69c2c22195 > > Bridge br-int > > fail_mode: secure > > datapath_type: system > > Port br-int > > Interface br-int > > type: internal > > Port ovn--1 > > Interface ovn--1 > > type: geneve > > options: {csum="true", key=flow, remote_ip="172.168.2.98"} > > Port pf0vf1 > > Interface pf0vf1 > > ovs_version: "2.17.2-24a81c8" > > ``` > > > That's why I guess "first three command" has already create network > topology, and "openstack server create" command only need to plug VF into > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done. no that jsut looks like the standard bridge toplogy that gets created when you provision the dpu to be used with openstac vai ovn. that looks unrelated to the neuton comamnd you ran. > > - In my test, then I run "openstack server create" command, I got ERROR > which said "No valid host...", which is what the email said above. > The reason has already said, it's nova-scheduler's PCI filter module report > no valid host. The reason "nova-scheduler's PCI filter module report no > valid host" is nova-scheduler could NOT see PCI information of compute > node. The reason "nova-scheduler could NOT see PCI information of compute > node" is compute node's /etc/nova/nova.conf configure remote_managed tag > like this: > > > ``` > > [pci] > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > "physical_network": null, "remote_managed": "true"} > > alias = { "vendor_id":"15b3", "product_id":"101e", > > "device_type":"type-VF", "name":"a1" } > > ``` > > > > 2) Discuss some detail design of "remote_managed" tag, I don't know if this > is right in the design of openstack with DPU: > > - In neutron-server side, use remote_managed tag in "openstack port create > ..." command. > This command will make neutron-server / OVN / ovn-controller / ovs to make > the network topology done, like above said. > I this this is right, because test shows that. that is not correct your test do not show what you think it does, they show the baisic bridge toplogy and flow configuraiton that ovn installs by defualt when it manages as ovs. please read the design docs for this feature for both nova and neutron to understand how the interacction works. https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html > > - In nova side, there are 2 things should process, first is PCI passthrough > filter, second is nova-compute to plug VF into VM. > > If the link above is right, which remote_managed tag exists in > /etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf of > compute node. > As above ("- In my test, then I run "openstack server create" command") > said, got ERROR in this step. > So what should do in "PCI passthrough filter" ? How to configure ? > > Then, if "PCI passthrough filter" stage pass, what will do of nova-compute > in compute node? > > 3) Post all what I do follow this link: > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > - build openstack physical env, link plug DPU into compute mode, use VM as > controller ... etc. > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link. > - configure DPU side /etc/neutron/neutron.conf > - configure host side /etc/nova/nova.conf > - configure host side /etc/nova/nova-compute.conf > - run first 3 command > - last, run this command, got ERROR > > ---- > Simon Jones > > > Sean Mooney ?2023?3?1??? 18:35??? > > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: > > > Thanks a lot !!! > > > > > > As you say, I follow > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > > > And I want to use DPU mode. Not "disable DPU mode". > > > So I think I should follow the link above exactlly, so I use > > > vnic-type=remote_anaged. > > > In my opnion, after I run first three command (which is "openstack > > network > > > create ...", "openstack subnet create", "openstack port create ..."), the > > > VF rep port and OVN and OVS rules are all ready. > > not at that point nothign will have been done on ovn/ovs > > > > that will only happen after the port is bound to a vm and host. > > > > > What I should do in "openstack server create ..." is to JUST add PCI > > device > > > into VM, do NOT call neutron-server in nova-compute of compute node ( > > like > > > call port_binding or something). > > this is incorrect. > > > > > > But as the log and steps said in the emails above, nova-compute call > > > port_binding to neutron-server while running the command "openstack > > server > > > create ...". > > > > > > So I still have questions is: > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT > > call > > > neutron-server in nova-compute of compute node ( like call port_binding > > or > > > something)" . > > no this is not how its designed. > > until you attach the logical port to a vm (either at runtime or as part of > > vm create) > > the logical port is not assocated with any host or phsical dpu/vf. > > > > so its not possibel to instanciate the openflow rules in ovs form the > > logical switch model > > in the ovn north db as no chassie info has been populated and we do not > > have the dpu serial > > info in the port binding details. > > > 2) If it's right, how to deal with this? Which is how to JUST add PCI > > > device into VM, do NOT call neutron-server? By command or by configure? > > Is > > > there come document ? > > no this happens automaticaly when nova does the port binding which cannot > > happen until after > > teh vm is schduled to a host. > > > > > > ---- > > > Simon Jones > > > > > > > > > Sean Mooney ?2023?3?1??? 16:15??? > > > > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > > > > > BTW, this link ( > > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) > > > > said > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? > > > > > > > > no its not wrong but for dpu smart nics you have to make a choice when > > you > > > > deploy > > > > either they can be used in dpu mode in which case remote_managed > > shoudl be > > > > set to true > > > > and you can only use them via neutron ports with > > vnic-type=remote_managed > > > > as descried in that doc > > > > > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port > > > > > > > > > > > > or if you disable dpu mode in the nic frimware then you shoudl remvoe > > > > remote_managed form the pci device list and > > > > then it can be used liek a normal vf either for neutron sriov ports > > > > vnic-type=direct or via flavor based pci passthough. > > > > > > > > the issue you were havign is you configured the pci device list to > > contain > > > > "remote_managed: ture" which means > > > > the vf can only be consumed by a neutron port with > > > > vnic-type=remote_managed, when you have "remote_managed: false" or > > unset > > > > you can use it via vnic-type=direct i forgot that slight detail that > > > > vnic-type=remote_managed is required for "remote_managed: ture". > > > > > > > > > > > > in either case you foudn the correct doc > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > > neutorn sriov port configuration is documented here > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > > > > and nova flavor based pci passthough is documeted here > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > > > > > all three server slightly differnt uses. both neutron proceedures are > > > > exclusivly fo network interfaces. > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > > requires the use of ovn deployed on the dpu > > > > to configure the VF contolplane. > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses > > > > the sriov nic agent > > > > to manage the VF with ip tools. > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is > > > > intended for pci passthough > > > > of stateless acclerorators like qat devices. while the nova flavor > > approch > > > > cna be used with nics it not how its generally > > > > ment to be used and when used to passthough a nic expectation is that > > its > > > > not related to a neuton network. > > > > > > > > > > > > From elfosardo at gmail.com Thu Mar 2 11:26:41 2023 From: elfosardo at gmail.com (Riccardo Pittau) Date: Thu, 2 Mar 2023 12:26:41 +0100 Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu to OpenStack community In-Reply-To: <20230301225909.enp5xetacmzpjg7o@yuggoth.org> References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org> <20230301225909.enp5xetacmzpjg7o@yuggoth.org> Message-ID: Thanks for this anyway Jeremy! Luckily one of the old maintainers added me to the virtualpdu-core and virtualpdu-release groups, so now we can move forward with the repository update and move easily. Ciao! Riccardo On Thu, Mar 2, 2023 at 12:05?AM Jeremy Stanley wrote: > On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote: > [...] > > We don't have a documented process since this is the first time it's > > really come up, but I'm officially announcing that I intend to use > > my administrative permissions as an OpenDev sysadmin to add > > membership of an OpenStack Ironic team representative to the > > following Gerrit groups if no objections are raised before > > Wednesday, March 8 > [...] > > It seems the additional round of outreach worked, so no longer > requires my direct intervention: > > > https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/ > > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrii.shcherbakov at canonical.com Thu Mar 2 12:29:25 2023 From: dmitrii.shcherbakov at canonical.com (Dmitrii Shcherbakov) Date: Thu, 2 Mar 2023 15:29:25 +0300 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com> References: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com> Message-ID: Hi {Sean, Simon}, > did you ever give a presentation on the DPU support Yes, there were a couple at different stages. The following is the one of the older ones that references the SMARTNIC VNIC type but we later switched to REMOTE_MANAGED in the final code: https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf, however, it has a useful diagram on page 15 which shows the interactions of different components. A lot of other content from it is present in the OpenStack docs now which we added during the feature development. There is also a presentation with a demo that we did at the Open Infra summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared the material after the features got merged). Generally, as Sean described, the aim of this feature is to make the interaction between components present at the hypervisor and the DPU side automatic but, in order to make this workflow explicitly different from SR-IOV or offload at the hypervisor side, one has to use the "remote_managed" flag. This flag allows Nova to differentiate between "regular" VFs and the ones that have to be programmed by a remote host (DPU) - hence the name. A port needs to be pre-created with the remote-managed type - that way when Nova tries to schedule a VM with that port attached, it will find hosts which actually have PCI devices tagged with the "remote_managed": "true" in the PCI whitelist. The important thing to note here is that you must not use PCI passthrough directly for this - Nova will create a PCI device request automatically with the remote_managed flag included. There is currently no way to instruct Nova to choose one vendor/device ID vs the other for this (any remote_managed=true device from a pool will match) but maybe the work that was recently done to store PCI device information in the Placement service will pave the way for such granularity in the future. Best Regards, Dmitrii Shcherbakov LP/MM/oftc: dmitriis On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney wrote: > adding Dmitrii who was the primary developer of the openstack integration > so > they can provide more insight. > > Dmitrii did you ever give a presentationon the DPU support and how its > configured/integrated > that might help fill in the gaps for simon? > > more inline. > > On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote: > > E... > > > > But there are these things: > > > > 1) Show some real happened in my test: > > > > - Let me clear that, I use DPU in compute node: > > The graph in > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html . > > > > - I configure exactly follow > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html, > > which is said bellow in "3) Let me post all what I do follow this link". > > > > - In my test, I found after first three command (which is "openstack > > network create ...", "openstack subnet create", "openstack port create > ..."), > > there are network topology exist in DPU side, and there are rules exist > in > > OVN north DB, south DB of controller, like this: > > > > > ``` > > > root at c1:~# ovn-nbctl show > > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976 > > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice) > > > port 01a68701-0e6a-4c30-bfba-904d1b9813e1 > > > addresses: ["unknown"] > > > port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1) > > > addresses: ["fa:16:3e:13:36:e2 172.1.1.228"] > > > > > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding > > > _uuid : 61dc8bc0-ab33-4d67-ac13-0781f89c905a > > > chassis : [] > > > datapath : 91d3509c-d794-496a-ba11-3706ebf143c8 > > > encap : [] > > > external_ids : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24", > > > "neutron:device_id"="", "neutron:device_owner"="", > > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69, > > > "neutron:port_name"=pf0vf1, > > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9", > > > "neutron:revision_number"="1", > > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"} > > > > > > root at c1c2dpu:~# sudo ovs-vsctl show > > > 62cf78e5-2c02-471e-927e-1d69c2c22195 > > > Bridge br-int > > > fail_mode: secure > > > datapath_type: system > > > Port br-int > > > Interface br-int > > > type: internal > > > Port ovn--1 > > > Interface ovn--1 > > > type: geneve > > > options: {csum="true", key=flow, > remote_ip="172.168.2.98"} > > > Port pf0vf1 > > > Interface pf0vf1 > > > ovs_version: "2.17.2-24a81c8" > > > ``` > > > > > That's why I guess "first three command" has already create network > > topology, and "openstack server create" command only need to plug VF into > > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done. > no that jsut looks like the standard bridge toplogy that gets created when > you provision > the dpu to be used with openstac vai ovn. > > that looks unrelated to the neuton comamnd you ran. > > > > - In my test, then I run "openstack server create" command, I got ERROR > > which said "No valid host...", which is what the email said above. > > The reason has already said, it's nova-scheduler's PCI filter module > report > > no valid host. The reason "nova-scheduler's PCI filter module report no > > valid host" is nova-scheduler could NOT see PCI information of compute > > node. The reason "nova-scheduler could NOT see PCI information of compute > > node" is compute node's /etc/nova/nova.conf configure remote_managed tag > > like this: > > > > > ``` > > > [pci] > > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > > > "physical_network": null, "remote_managed": "true"} > > > alias = { "vendor_id":"15b3", "product_id":"101e", > > > "device_type":"type-VF", "name":"a1" } > > > ``` > > > > > > > 2) Discuss some detail design of "remote_managed" tag, I don't know if > this > > is right in the design of openstack with DPU: > > > > - In neutron-server side, use remote_managed tag in "openstack port > create > > ..." command. > > This command will make neutron-server / OVN / ovn-controller / ovs to > make > > the network topology done, like above said. > > I this this is right, because test shows that. > that is not correct > your test do not show what you think it does, they show the baisic bridge > toplogy and flow configuraiton that ovn installs by defualt when it manages > as ovs. > > please read the design docs for this feature for both nova and neutron to > understand how the interacction works. > > https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html > > https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html > > > > - In nova side, there are 2 things should process, first is PCI > passthrough > > filter, second is nova-compute to plug VF into VM. > > > > If the link above is right, which remote_managed tag exists in > > /etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf > of > > compute node. > > As above ("- In my test, then I run "openstack server create" command") > > said, got ERROR in this step. > > So what should do in "PCI passthrough filter" ? How to configure ? > > > > Then, if "PCI passthrough filter" stage pass, what will do of > nova-compute > > in compute node? > > > > 3) Post all what I do follow this link: > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > > - build openstack physical env, link plug DPU into compute mode, use VM > as > > controller ... etc. > > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link. > > - configure DPU side /etc/neutron/neutron.conf > > - configure host side /etc/nova/nova.conf > > - configure host side /etc/nova/nova-compute.conf > > - run first 3 command > > - last, run this command, got ERROR > > > > ---- > > Simon Jones > > > > > > Sean Mooney ?2023?3?1??? 18:35??? > > > > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: > > > > Thanks a lot !!! > > > > > > > > As you say, I follow > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > > > > And I want to use DPU mode. Not "disable DPU mode". > > > > So I think I should follow the link above exactlly, so I use > > > > vnic-type=remote_anaged. > > > > In my opnion, after I run first three command (which is "openstack > > > network > > > > create ...", "openstack subnet create", "openstack port create > ..."), the > > > > VF rep port and OVN and OVS rules are all ready. > > > not at that point nothign will have been done on ovn/ovs > > > > > > that will only happen after the port is bound to a vm and host. > > > > > > > What I should do in "openstack server create ..." is to JUST add PCI > > > device > > > > into VM, do NOT call neutron-server in nova-compute of compute node ( > > > like > > > > call port_binding or something). > > > this is incorrect. > > > > > > > > But as the log and steps said in the emails above, nova-compute call > > > > port_binding to neutron-server while running the command "openstack > > > server > > > > create ...". > > > > > > > > So I still have questions is: > > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT > > > call > > > > neutron-server in nova-compute of compute node ( like call > port_binding > > > or > > > > something)" . > > > no this is not how its designed. > > > until you attach the logical port to a vm (either at runtime or as > part of > > > vm create) > > > the logical port is not assocated with any host or phsical dpu/vf. > > > > > > so its not possibel to instanciate the openflow rules in ovs form the > > > logical switch model > > > in the ovn north db as no chassie info has been populated and we do not > > > have the dpu serial > > > info in the port binding details. > > > > 2) If it's right, how to deal with this? Which is how to JUST add PCI > > > > device into VM, do NOT call neutron-server? By command or by > configure? > > > Is > > > > there come document ? > > > no this happens automaticaly when nova does the port binding which > cannot > > > happen until after > > > teh vm is schduled to a host. > > > > > > > > ---- > > > > Simon Jones > > > > > > > > > > > > Sean Mooney ?2023?3?1??? 16:15??? > > > > > > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > > > > > > BTW, this link ( > > > > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) > > > > > said > > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that > WRONG ? > > > > > > > > > > no its not wrong but for dpu smart nics you have to make a choice > when > > > you > > > > > deploy > > > > > either they can be used in dpu mode in which case remote_managed > > > shoudl be > > > > > set to true > > > > > and you can only use them via neutron ports with > > > vnic-type=remote_managed > > > > > as descried in that doc > > > > > > > > > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port > > > > > > > > > > > > > > > or if you disable dpu mode in the nic frimware then you shoudl > remvoe > > > > > remote_managed form the pci device list and > > > > > then it can be used liek a normal vf either for neutron sriov ports > > > > > vnic-type=direct or via flavor based pci passthough. > > > > > > > > > > the issue you were havign is you configured the pci device list to > > > contain > > > > > "remote_managed: ture" which means > > > > > the vf can only be consumed by a neutron port with > > > > > vnic-type=remote_managed, when you have "remote_managed: false" or > > > unset > > > > > you can use it via vnic-type=direct i forgot that slight detail > that > > > > > vnic-type=remote_managed is required for "remote_managed: ture". > > > > > > > > > > > > > > > in either case you foudn the correct doc > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > > > neutorn sriov port configuration is documented here > > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > > > > > and nova flavor based pci passthough is documeted here > > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > > > > > > > all three server slightly differnt uses. both neutron proceedures > are > > > > > exclusivly fo network interfaces. > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > > > requires the use of ovn deployed on the dpu > > > > > to configure the VF contolplane. > > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > uses > > > > > the sriov nic agent > > > > > to manage the VF with ip tools. > > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > is > > > > > intended for pci passthough > > > > > of stateless acclerorators like qat devices. while the nova flavor > > > approch > > > > > cna be used with nics it not how its generally > > > > > ment to be used and when used to passthough a nic expectation is > that > > > its > > > > > not related to a neuton network. > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at bitswalk.com Thu Mar 2 14:00:03 2023 From: gael.therond at bitswalk.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Thu, 2 Mar 2023 15:00:03 +0100 Subject: [OPENSTACKSDK] - Missing feature or bad reader? Message-ID: Hi everyone, I'm currently adding a new module on ansible-collections-openstack, however, I'm having a hard time finding the appropriate function for my module. Within ansible-collections-openstack, we have a compute_services_info module that list services using the conn.compute.services() function that, to my knowledge, come from the openstacksdk.compute.v2.service.py module. Our ansible module replicate what you get with: openstack --os-cloud compute service list command. or openstack --os-cloud volume service list command. (If I'm not wrong, it seems, openstack client is leveraging osc-lib for that and not the SDK). My problem is, I want to add another similar module on our collection, (volume_services_info) that would do the same but for volumes services: Unfortunately, either I'm not looking at the right place, or any volume endpoint (v2/v3) within the openstack sdk is implementing the appropriate service module. Did I miss something or is that class simply missing? Thanks everyone! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Thu Mar 2 14:15:03 2023 From: ozzzo at yahoo.com (Albert Braden) Date: Thu, 2 Mar 2023 14:15:03 +0000 (UTC) Subject: (OpenStack-Upgrade) In-Reply-To: References: Message-ID: <821443260.1309518.1677766503515@mail.yahoo.com> Having done a few upgrades, I can give you some general advice: 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer. 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep. 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback. 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues. 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure. On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov wrote: Hey, Regarding rollaback of upgrade in OSA we indeed don't have any good established/documented process for that. At the same time it should be completely possible with some "BUT". It also depends on what exactly you want to rollback - roles, openstack services or both. As OSA roles can actually install any openstack service version. We keep all virtualenvs from the previous version, so during upgrade we build just new virtualenvs and reconfigure systemd units to point there. So fastest way likely would be to just edit systemd unit files and point them to old venv version and reload systemd daemon and service and restore DB from backup of course. You can also define? _venv_tag (ie `glance_venv_tag`) to the old OSA version you was running and execute openstack-ansible os--install.yml --tags? systemd-service,uwsgi - that in most cases will be enough to just edit systemd units for the service and start old version of it. BUT running without tags will result in having new packages in old venv which is smth you totally want to avoid. To prevent that you can also define _git_install_branch and requirements_git_install_branch in /etc/openstack_deploy/group_vars (it's important to use group vars if you want to rollback only one service) and take value from https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml (ofc pick your old version!) For a full rollback and not in-place workarounds, I think it should be like that * checkout to previous osa version * re-execute scripts/bootstrap-ansible.sh * you should still take current versions of mariadb and rabbitmq and define them in user_variables (galera_major_version, galera_minor_version, rabbitmq_package_version, rabbitmq_erlang_version_spec) - it's close to never ends well downgrading these. * Restore DB backup * Re-run setup-openstack.yml It's quite a rough summary of how I do see this process, but to be frank I never had to execute full downgrade - I was limited mostly by downgrading 1 service tops after the upgrade. Hope that helps! ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > hi Alvaro, > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > but what is the roll back procedure , i m looking for > > Regards > Adivya Singh > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: >> >> That will depend on how did you installed your environment: OSA, TripleO, etc. >> >> Can you provide more information? >> >> --- >> Alvaro Soto. >> >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. >> ---------------------------------------------------------- >> Great people talk about ideas, >> ordinary people talk about things, >> small people talk... about other people. >> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: >>> >>> Hi Team, >>> >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. >>> >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database >>> >>> Regards >>> Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From artem.goncharov at gmail.com Thu Mar 2 14:35:25 2023 From: artem.goncharov at gmail.com (artem.goncharov at gmail.com) Date: Thu, 02 Mar 2023 15:35:25 +0100 Subject: [OPENSTACKSDK] - Missing feature or bad reader? In-Reply-To: References: Message-ID: <2674425.mvXUDI8C0e@nuc> Hi On Thursday, 2 March 2023 15:00:03 CET Ga?l THEROND wrote: > Hi everyone, > > I'm currently adding a new module on ansible-collections-openstack, > however, I'm having a hard time finding the appropriate function for my > module. > > Within ansible-collections-openstack, we have a compute_services_info > module that list services using the conn.compute.services() function that, > to my knowledge, come from the openstacksdk.compute.v2.service.py module. > > Our ansible module replicate what you get with: > openstack --os-cloud compute service list command. > or > openstack --os-cloud volume service list command. > (If I'm not wrong, it seems, openstack client is leveraging osc-lib for > that and not the SDK). > > My problem is, I want to add another similar module on our collection, > (volume_services_info) that would do the same but for volumes services: > > Unfortunately, either I'm not looking at the right place, or any volume > endpoint (v2/v3) within the openstack sdk is implementing the appropriate > service module. From what I see block_storage in SDK is currently missing implementation for service management (it is an admin-only and such APIs tend to be of lower prio in SDK). > > Did I miss something or is that class simply missing? > > Thanks everyone! Artem From batmanustc at gmail.com Thu Mar 2 03:05:53 2023 From: batmanustc at gmail.com (Simon Jones) Date: Thu, 2 Mar 2023 11:05:53 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: Message-ID: E... But there are these things: 1) Show some real happened in my test: - Let me clear that, I use DPU in compute node: The graph in https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html . - I configure exactly follow https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html, which is said bellow in "3) Let me post all what I do follow this link". - In my test, I found after first three command (which is "openstack network create ...", "openstack subnet create", "openstack port create ..."), there are network topology exist in DPU side, and there are rules exist in OVN north DB, south DB of controller, like this: > ``` > root at c1:~# ovn-nbctl show > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976 > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice) > port 01a68701-0e6a-4c30-bfba-904d1b9813e1 > addresses: ["unknown"] > port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1) > addresses: ["fa:16:3e:13:36:e2 172.1.1.228"] > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding > _uuid : 61dc8bc0-ab33-4d67-ac13-0781f89c905a > chassis : [] > datapath : 91d3509c-d794-496a-ba11-3706ebf143c8 > encap : [] > external_ids : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24", > "neutron:device_id"="", "neutron:device_owner"="", > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69, > "neutron:port_name"=pf0vf1, > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9", > "neutron:revision_number"="1", > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"} > > root at c1c2dpu:~# sudo ovs-vsctl show > 62cf78e5-2c02-471e-927e-1d69c2c22195 > Bridge br-int > fail_mode: secure > datapath_type: system > Port br-int > Interface br-int > type: internal > Port ovn--1 > Interface ovn--1 > type: geneve > options: {csum="true", key=flow, remote_ip="172.168.2.98"} > Port pf0vf1 > Interface pf0vf1 > ovs_version: "2.17.2-24a81c8" > ``` > That's why I guess "first three command" has already create network topology, and "openstack server create" command only need to plug VF into VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done. - In my test, then I run "openstack server create" command, I got ERROR which said "No valid host...", which is what the email said above. The reason has already said, it's nova-scheduler's PCI filter module report no valid host. The reason "nova-scheduler's PCI filter module report no valid host" is nova-scheduler could NOT see PCI information of compute node. The reason "nova-scheduler could NOT see PCI information of compute node" is compute node's /etc/nova/nova.conf configure remote_managed tag like this: > ``` > [pci] > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", > "physical_network": null, "remote_managed": "true"} > alias = { "vendor_id":"15b3", "product_id":"101e", > "device_type":"type-VF", "name":"a1" } > ``` > 2) Discuss some detail design of "remote_managed" tag, I don't know if this is right in the design of openstack with DPU: - In neutron-server side, use remote_managed tag in "openstack port create ..." command. This command will make neutron-server / OVN / ovn-controller / ovs to make the network topology done, like above said. I this this is right, because test shows that. - In nova side, there are 2 things should process, first is PCI passthrough filter, second is nova-compute to plug VF into VM. If the link above is right, which remote_managed tag exists in /etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf of compute node. As above ("- In my test, then I run "openstack server create" command") said, got ERROR in this step. So what should do in "PCI passthrough filter" ? How to configure ? Then, if "PCI passthrough filter" stage pass, what will do of nova-compute in compute node? 3) Post all what I do follow this link: https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. - build openstack physical env, link plug DPU into compute mode, use VM as controller ... etc. - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link. - configure DPU side /etc/neutron/neutron.conf - configure host side /etc/nova/nova.conf - configure host side /etc/nova/nova-compute.conf - run first 3 command - last, run this command, got ERROR ---- Simon Jones Sean Mooney ?2023?3?1??? 18:35??? > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: > > Thanks a lot !!! > > > > As you say, I follow > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. > > And I want to use DPU mode. Not "disable DPU mode". > > So I think I should follow the link above exactlly, so I use > > vnic-type=remote_anaged. > > In my opnion, after I run first three command (which is "openstack > network > > create ...", "openstack subnet create", "openstack port create ..."), the > > VF rep port and OVN and OVS rules are all ready. > not at that point nothign will have been done on ovn/ovs > > that will only happen after the port is bound to a vm and host. > > > What I should do in "openstack server create ..." is to JUST add PCI > device > > into VM, do NOT call neutron-server in nova-compute of compute node ( > like > > call port_binding or something). > this is incorrect. > > > > But as the log and steps said in the emails above, nova-compute call > > port_binding to neutron-server while running the command "openstack > server > > create ...". > > > > So I still have questions is: > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT > call > > neutron-server in nova-compute of compute node ( like call port_binding > or > > something)" . > no this is not how its designed. > until you attach the logical port to a vm (either at runtime or as part of > vm create) > the logical port is not assocated with any host or phsical dpu/vf. > > so its not possibel to instanciate the openflow rules in ovs form the > logical switch model > in the ovn north db as no chassie info has been populated and we do not > have the dpu serial > info in the port binding details. > > 2) If it's right, how to deal with this? Which is how to JUST add PCI > > device into VM, do NOT call neutron-server? By command or by configure? > Is > > there come document ? > no this happens automaticaly when nova does the port binding which cannot > happen until after > teh vm is schduled to a host. > > > > ---- > > Simon Jones > > > > > > Sean Mooney ?2023?3?1??? 16:15??? > > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: > > > > BTW, this link ( > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) > > > said > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ? > > > > > > no its not wrong but for dpu smart nics you have to make a choice when > you > > > deploy > > > either they can be used in dpu mode in which case remote_managed > shoudl be > > > set to true > > > and you can only use them via neutron ports with > vnic-type=remote_managed > > > as descried in that doc > > > > > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port > > > > > > > > > or if you disable dpu mode in the nic frimware then you shoudl remvoe > > > remote_managed form the pci device list and > > > then it can be used liek a normal vf either for neutron sriov ports > > > vnic-type=direct or via flavor based pci passthough. > > > > > > the issue you were havign is you configured the pci device list to > contain > > > "remote_managed: ture" which means > > > the vf can only be consumed by a neutron port with > > > vnic-type=remote_managed, when you have "remote_managed: false" or > unset > > > you can use it via vnic-type=direct i forgot that slight detail that > > > vnic-type=remote_managed is required for "remote_managed: ture". > > > > > > > > > in either case you foudn the correct doc > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > neutorn sriov port configuration is documented here > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html > > > and nova flavor based pci passthough is documeted here > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html > > > > > > all three server slightly differnt uses. both neutron proceedures are > > > exclusivly fo network interfaces. > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html > > > requires the use of ovn deployed on the dpu > > > to configure the VF contolplane. > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses > > > the sriov nic agent > > > to manage the VF with ip tools. > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is > > > intended for pci passthough > > > of stateless acclerorators like qat devices. while the nova flavor > approch > > > cna be used with nics it not how its generally > > > ment to be used and when used to passthough a nic expectation is that > its > > > not related to a neuton network. > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 2 15:47:36 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 2 Mar 2023 16:47:36 +0100 Subject: (OpenStack-Upgrade) In-Reply-To: <821443260.1309518.1677766503515@mail.yahoo.com> References: <821443260.1309518.1677766503515@mail.yahoo.com> Message-ID: These are very weird statements and I can not agree with most of them. 1. You should upgrade in time. All problems come if you try to avoid upgrades at any costs - then you're indeed in a situation when upgrades are painful as you're running obsolete stuff that is not supported anymore and not provided by your distro (or distro also is not supported as well). With SLURP releases you will be able to do upgrades yearly starting with Antelope. Before that upgrades should be done each 6 month basically. Jumping through 1 release was not supported before but is doable given some preparation and small hacks. Jumping through more than 1 release will almost certainly guarantee you pain. Upgrades to next releases are well tested both by individual projects and by OpenStack-Ansible, so given you've looked through release notes and adjusted configuration - it should be just fine. 2. It's quite an easy and relatively smooth process as of today. Yes, you will have small API interruptions during the upgrade and when services do restart they drop connections. But we control HAproxy backends to minimize the effect of this. In many cases upgrade can be performed just running scripts/run_upgrade.sh - it will work given it's ran against healthy cluster (meaning that you don't have dead galera or rabbit node in your cluster). At the moment we spend around a working day for upgrading a region, but planning to automate this process soonish to perform upgrades of production environments using Zuul. We also never had to rollback, as rollback is indeed painful process that you can hard process. So I won't sugggest rolling back production environment unless it's absolutely needed. 3. This is smth I will agree with. You can take a look at our MNAIO [1] that can help you to spawn a virtual sandbox with multiple nodes in it, where you can play with upgrades. Also I'd suggest running tempest or rally tests regularly. They are helpful indeed. 4. I'm not sure what's meant here at all. I can hardly imagine how you can fail an OpenStack upgrade in a way that you will lose customer data. I can recall such failures with Ceph though, but it was somewhere around Hammer release (0.84 or smth) which is not the case for quite a while as well. [1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio ??, 2 ???. 2023??. ? 15:15, Albert Braden : > > Having done a few upgrades, I can give you some general advice: > > 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer. > > 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep. > > 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback. > > 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues. > > 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure. > On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov wrote: > > > Hey, > > Regarding rollaback of upgrade in OSA we indeed don't have any good > established/documented process for that. At the same time it should be > completely possible with some "BUT". It also depends on what exactly > you want to rollback - roles, openstack services or both. As OSA roles > can actually install any openstack service version. > > We keep all virtualenvs from the previous version, so during upgrade > we build just new virtualenvs and reconfigure systemd units to point > there. So fastest way likely would be to just edit systemd unit files > and point them to old venv version and reload systemd daemon and > service and restore DB from backup of course. > You can also define _venv_tag (ie `glance_venv_tag`) to the > old OSA version you was running and execute openstack-ansible > os--install.yml --tags systemd-service,uwsgi - that in most > cases will be enough to just edit systemd units for the service and > start old version of it. BUT running without tags will result in > having new packages in old venv which is smth you totally want to > avoid. > To prevent that you can also define _git_install_branch and > requirements_git_install_branch in /etc/openstack_deploy/group_vars > (it's important to use group vars if you want to rollback only one > service) and take value from > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml > (ofc pick your old version!) > > For a full rollback and not in-place workarounds, I think it should be like that > * checkout to previous osa version > * re-execute scripts/bootstrap-ansible.sh > * you should still take current versions of mariadb and rabbitmq and > define them in user_variables (galera_major_version, > galera_minor_version, rabbitmq_package_version, > rabbitmq_erlang_version_spec) - it's close to never ends well > downgrading these. > * Restore DB backup > * Re-run setup-openstack.yml > > It's quite a rough summary of how I do see this process, but to be > frank I never had to execute full downgrade - I was limited mostly by > downgrading 1 service tops after the upgrade. > > Hope that helps! > > ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > > > > hi Alvaro, > > > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > > > but what is the roll back procedure , i m looking for > > > > Regards > > Adivya Singh > > > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: > >> > >> That will depend on how did you installed your environment: OSA, TripleO, etc. > >> > >> Can you provide more information? > >> > >> --- > >> Alvaro Soto. > >> > >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > >> ---------------------------------------------------------- > >> Great people talk about ideas, > >> ordinary people talk about things, > >> small people talk... about other people. > >> > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: > >>> > >>> Hi Team, > >>> > >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. > >>> > >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database > >>> > >>> Regards > >>> Adivya Singh > From gael.therond at bitswalk.com Thu Mar 2 16:07:07 2023 From: gael.therond at bitswalk.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Thu, 2 Mar 2023 17:07:07 +0100 Subject: [OPENSTACKSDK] - Missing feature or bad reader? Message-ID: > Hi everyone, > > I'm currently adding a new module on ansible-collections-openstack, > however, I'm having a hard time finding the appropriate function for my > module. > > Within ansible-collections-openstack, we have a compute_services_info > module that list services using the conn.compute.services() function that, > to my knowledge, come from the openstacksdk.compute.v2.service.py module. > > Our ansible module replicate what you get with: > openstack --os-cloud compute service list command. > or > openstack --os-cloud volume service list command. > (If I'm not wrong, it seems, openstack client is leveraging osc-lib for > that and not the SDK). > > My problem is, I want to add another similar module on our collection, > (volume_services_info) that would do the same but for volumes services: > > Unfortunately, either I'm not looking at the right place, or any volume > endpoint (v2/v3) within the openstack sdk is implementing the appropriate > service module. >From what I see block_storage in SDK is currently missing implementation for >service management (it is an admin-only and such APIs tend to be of lower prio >in SDK). >Artem All right, thanks a lot for the confirmation Artem, gonna see it we could add that pretty quickly on SDK as cinder already provide the appropriate /os-service API endpoint. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Thu Mar 2 17:03:21 2023 From: ozzzo at yahoo.com (Albert Braden) Date: Thu, 2 Mar 2023 17:03:21 +0000 (UTC) Subject: (OpenStack-Upgrade) In-Reply-To: References: <821443260.1309518.1677766503515@mail.yahoo.com> Message-ID: <1288757490.3453002.1677776601143@mail.yahoo.com> 1. Of course you should upgrade every 6 months. I've never seen or heard of anyone doing that, but if you have the resources, I agree, that would be great. And yes, if you're upgrading a few versions, you may need to do one or more operating system upgrades along the way. 2. I've never seen an easy, smooth process. That being said, I've never done a single-version upgrade. If you upgrade every 6 months, then maybe it would be smooth and easy. The standard situation I saw during my contracting years is that a company has got themselves into a bind because they have a small team (or maybe 1 guy) running Openstack, and they haven't upgraded for a long time, so they hire me to clean up the mess. 4 (I think you meant 5 here?). I've never lost resources during an upgrade, but I would never promise customers that there is 0 percentage chance of loss. I always recommend that the customer be resilient against loss, for example by duplicating their application in multiple clusters and by maintaining backups of important data, and I strengthen that recommendation during upgrades. On Thursday, March 2, 2023, 10:56:47 AM EST, Dmitriy Rabotyagov wrote: These are very weird statements and I can not agree with most of them. 1. You should upgrade in time. All problems come if you try to avoid upgrades at any costs - then you're indeed in a situation when upgrades are painful as you're running obsolete stuff that is not supported anymore and not provided by your distro (or distro also is not supported as well). With SLURP releases you will be able to do upgrades yearly starting with Antelope. Before that upgrades should be done each 6 month basically. Jumping through 1 release was not supported before but is doable given some preparation and small hacks. Jumping through more than 1 release will almost certainly guarantee you pain. Upgrades to next releases are well tested both by individual projects and by OpenStack-Ansible, so given you've looked through release notes and adjusted configuration - it should be just fine. 2. It's quite an easy and relatively smooth process as of today. Yes, you will have small API interruptions during the upgrade and when services do restart they drop connections. But we control HAproxy backends to minimize the effect of this. In many cases upgrade can be performed just running scripts/run_upgrade.sh - it will work given it's ran against healthy cluster (meaning that you don't have dead galera or rabbit node in your cluster). At the moment we spend around a working day for upgrading a region, but planning to automate this process soonish to perform upgrades of production environments using Zuul. We also never had to rollback, as rollback is indeed painful process that you can hard process. So I won't sugggest rolling back production environment unless it's absolutely needed. 3. This is smth I will agree with. You can take a look at our MNAIO [1] that can help you to spawn a virtual sandbox with multiple nodes in it, where you can play with upgrades. Also I'd suggest running tempest or rally tests regularly. They are helpful indeed. 4. I'm not sure what's meant here at all. I can hardly imagine how you can fail an OpenStack upgrade in a way that you will lose customer data. I can recall such failures with Ceph though, but it was somewhere around Hammer release (0.84 or smth) which is not the case for quite a while as well. [1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio ??, 2 ???. 2023??. ? 15:15, Albert Braden : > > Having done a few upgrades, I can give you some general advice: > > 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer. > > 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep. > > 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback. > > 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues. > > 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure. > On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov wrote: > > > Hey, > > Regarding rollaback of upgrade in OSA we indeed don't have any good > established/documented process for that. At the same time it should be > completely possible with some "BUT". It also depends on what exactly > you want to rollback - roles, openstack services or both. As OSA roles > can actually install any openstack service version. > > We keep all virtualenvs from the previous version, so during upgrade > we build just new virtualenvs and reconfigure systemd units to point > there. So fastest way likely would be to just edit systemd unit files > and point them to old venv version and reload systemd daemon and > service and restore DB from backup of course. > You can also define? _venv_tag (ie `glance_venv_tag`) to the > old OSA version you was running and execute openstack-ansible > os--install.yml --tags? systemd-service,uwsgi - that in most > cases will be enough to just edit systemd units for the service and > start old version of it. BUT running without tags will result in > having new packages in old venv which is smth you totally want to > avoid. > To prevent that you can also define _git_install_branch and > requirements_git_install_branch in /etc/openstack_deploy/group_vars > (it's important to use group vars if you want to rollback only one > service) and take value from > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml > (ofc pick your old version!) > > For a full rollback and not in-place workarounds, I think it should be like that > * checkout to previous osa version > * re-execute scripts/bootstrap-ansible.sh > * you should still take current versions of mariadb and rabbitmq and > define them in user_variables (galera_major_version, > galera_minor_version, rabbitmq_package_version, > rabbitmq_erlang_version_spec) - it's close to never ends well > downgrading these. > * Restore DB backup > * Re-run setup-openstack.yml > > It's quite a rough summary of how I do see this process, but to be > frank I never had to execute full downgrade - I was limited mostly by > downgrading 1 service tops after the upgrade. > > Hope that helps! > > ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > > > > hi Alvaro, > > > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > > > but what is the roll back procedure , i m looking for > > > > Regards > > Adivya Singh > > > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: > >> > >> That will depend on how did you installed your environment: OSA, TripleO, etc. > >> > >> Can you provide more information? > >> > >> --- > >> Alvaro Soto. > >> > >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > >> ---------------------------------------------------------- > >> Great people talk about ideas, > >> ordinary people talk about things, > >> small people talk... about other people. > >> > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: > >>> > >>> Hi Team, > >>> > >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. > >>> > >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database > >>> > >>> Regards > >>> Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openinfra.dev Thu Mar 2 19:21:04 2023 From: jimmy at openinfra.dev (Jimmy McArthur) Date: Thu, 2 Mar 2023 14:21:04 -0500 Subject: (OpenStack-Upgrade) In-Reply-To: <821443260.1309518.1677766503515@mail.yahoo.com> References: <821443260.1309518.1677766503515@mail.yahoo.com> Message-ID: Hi Albert, I would highly recommend checking out a few episodes of OpenInfra Live specifically around large scale upgrades. For example [1]. There are a number of organizations running OpenStack that stay on the most current release. From large to small, it?s well worth your time to stay up to date as much as possible. Cheers, Jimmy [1] https://superuser.openstack.org/articles/upgrades-in-large-scale-openstack-infrastructure-openinfra-live-episode-6/ > On Mar 2, 2023, at 9:15 AM, Albert Braden wrote: > > Having done a few upgrades, I can give you some general advice: > > 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer. > > 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep. > > 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback. > > 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues. > > 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure. > On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov wrote: > > > Hey, > > Regarding rollaback of upgrade in OSA we indeed don't have any good > established/documented process for that. At the same time it should be > completely possible with some "BUT". It also depends on what exactly > you want to rollback - roles, openstack services or both. As OSA roles > can actually install any openstack service version. > > We keep all virtualenvs from the previous version, so during upgrade > we build just new virtualenvs and reconfigure systemd units to point > there. So fastest way likely would be to just edit systemd unit files > and point them to old venv version and reload systemd daemon and > service and restore DB from backup of course. > You can also define _venv_tag (ie `glance_venv_tag`) to the > old OSA version you was running and execute openstack-ansible > os--install.yml --tags systemd-service,uwsgi - that in most > cases will be enough to just edit systemd units for the service and > start old version of it. BUT running without tags will result in > having new packages in old venv which is smth you totally want to > avoid. > To prevent that you can also define _git_install_branch and > requirements_git_install_branch in /etc/openstack_deploy/group_vars > (it's important to use group vars if you want to rollback only one > service) and take value from > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml > (ofc pick your old version!) > > For a full rollback and not in-place workarounds, I think it should be like that > * checkout to previous osa version > * re-execute scripts/bootstrap-ansible.sh > * you should still take current versions of mariadb and rabbitmq and > define them in user_variables (galera_major_version, > galera_minor_version, rabbitmq_package_version, > rabbitmq_erlang_version_spec) - it's close to never ends well > downgrading these. > * Restore DB backup > * Re-run setup-openstack.yml > > It's quite a rough summary of how I do see this process, but to be > frank I never had to execute full downgrade - I was limited mostly by > downgrading 1 service tops after the upgrade. > > Hope that helps! > > ??, 1 ???. 2023??. ? 12:06, Adivya Singh >: > > > > > hi Alvaro, > > > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > > > but what is the roll back procedure , i m looking for > > > > Regards > > Adivya Singh > > > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto > wrote: > >> > >> That will depend on how did you installed your environment: OSA, TripleO, etc. > >> > >> Can you provide more information? > >> > >> --- > >> Alvaro Soto. > >> > >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > >> ---------------------------------------------------------- > >> Great people talk about ideas, > >> ordinary people talk about things, > >> small people talk... about other people. > >> > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh > wrote: > >>> > >>> Hi Team, > >>> > >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. > >>> > >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database > >>> > >>> Regards > >>> Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 2 19:24:26 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 2 Mar 2023 20:24:26 +0100 Subject: (OpenStack-Upgrade) In-Reply-To: <1288757490.3453002.1677776601143@mail.yahoo.com> References: <821443260.1309518.1677766503515@mail.yahoo.com> <1288757490.3453002.1677776601143@mail.yahoo.com> Message-ID: Oh, well, that explains your attitude to upgrades then. But basically it's all about collecting and sorting out a technical debt, that IS collected by avoiding upgrades for as long as possible. Out of my experience, a team of 3-4 engineers is capable of maintaining and regularly upgrading OpenStack. Yes, maybe not once in 6 month, but once a year for sure. And performing 2 sequential upgrades is not that big of a deal - it's kind of 20 hours per year per region if you don't have time or knowledge to deal with small hackeries for jumping through 1 release (which is usually not a big deal). Based on that my advice would be to prevent having and collecting technical debt, as while it might feel cheaper to not invest time in maintenance, dealing with debt is always more expensive. So do not be afraid of upgrades if they're done in a timely manner, using maintained and supported versions of software is always better then legacy and EOLed ones. We were also discussing the upgrade process with OpenStack-Ansible, which is being used as a deployment tool, which does simplify the upgrade process. I bet kolla-ansible also do a damn good job with their upgrades. But I do understand how much a PITA heterogeneous deployments can be. And yeah, I meant 5. Regarding 4 - I kind of agree - each deployment is individual especially with some time. And it's really true that on production you will see issues you never saw in CI or DEV environments, but such issues will be mostly related to the load or not exact same configuration of dev envs. I'd say a good example of that might be OVS or l3 agents, that will take way more time to startup on production compared to sandbox where you won't spot any downtime or issues. ??, 2 ???. 2023??. ? 18:03, Albert Braden : > > 1. Of course you should upgrade every 6 months. I've never seen or heard of anyone doing that, but if you have the resources, I agree, that would be great. And yes, if you're upgrading a few versions, you may need to do one or more operating system upgrades along the way. > > 2. I've never seen an easy, smooth process. That being said, I've never done a single-version upgrade. If you upgrade every 6 months, then maybe it would be smooth and easy. The standard situation I saw during my contracting years is that a company has got themselves into a bind because they have a small team (or maybe 1 guy) running Openstack, and they haven't upgraded for a long time, so they hire me to clean up the mess. > > 4 (I think you meant 5 here?). I've never lost resources during an upgrade, but I would never promise customers that there is 0 percentage chance of loss. I always recommend that the customer be resilient against loss, for example by duplicating their application in multiple clusters and by maintaining backups of important data, and I strengthen that recommendation during upgrades. > On Thursday, March 2, 2023, 10:56:47 AM EST, Dmitriy Rabotyagov wrote: > > > These are very weird statements and I can not agree with most of them. > > 1. You should upgrade in time. All problems come if you try to avoid > upgrades at any costs - then you're indeed in a situation when > upgrades are painful as you're running obsolete stuff that is not > supported anymore and not provided by your distro (or distro also is > not supported as well). > With SLURP releases you will be able to do upgrades yearly starting > with Antelope. Before that upgrades should be done each 6 month > basically. Jumping through 1 release was not supported before but is > doable given some preparation and small hacks. Jumping through more > than 1 release will almost certainly guarantee you pain. Upgrades to > next releases are well tested both by individual projects and by > OpenStack-Ansible, so given you've looked through release notes and > adjusted configuration - it should be just fine. > > 2. It's quite an easy and relatively smooth process as of today. Yes, > you will have small API interruptions during the upgrade and when > services do restart they drop connections. But we control HAproxy > backends to minimize the effect of this. In many cases upgrade can be > performed just running scripts/run_upgrade.sh - it will work given > it's ran against healthy cluster (meaning that you don't have dead > galera or rabbit node in your cluster). At the moment we spend around > a working day for upgrading a region, but planning to automate this > process soonish to perform upgrades of production environments using > Zuul. We also never had to rollback, as rollback is indeed painful > process that you can hard process. So I won't sugggest rolling back > production environment unless it's absolutely needed. > > 3. This is smth I will agree with. You can take a look at our MNAIO > [1] that can help you to spawn a virtual sandbox with multiple nodes > in it, where you can play with upgrades. Also I'd suggest running > tempest or rally tests regularly. They are helpful indeed. > > 4. I'm not sure what's meant here at all. I can hardly imagine how you > can fail an OpenStack upgrade in a way that you will lose customer > data. I can recall such failures with Ceph though, but it was > somewhere around Hammer release (0.84 or smth) which is not the case > for quite a while as well. > > > [1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio > > ??, 2 ???. 2023??. ? 15:15, Albert Braden : > > > > Having done a few upgrades, I can give you some general advice: > > > > 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer. > > > > 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep. > > > > 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback. > > > > 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues. > > > > 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure. > > On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov wrote: > > > > > > Hey, > > > > Regarding rollaback of upgrade in OSA we indeed don't have any good > > established/documented process for that. At the same time it should be > > completely possible with some "BUT". It also depends on what exactly > > you want to rollback - roles, openstack services or both. As OSA roles > > can actually install any openstack service version. > > > > We keep all virtualenvs from the previous version, so during upgrade > > we build just new virtualenvs and reconfigure systemd units to point > > there. So fastest way likely would be to just edit systemd unit files > > and point them to old venv version and reload systemd daemon and > > service and restore DB from backup of course. > > You can also define _venv_tag (ie `glance_venv_tag`) to the > > old OSA version you was running and execute openstack-ansible > > os--install.yml --tags systemd-service,uwsgi - that in most > > cases will be enough to just edit systemd units for the service and > > start old version of it. BUT running without tags will result in > > having new packages in old venv which is smth you totally want to > > avoid. > > To prevent that you can also define _git_install_branch and > > requirements_git_install_branch in /etc/openstack_deploy/group_vars > > (it's important to use group vars if you want to rollback only one > > service) and take value from > > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml > > (ofc pick your old version!) > > > > For a full rollback and not in-place workarounds, I think it should be like that > > * checkout to previous osa version > > * re-execute scripts/bootstrap-ansible.sh > > * you should still take current versions of mariadb and rabbitmq and > > define them in user_variables (galera_major_version, > > galera_minor_version, rabbitmq_package_version, > > rabbitmq_erlang_version_spec) - it's close to never ends well > > downgrading these. > > * Restore DB backup > > * Re-run setup-openstack.yml > > > > It's quite a rough summary of how I do see this process, but to be > > frank I never had to execute full downgrade - I was limited mostly by > > downgrading 1 service tops after the upgrade. > > > > Hope that helps! > > > > ??, 1 ???. 2023??. ? 12:06, Adivya Singh : > > > > > > > > hi Alvaro, > > > > > > i have installed using Openstack-ansible, The upgrade procedure is consistent > > > > > > but what is the roll back procedure , i m looking for > > > > > > Regards > > > Adivya Singh > > > > > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto wrote: > > >> > > >> That will depend on how did you installed your environment: OSA, TripleO, etc. > > >> > > >> Can you provide more information? > > >> > > >> --- > > >> Alvaro Soto. > > >> > > >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > > >> ---------------------------------------------------------- > > >> Great people talk about ideas, > > >> ordinary people talk about things, > > >> small people talk... about other people. > > >> > > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh wrote: > > >>> > > >>> Hi Team, > > >>> > > >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums. > > >>> > > >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database > > >>> > > >>> Regards > > >>> Adivya Singh > > > From kozhukalov at gmail.com Thu Mar 2 21:07:06 2023 From: kozhukalov at gmail.com (Vladimir Kozhukalov) Date: Fri, 3 Mar 2023 00:07:06 +0300 Subject: [openstack-helm] Get rid of cephfs and rbd provisioners Message-ID: Hi everyone, I would like to suggest getting rid of cephfs and rbd provisioners. They have been retired and have not been maintained for about 2.5 years now [1]. I believe the CSI approach is what all users rely on nowadays and we can safely remove them. The trigger for this suggestion is that we are currently experiencing issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing this is just wasting time. [2] Stephen spent some time debugging the issues and can give more details if needed. What do you think? [1] https://github.com/kubernetes-retired/external-storage/tree/master/ceph [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976 -- Best regards, Kozhukalov Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Thu Mar 2 22:13:47 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Thu, 2 Mar 2023 23:13:47 +0100 Subject: [openstack-helm] Get rid of cephfs and rbd provisioners In-Reply-To: References: Message-ID: Hi Vladimir, I agree. I also think we should stop maintaining the CSI provisioner chart and simply deploy the one provided by the Ceph CSI team Less code we maintain, the better. Thanks Mohammed On Thu, Mar 2, 2023 at 10:13?PM Vladimir Kozhukalov wrote: > Hi everyone, > > I would like to suggest getting rid of cephfs and rbd provisioners. They > have been retired and have not been maintained for about 2.5 years now [1]. > I believe the CSI approach is what all users rely on nowadays and we can > safely remove them. > > The trigger for this suggestion is that we are currently experiencing > issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing > this is just wasting time. [2] Stephen spent some time debugging the issues > and can give more details if needed. > > What do you think? > > [1] > https://github.com/kubernetes-retired/external-storage/tree/master/ceph > [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976 > -- > Best regards, > Kozhukalov Vladimir > -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 2 23:04:57 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 2 Mar 2023 15:04:57 -0800 Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu to OpenStack community In-Reply-To: References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org> <20230301225909.enp5xetacmzpjg7o@yuggoth.org> Message-ID: Thanks for the work, VirtualPDU-maintainers-emeritus, Jeremy, and Riccardo. The change just merged into governance for Ironic to officially take over this repo ( https://opendev.org/openstack/governance/commit/db8597ad9245ce2c115b4d5140a6d81d63b2a9af ). Over the next handful of weeks we'll work with our OpenDev sysadmin partners to get the repo moved into the openstack/ namespace. - Jay Faulkner Ironic PTL TC Vice-Chair On Thu, Mar 2, 2023 at 3:38?AM Riccardo Pittau wrote: > Thanks for this anyway Jeremy! > Luckily one of the old maintainers added me to the virtualpdu-core and > virtualpdu-release groups, so now we can move forward with the repository > update and move easily. > > Ciao! > Riccardo > > On Thu, Mar 2, 2023 at 12:05?AM Jeremy Stanley wrote: > >> On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote: >> [...] >> > We don't have a documented process since this is the first time it's >> > really come up, but I'm officially announcing that I intend to use >> > my administrative permissions as an OpenDev sysadmin to add >> > membership of an OpenStack Ironic team representative to the >> > following Gerrit groups if no objections are raised before >> > Wednesday, March 8 >> [...] >> >> It seems the additional round of outreach worked, so no longer >> requires my direct intervention: >> >> >> https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/ >> >> -- >> Jeremy Stanley >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Fri Mar 3 00:45:11 2023 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Fri, 3 Mar 2023 09:45:11 +0900 Subject: [tc][heat][tacker] Moving governance of tosca-parser(and heat-translator ?) to Tacker In-Reply-To: References: <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com> <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com> Message-ID: On 2023/03/02 11:55, Takashi Kajinami wrote: > Thanks. So based on the agreement in this thread I've pushed the change to > the governance repository > to migrate tosca-parser and heat-translator to Tacker's governance. > > https://review.opendev.org/c/openstack/governance/+/876012 > > I'll keep heat-core group in heat-translator-core group for now, but we can > revisit this in the future. Thanks for the update. > > > On Wed, Mar 1, 2023 at 6:41?PM Yasufumi Ogawa wrote: > >> On 2023/02/28 3:49, Ghanshyam Mann wrote: >>> ---- On Sun, 26 Feb 2023 19:54:45 -0800 Takashi Kajinami wrote --- >>> > >>> > >>> > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com> >> wrote: >>> > Hi, >>> > >>> > On 2023/02/27 10:51, Takashi Kajinami wrote: >>> > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann >> gmann at ghanshyammann.com> >>> > > wrote: >>> > > >>> > >> ---- On Sun, 19 Feb 2023 18:44:14 -0800 Takashi Kajinami >> wrote --- >>> > >> > Hello, >>> > >> > >>> > >> > Currently tosca-parser is part of heat's governance, but the >> core >>> > >> reviewers of this repositorydoes not contain any active heat >> cores while we >>> > >> see multiple Tacker cores in this group.Considering the fact the >> project is >>> > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate >> this >>> > >> repository to Tacker's governance. Most of the current heat cores >> are not >>> > >> quitefamiliar with the codes in this repository, and if Tacker >> team is not >>> > >> interested in maintainingthis repository then I'd propose >> retiring this. >>> > As you mentioned, tacker still using tosca-parser and >> heat-translator. >>> > >>> > >> >>> > >> I think it makes sense and I remember its usage/maintenance by >> the Tacker >>> > >> team since starting. >>> > >> But let's wait for the Tacker team opinion and accordingly you >> can propose >>> > >> the governance patch. >>> > Although I've not joined to tacker team since starting, it might not >> be >>> > true because there was no cores of tosca-parser and heat-translator >> in >>> > tacker team. We've started to help maintenance the projects because >> no >>> > other active contributer. >>> > >>> > >> >>> > >> > >>> > >> > Similarly, we have heat-translator project which has both >> heat cores >>> > >> and tacker cores as itscore reviewers. IIUC this is tightly >> related to the >>> > >> work in tosca-parser, I'm wondering it makesmore sense to move >> this project >>> > >> to Tacker, because the requirement is mostly made fromTacker side >> rather >>> > >> than Heat side. >>> > >> >>> > >> I am not sure about this and from the name, it seems like more of >> a heat >>> > >> thing but it is not got beyond the Tosca template >>> > >> conversion. Are there no users of it outside of the Tacker >> service? or any >>> > >> request to support more template conversions than >>> > >> Tosca? >>> > >> >>> > > >>> > > Current hea-translator supports only the TOSCA template[1]. >>> > > The heat-translator project can be a generic template converter by >> its >>> > > nature but we haven't seen any interest >>> > > in implementing support for different template formats. >>> > > >>> > > [1] >>> > > >> https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49 >>> > > >>> > > >>> > > >>> > >> If no other user or use case then I think one option can be to >> merge it >>> > >> into Tosca-parser itself and retire heat-translator. >>> > >> >>> > >> Opinion? >>> > Hmm, as a core of tosca-parser, I'm not sure it's a good idea >> because it >>> > is just a parser TOSCA and independent from heat-translator. In >>> > addition, there is no experts of Heat or HOT in current tacker team >>> > actually, so it might be difficult to maintain heat-translator >> without >>> > any help from heat team. >>> > >>> > The hea-translator project was initially created to implement a >> translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2] >> but we have never increased scope of tosca-parser. So it has beenno more >> than the TOSCA template translator. >>> > >>> > [1] >> https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2] >> >> https://review.opendev.org/c/openstack/project-config/+/211204 >>> > We (Heat team) can provide help with any problems with heat, but we >> own no actual use case of template translation.Maintaining the >> heat-translator repository with tacker, which currently provides actual use >> cases would make more sense.This also gives the benefit that Tacker team >> can decide when stable branches of heat-translator should be retiredalong >> with the other Tacker repos. >>> > >>> > By the way, may I ask what will be happened if the governance is >> move on >>> > to tacker? Is there any extra tasks for maintenance? >>> > >>> > TC would have better (and more precise) explanation but my >> understanding is that - creating a release >>> > - maintaining stable branches >>> > - maintaining gate healthwould be the required tasks along with >> moderating dev discussion in mailing list/PTG/etc. >>> >>> I think you covered all and the Core team (Tacker members) might be >> already doing a few of the tasks. From the >>> governance perspective, tacker PTL will be the point of contact for this >> repo in the case repo becomes inactive or so >>> but it will be the project team's decision to merge/split things >> whatever way makes maintenance easy. >> I understand. I've shared the proposal again in the previous meeting and >> no objection raised. So, we'd agree to move the governance as Tacker team. >> >> Thanks, >> Yasufumi >>> >>> -gmann >>> >>> >>> > Thanks, >>> > Yasufumi >>> > >>> > >> >>> > > >>> > > That also sounds good to me. >>> > > >>> > > >>> > >> Also, correcting the email subject tag as [tc]. >>> > >> >>> > >> -gmann >>> > >> >>> > >> > >>> > >> > [1] >>> > >> >> https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2] >>> > >> >> https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members >>> > >> > Thank you,Takashi >>> > >> > >>> > >> >>> > >> >>> > > >>> > >>> > >> >> > From rosmaita.fossdev at gmail.com Fri Mar 3 03:17:32 2023 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 2 Mar 2023 22:17:32 -0500 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com> References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com> Message-ID: On 3/1/23 12:19 PM, Ghanshyam Mann wrote: > ---- On Wed, 01 Mar 2023 09:02:59 -0800 Brian Rosmaita wrote --- > > On 2/28/23 9:02 PM, Ghanshyam Mann wrote: > > [snip] > > > > > I think removing from client is good way to stop exposing this old/not-recommended way to users > > > but API is separate things and removing the API request parameter 'multiattach' from it can break > > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain > > > the backward compatibility/interoperability it should be removed by bumping the microversion so that > > > it continue working for older microversions. This way we will not break the existing users and will > > > provide the new way for users to start using. > > > > It's not just that this is not recommended, it can lead to data loss. > > We should only allow multiattach for volume types that actually support > > it. So I see this as a case of "I broke your script now, but you'll > > thank me later". > > > > We could microversion this, but then an end user has to go out of the > > way and add the correct mv to their request to get the correct behavior. > > Someone using the default mv + multiattach=true will unknowingly put > > themselves into a data loss situation. I think it's better to break > > that person's API request. > > Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes) > then I think changing it without microversion bump makes sense. But if we know there is any success case > for xyz configuration/backend then I feel we should not break such success use case. Thanks, Ghanshyam. An end user is setting themselves up for data loss if they rely on the request parameter rather than on using a volume type that explicitly supports multiattach. They could get lucky and not lose any data, but that's not really a success, so I think the best thing to do here is make this breaking change without a microversion. > I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC, > the test does not check the data things so we do not completely test it in Tempest. It's good that Tempest is there to keep us honest! I think what we can do to help out people whose scripts break is to return a specific error message explaining that the 'multiattach' element is not allowed in a volume-create request and instead the user should select a multiattach-capable volume type. > > -gmann > > > > > > > cheers, > > brian > > > > > > From gmann at ghanshyammann.com Fri Mar 3 03:24:22 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 02 Mar 2023 19:24:22 -0800 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com> Message-ID: <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com> ---- On Thu, 02 Mar 2023 19:17:32 -0800 Brian Rosmaita wrote --- > On 3/1/23 12:19 PM, Ghanshyam Mann wrote: > > ---- On Wed, 01 Mar 2023 09:02:59 -0800 Brian Rosmaita wrote --- > > > On 2/28/23 9:02 PM, Ghanshyam Mann wrote: > > > [snip] > > > > > > > I think removing from client is good way to stop exposing this old/not-recommended way to users > > > > but API is separate things and removing the API request parameter 'multiattach' from it can break > > > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain > > > > the backward compatibility/interoperability it should be removed by bumping the microversion so that > > > > it continue working for older microversions. This way we will not break the existing users and will > > > > provide the new way for users to start using. > > > > > > It's not just that this is not recommended, it can lead to data loss. > > > We should only allow multiattach for volume types that actually support > > > it. So I see this as a case of "I broke your script now, but you'll > > > thank me later". > > > > > > We could microversion this, but then an end user has to go out of the > > > way and add the correct mv to their request to get the correct behavior. > > > Someone using the default mv + multiattach=true will unknowingly put > > > themselves into a data loss situation. I think it's better to break > > > that person's API request. > > > > Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes) > > then I think changing it without microversion bump makes sense. But if we know there is any success case > > for xyz configuration/backend then I feel we should not break such success use case. > > Thanks, Ghanshyam. An end user is setting themselves up for data loss > if they rely on the request parameter rather than on using a volume type > that explicitly supports multiattach. They could get lucky and not lose > any data, but that's not really a success, so I think the best thing to > do here is make this breaking change without a microversion. > > > I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC, > > the test does not check the data things so we do not completely test it in Tempest. > > It's good that Tempest is there to keep us honest! I think what we can > do to help out people whose scripts break is to return a specific error > message explaining that the 'multiattach' element is not allowed in a > volume-create request and instead the user should select a > multiattach-capable volume type. Thanks, Brian for explaining. This sounds good to me. Explaining the situation in release notes and error message will be really helpful for users. I am +2 on the tempest change now - https://review.opendev.org/c/openstack/tempest/+/875372 -gmann > > > > > -gmann > > > > > > > > > > > cheers, > > > brian > > > > > > > > > > > > From rdhasman at redhat.com Fri Mar 3 09:49:09 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Fri, 3 Mar 2023 15:19:09 +0530 Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API In-Reply-To: <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com> References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com> <1708281385.5319584.1677085955832@mail.yahoo.com> <2009529524.2155590.1677101634600@mail.yahoo.com> <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com> <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com> <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com> <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com> Message-ID: Thanks Brian and Ghanshyam for the discussion. I've updated the tempest patch[1] to update one test that I missed earlier and also the cinder patch[2] which now returns a BadRequest stating the reason for the error and how to fix it. $ curl -g -i -X POST http://127.0.0.1/volume/v3/d6634f35c00f409883ecb10361b556c3/volumes -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABkAbZkWgdbpXNgObizvGy8jS6LoMGuxzMnMaMOw6wm2j5i5KrG2xIzWCDxrSAiaMJWqneNpKrwn8P852mPOyJB_WmxrhrmKiuafcP0KSljyW44mFwDtGN74VL50NLoVC-srL63L3xduyeF5EIlPEyDsWRqPSRZZwau7wQrngAZ8XBP3M8" -d '{"volume": {"size": 1, "consistencygroup_id": null, "snapshot_id": null, "name": null, "description": null, "volume_type": null, "availability_zone": null, "metadata": {}, "imageRef": null, "source_volid": null, "backup_id": null, "multiattach": "true"}}' HTTP/1.1 400 Bad Request Date: Fri, 03 Mar 2023 09:04:38 GMT Server: Apache/2.4.41 (Ubuntu) OpenStack-API-Version: volume 3.0 Vary: OpenStack-API-Version Content-Length: 261 Content-Type: application/json x-compute-request-id: req-a9f9999e-01e3-4970-9c32-35de193c04c1 x-openstack-request-id: req-a9f9999e-01e3-4970-9c32-35de193c04c1 Connection: close {"badRequest": {"code": 400, "message": "multiattach parameter has been removed. The default behavior is to use multiattach enabled volume types. Contact your administrator to create a multiattach enabled volume type and use it to create multiattach volumes."}} [1] https://review.opendev.org/c/openstack/tempest/+/875372 [2] https://review.opendev.org/c/openstack/cinder/+/874865 On Fri, Mar 3, 2023 at 9:00?AM Ghanshyam Mann wrote: > ---- On Thu, 02 Mar 2023 19:17:32 -0800 Brian Rosmaita wrote --- > > On 3/1/23 12:19 PM, Ghanshyam Mann wrote: > > > ---- On Wed, 01 Mar 2023 09:02:59 -0800 Brian Rosmaita wrote --- > > > > On 2/28/23 9:02 PM, Ghanshyam Mann wrote: > > > > [snip] > > > > > > > > > I think removing from client is good way to stop exposing this > old/not-recommended way to users > > > > > but API is separate things and removing the API request > parameter 'multiattach' from it can break > > > > > the existing users using it this way. Tempest test is one good > example of such users use case. To maintain > > > > > the backward compatibility/interoperability it should be > removed by bumping the microversion so that > > > > > it continue working for older microversions. This way we will > not break the existing users and will > > > > > provide the new way for users to start using. > > > > > > > > It's not just that this is not recommended, it can lead to data > loss. > > > > We should only allow multiattach for volume types that actually > support > > > > it. So I see this as a case of "I broke your script now, but > you'll > > > > thank me later". > > > > > > > > We could microversion this, but then an end user has to go out of > the > > > > way and add the correct mv to their request to get the correct > behavior. > > > > Someone using the default mv + multiattach=true will > unknowingly put > > > > themselves into a data loss situation. I think it's better to > break > > > > that person's API request. > > > > > > Ok, if multiattach=True in the request is always an unsuccessful case > (or unknown successful sometimes) > > > then I think changing it without microversion bump makes sense. But > if we know there is any success case > > > for xyz configuration/backend then I feel we should not break such > success use case. > > > > Thanks, Ghanshyam. An end user is setting themselves up for data loss > > if they rely on the request parameter rather than on using a volume > type > > that explicitly supports multiattach. They could get lucky and not > lose > > any data, but that's not really a success, so I think the best thing to > > do here is make this breaking change without a microversion. > > > > > I was just thinking from the Tempest test perspective which was > passing but as you corrected me in IRC, > > > the test does not check the data things so we do not completely test > it in Tempest. > > > > It's good that Tempest is there to keep us honest! I think what we can > > do to help out people whose scripts break is to return a specific error > > message explaining that the 'multiattach' element is not allowed in a > > volume-create request and instead the user should select a > > multiattach-capable volume type. > > Thanks, Brian for explaining. This sounds good to me. Explaining the > situation in release notes and error message > will be really helpful for users. > > I am +2 on the tempest change now - > https://review.opendev.org/c/openstack/tempest/+/875372 > > -gmann > > > > > > > > > -gmann > > > > > > > > > > > > > > > cheers, > > > > brian > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Fri Mar 3 11:00:20 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 3 Mar 2023 12:00:20 +0100 Subject: [neutron] Drivers meeting cancelled Message-ID: Hello Neutrinos: Due to the lack of agenda [1], today's meeting is cancelled. Have a nice weekend. *PS: do not forget to add your topics to the PTG agenda [2]**. PTG is coming!* [1]https://wiki.openstack.org/wiki/Meetings/NeutronDrivers [2]https://etherpad.opendev.org/p/neutron-bobcat-ptg -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Fri Mar 3 11:04:52 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Fri, 3 Mar 2023 16:34:52 +0530 Subject: [cinder] proposing Jon Bernard for cinder core Message-ID: Hello everyone, I would like to propose Jon Bernard as cinder core. Looking at the review stats for the past 60[1], 90[2], 120[3] days, he has been consistently in the top 5 reviewers with a good +/- ratio and leaving helpful comments indicating good quality of reviews. He has been managing the stable branch releases for the past 2 cycles (Zed and 2023.1) and has helped in releasing security issues as well. Jon has been part of the cinder and OpenStack community for a long time and has shown very active interest in upstream activities, be it release liaison, review contribution, attending cinder meetings and also involving in outreachy activities. He will be a very good addition to our team helping out with the review bandwidth and adding valuable input in our discussions. I will leave this thread open for a week and if there are no objections, I will add Jon Bernard to the cinder core team. [1] https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 [2] https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 [3] https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 Thanks Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Fri Mar 3 14:15:31 2023 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Fri, 3 Mar 2023 09:15:31 -0500 Subject: [cinder] proposing Jon Bernard for cinder core In-Reply-To: References: Message-ID: <6f0e5056-ecae-83cb-2389-7117bb253ab6@gmail.com> On 3/3/23 6:04 AM, Rajat Dhasmana wrote: > Hello everyone, > > I would like to propose Jon Bernard as cinder core. [snip]> I will leave this thread open for a week and if there are no objections, > I will add Jon Bernard to the cinder core team. No objections from me! Jon is a careful and knowledgeable reviewer and he will be a great addition to the cinder core team. cheers, brian From thierry at openstack.org Fri Mar 3 14:51:13 2023 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 3 Mar 2023 15:51:13 +0100 Subject: [release] Release countdown for week R-2, March 6-10 Message-ID: <2a831c77-bc39-539b-c6c0-e9e198c84e10@openstack.org> Development Focus ----------------- At this point we should have release candidates (RC1 or recent intermediary release) for almost all the deliverables. Teams should be working on any release-critical bugs that would require another RC or intermediary release before the final release. Actions ------- Early in the week, the release team will be proposing stable/2023.1 branch creation for all deliverables that have not branched yet, using the latest available 2023.1 Antelope release as the branch point. If your team is ready to go for creating that branch, please let us know by leaving a +1 on these patches. If you would like to wait for another release before branching, you can -1 the patch and update it later in the week with the new release you would like to use. By the end of the week the release team will merge those patches though, unless an exception is granted. Once stable/2023.1 branches are created, if a release-critical bug is detected, you will need to fix the issue in the master branch first, then backport the fix to the stable/2023.1 branch before releasing out of the stable/2023.1 branch. After all of the cycle-with-rc projects have branched we will branch devstack, grenade, and the requirements repos. This will effectively open them up for Bobcat development, though the focus should still be on finishing up Antelope until the final 2023.1 release. For projects with translations, watch for any translation patches coming through and merge them quickly. A new release should be produced so that translations are included in the final 2023.1 Antelope release. Finally, now is a good time to finalize release notes. In particular, consider adding any relevant "prelude" content. Release notes are targeted for the downstream consumers of your project, so it would be great to include any useful information for those that are going to pick up and use or deploy the 2023.1 Antelope version of your project. Upcoming Deadlines & Dates -------------------------- Final RC deadline: March 17 (end of R-1 week) Final 2023.1 Antelope release: March 22 Virtual PTG: March 27-31 -- Thierry Carrez (ttx) From dtantsur at redhat.com Fri Mar 3 15:31:22 2023 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Fri, 3 Mar 2023 16:31:22 +0100 Subject: [ironic] The future of x/ironic-staging-drivers Message-ID: Hi folks! I have been maintaining $subj together with Riccardo and a few occasional volunteers for many years. Now that our priorities have changed, it is not maintained any more. We haven't even created stable/zed, and I'm afraid to check the CI status. We're looking for volunteers to maintain the repository long-term. If none are found, the project will be deprecated and frozen in its current state. Please speak up if you care. Dmitry -- Red Hat GmbH , Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From ozzzo at yahoo.com Fri Mar 3 16:34:44 2023 From: ozzzo at yahoo.com (Albert Braden) Date: Fri, 3 Mar 2023 16:34:44 +0000 (UTC) Subject: Paying for Openstack support References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> Message-ID: <1739978420.3955026.1677861284070@mail.yahoo.com> I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience. Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it? If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators. From vrook at wikimedia.org Fri Mar 3 18:09:24 2023 From: vrook at wikimedia.org (Vivian Rook) Date: Fri, 3 Mar 2023 13:09:24 -0500 Subject: [magnum] security groups for magnum nodes Message-ID: Is there an option for adding security groups to a given magnum template, and thus the nodes that such a template would create? I have an NFS server, and it is setup to only allow connections from nodes with the "nfs" security group. A few pods in my cluster mount the NFS server, and are blocked as a result. Is it possible to setup magnum so that it adds the "nfs" security group to the worker nodes (it would be alright if it has to be worker and control nodes)? Thank you! -- *Vivian Rook (They/Them)* Site Reliability Engineer Wikimedia Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri Mar 3 19:57:41 2023 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 3 Mar 2023 20:57:41 +0100 Subject: Paying for Openstack support In-Reply-To: <1739978420.3955026.1677861284070@mail.yahoo.com> References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> <1739978420.3955026.1677861284070@mail.yahoo.com> Message-ID: Hello, I think you must look at https://www.openstack.org/marketplace/distros/ for adopting a supported distro. Ignazio Il Ven 3 Mar 2023, 17:37 Albert Braden ha scritto: > I have a question for the operators here. Is anyone paying for Openstack > support and getting good value for your money? Can you contact someone for > help with an issue, and get a useful response in a reasonable time? If you > have an emergency, can you get help quickly? If so, I would like to hear > about your experience. > > Who are you getting good support from? Do they support your operating > system too? If not, where do you get your OS support, and how good is it? > > If you work for a company that provides openstack and/or Linux support, > you are welcome to send me a sales pitch, but my goal is to hear from > operators. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openinfra.dev Fri Mar 3 20:02:47 2023 From: jimmy at openinfra.dev (Jimmy McArthur) Date: Fri, 3 Mar 2023 14:02:47 -0600 Subject: Paying for Openstack support In-Reply-To: References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> <1739978420.3955026.1677861284070@mail.yahoo.com> Message-ID: <2BE356FB-1EFA-43BE-BFBB-DFD05F82636D@openinfra.dev> You can also look at https://www.openstack.org/marketplace/consultants for organizations working with and without distros. > On Mar 3, 2023, at 1:57 PM, Ignazio Cassano wrote: > > Hello, I think you must look at > https://www.openstack.org/marketplace/distros/ for adopting a supported distro. > Ignazio > > Il Ven 3 Mar 2023, 17:37 Albert Braden > ha scritto: > I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience. > > Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it? > > If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.vanommen at gmail.com Fri Mar 3 20:03:49 2023 From: john.vanommen at gmail.com (John van Ommen) Date: Fri, 3 Mar 2023 12:03:49 -0800 Subject: Paying for Openstack support In-Reply-To: <1739978420.3955026.1677861284070@mail.yahoo.com> References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> <1739978420.3955026.1677861284070@mail.yahoo.com> Message-ID: I've worked on a number of RHOSP deployments that relied on Red Hat support, and the experience was positive. The first OpenStack project that I did, it depended on SwiftStack for support and they were great. But they were acquired by Nvidia. AFAIK, there aren't many companies that still provide OpenStack support in the United States. From what I understand, RackSpace has been pivoting towards doing AWS support. On Fri, Mar 3, 2023 at 8:36?AM Albert Braden wrote: > I have a question for the operators here. Is anyone paying for Openstack > support and getting good value for your money? Can you contact someone for > help with an issue, and get a useful response in a reasonable time? If you > have an emergency, can you get help quickly? If so, I would like to hear > about your experience. > > Who are you getting good support from? Do they support your operating > system too? If not, where do you get your OS support, and how good is it? > > If you work for a company that provides openstack and/or Linux support, > you are welcome to send me a sales pitch, but my goal is to hear from > operators. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openinfra.dev Fri Mar 3 20:08:45 2023 From: jimmy at openinfra.dev (Jimmy McArthur) Date: Fri, 3 Mar 2023 14:08:45 -0600 Subject: Paying for Openstack support In-Reply-To: References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> <1739978420.3955026.1677861284070@mail.yahoo.com> Message-ID: <4EDD9DD7-A2A5-48B4-8152-41B58D0CDEB7@openinfra.dev> Just to note, there are plenty of organizations that provide support in the US: Virtuozzo, Vexxhost, Red Hat, Canonical, SharkTech, Mirantis, OpenMetal (for hosted private cloud), to name a few. > On Mar 3, 2023, at 2:03 PM, John van Ommen wrote: > > I've worked on a number of RHOSP deployments that relied on Red Hat support, and the experience was positive. > > The first OpenStack project that I did, it depended on SwiftStack for support and they were great. But they were acquired by Nvidia. > > AFAIK, there aren't many companies that still provide OpenStack support in the United States. From what I understand, RackSpace has been pivoting towards doing AWS support. > > On Fri, Mar 3, 2023 at 8:36?AM Albert Braden > wrote: > I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience. > > Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it? > > If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Sat Mar 4 18:19:36 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Sat, 4 Mar 2023 23:49:36 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> Message-ID: Hi, Can someone please help me out on this issue? With regards, Swogat Pradhan On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan wrote: > Hi > I don't see any major packet loss. > It seems the problem is somewhere in rabbitmq maybe but not due to packet > loss. > > with regards, > Swogat Pradhan > > On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan > wrote: > >> Hi, >> Yes the MTU is the same as the default '1500'. >> Generally I haven't seen any packet loss, but never checked when >> launching the instance. >> I will check that and come back. >> But everytime i launch an instance the instance gets stuck at spawning >> state and there the hypervisor becomes down, so not sure if packet loss >> causes this. >> >> With regards, >> Swogat pradhan >> >> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >> >>> One more thing coming to mind is MTU size. Are they identical between >>> central and edge site? Do you see packet loss through the tunnel? >>> >>> Zitat von Swogat Pradhan : >>> >>> > Hi Eugen, >>> > Request you to please add my email either on 'to' or 'cc' as i am not >>> > getting email's from you. >>> > Coming to the issue: >>> > >>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>> / >>> > Listing policies for vhost "/" ... >>> > vhost name pattern apply-to definition priority >>> > / ha-all ^(?!amq\.).* queues >>> > >>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>> > >>> > I have the edge site compute nodes up, it only goes down when i am >>> trying >>> > to launch an instance and the instance comes to a spawning state and >>> then >>> > gets stuck. >>> > >>> > I have a tunnel setup between the central and the edge sites. >>> > >>> > With regards, >>> > Swogat Pradhan >>> > >>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> > wrote: >>> > >>> >> Hi Eugen, >>> >> For some reason i am not getting your email to me directly, i am >>> checking >>> >> the email digest and there i am able to find your reply. >>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>> >> Yes, these logs are from the time when the issue occurred. >>> >> >>> >> *Note: i am able to create vm's and perform other activities in the >>> >> central site, only facing this issue in the edge site.* >>> >> >>> >> With regards, >>> >> Swogat Pradhan >>> >> >>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >> wrote: >>> >> >>> >>> Hi Eugen, >>> >>> Thanks for your response. >>> >>> I have actually a 4 controller setup so here are the details: >>> >>> >>> >>> *PCS Status:* >>> >>> * Container bundle set: rabbitmq-bundle [ >>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-no-ceph-3 >>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-2 >>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-1 >>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-0 >>> >>> >>> >>> I have tried restarting the bundle multiple times but the issue is >>> still >>> >>> present. >>> >>> >>> >>> *Cluster status:* >>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>> >>> Cluster status of node >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>> >>> Basics >>> >>> >>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>> >>> >>> Disk Nodes >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> >>> >>> Running Nodes >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> >>> >>> Versions >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>> RabbitMQ >>> >>> 3.8.3 on Erlang 22.3.4.1 >>> >>> >>> >>> Alarms >>> >>> >>> >>> (none) >>> >>> >>> >>> Network Partitions >>> >>> >>> >>> (none) >>> >>> >>> >>> Listeners >>> >>> >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>> inter-node and >>> >>> CLI tool communication >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> >>> >>> Feature flags >>> >>> >>> >>> Flag: drop_unroutable_metric, state: enabled >>> >>> Flag: empty_basic_get_metric, state: enabled >>> >>> Flag: implicit_default_bindings, state: enabled >>> >>> Flag: quorum_queue, state: enabled >>> >>> Flag: virtual_host_metadata, state: enabled >>> >>> >>> >>> *Logs:* >>> >>> *(Attached)* >>> >>> >>> >>> With regards, >>> >>> Swogat Pradhan >>> >>> >>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> Please find the nova conductor as well as nova api log. >>> >>>> >>> >>>> nova-conuctor: >>> >>>> >>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 16152921c1eb45c2b1f562087140168b >>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> >>>> 83dbe5f567a940b698acfe986f6194fa >>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> b240e3e89d99489284cd731e75f2a5db >>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>> with >>> >>>> backend dogpile.cache.null. >>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>> >>>> With regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> >>>> swogatpradhan22 at gmail.com> wrote: >>> >>>> >>> >>>>> Hi, >>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>> >>>>> launch vm's. >>> >>>>> When the VM is in spawning state the node goes down (openstack >>> compute >>> >>>>> service list), the node comes backup when i restart the nova >>> compute >>> >>>>> service but then the launch of the vm fails. >>> >>>>> >>> >>>>> nova-compute.log >>> >>>>> >>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>> >>>>> instance usage >>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>> to >>> >>>>> 2023-02-26 08:00:00. 0 instances. >>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>> >>>>> dcn01-hci-0.bdxworld.com >>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>> name: >>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>> with >>> >>>>> backend dogpile.cache.null. >>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>> >>>>> privsep helper: >>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>> 'privsep-helper', >>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>> privsep >>> >>>>> daemon via rootwrap >>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> daemon starting >>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> process running with uid/gid: 0/0 >>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> process running with capabilities (eff/prm/inh): >>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> daemon running as pid 2647 >>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>> os_brick.initiator.connectors.nvmeof >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>> >>>>> execution error >>> >>>>> in _get_host_uuid: Unexpected error while running command. >>> >>>>> Command: blkid overlay -s UUID -o value >>> >>>>> Exit code: 2 >>> >>>>> Stdout: '' >>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>> >>>>> Unexpected error while running command. >>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>> >>>>> >>> >>>>> Is there a way to solve this issue? >>> >>>>> >>> >>>>> >>> >>>>> With regards, >>> >>>>> >>> >>>>> Swogat Pradhan >>> >>>>> >>> >>>> >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Sat Mar 4 20:47:45 2023 From: eblock at nde.ag (Eugen Block) Date: Sat, 04 Mar 2023 20:47:45 +0000 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> Message-ID: <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> Hi, I tried to help someone with a similar issue some time ago in this thread: https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue. If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes: - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands. - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice. Regards, Eugen Zitat von Swogat Pradhan : > Hi, > Can someone please help me out on this issue? > > With regards, > Swogat Pradhan > > On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan > wrote: > >> Hi >> I don't see any major packet loss. >> It seems the problem is somewhere in rabbitmq maybe but not due to packet >> loss. >> >> with regards, >> Swogat Pradhan >> >> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >> wrote: >> >>> Hi, >>> Yes the MTU is the same as the default '1500'. >>> Generally I haven't seen any packet loss, but never checked when >>> launching the instance. >>> I will check that and come back. >>> But everytime i launch an instance the instance gets stuck at spawning >>> state and there the hypervisor becomes down, so not sure if packet loss >>> causes this. >>> >>> With regards, >>> Swogat pradhan >>> >>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>> >>>> One more thing coming to mind is MTU size. Are they identical between >>>> central and edge site? Do you see packet loss through the tunnel? >>>> >>>> Zitat von Swogat Pradhan : >>>> >>>> > Hi Eugen, >>>> > Request you to please add my email either on 'to' or 'cc' as i am not >>>> > getting email's from you. >>>> > Coming to the issue: >>>> > >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>>> / >>>> > Listing policies for vhost "/" ... >>>> > vhost name pattern apply-to definition priority >>>> > / ha-all ^(?!amq\.).* queues >>>> > >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>> > >>>> > I have the edge site compute nodes up, it only goes down when i am >>>> trying >>>> > to launch an instance and the instance comes to a spawning state and >>>> then >>>> > gets stuck. >>>> > >>>> > I have a tunnel setup between the central and the edge sites. >>>> > >>>> > With regards, >>>> > Swogat Pradhan >>>> > >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> > wrote: >>>> > >>>> >> Hi Eugen, >>>> >> For some reason i am not getting your email to me directly, i am >>>> checking >>>> >> the email digest and there i am able to find your reply. >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>> >> Yes, these logs are from the time when the issue occurred. >>>> >> >>>> >> *Note: i am able to create vm's and perform other activities in the >>>> >> central site, only facing this issue in the edge site.* >>>> >> >>>> >> With regards, >>>> >> Swogat Pradhan >>>> >> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >> wrote: >>>> >> >>>> >>> Hi Eugen, >>>> >>> Thanks for your response. >>>> >>> I have actually a 4 controller setup so here are the details: >>>> >>> >>>> >>> *PCS Status:* >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-no-ceph-3 >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-2 >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-1 >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-0 >>>> >>> >>>> >>> I have tried restarting the bundle multiple times but the issue is >>>> still >>>> >>> present. >>>> >>> >>>> >>> *Cluster status:* >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>> >>> Cluster status of node >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>> >>> Basics >>>> >>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>> >>> >>>> >>> Disk Nodes >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>> >>>> >>> Running Nodes >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>> >>>> >>> Versions >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>> >>> >>>> >>> Alarms >>>> >>> >>>> >>> (none) >>>> >>> >>>> >>> Network Partitions >>>> >>> >>>> >>> (none) >>>> >>> >>>> >>> Listeners >>>> >>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>> inter-node and >>>> >>> CLI tool communication >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>>> 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> >>>> >>> Feature flags >>>> >>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>> >>> Flag: implicit_default_bindings, state: enabled >>>> >>> Flag: quorum_queue, state: enabled >>>> >>> Flag: virtual_host_metadata, state: enabled >>>> >>> >>>> >>> *Logs:* >>>> >>> *(Attached)* >>>> >>> >>>> >>> With regards, >>>> >>> Swogat Pradhan >>>> >>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>> wrote: >>>> >>> >>>> >>>> Hi, >>>> >>>> Please find the nova conductor as well as nova api log. >>>> >>>> >>>> >>>> nova-conuctor: >>>> >>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> with >>>> >>>> backend dogpile.cache.null. >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>> >>>> >>>>> Hi, >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>> >>>>> launch vm's. >>>> >>>>> When the VM is in spawning state the node goes down (openstack >>>> compute >>>> >>>>> service list), the node comes backup when i restart the nova >>>> compute >>>> >>>>> service but then the launch of the vm fails. >>>> >>>>> >>>> >>>>> nova-compute.log >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>> >>>>> instance usage >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>>> to >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>> >>>>> dcn01-hci-0.bdxworld.com >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>>> name: >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> with >>>> >>>>> backend dogpile.cache.null. >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>> >>>>> privsep helper: >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>> 'privsep-helper', >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>>> privsep >>>> >>>>> daemon via rootwrap >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> daemon starting >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> process running with uid/gid: 0/0 >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> process running with capabilities (eff/prm/inh): >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> daemon running as pid 2647 >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>> os_brick.initiator.connectors.nvmeof >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>> >>>>> execution error >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>> >>>>> Command: blkid overlay -s UUID -o value >>>> >>>>> Exit code: 2 >>>> >>>>> Stdout: '' >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> >>>>> Unexpected error while running command. >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>> >>>>> >>>> >>>>> >>>> >>>>> With regards, >>>> >>>>> >>>> >>>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> From jvisser at redhat.com Fri Mar 3 13:57:39 2023 From: jvisser at redhat.com (John Visser) Date: Fri, 3 Mar 2023 08:57:39 -0500 Subject: [cinder] proposing Jon Bernard for cinder core In-Reply-To: References: Message-ID: That's great, I fully support Jon being a core member, and he's certainly hitting all the important aspects. jv On Fri, Mar 3, 2023 at 6:10?AM Rajat Dhasmana wrote: > Hello everyone, > > I would like to propose Jon Bernard as cinder core. Looking at the review > stats > for the past 60[1], 90[2], 120[3] days, he has been consistently in the > top 5 > reviewers with a good +/- ratio and leaving helpful comments indicating > good > quality of reviews. He has been managing the stable branch releases for the > past 2 cycles (Zed and 2023.1) and has helped in releasing security issues > as well. > > Jon has been part of the cinder and OpenStack community for a long time and > has shown very active interest in upstream activities, be it release > liaison, review > contribution, attending cinder meetings and also involving in outreachy > activities. > He will be a very good addition to our team helping out with the review > bandwidth > and adding valuable input in our discussions. > > I will leave this thread open for a week and if there are no objections, I > will add > Jon Bernard to the cinder core team. > > [1] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 > [2] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 > [3] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 > > Thanks > Rajat Dhasmana > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bshephar at redhat.com Sun Mar 5 10:32:11 2023 From: bshephar at redhat.com (Brendan Shephard) Date: Sun, 5 Mar 2023 20:32:11 +1000 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> Message-ID: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it? One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface. I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance. In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue. Regards, Brendan Shephard Senior Software Engineer Red Hat Australia > On 5 Mar 2023, at 6:47 am, Eugen Block wrote: > > Hi, > > I tried to help someone with a similar issue some time ago in this thread: > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > > But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue. > If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes: > > - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands. > - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. > > I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice. > > Regards, > Eugen > > Zitat von Swogat Pradhan : > >> Hi, >> Can someone please help me out on this issue? >> >> With regards, >> Swogat Pradhan >> >> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >> wrote: >> >>> Hi >>> I don't see any major packet loss. >>> It seems the problem is somewhere in rabbitmq maybe but not due to packet >>> loss. >>> >>> with regards, >>> Swogat Pradhan >>> >>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >>> wrote: >>> >>>> Hi, >>>> Yes the MTU is the same as the default '1500'. >>>> Generally I haven't seen any packet loss, but never checked when >>>> launching the instance. >>>> I will check that and come back. >>>> But everytime i launch an instance the instance gets stuck at spawning >>>> state and there the hypervisor becomes down, so not sure if packet loss >>>> causes this. >>>> >>>> With regards, >>>> Swogat pradhan >>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>>> >>>>> One more thing coming to mind is MTU size. Are they identical between >>>>> central and edge site? Do you see packet loss through the tunnel? >>>>> >>>>> Zitat von Swogat Pradhan : >>>>> >>>>> > Hi Eugen, >>>>> > Request you to please add my email either on 'to' or 'cc' as i am not >>>>> > getting email's from you. >>>>> > Coming to the issue: >>>>> > >>>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>>>> / >>>>> > Listing policies for vhost "/" ... >>>>> > vhost name pattern apply-to definition priority >>>>> > / ha-all ^(?!amq\.).* queues >>>>> > >>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>> > >>>>> > I have the edge site compute nodes up, it only goes down when i am >>>>> trying >>>>> > to launch an instance and the instance comes to a spawning state and >>>>> then >>>>> > gets stuck. >>>>> > >>>>> > I have a tunnel setup between the central and the edge sites. >>>>> > >>>>> > With regards, >>>>> > Swogat Pradhan >>>>> > >>>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> >>>>> > wrote: >>>>> > >>>>> >> Hi Eugen, >>>>> >> For some reason i am not getting your email to me directly, i am >>>>> checking >>>>> >> the email digest and there i am able to find your reply. >>>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>>> >> Yes, these logs are from the time when the issue occurred. >>>>> >> >>>>> >> *Note: i am able to create vm's and perform other activities in the >>>>> >> central site, only facing this issue in the edge site.* >>>>> >> >>>>> >> With regards, >>>>> >> Swogat Pradhan >>>>> >> >>>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> >>>>> >> wrote: >>>>> >> >>>>> >>> Hi Eugen, >>>>> >>> Thanks for your response. >>>>> >>> I have actually a 4 controller setup so here are the details: >>>>> >>> >>>>> >>> *PCS Status:* >>>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>>> Started >>>>> >>> overcloud-controller-no-ceph-3 >>>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>>> Started >>>>> >>> overcloud-controller-2 >>>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>>> Started >>>>> >>> overcloud-controller-1 >>>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>>> Started >>>>> >>> overcloud-controller-0 >>>>> >>> >>>>> >>> I have tried restarting the bundle multiple times but the issue is >>>>> still >>>>> >>> present. >>>>> >>> >>>>> >>> *Cluster status:* >>>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>>> >>> Cluster status of node >>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>> >>> Basics >>>>> >>> >>>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>> >>> >>>>> >>> Disk Nodes >>>>> >>> >>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>> >>>>> >>> Running Nodes >>>>> >>> >>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>> >>>>> >>> Versions >>>>> >>> >>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>> 3.8.3 >>>>> >>> on Erlang 22.3.4.1 >>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>> 3.8.3 >>>>> >>> on Erlang 22.3.4.1 >>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>> 3.8.3 >>>>> >>> on Erlang 22.3.4.1 >>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>> RabbitMQ >>>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>> >>> >>>>> >>> Alarms >>>>> >>> >>>>> >>> (none) >>>>> >>> >>>>> >>> Network Partitions >>>>> >>> >>>>> >>> (none) >>>>> >>> >>>>> >>> Listeners >>>>> >>> >>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> tool >>>>> >>> communication >>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>> and AMQP 1.0 >>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> tool >>>>> >>> communication >>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>> and AMQP 1.0 >>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> tool >>>>> >>> communication >>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>> and AMQP 1.0 >>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> interface: >>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> , >>>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>>> inter-node and >>>>> >>> CLI tool communication >>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> , >>>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>>>> 0-9-1 >>>>> >>> and AMQP 1.0 >>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> , >>>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>> >>>>> >>> Feature flags >>>>> >>> >>>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>> >>> Flag: implicit_default_bindings, state: enabled >>>>> >>> Flag: quorum_queue, state: enabled >>>>> >>> Flag: virtual_host_metadata, state: enabled >>>>> >>> >>>>> >>> *Logs:* >>>>> >>> *(Attached)* >>>>> >>> >>>>> >>> With regards, >>>>> >>> Swogat Pradhan >>>>> >>> >>>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> >>>>> >>> wrote: >>>>> >>> >>>>> >>>> Hi, >>>>> >>>> Please find the nova conductor as well as nova api log. >>>>> >>>> >>>>> >>>> nova-conuctor: >>>>> >>>> >>>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>>>> due to a >>>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>>> Abandoning...: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>>>> due to a >>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> Abandoning...: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>>>> due to a >>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> Abandoning...: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>>> with >>>>> >>>> backend dogpile.cache.null. >>>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>>>> due to a >>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> Abandoning...: >>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>>> >>>> With regards, >>>>> >>>> Swogat Pradhan >>>>> >>>> >>>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>> >>>>> >>>>> Hi, >>>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>>> >>>>> launch vm's. >>>>> >>>>> When the VM is in spawning state the node goes down (openstack >>>>> compute >>>>> >>>>> service list), the node comes backup when i restart the nova >>>>> compute >>>>> >>>>> service but then the launch of the vm fails. >>>>> >>>>> >>>>> >>>>> nova-compute.log >>>>> >>>>> >>>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>>> >>>>> instance usage >>>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>>>> to >>>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>>> >>>>> dcn01-hci-0.bdxworld.com >>>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>>>> name: >>>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>>> with >>>>> >>>>> backend dogpile.cache.null. >>>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>>> >>>>> privsep helper: >>>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>>> 'privsep-helper', >>>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>>>> privsep >>>>> >>>>> daemon via rootwrap >>>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>>> daemon starting >>>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>>> process running with uid/gid: 0/0 >>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>>> process running with capabilities (eff/prm/inh): >>>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>>> daemon running as pid 2647 >>>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>> os_brick.initiator.connectors.nvmeof >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>>> >>>>> execution error >>>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>>> >>>>> Command: blkid overlay -s UUID -o value >>>>> >>>>> Exit code: 2 >>>>> >>>>> Stdout: '' >>>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>>> >>>>> Unexpected error while running command. >>>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>> >>>>> >>>>> >>>>> Is there a way to solve this issue? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> With regards, >>>>> >>>>> >>>>> >>>>> Swogat Pradhan >>>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>>> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Sun Mar 5 11:00:26 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Sun, 5 Mar 2023 16:30:26 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Brendan, Thank you for your response. The edge1 site was just for testing so i used active-backup on 1gbe bonded interface. We are in the process of adding another edge site where we are using 2 linux bond vlan templates. I will test and try launching vm in the 2nd edge site and confirm if I am facing the same issue or no issue at all. With regards, Swogat Pradhan On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard wrote: > Does your environment use different network interfaces for each of the > networks? Or does it have a bond with everything on it? > > One issue I have seen before is that when launching instances, there is a > lot of network traffic between nodes as the hypervisor needs to download > the image from Glance. Along with various other services sending normal > network traffic, it can be enough to cause issues if everything is running > over a single 1Gbe interface. > > I have seen the same situation in fact when using a single active/backup > bond on 1Gbe nics. It?s worth checking the network traffic while you try to > spawn the instance to see if you?re dropping packets. In the situation I > described, there were dropped packets which resulted in a loss of > communication between nova_compute and RMQ, so the node appeared offline. > You should also confirm that nova_compute is being disconnected in the > nova_compute logs if you tail them on the Hypervisor while spawning the > instance. > > In my case, changing from active/backup to LACP helped. So, based on that > experience, from my perspective, is certainly sounds like some kind of > network issue. > > Regards, > > Brendan Shephard > Senior Software Engineer > Red Hat Australia > > > > On 5 Mar 2023, at 6:47 am, Eugen Block wrote: > > Hi, > > I tried to help someone with a similar issue some time ago in this thread: > > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > > But apparently a neutron reinstallation fixed it for that user, not sure > if that could apply here. But is it possible that your nova and neutron > versions are different between central and edge site? Have you restarted > nova and neutron services on the compute nodes after installation? Have you > debug logs of nova-conductor and maybe nova-compute? Maybe they can help > narrow down the issue. > If there isn't any additional information in the debug logs I probably > would start "tearing down" rabbitmq. I didn't have to do that in a > production system yet so be careful. I can think of two routes: > > - Either remove queues, exchanges etc. while rabbit is running, this will > most likely impact client IO depending on your load. Check out the > rabbitmqctl commands. > - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes > and restart rabbitmq so the exchanges, queues etc. rebuild. > > I can imagine that the failed reply "survives" while being replicated > across the rabbit nodes. But I don't really know the rabbit internals too > well, so maybe someone else can chime in here and give a better advice. > > Regards, > Eugen > > Zitat von Swogat Pradhan : > > Hi, > Can someone please help me out on this issue? > > With regards, > Swogat Pradhan > > On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan > wrote: > > Hi > I don't see any major packet loss. > It seems the problem is somewhere in rabbitmq maybe but not due to packet > loss. > > with regards, > Swogat Pradhan > > On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan > wrote: > > Hi, > Yes the MTU is the same as the default '1500'. > Generally I haven't seen any packet loss, but never checked when > launching the instance. > I will check that and come back. > But everytime i launch an instance the instance gets stuck at spawning > state and there the hypervisor becomes down, so not sure if packet loss > causes this. > > With regards, > Swogat pradhan > > On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: > > One more thing coming to mind is MTU size. Are they identical between > central and edge site? Do you see packet loss through the tunnel? > > Zitat von Swogat Pradhan : > > > Hi Eugen, > > Request you to please add my email either on 'to' or 'cc' as i am not > > getting email's from you. > > Coming to the issue: > > > > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p > / > > Listing policies for vhost "/" ... > > vhost name pattern apply-to definition priority > > / ha-all ^(?!amq\.).* queues > > > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > > > > I have the edge site compute nodes up, it only goes down when i am > trying > > to launch an instance and the instance comes to a spawning state and > then > > gets stuck. > > > > I have a tunnel setup between the central and the edge sites. > > > > With regards, > > Swogat Pradhan > > > > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > > wrote: > > > >> Hi Eugen, > >> For some reason i am not getting your email to me directly, i am > checking > >> the email digest and there i am able to find your reply. > >> Here is the log for download: https://we.tl/t-L8FEkGZFSq > >> Yes, these logs are from the time when the issue occurred. > >> > >> *Note: i am able to create vm's and perform other activities in the > >> central site, only facing this issue in the edge site.* > >> > >> With regards, > >> Swogat Pradhan > >> > >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >> wrote: > >> > >>> Hi Eugen, > >>> Thanks for your response. > >>> I have actually a 4 controller setup so here are the details: > >>> > >>> *PCS Status:* > >>> * Container bundle set: rabbitmq-bundle [ > >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-no-ceph-3 > >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-2 > >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-1 > >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-0 > >>> > >>> I have tried restarting the bundle multiple times but the issue is > still > >>> present. > >>> > >>> *Cluster status:* > >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status > >>> Cluster status of node > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... > >>> Basics > >>> > >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>> > >>> Disk Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Running Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Versions > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > RabbitMQ > >>> 3.8.3 on Erlang 22.3.4.1 > >>> > >>> Alarms > >>> > >>> (none) > >>> > >>> Network Partitions > >>> > >>> (none) > >>> > >>> Listeners > >>> > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: [::], port: 25672, protocol: clustering, purpose: > inter-node and > >>> CLI tool communication > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API > >>> > >>> Feature flags > >>> > >>> Flag: drop_unroutable_metric, state: enabled > >>> Flag: empty_basic_get_metric, state: enabled > >>> Flag: implicit_default_bindings, state: enabled > >>> Flag: quorum_queue, state: enabled > >>> Flag: virtual_host_metadata, state: enabled > >>> > >>> *Logs:* > >>> *(Attached)* > >>> > >>> With regards, > >>> Swogat Pradhan > >>> > >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> Please find the nova conductor as well as nova api log. > >>>> > >>>> nova-conuctor: > >>>> > >>>> 2023-02-26 08:45:01.108 31 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 16152921c1eb45c2b1f562087140168b > >>>> 2023-02-26 08:45:02.144 26 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> 83dbe5f567a940b698acfe986f6194fa > >>>> 2023-02-26 08:45:02.314 32 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply > >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds > due to a > >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.282 35 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.303 33 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > b240e3e89d99489284cd731e75f2a5db > >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > with > >>>> backend dogpile.cache.null. > >>>> 2023-02-26 08:50:01.264 27 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> > >>>> With regards, > >>>> Swogat Pradhan > >>>> > >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> wrote: > >>>> > >>>>> Hi, > >>>>> I currently have 3 compute nodes on edge site1 where i am trying to > >>>>> launch vm's. > >>>>> When the VM is in spawning state the node goes down (openstack > compute > >>>>> service list), the node comes backup when i restart the nova > compute > >>>>> service but then the launch of the vm fails. > >>>>> > >>>>> nova-compute.log > >>>>> > >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running > >>>>> instance usage > >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 > to > >>>>> 2023-02-26 08:00:00. 0 instances. > >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node > >>>>> dcn01-hci-0.bdxworld.com > >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device > name: > >>>>> /dev/vda. Libvirt can't honour user-supplied dev names > >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume > >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > with > >>>>> backend dogpile.cache.null. > >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running > >>>>> privsep helper: > >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > 'privsep-helper', > >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', > >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new > privsep > >>>>> daemon via rootwrap > >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon starting > >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with uid/gid: 0/0 > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with capabilities (eff/prm/inh): > >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon running as pid 2647 > >>>>> 2023-02-26 08:49:55.956 7 WARNING > os_brick.initiator.connectors.nvmeof > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process > >>>>> execution error > >>>>> in _get_host_uuid: Unexpected error while running command. > >>>>> Command: blkid overlay -s UUID -o value > >>>>> Exit code: 2 > >>>>> Stdout: '' > >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > >>>>> Unexpected error while running command. > >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>>> > >>>>> Is there a way to solve this issue? > >>>>> > >>>>> > >>>>> With regards, > >>>>> > >>>>> Swogat Pradhan > >>>>> > >>>> > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ddorra at t-online.de Sun Mar 5 13:02:51 2023 From: ddorra at t-online.de (ddorra at t-online.de) Date: Sun, 5 Mar 2023 14:02:51 +0100 (CET) Subject: [Openstack Trove] Fails to create DB container - "context deadline exceeded", even though images can be pulled manually Message-ID: <1678021371655.583885.3e98060dd2330376c64446c87304aa08d5d3462b@spica.telekom.de> Hi, my Trove service installed into a Openstack Victoria fails in starting DB instances: 2023-03-05 12:06:17.172 1015 INFO trove.guestagent.datastore.mysql_common.service [-] Starting docker container, image: mysql:5.7.29 2023-03-05 12:06:17.174 1015 WARNING trove.guestagent.utils.docker [-] Failed to get container database: docker.errors.NotFound: 404 Client Error: Not Found ("No such container: database") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service [-] Failed to start mysql: docker.errors.APIError: 500 Server Error: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded") It sounds that it's unable to connect to Docker in order to download the image. However, after logging in to the DB instance I am able to pull docker images manually, or even a cloudinit action can successfully pull images during instance creation... root at t1:~# docker images REPOSITORY TAG IMAGE ID CREATED SIZE mysql 5.7.29 5d9483f9a7b2 2 years ago 455MB root at t1:~# The cloudinit I need to adjust the default GW in oder to reach internet. I hope that is early enough in the boot process and doesn't hamper the guestagent. Any hint what I can do? ============= Trove cloudinit root at voscontrol:~# cat /etc/trove/cloudinit/mysql.cloudinit #cloud-config write_files: - content: | ... path: /etc/ssh/sshd_config - content: | ... path: /root/.bash_aliases runcmd: - [ ip, route, delete, default, via, 10.0.0.79 ] - [ ip, route, add, default, via, 10.0.0.62 ] - [ service, ssh, restart ] - [ docker, pull, "mysql:5.7.29" ] root at voscontrol:~# ================ guest config root at t1:/etc/trove# cat ./conf.d/guest_info.conf [DEFAULT] guest_id=291070bc-671d-437a-a68c-9cee840e614c datastore_manager=mysql datastore_version=5.7.29 tenant_id=606b291caeab4dc2bb1072e2e43b082a instance_rpc_encr_key=EEKenGHSe4exOOU8a0ivV2aR8FwOr2yw root at t1:/etc/trove# cat ./conf.d/trove-guestagent.conf [DEFAULT] log_file = trove-guestagent.log log_dir = /var/log/trove/ ignore_users = os_admin control_exchange = trove #transport_url = rabbit://openstack:pass123 at 192.168.100.79:5672/ transport_url = rabbit://openstack:pass123 at 10.0.0.79:5672/ rpc_backend = rabbit command_process_timeout = 60 use_syslog = False debug = True [service_credentials] #auth_url = http://192.168.100.79:5000/v3 auth_url = http://10.0.0.79:5000/v3 region_name = RegionOne project_name = service password = pass123 project_domain_name = Default user_domain_name = Default username = trove root at t1:/etc/trove# ================ /var/log/trove/trove-guestagent.log ... 2023-03-05 12:06:17.162 1015 DEBUG oslo_concurrency.processutils [-] CMD "sudo mkdir -p /var/run/mysqld" returned: 0 in 0.009s execute /opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 2023-03-05 12:06:17.162 1015 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo chown -R 1001:1001 /var/run/mysqld execute /opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 2023-03-05 12:06:17.171 1015 DEBUG oslo_concurrency.processutils [-] CMD "sudo chown -R 1001:1001 /var/run/mysqld" returned: 0 in 0.009s execute /opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 2023-03-05 12:06:17.172 1015 INFO trove.guestagent.datastore.mysql_common.service [-] Starting docker container, image: mysql:5.7.29 2023-03-05 12:06:17.174 1015 WARNING trove.guestagent.utils.docker [-] Failed to get container database: docker.errors.NotFound: 404 Client Error: Not Found ("No such container: database") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service [-] Failed to start mysql: docker.errors.APIError: 500 Server Error: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service response.raise_for_status() 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/database/json 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 58, in start_container 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service container = client.containers.get(name) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 887, in get 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service resp = self.client.api.inspect_container(container_id) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service return f(self, resource_id, *args, **kwargs) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 771, in inspect_container 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service self._get(self._url("/containers/{0}/json", container)), True 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service self._raise_for_status(response) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service docker.errors.NotFound: 404 Client Error: Not Found ("No such container: database") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service response.raise_for_status() 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/create?name=database 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 810, in run 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service detach=detach, **kwargs) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 868, in create 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service resp = self.client.api.create_container(**create_kwargs) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 430, in create_container 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service return self.create_container_from_config(config, name) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 441, in create_container_from_config 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service return self._result(res, True) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service self._raise_for_status(response) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: mysql:5.7.29") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service response.raise_for_status() 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service Traceback (most recent call last): 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", line 612, in start_db 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service command=command 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 73, in start_container 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service command=command 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 812, in run 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service self.client.images.pull(image, platform=platform) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", line 445, in pull 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service repository, tag=tag, stream=True, **kwargs 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", line 415, in pull 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service self._raise_for_status(response) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service docker.errors.APIError: 500 Server Error: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded") 2023-03-05 12:06:32.540 1015 ERROR trove.guestagent.datastore.mysql_common.service 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager [-] Failed to prepare datastore: Failed to start mysql: trove.common.exception.TroveError: Failed to start mysql 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager response.raise_for_status() 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/database/json 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 58, in start_container 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager container = client.containers.get(name) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 887, in get 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager resp = self.client.api.inspect_container(container_id) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager return f(self, resource_id, *args, **kwargs) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 771, in inspect_container 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self._get(self._url("/containers/{0}/json", container)), True 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self._raise_for_status(response) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager docker.errors.NotFound: 404 Client Error: Not Found ("No such container: database") 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager response.raise_for_status() 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/create?name=database 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 810, in run 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager detach=detach, **kwargs) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 868, in create 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager resp = self.client.api.create_container(**create_kwargs) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 430, in create_container 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager return self.create_container_from_config(config, name) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 441, in create_container_from_config 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager return self._result(res, True) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self._raise_for_status(response) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: mysql:5.7.29") 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager response.raise_for_status() 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", line 612, in start_db 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager command=command 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 73, in start_container 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager command=command 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 812, in run 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self.client.images.pull(image, platform=platform) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", line 445, in pull 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager repository, tag=tag, stream=True, **kwargs 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", line 415, in pull 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self._raise_for_status(response) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager docker.errors.APIError: 500 Server Error: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded") 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager Traceback (most recent call last): 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", line 223, in _prepare 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager cluster_config, snapshot, ds_version=ds_version) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager result = f(*args, **kwargs) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/manager.py", line 96, in do_prepare 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager self.app.start_db(ds_version=ds_version, command=command) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", line 620, in start_db 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise exception.TroveError(_("Failed to start mysql")) 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager trove.common.exception.TroveError: Failed to start mysql 2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 2023-03-05 12:06:32.549 1015 INFO trove.guestagent.datastore.manager [-] Ending datastore prepare for 'mysql'. 2023-03-05 12:06:32.550 1015 INFO trove.guestagent.datastore.service [-] Set final status to failed to spawn. 2023-03-05 12:06:32.550 1015 DEBUG trove.guestagent.datastore.service [-] Casting set_status message to conductor (status is 'failed to spawn'). set_status /opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/service.py:144 2023-03-05 12:06:32.550 1015 DEBUG trove.conductor.api [-] Making async call to cast heartbeat for instance: 291070bc-671d-437a-a68c-9cee840e614c heartbeat /opt/guest-agent-venv/lib/python3.6/site-packages/trove/conductor/api.py:73 2023-03-05 12:06:32.553 1015 DEBUG trove.guestagent.datastore.service [-] Successfully cast set_status. set_status /opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/service.py:151 2023-03-05 12:06:32.553 1015 DEBUG trove.conductor.api [-] Making async call to cast error notification notify_exc_info /opt/guest-agent-venv/lib/python3.6/site-packages/trove/conductor/api.py:115 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server [-] Exception during message handling: trove.common.exception.TroveError: Failed to start mysql 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server response.raise_for_status() 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/database/json 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 58, in start_container 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server container = client.containers.get(name) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 887, in get 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server resp = self.client.api.inspect_container(container_id) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return f(self, resource_id, *args, **kwargs) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 771, in inspect_container 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self._get(self._url("/containers/{0}/json", container)), True 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self._raise_for_status(response) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server docker.errors.NotFound: 404 Client Error: Not Found ("No such container: database") 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server response.raise_for_status() 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.41/containers/create?name=database 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 810, in run 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server detach=detach, **kwargs) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 868, in create 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server resp = self.client.api.create_container(**create_kwargs) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 430, in create_container 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return self.create_container_from_config(config, name) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", line 441, in create_container_from_config 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return self._result(res, True) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 265, in _result 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self._raise_for_status(response) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: mysql:5.7.29") 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 259, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server response.raise_for_status() 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise HTTPError(http_error_msg, response=self) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", line 612, in start_db 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server command=command 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", line 73, in start_container 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server command=command 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", line 812, in run 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self.client.images.pull(image, platform=platform) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", line 445, in pull 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server repository, tag=tag, stream=True, **kwargs 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", line 415, in pull 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self._raise_for_status(response) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise create_api_error_from_http_exception(e) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, response=response, explanation=explanation) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server docker.errors.APIError: 500 Server Error: Internal Server Error ("Get "https://registry-1.docker.io/v2/": context deadline exceeded") 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred: 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", line 207, in prepare 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server ds_version=ds_version) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", line 223, in _prepare 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server cluster_config, snapshot, ds_version=ds_version) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/manager.py", line 96, in do_prepare 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server self.app.start_db(ds_version=ds_version, command=command) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File "/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", line 620, in start_db 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise exception.TroveError(_("Failed to start mysql")) 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server trove.common.exception.TroveError: Failed to start mysql 2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 2023-03-05 12:06:58.303 1015 DEBUG trove.guestagent.datastore.manager [-] Getting file system stats for '/var/lib/mysql' get_filesystem_stats /opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py:368 2023-03-05 12:07:16.911 1015 DEBUG oslo_service.periodic_task [-] Running periodic task Manager.update_status run_periodic_tasks /opt/guest-agent-venv/lib/python3.6/site-packages/oslo_service/periodic_task.py:211 2023-03-05 12:07:16.911 1015 INFO trove.guestagent.datastore.manager [-] Database service is not installed, skip status check 2023-03-05 12:08:16.918 1015 DEBUG oslo_service.periodic_task [-] Running periodic task Manager.update_status run_periodic_tasks /opt/guest-agent-venv/lib/python3.6/site-packages/oslo_service/periodic_task.py:211 2023-03-05 12:08:16.919 1015 INFO trove.guestagent.datastore.manager [-] Database service is not installed, skip status check root at t1:/var/log/trove# ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Mon Mar 6 09:05:28 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Mon, 6 Mar 2023 10:05:28 +0100 Subject: [kolla] Weekly meeting on 8th March cancelled Message-ID: <4FB2F7F9-9368-4352-ADFA-C34718807B3C@gmail.com> Hola Koalas, Weekly meeting on Wed this week is cancelled - I?m off on vacation. See you next week! Best regards, Michal From thierry at openstack.org Mon Mar 6 09:08:05 2023 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 6 Mar 2023 10:08:05 +0100 Subject: [largescale-sig] Next meeting: March 8, 9utc Message-ID: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org> Hi everyone, The Large Scale SIG will be meeting this Wednesday in #openstack-operators on OFTC IRC, at 9UTC, our APAC+EU-friendly time. You can doublecheck how that UTC time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20230308T09 Feel free to add topics to the agenda: https://etherpad.opendev.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From lokendrarathour at gmail.com Mon Mar 6 09:34:23 2023 From: lokendrarathour at gmail.com (Lokendra Rathour) Date: Mon, 6 Mar 2023 15:04:23 +0530 Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal Instance Message-ID: Hi Team, we have a PoC where we wish to try Creating OpenStack Baremetal Instance using Alma Linux. Please help in case that is possible, any reference where we can use the Alma Images to instantiate the Baremetal Instance. -- ~ Lokendra skype: lokendrarathour -------------- next part -------------- An HTML attachment was scrubbed... URL: From katonalala at gmail.com Mon Mar 6 10:35:47 2023 From: katonalala at gmail.com (Lajos Katona) Date: Mon, 6 Mar 2023 11:35:47 +0100 Subject: [neutron] Bug deputy report (week 09, starting on Feb-27-2022) Message-ID: Hi neutrinos, Here are the bugs reported between February 27 and March 05. *First the unassigned bugs:* * https://bugs.launchpad.net/neutron/+bug/2009043 neutron-l3-agent restart some random ha routers get wrong state *High prio bugs* * https://bugs.launchpad.net/neutron/+bug/2008712 Security group rule deleted by cascade (because its remote group had been deleted) is not deleted in the backend * https://bugs.launchpad.net/neutron/+bug/2008947 Investigate test_restart_rpc_on_sighup_multiple_workers failure * https://bugs.launchpad.net/neutron/+bug/2009055 Performance issue when creating lots of ports * https://bugs.launchpad.net/neutron/+bug/2009215 [OVS] Error during OVS agent start *Medium or lower:* * https://bugs.launchpad.net/neutron/+bug/2008858 Call the api and do not return for a long time * https://bugs.launchpad.net/neutron/+bug/2008912 "_validate_create_network_callback" failing with 'NoneType' object has no attribute 'qos_policy_id' * https://bugs.launchpad.net/neutron/+bug/2008943 OVN DB Sync utility cannot find NB DB Port Group * https://bugs.launchpad.net/neutron/+bug/2009053 OVN: default stateless SG blocks metadata traffic * https://bugs.launchpad.net/neutron/+bug/2009221 [OVS] Custom ethertype traffic is not coming into the VM *One incomplete:* * https://bugs.launchpad.net/neutron/+bug/2008808 Duplicate packet when ping to external with out floating ip *Already merged / released bugs:* * https://bugs.launchpad.net/neutron/+bug/2008695 Remove any LB HM references from the external_ids upon deleting an HM * https://bugs.launchpad.net/neutron/+bug/2008767 [sqlalchemy-20][vnpaas] SQL execution without transaction in progress Have a bugless week :-) Lajos (lajoskatona) -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Mon Mar 6 11:35:43 2023 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 6 Mar 2023 12:35:43 +0100 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: References: Message-ID: <20230306113543.a57aywefbn4cgsu3@localhost> On 16/02, Rishat Azizov wrote: > Hello! > > We have an error with creating backups from iscsi volume. Usually, this > happens with large backups over 100GB. > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > [req-f6619913-6f96-4226-8d75-2da3fca722f1 23de1b92e7674cf59486f07ac75b886b > a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message handling: > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while > running command. > Command: multipath -f 3624a93705842cfae35d7483200015ec6 > Exit code: 1 > Stdout: '' > Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a > multipath device\n' > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Traceback > (most recent call last): > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, > in _process_incoming > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server res = > self.dispatcher.dispatch(message) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line > 309, in dispatch > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > self._do_dispatch(endpoint, method, ctxt, args) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line > 229, in _do_dispatch > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = > func(ctxt, **new_args) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > func(self, *args, **kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in > create_backup > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > volume_utils.update_backup_error(backup, str(err)) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in > __exit__ > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > self.force_reraise() > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in > force_reraise > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise > self.value > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in > create_backup > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server updates = > self._run_backup(context, backup, volume) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in > _run_backup > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > ignore_errors=True) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, in > _detach_device > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > force=force, ignore_errors=ignore_errors) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in > trace_logging_wrapper > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > f(*args, **kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, > in inner > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > f(*args, **kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > line 880, in disconnect_volume > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > is_disconnect_call=True) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > line 942, in _cleanup_connection > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > self._linuxscsi.flush_multipath_device(multipath_name) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line > 382, in flush_multipath_device > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > root_helper=self._root_helper) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in > _execute > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = > self.__execute(*args, **kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line > 172, in execute > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > execute_root(*cmd, **kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, > in _wrap > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > self.channel.remote_call(name, args, kwargs) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in > remote_call > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise > exc_type(*result[2]) > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while > running command. > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command: > multipath -f 3624a93705842cfae35d7483200015ec6 > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit code: 1 > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: '' > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: 'Feb > 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath device\n' > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > Could you please help with this error? Hi, Does it work for smaller volumes or does it also fail? What are your defaults in your /etc/multipath.conf file? What Cinder release are you using? Cheers, Gorka. From jungleboyj at gmail.com Mon Mar 6 13:56:03 2023 From: jungleboyj at gmail.com (Jay Bryant) Date: Mon, 6 Mar 2023 07:56:03 -0600 Subject: [cinder] proposing Jon Bernard for cinder core In-Reply-To: References: Message-ID: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com> No objections from me!? I think Jon would be a great addition! Thanks, Jay On 3/3/2023 5:04 AM, Rajat Dhasmana wrote: > Hello everyone, > > I would like to propose Jon Bernard as cinder core. Looking at the > review stats > for the past 60[1], 90[2], 120[3] days, he has been consistently in > the top 5 > reviewers with a good?+/- ratio and leaving helpful comments > indicating good > quality of reviews. He has been managing the stable?branch releases > for the > past 2 cycles (Zed and 2023.1) and has helped in releasing security > issues as well. > > Jon has been part of the cinder and OpenStack community for a long > time and > has shown very active interest in upstream activities, be it release > liaison, review > contribution, attending cinder meetings and also involving in > outreachy activities. > He will be a very good addition to our team helping out with the > review bandwidth > and adding valuable input in our discussions. > > I will leave this thread open for a week and if there are no > objections, I will add > Jon Bernard to the cinder core team. > > [1] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 > > [2] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 > > [3] > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 > > > Thanks > Rajat Dhasmana From steveftaylor at gmail.com Mon Mar 6 15:40:36 2023 From: steveftaylor at gmail.com (Steve Taylor) Date: Mon, 06 Mar 2023 08:40:36 -0700 Subject: [openstack-helm] Get rid of cephfs and rbd provisioners In-Reply-To: References: Message-ID: I agree as well. I have been working to update the Ceph client images to Quincy and Focal to match the Ceph daemons that have already been updated, and the CephFS provisioner is proving difficult. It is outdated and incompatible with Python 3, but newer librados packages are only built for Python 3. If we want to keep the old provisioners around, the path that makes the most sense is to update them to be compatible with more modern frameworks and libraries, but personally, I don't see a need. I think pretty much everyone has moved to CSI, and anyone that hasn't probably should. I am in favor of removing the outdated provisioners. Steve On 3/2/2023 3:22:20 PM, Mohammed Naser wrote: Hi Vladimir, I agree.? I also think we should stop maintaining the CSI provisioner chart and simply deploy the one provided by the Ceph CSI team Less code we maintain, the better. Thanks Mohammed On Thu, Mar 2, 2023 at 10:13 PM Vladimir Kozhukalov wrote: Hi everyone, I would like to suggest getting rid of cephfs and rbd provisioners. They have been retired and have not been maintained for about 2.5 years now [1]. I believe the CSI approach is what all users rely on nowadays and we can safely remove them.? The trigger for this suggestion is that we are currently experiencing issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing this is just wasting time. [2] Stephen spent some time debugging the issues and can give more details if needed.? What?do you think? ? [1]?https://github.com/kubernetes-retired/external-storage/tree/master/ceph [https://github.com/kubernetes-retired/external-storage/tree/master/ceph] [2]?https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976 [https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976] -- Best regards, Kozhukalov Vladimir -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcetto at gmail.com Mon Mar 6 16:10:44 2023 From: garcetto at gmail.com (garcetto) Date: Mon, 6 Mar 2023 17:10:44 +0100 Subject: [manila] ha for share server Message-ID: good afternoon, is it possible to have HA for share server vm in openstack? i mean, the vm that is created on every tenant and used as nfs server. thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcetto at gmail.com Mon Mar 6 17:05:55 2023 From: garcetto at gmail.com (garcetto) Date: Mon, 6 Mar 2023 18:05:55 +0100 Subject: [manila] share server with multiple tenant networks Message-ID: good afternoon, i am trying to add a second share server or share to existing share server on different network inside same tenant, actually i have: tenant-net-01 (with the share server), ok working. tenant-net-02 (tried to add a share network, but the error says " create: Could not find an existing share server or allocate one on the share network provided. You may use a different share network, or verify the network details in the share network and retry your request. If this doesn't work, contact your administrator to troubleshoot issues with your network. " any clue or doc i can read? thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Mar 6 17:19:26 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 06 Mar 2023 18:19:26 +0100 Subject: [neutron] CI meeting Tuesday 7.03.2023 cancelled Message-ID: <5222101.MrzyGnTNMV@p1> Hi, Due to some internal event which I have tomorrow in the same time as our CI meeting I will not be able to run the CI meeting. I don't see any really serious issues in our CI this week and after discussing that with Rodolfo we decided to cancel this week's CI meeting. See You on the meeting next week. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From james.denton at rackspace.com Mon Mar 6 17:25:15 2023 From: james.denton at rackspace.com (James Denton) Date: Mon, 6 Mar 2023 17:25:15 +0000 Subject: Paying for Openstack support In-Reply-To: References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com> <1739978420.3955026.1677861284070@mail.yahoo.com> Message-ID: +1 for Red Hat support. Rackspace is still very much in the OpenStack game, especially for private clouds, with deployments mainly based on RHOSP and OpenStack-Ansible. Happy to put you in touch with someone if you?d like more info on various support services (short or long term). -- James Denton Principal Architect Rackspace Private Cloud - OpenStack james.denton at rackspace.com From: John van Ommen Date: Friday, March 3, 2023 at 2:11 PM To: ozzzo at yahoo.com Cc: OpenStack Discuss Subject: Re: Paying for Openstack support CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! I've worked on a number of RHOSP deployments that relied on Red Hat support, and the experience was positive. The first OpenStack project that I did, it depended on SwiftStack for support and they were great. But they were acquired by Nvidia. AFAIK, there aren't many companies that still provide OpenStack support in the United States. From what I understand, RackSpace has been pivoting towards doing AWS support. On Fri, Mar 3, 2023 at 8:36?AM Albert Braden > wrote: I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience. Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it? If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pshchelokovskyy at mirantis.com Mon Mar 6 17:32:40 2023 From: pshchelokovskyy at mirantis.com (Pavlo Shchelokovskyy) Date: Mon, 6 Mar 2023 19:32:40 +0200 Subject: [barbican] database is growing and can not be purged Message-ID: Hi all, we are observing the following behavior in Barbican: - OpenStack environment is using both encrypted Cinder volumes and encrypted local storage (lvm) for Nova instances - over the time, the secrets and orders tables are growing - many soft-deleted entries in secrets DB can not be purged by the db cleanup script As I understand what is happening - both Cinder and Nova create secrets in Barbican on behalf of the user when creating an encrypted volume or booting an instance with encrypted local storage. They both do it via castellan library, that under the hood creates orders in Barbican, waits for them to become active and returns to the caller only the ID of the generated secret. When time comes to delete the thing (volume or instance) Cinder/Nova again use castellan, but only delete the secret, not the order (they are not aware that there was any 'order' created anyway). As a result, the orders are left in place, and DB cleanup procedure does not delete soft-deleted secrets when there's an ACTIVE order referencing such secret. This is troublesomes on many levels - users who use Cinder or Nova may not even be aware that they are creating something in Barbican. Orders accumulating like that may eventually result in cryptic errors when e.g. when you run out of quota for orders. And what's more, default Barbican policies do allow 'normal' (creator) users to create an order, but not delete it (only project admin can do it), so even if the users are aware of Barbican involvement, they can not delete those orders manually anyway. Plus there's no good way in API to determine outright which orders are referencing deleted secrets. I see several ways of dealing with that and would like to ask for your opinion on what would be the best one: 1. Amend Barbican API to allow filtering orders by the secrets, when castellan deletes a secret - search for corresponding order and delete it as well, change default policy to actually allow order deletion by the same users who can create them. 2. Cascade-delete orders when deleting secrets - this is easy but probably violates that very policy that disallowed normal users to delete orders. 3. improve the database cleanup so it first marks any order that references a deleted secret also as deleted, so later when time comes both could be purged (or something like that). This also has a similar downside to the previous option by not being explicit enough. I've filed a bug for that https://storyboard.openstack.org/#!/story/2010625 and proposed a patch for option 2 (cascade delete), but would like to ask what would you see as the most appropriate way or may be there's something else that I've missed. Btw, the problem is probably even more pronounced with keypairs - when castellan is used to create those, under the hood both order and container are created besides the actual secrets, and again only the secret ids are returned to the caller. When time comes to delete things, the caller only knows about secret IDs, and can only delete them, leaving both container and order behind. Luckily, I did not find any place across OpenStack that actually creates keypairs using castellan... but the problem is definitely there. Best regards, -- Dr. Pavlo Shchelokovskyy Principal Software Engineer Mirantis Inc www.mirantis.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Mon Mar 6 18:36:30 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 06 Mar 2023 10:36:30 -0800 Subject: [qa][stable][gate] Tempest master failing on stable/yoga|xena, fix is in gate Message-ID: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com> Hi All, In case any of you are seeing failure on stable/yoga and stable/xena with the below error, please hold the recheck until the fix (revert) in Tempest is merged "AttributeError: type object 'Draft4Validator' has no attribute 'FORMAT_CHECKER'" - https://review.opendev.org/c/openstack/tempest/+/876218 -gmann From kozhukalov at gmail.com Mon Mar 6 18:47:29 2023 From: kozhukalov at gmail.com (Vladimir Kozhukalov) Date: Mon, 6 Mar 2023 21:47:29 +0300 Subject: [openstack-helm] Get rid of cephfs and rbd provisioners In-Reply-To: References: Message-ID: Guys, Thank you for your thoughts. I appreciate it. Looks like we all agree about removal of old style provisioners. I'll prepare a PS for this. Also thanks for the good idea to switch to the upstream charts for CSI Ceph provisioners [1]. Let's do this as a separate PS. [1] https://github.com/ceph/ceph-csi/tree/devel/charts On Mon, Mar 6, 2023 at 6:40?PM Steve Taylor wrote: > I agree as well. I have been working to update the Ceph client images to > Quincy and Focal to match the Ceph daemons that have already been updated, > and the CephFS provisioner is proving difficult. It is outdated and > incompatible with Python 3, but newer librados packages are only built for > Python 3. > > If we want to keep the old provisioners around, the path that makes the > most sense is to update them to be compatible with more modern frameworks > and libraries, but personally, I don't see a need. I think pretty much > everyone has moved to CSI, and anyone that hasn't probably should. I am in > favor of removing the outdated provisioners. > > Steve > > On 3/2/2023 3:22:20 PM, Mohammed Naser wrote: > Hi Vladimir, > > I agree. I also think we should stop maintaining the CSI provisioner > chart and simply deploy the one provided by the Ceph CSI team > > Less code we maintain, the better. > > Thanks > Mohammed > > On Thu, Mar 2, 2023 at 10:13?PM Vladimir Kozhukalov > wrote: > >> Hi everyone, >> >> I would like to suggest getting rid of cephfs and rbd provisioners. They >> have been retired and have not been maintained for about 2.5 years now [1]. >> I believe the CSI approach is what all users rely on nowadays and we can >> safely remove them. >> >> The trigger for this suggestion is that we are currently experiencing >> issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing >> this is just wasting time. [2] Stephen spent some time debugging the issues >> and can give more details if needed. >> >> What do you think? >> >> [1] >> https://github.com/kubernetes-retired/external-storage/tree/master/ceph >> [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976 >> -- >> Best regards, >> Kozhukalov Vladimir >> > > > -- > Mohammed Naser > VEXXHOST, Inc. > > -- Best regards, Kozhukalov Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Mon Mar 6 18:47:52 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 6 Mar 2023 10:47:52 -0800 Subject: [ironic] PTL Availability Message-ID: Hi all, Just a heads up -- I'll be out of town and mostly out of IRC from Wednesday, 3/8 to Monday 3/13. If there are emergent issues that need to be addressed urgently, please send an email directly to me and I can have a look. Alternatively, I have a large amount of trust in our delegated release managers and former Ironic PTLs and support them if they need to take action in my stead. - Jay Faulkenr -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Mon Mar 6 18:56:35 2023 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 6 Mar 2023 10:56:35 -0800 Subject: [manila] share server with multiple tenant networks In-Reply-To: References: Message-ID: Hi Garcetto, On Mon, Mar 6, 2023 at 9:07?AM garcetto wrote: > good afternoon, > i am trying to add a second share server or share to existing share > server on different network inside same tenant, actually i have: > > tenant-net-01 (with the share server), ok working. > tenant-net-02 (tried to add a share network, but the error says > " > create: Could not find an existing share server or allocate one on the > share network provided. You may use a different share network, or verify > the network details in the share network and retry your request. If this > doesn't work, contact your administrator to troubleshoot issues with your > network. > " > As the message suggests, manila was unable to obtain network allocations to create a share server for you on the second network. Are you a user, or an administrator of this cloud? If you are an administrator, you should look at the logs from manila's share-manager service to check what failed. You cannot expect to attach multiple tenant networks to the same share server unfortunately - that isn't supported today. > any clue or doc i can read? > thank you > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Mon Mar 6 19:06:22 2023 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 6 Mar 2023 11:06:22 -0800 Subject: [manila] ha for share server In-Reply-To: References: Message-ID: Hi Garcetto, On Mon, Mar 6, 2023 at 8:11?AM garcetto wrote: > good afternoon, > is it possible to have HA for share server vm in openstack? > i mean, the vm that is created on every tenant and used as nfs server. > VMs created by Manila's generic driver aren't set up with any sort of HA. We've had ideas in the past [1] to configure HA. For now, the investment in the generic driver in the upstream community is to just provide a reference architecture for a hard multi-tenancy driver. We'd love to have help to revive those efforts. [1] https://review.opendev.org/c/openstack/manila-specs/+/504987 > thank you > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gouthampravi at gmail.com Mon Mar 6 21:02:28 2023 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Mon, 6 Mar 2023 13:02:28 -0800 Subject: [manila] share server with multiple tenant networks In-Reply-To: References: Message-ID: On Mon, Mar 6, 2023 at 12:12?PM garcetto wrote: > thank you, so how can i have multiple shares on different tenant networks, > creating more share servers right? > Yes; map these tenant networks into share networks in manila, and create your shares with the share networks. Each share network will trigger the creation of a share server if one already doesn't exist. > > > On Mon, Mar 6, 2023 at 7:56?PM Goutham Pacha Ravi > wrote: > >> Hi Garcetto, >> >> On Mon, Mar 6, 2023 at 9:07?AM garcetto wrote: >> >>> good afternoon, >>> i am trying to add a second share server or share to existing share >>> server on different network inside same tenant, actually i have: >>> >> >>> tenant-net-01 (with the share server), ok working. >>> tenant-net-02 (tried to add a share network, but the error says >>> " >>> create: Could not find an existing share server or allocate one on the >>> share network provided. You may use a different share network, or verify >>> the network details in the share network and retry your request. If this >>> doesn't work, contact your administrator to troubleshoot issues with your >>> network. >>> " >>> >> >> As the message suggests, manila was unable to obtain network allocations >> to create a share server for you on the second network. Are you a user, or >> an administrator of this cloud? If you are an administrator, you >> should look at the logs from manila's share-manager service to check what >> failed. >> >> You cannot expect to attach multiple tenant networks to the same share >> server unfortunately - that isn't supported today. >> >> >> >>> any clue or doc i can read? >>> thank you >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Mon Mar 6 21:34:42 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Tue, 7 Mar 2023 04:34:42 +0700 Subject: [openstack][backup] Experience for instance backup Message-ID: Hello guys. I am looking for instance backup solution. I am using Cinder backup with nfs backup but it looks not too fast. I am using a 10Gbps network. I would like to know experience for best practice for instance backup solutions on Openstack. Thank you. Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From haiwu.us at gmail.com Mon Mar 6 22:27:23 2023 From: haiwu.us at gmail.com (hai wu) Date: Mon, 6 Mar 2023 16:27:23 -0600 Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on production Openstack system Message-ID: Is there any concern with using osprofiler/rally directly on the production Openstack system? From andr.kurilin at gmail.com Mon Mar 6 23:09:51 2023 From: andr.kurilin at gmail.com (Andriy Kurilin) Date: Tue, 7 Mar 2023 00:09:51 +0100 Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on production Openstack system In-Reply-To: References: Message-ID: Hi! OSProfiler enables tracing only requests with a special header. The header is not embedded in each request (even if you configure osprofiler in your system), you need to use a special CLI argument to set it. So even if the tracing of one particular request slows done the flow (which should not happen), it should give zero impact on the performance of the whole system. As for Rally (in particular, the task component), it creates resources with a special naming format that allows to filter out only these resources during the cleanup process. We are using Rally in production as a part of the monitoring for ~5 years or so. ??, 6 ???. 2023??. ? 23:34, hai wu : > Is there any concern with using osprofiler/rally directly on the > production Openstack system? > > -- Best regards, Andrey Kurilin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomas.bredar at gmail.com Mon Mar 6 23:32:11 2023 From: tomas.bredar at gmail.com (=?UTF-8?B?VG9tw6HFoSBCcmVkw6Fy?=) Date: Tue, 7 Mar 2023 00:32:11 +0100 Subject: [ovn] safely change bridge_mappings Message-ID: Hi, I have a running production OpenStack deployment - version Wallaby installed using TripleO. I'm using the default OVN/OVS networking. For provider networks I have two bridges on the compute nodes br-ex and br-ex2. Instances mainly use br-ex for provider networks, but there are some instances which started using a provider network which should be mapped to br-ex2, however I didn't specify "bridge_mappings" on ml2_conf.ini, so the traffic wants to flow through the default datacentre:br-ex. My questions is, what services should I restart on the controller and compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And if this operation is safe and if the instances already using br-ex will lose connectivity? Thanks for your help Tomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From haiwu.us at gmail.com Tue Mar 7 01:36:37 2023 From: haiwu.us at gmail.com (hai wu) Date: Mon, 6 Mar 2023 19:36:37 -0600 Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on production Openstack system In-Reply-To: References: Message-ID: Thanks Andriy! That sounds very promising. I just tried to install rally, and hit one known bug: https://bugs.launchpad.net/rally/+bug/2004022. I downgraded SQLAlchemy and was able to get rally installed. But I could not find any sample task to run per this document url: https://rally.readthedocs.io/en/latest/quick_start/tutorial/step_1_setting_up_env_and_running_benchmark_from_samples.html#tutorial-step-1-setting-up-env-and-running-benchmark-from-samples. It seems most of the sample tasks have been deleted already per its github history. Which sample task I could use to try this out (for example, list all openstack instances..)? It seems its documentation is out of date.. Thanks, Hai On Mon, Mar 6, 2023 at 5:10?PM Andriy Kurilin wrote: > > Hi! > > OSProfiler enables tracing only requests with a special header. The header is not embedded in each request (even if you configure osprofiler in your system), you need to use a special CLI argument to set it. So even if the tracing of one particular request slows done the flow (which should not happen), it should give zero impact on the performance of the whole system. > > As for Rally (in particular, the task component), it creates resources with a special naming format that allows to filter out only these resources during the cleanup process. We are using Rally in production as a part of the monitoring for ~5 years or so. > > ??, 6 ???. 2023??. ? 23:34, hai wu : >> >> Is there any concern with using osprofiler/rally directly on the >> production Openstack system? >> > > > -- > Best regards, > Andrey Kurilin. From haiwu.us at gmail.com Tue Mar 7 01:41:34 2023 From: haiwu.us at gmail.com (hai wu) Date: Mon, 6 Mar 2023 19:41:34 -0600 Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on production Openstack system In-Reply-To: References: Message-ID: It seems they might have been moved here? https://github.com/openstack/rally-openstack/tree/master/samples/tasks/scenarios/nova. If so, the rally documentation needs to be updated .. On Mon, Mar 6, 2023 at 7:36?PM hai wu wrote: > > Thanks Andriy! That sounds very promising. I just tried to install > rally, and hit one known bug: > https://bugs.launchpad.net/rally/+bug/2004022. I downgraded SQLAlchemy > and was able to get rally installed. But I could not find any sample > task to run per this document url: > https://rally.readthedocs.io/en/latest/quick_start/tutorial/step_1_setting_up_env_and_running_benchmark_from_samples.html#tutorial-step-1-setting-up-env-and-running-benchmark-from-samples. > It seems most of the sample tasks have been deleted already per its > github history. > > Which sample task I could use to try this out (for example, list all > openstack instances..)? It seems its documentation is out of date.. > > Thanks, > Hai > > On Mon, Mar 6, 2023 at 5:10?PM Andriy Kurilin wrote: > > > > Hi! > > > > OSProfiler enables tracing only requests with a special header. The header is not embedded in each request (even if you configure osprofiler in your system), you need to use a special CLI argument to set it. So even if the tracing of one particular request slows done the flow (which should not happen), it should give zero impact on the performance of the whole system. > > > > As for Rally (in particular, the task component), it creates resources with a special naming format that allows to filter out only these resources during the cleanup process. We are using Rally in production as a part of the monitoring for ~5 years or so. > > > > ??, 6 ???. 2023??. ? 23:34, hai wu : > >> > >> Is there any concern with using osprofiler/rally directly on the > >> production Openstack system? > >> > > > > > > -- > > Best regards, > > Andrey Kurilin. From gmann at ghanshyammann.com Tue Mar 7 02:28:19 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Mon, 06 Mar 2023 18:28:19 -0800 Subject: [qa][stable][gate] Tempest master failing on stable/yoga|xena, fix is in gate In-Reply-To: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com> References: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com> Message-ID: <186b9e5ee5f.1290e52e2388064.1234243886718861499@ghanshyammann.com> ---- On Mon, 06 Mar 2023 10:36:30 -0800 Ghanshyam Mann wrote --- > Hi All, > > In case any of you are seeing failure on stable/yoga and stable/xena with the below error, > please hold the recheck until the fix (revert) in Tempest is merged > > "AttributeError: type object 'Draft4Validator' has no attribute 'FORMAT_CHECKER'" > > - https://review.opendev.org/c/openstack/tempest/+/876218 It is merged, feel free to recheck. -gmann > > -gmann > > From ralonsoh at redhat.com Tue Mar 7 09:12:54 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 7 Mar 2023 10:12:54 +0100 Subject: [ovn] safely change bridge_mappings In-Reply-To: References: Message-ID: Hello Tom??: You need to follow the steps in [1]: * You need to create the new physical bridge "br-ex2". * Then you need to add to the bridge the physical interface. * In the compute node you need to add the bridge mappings to the OVN database Open vSwitch register * In the controller, you need to add the reference for this second provider network in "flat_networks" and "network_vlan_ranges" (in the ml2.ini file). Then you need to restart the Neutron server to read these new parameters (this step is not mentioned in this link). $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_flat] flat_networks = public,public2 [ml2_type_vlan] network_vlan_ranges = public:11:200,public2:11:200 Regards. [1] https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r wrote: > Hi, > > I have a running production OpenStack deployment - version Wallaby > installed using TripleO. I'm using the default OVN/OVS networking. > For provider networks I have two bridges on the compute nodes br-ex and > br-ex2. Instances mainly use br-ex for provider networks, but there are > some instances which started using a provider network which should be > mapped to br-ex2, however I didn't specify "bridge_mappings" on > ml2_conf.ini, so the traffic wants to flow through the default > datacentre:br-ex. > My questions is, what services should I restart on the controller and > compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And > if this operation is safe and if the instances already using br-ex will > lose connectivity? > > Thanks for your help > > Tomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hjensas at redhat.com Tue Mar 7 09:33:40 2023 From: hjensas at redhat.com (Harald Jensas) Date: Tue, 7 Mar 2023 10:33:40 +0100 Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal Instance In-Reply-To: References: Message-ID: On 3/6/23 10:34, Lokendra Rathour wrote: > Hi Team, > we have a PoC where we wish to try Creating OpenStack Baremetal Instance > using Alma Linux. > Please help in case that is possible, any reference where we can use the > Alma Images to instantiate?the Baremetal?Instance. > Since Alma is aiming to be binary compatible with RHEL I think doing this would certainly be possible. You may have to keep local, or propose patches to diskimage-builder to add Alma support. Also RDO packages are built against CentOS so re-building the RDO RPM's from source on Alma is probably required. (Since CentOS-Stream is not binary compatible with RHEL, I assume some RDO packages won't work without a re-compile against the ALMA libraries) -- Harald From rdhasman at redhat.com Tue Mar 7 11:00:17 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 7 Mar 2023 16:30:17 +0530 Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) PTG Planning Message-ID: Hello All, The 2023.2 (Bobcat) virtual PTG is approaching and will be held between 27-31 March, 2023. I've created a planning etherpad[1] and a PTG etherpad[2] to gather topics for the PTG. Note that you only need to add topics in the planning etherpad and those will be arranged in the PTG etherpad later. Dates: Tuesday (28th March) to Friday (31st March) 2023 Time: 1300 to 1700 UTC Etherpad: https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning Please add the topics as early as possible as finalizing and arranging topics would require some buffer time. [1] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning [2] https://etherpad.opendev.org/p/bobcat-ptg-cinder Thanks Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Mar 7 12:58:58 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 Mar 2023 12:58:58 +0000 Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal Instance In-Reply-To: References: Message-ID: <20230307125857.gqokp4mcpz7i22vj@yuggoth.org> On 2023-03-07 10:33:40 +0100 (+0100), Harald Jensas wrote: [...] > Since Alma is aiming to be binary compatible with RHEL I think doing this > would certainly be possible. You may have to keep local, or propose patches > to diskimage-builder to add Alma support. Also RDO packages are built > against CentOS so re-building the RDO RPM's from source on Alma is probably > required. (Since CentOS-Stream is not binary compatible with RHEL, I assume > some RDO packages won't work without a re-compile against the ALMA > libraries) There's already support in diskimage-builder for Rocky Linux and OpenEuler, both of which are RHEL clones, so Alma is likely very close to those (closer than to CentOS Stream anyway). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From knikolla at bu.edu Tue Mar 7 14:46:19 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 7 Mar 2023 14:46:19 +0000 Subject: [all][tc] Technical Committee next weekly meeting on 2023 Mar 8 at 1600 UTC Message-ID: <1D8445C6-68E3-475F-9A14-051EAE33BE09@bu.edu> Hi all, This is a reminder that the next weekly Technical Committee meeting is to be held tomorrow (March 8) at 1600 UTC on #openstack-tc on OFTC IRC A copy of the preliminary agenda can be found below. Items can be proposed by editing the wiki page at https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Roll call * Follow up on past action items * Deciding on meeting time * Gate health check * TC 2023.1 tracker status checks ** https://etherpad.opendev.org/p/tc-2023.1-tracker * Deprecation process for TripleO ** https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032083.html * Cleanup of PyPI maintainer list for OpenStack Projects ** Etherpad for audit and cleanup of additional PyPi maintainers *** https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup ** ML discussion *** https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html * Recurring tasks check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * Virtual PTG Planning ** March 27-31, 2023, there's the Virtual PTG. * Check in on the voting for version names ** https://review.opendev.org/c/openstack/governance/+/874484 ** https://review.opendev.org/c/openstack/governance/+/875942 * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open Thank you, Kristi Nikolla From corey.bryant at canonical.com Tue Mar 7 16:19:26 2023 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 7 Mar 2023 11:19:26 -0500 Subject: cryptography min version (non-rust) through 2024.1 Message-ID: Hi All, As you probably know, recent versions of cryptography have hard dependencies on rust. Are there any community plans to continue supporting a minimum (non-rust) version of cryptography until a specific release? The concern I have downstream in Ubuntu is that we need to continue being compatible with cryptography 3.4.8 through openstack 2024.1. This is because all releases through 2024.1 will be backported to the ubuntu 22.04 cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we will be backporting to 24.04 cloud archives, which will have the new rust-based versions of cryptography. The current upper-constraint for cryptography is 38.0.2, but the various requirements.txt min versions are much lower (e.g. keystone has cryptography>=2.7). This is likely to lead to patches landing with features that are only in 38.0.2, so it will likely be difficult to enforce min version support. But perhaps a stance toward maintaining compatibility could be established. Thoughts? Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Mar 7 17:13:17 2023 From: smooney at redhat.com (Sean Mooney) Date: Tue, 07 Mar 2023 17:13:17 +0000 Subject: cryptography min version (non-rust) through 2024.1 In-Reply-To: References: Message-ID: <32af22810a975f0eddaa4361638b12a2d25f8959.camel@redhat.com> On Tue, 2023-03-07 at 11:19 -0500, Corey Bryant wrote: > Hi All, > > As you probably know, recent versions of cryptography have hard > dependencies on rust. Are there any community plans to continue supporting > a minimum (non-rust) version of cryptography until a specific release? i tought we had already raised the min above the version that required rust so not that i am aware of. cryptography>=2.7 is our curret stated minium but we have been testing with a much much newwer version for alont time since we do not test miniums anymore https://github.com/openstack/nova/commit/6caedfd97675940eb3cf07e2f019926dae45d02c > > The concern I have downstream in Ubuntu is that we need to continue being > compatible with cryptography 3.4.8 through openstack 2024.1. This is > because all releases through 2024.1 will be backported to the ubuntu 22.04 > cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we > will be backporting to 24.04 cloud archives, which will have the new > rust-based versions of cryptography. > > The current upper-constraint for cryptography is 38.0.2, but the various > requirements.txt min versions are much lower (e.g. keystone has > cryptography>=2.7). This is likely to lead to patches landing with features > that are only in 38.0.2, so it will likely be difficult to enforce min > version support. But perhaps a stance toward maintaining compatibility > could be established. https://github.com/openstack/governance/blob/584e06b0c186d4355d1d51f2d6df96e822253bef/resolutions/20220414-drop-lower-constraints.rst we decided to "Drop Lower Constraints Maintenance" relitivly recently while we have pti guidance for some lanagues rust is not one of them https://github.com/openstack/governance/tree/584e06b0c186d4355d1d51f2d6df96e822253bef/reference/pti and its also not part of the tested runtims https://github.com/openstack/governance/blob/master/reference/runtimes/2023.2.rst so i would proably try to avoid makign any commitment to continuting to supprot non rust based pycryptography release > > Thoughts? > > Corey From fungi at yuggoth.org Tue Mar 7 17:23:18 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 Mar 2023 17:23:18 +0000 Subject: [dev][requirements][security-sig][tc]cryptography min version (non-rust) through 2024.1 In-Reply-To: References: Message-ID: <20230307172318.apcjjebwcf4atgyx@yuggoth.org> On 2023-03-07 11:19:26 -0500 (-0500), Corey Bryant wrote: [...] > The current upper-constraint for cryptography is 38.0.2, but the > various requirements.txt min versions are much lower (e.g. > keystone has cryptography>=2.7). This is likely to lead to patches > landing with features that are only in 38.0.2, so it will likely > be difficult to enforce min version support. But perhaps a stance > toward maintaining compatibility could be established. [...] While introducing specific tests for this would not be trivial, maybe it's one of those situations where we try to avoid breaking compatibility with older versions and don't reject patches when people find that something has inadvertently started depending on a feature only available in the Rust-based builds? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sbauza at redhat.com Tue Mar 7 18:14:55 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Tue, 7 Mar 2023 19:14:55 +0100 Subject: [nova][ptg] Strawman proposal for vPTG timeslots Message-ID: Hi team, We recently discussed this in the Nova meeting [1] but I'd like to reemphasize here and now that I proposed to book 4 timeslots of one hour during 4 days for the next vPTG. https://ptg.opendev.org/ptg.html As you see, the proposed timeline will be : Tuesday, Wednesday, Thursday, Friday between 13:00UTC and 17:00UTC. May you have concerns with this proposal, please express them by replying to this thread. As a reminder, please add the topics you'd like to cover during the vPTG in the PTG etherpad : https://etherpad.opendev.org/p/nova-bobcat-ptg Thanks, -Sylvain [1] https://meetings.opendev.org/meetings/nova/2023/nova.2023-03-07-16.00.log.html#l-229 -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Tue Mar 7 19:28:23 2023 From: corey.bryant at canonical.com (Corey Bryant) Date: Tue, 7 Mar 2023 14:28:23 -0500 Subject: [dev][requirements][security-sig][tc]cryptography min version (non-rust) through 2024.1 In-Reply-To: <20230307172318.apcjjebwcf4atgyx@yuggoth.org> References: <20230307172318.apcjjebwcf4atgyx@yuggoth.org> Message-ID: On Tue, Mar 7, 2023 at 12:30?PM Jeremy Stanley wrote: > On 2023-03-07 11:19:26 -0500 (-0500), Corey Bryant wrote: > [...] > > The current upper-constraint for cryptography is 38.0.2, but the > > various requirements.txt min versions are much lower (e.g. > > keystone has cryptography>=2.7). This is likely to lead to patches > > landing with features that are only in 38.0.2, so it will likely > > be difficult to enforce min version support. But perhaps a stance > > toward maintaining compatibility could be established. > [...] > > While introducing specific tests for this would not be trivial, > maybe it's one of those situations where we try to avoid breaking > compatibility with older versions and don't reject patches when > people find that something has inadvertently started depending on a > feature only available in the Rust-based builds? > -- > Jeremy Stanley > I'd be okay with an approach like this. Would this need to be formally adopted by the TC? Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Mar 7 19:54:43 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 7 Mar 2023 11:54:43 -0800 Subject: cryptography min version (non-rust) through 2024.1 In-Reply-To: References: Message-ID: > The concern I have downstream in Ubuntu is that we need to continue being > compatible with cryptography 3.4.8 through openstack 2024.1. This is > because all releases through 2024.1 will be backported to the ubuntu 22.04 > cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we > will be backporting to 24.04 cloud archives, which will have the new > rust-based versions of cryptography. > > The current upper-constraint for cryptography is 38.0.2, but the various > requirements.txt min versions are much lower (e.g. keystone has > cryptography>=2.7). This is likely to lead to patches landing with features > that are only in 38.0.2, so it will likely be difficult to enforce min > version support. But perhaps a stance toward maintaining compatibility > could be established. > > What is the impetus needed for us to raise the lower-constraint? When do we decide we should do that, generally -- is it just ad-hoc, someone requests it, or is there a more involved process? We certainly don't dictate all versions of required compilers and such in our TC testing docs (although that is implied in distribution-platform) -- so I think there's another piece to it when talking about python dependencies. I do not think it's wise to commit to supporting older versions of cryptography through 2024.1. In fact, you *must* have a cryptography release that is rust-enabled in order to get OpenSSL 3 support. Not to mention the memory safety benefits from using a rust version. I'm not saying we should force newer cryptography immediately; but it is reason enough to give me significant pause about answering a question about supporting it through two additional releases. Thanks, Jay Faulkner Ironic PTL TC Member -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Mar 7 20:43:46 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 Mar 2023 20:43:46 +0000 Subject: cryptography min version (non-rust) through 2024.1 In-Reply-To: References: Message-ID: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org> On 2023-03-07 11:54:43 -0800 (-0800), Jay Faulkner wrote: [...] > I do not think it's wise to commit to supporting older versions of > cryptography through 2024.1. In fact, you *must* have a cryptography > release that is rust-enabled in order to get OpenSSL 3 support. Not to > mention the memory safety benefits from using a rust version. I'm not > saying we should force newer cryptography immediately; but it is reason > enough to give me significant pause about answering a question about > supporting it through two additional releases. Well, the context here is not "supporting old versions of the PYCA/Cryptography library," it's rather "supporting downstream distributors who backport patches to their stable forks of PYCA/Cryptography." While it may sound like the same thing, there's a subtle difference. Obviously nobody should be running old versions of the library because they'll be missing critical security fixes, but there are stable distributions who take care of backporting security patches for their LTS versions and want newer OpenStack releases to still be usable there. The PYCA/Cryptography library has been particularly challenging here, since it decided to go all-in on Rust which, while a very exciting and compelling language from a security perspective, is not exactly the most stabilized ecosystem yet and has seen a lot of churn over the past few years leading to Rust-based projects often being entirely unfit for inclusion in stable server distros due to continually requiring newer toolchain versions and replacing build systems. This isn't just Ubuntu. The latest Debian stable release carries a python3-cryptography based on 3.3.2 from two years ago, older than what Corey's trying to support but still quite new from the perspective of an LTS server distribution. Rocky 9.1 (which I assume is the same as RHEL but I can never find where to look up RHEL package versions) is carrying a python3-cryptography based on 36.0.1 from 2021, so newer than what Corey is trying to support on Ubuntu but not by much (approximately 3 months). The point is, we can't reasonably test with all these different versions of the library, and that's just one library out of hundreds we're depending on for that matter... but what we can do is say that if people find regressions due to us testing exclusively with newer features of these libraries than are available on platforms we expect our users to deploy on, we'll gladly accept patches to fix that situation. I expect the TC is going to choose Ubuntu 22.04 LTS as a target platform for at least the OpenStack 2023.2 and 2024.1 coordinated releases, but almost certainly the 2024.2 coordinated release as well since Ubuntu 24.04 LTS won't be officially available before we start that development cycle. That means the first coordinated OpenStack release which would be able to effectively depend on features from a newer python3-cryptography package on Ubuntu is going to be 2025.1. Food for thought. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From vrook at wikimedia.org Tue Mar 7 21:22:45 2023 From: vrook at wikimedia.org (Vivian Rook) Date: Tue, 7 Mar 2023 16:22:45 -0500 Subject: [magnum] certificate authority key location Message-ID: If I want to create a credential for a user to access a magnum cluster I can do so as described in https://docs.openstack.org/magnum/latest/user/#id4 Namely by running: openstack coe ca sign secure-k8s-cluster client.csr > cert.pem I would like to do this without calling the openstack cli. Where does magnum store its ca key file? I could not find it on the control node under /etc/kubernetes/certs (the ca.crt is there, though no ca.key) Thank you! -- *Vivian Rook (They/Them)* Site Reliability Engineer Wikimedia Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.yip at ardc.edu.au Wed Mar 8 01:16:38 2023 From: jake.yip at ardc.edu.au (Jake Yip) Date: Wed, 8 Mar 2023 12:16:38 +1100 Subject: [magnum] security groups for magnum nodes In-Reply-To: References: Message-ID: <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au> Hi Vivian, I'm not aware of that, sorry. As an alternative, have you tried adding the security group of the workers to the NFS server instead? Regards, Jake On 4/3/2023 5:09 am, Vivian Rook wrote: > Is there an option for adding security groups to a given magnum > template, and thus the nodes that such a template would create? > > I have an NFS server, and it is setup to only allow connections from > nodes with the "nfs" security group. A few pods in my cluster mount the > NFS server, and are blocked as a result. Is it possible to setup magnum > so that it adds the "nfs" security group to the worker nodes (it would > be alright if it has to be worker and control nodes)? > > Thank you! > > -- > *Vivian Rook (They/Them) > * > Site Reliability Engineer > Wikimedia Foundation > From hanguangyu2 at gmail.com Wed Mar 8 06:50:13 2023 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Wed, 8 Mar 2023 06:50:13 +0000 Subject: [cinder] Could I use system lvm volume group for cinder-volume Message-ID: Hello, I have a physical node and the system uses lvm partitions. With the service already running above, is it possible for me to deploy cinder-volume that uses the lvm backend without affecting the existing service? The physical disk space is basically allocated to the /dev/sda4 physical volume, and the pv wad added to the `uniontechos` volume group. All the space of `uniontechos` vg is allocated to the system logical volume (/dev/uniontechos/root), there is 5T space (a lot of free space). I want to try to reduce the system logical volume??/dev/uniontechos/root, and let cinder-volume use the uniontechos volume group at the same time , I don't know if it is feasible. Could I get some advices? Thank for any help! ```shell # pvdisplay ... --- Physical volume --- PV Name /dev/sda4 VG Name uniontechos PV Size <5.43 TiB / not usable 2.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 1422313 Free PE 0 Allocated PE 1422313 PV UUID 8LF270-LYD1-kuP1-iWZb-BZEQ-LdA8-UwEa58 # lvdisplay --- Logical volume --- LV Path /dev/uniontechos/root LV Name root VG Name uniontechos LV UUID cEypcY-xcbC-JFoO-d3MS-uMPU-ntQa-WEeE1c LV Write Access read/write LV Creation host, time compute2, 2022-04-28 15:21:27 +0800 LV Status available # open 1 LV Size 5.42 TiB Current LE 1421289 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:0 ``` Best wishes, Han From manchandavishal143 at gmail.com Wed Mar 8 07:38:05 2023 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Wed, 8 Mar 2023 13:08:05 +0530 Subject: [horizon] Cancelling Today's Weekly meeting Message-ID: Hello Team, As it is a holiday for me, I will not be able to host today's horizon weekly meeting. So let's cancel today's weekly meeting. If anything urgent, please reach out to the horizon core team. Thanks & regards, Vishal Manchanda -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Mar 8 09:13:11 2023 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 8 Mar 2023 10:13:11 +0100 Subject: [largescale-sig] Next meeting: March 8, 9utc In-Reply-To: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org> References: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org> Message-ID: <33527716-9684-8606-553f-42e757981264@openstack.org> Due to lack of participants, we had a rather short SIG meeting today. I just gave an update on the preparation for our next OpenInfra Live episode and participation to Vancouver summit. You can read the detailed meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2023/large_scale_sig.2023-03-08-09.00.html Our next IRC meeting will be March 22, at 1500utc on #openstack-operators on OFTC. Regards, -- Thierry Carrez (ttx) From bkslash at poczta.onet.pl Wed Mar 8 09:32:35 2023 From: bkslash at poczta.onet.pl (A Tom) Date: Wed, 8 Mar 2023 10:32:35 +0100 Subject: [Magnum] Magnum service status in Openstack Antelope Message-ID: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl> Hi, Will magnum be supported in Antelope and further releases of Openstack? I was looking for some informations about this and I?m quite confused, because here I don?t see Magnum service: https://releases.openstack.org/antelope/index.html and here Magnum is mentioned: https://docs.openstack.org/2023.1.antelope/projects.html Which version of kubernetes is supported in Zed, and which (if) will be in Antelope? Because there?s no such information in compatibility matrix: https://wiki.openstack.org/wiki/Magnum Best regards, Adam Tomas From ralonsoh at redhat.com Wed Mar 8 10:43:52 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 8 Mar 2023 11:43:52 +0100 Subject: [neutron] New Neutron releases for Xena, Yoga and Zed Message-ID: Hello Neutrinos: Please check the patches for the new stable releases for the Neutron projects: * Xena: https://review.opendev.org/c/openstack/releases/+/876827 * Yoga: https://review.opendev.org/c/openstack/releases/+/876828 * Zed: https://review.opendev.org/c/openstack/releases/+/876835 Feel free to comment on the patch if you found something wrong or a missing pending patch. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vrook at wikimedia.org Wed Mar 8 11:48:24 2023 From: vrook at wikimedia.org (Vivian Rook) Date: Wed, 8 Mar 2023 06:48:24 -0500 Subject: [magnum] security groups for magnum nodes In-Reply-To: <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au> References: <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au> Message-ID: Hi Jake, Yeah I gave that a try, and it does work. Though when I've tried similar it causes problems with removing a cluster, failing on not being able to remove the cluster security group because something other than the cluster is using it. Mostly that is the answer that I was looking for, that this feature doesn't exist. So I can add and remove the security group manually, and can probably do something better in terraform, but we're not quite there yet :) Thank you! On Tue, Mar 7, 2023 at 8:16?PM Jake Yip wrote: > Hi Vivian, > > I'm not aware of that, sorry. > > As an alternative, have you tried adding the security group of the > workers to the NFS server instead? > > Regards, > Jake > > On 4/3/2023 5:09 am, Vivian Rook wrote: > > Is there an option for adding security groups to a given magnum > > template, and thus the nodes that such a template would create? > > > > I have an NFS server, and it is setup to only allow connections from > > nodes with the "nfs" security group. A few pods in my cluster mount the > > NFS server, and are blocked as a result. Is it possible to setup magnum > > so that it adds the "nfs" security group to the worker nodes (it would > > be alright if it has to be worker and control nodes)? > > > > Thank you! > > > > -- > > *Vivian Rook (They/Them) > > * > > Site Reliability Engineer > > Wikimedia Foundation > > > -- *Vivian Rook (They/Them)* Site Reliability Engineer Wikimedia Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Wed Mar 8 13:29:37 2023 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Wed, 8 Mar 2023 13:29:37 +0000 Subject: [neutron][largescale-sig] agent_down_time and report_interval Message-ID: Hello neutron and large-scalers, Is there any recommendation on tuning the neutron report_interval (agent) and agent_down_time (server) to "optimize" the communication between agents and servers without putting to much heavy duty on both rabbit and database? We are currently facing some scaling issue regarding this, and we found out that, at least, CERN did some tweak about this ([1] and [2]) Is there anyone else with specific configuration on that part? I have the feeling that this could be increased (so report will happen less often). One obvious side effect of this is the fact that the server will take more time to see a down agent, is there any other side effect that could happen? Is it something we can eventually add in our large-scale documentation ([3])? Cheers, [1] https://youtu.be/5WL47L1P5kE?t=1173 [2] https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Evolution-of-OpenStack-Networking-at-CERN3.pdf [3] https://docs.openstack.org/large-scale/journey/configure/index.html From felix.huettner at mail.schwarz Wed Mar 8 13:58:56 2023 From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=) Date: Wed, 8 Mar 2023 13:58:56 +0000 Subject: [neutron][largescale-sig] agent_down_time and report_interval In-Reply-To: References: Message-ID: Hi everyone, i can share that we use a agent_down_time of 3600 seconds and the default report_interval for clusters with around 600 compute nodes. -- Felix Huettner > -----Original Message----- > From: Arnaud Morin > Sent: Wednesday, March 8, 2023 2:30 PM > To: discuss openstack > Subject: [neutron][largescale-sig] agent_down_time and report_interval > > Hello neutron and large-scalers, > > Is there any recommendation on tuning the neutron report_interval > (agent) and agent_down_time (server) to "optimize" the communication > between agents and servers without putting to much heavy duty on both > rabbit and database? > > We are currently facing some scaling issue regarding this, and we found > out that, at least, CERN did some tweak about this ([1] and [2]) > > Is there anyone else with specific configuration on that part? > > I have the feeling that this could be increased (so report will happen > less often). One obvious side effect of this is the fact that the server > will take more time to see a down agent, is there any other side effect > that could happen? > > Is it something we can eventually add in our large-scale > documentation ([3])? > > Cheers, > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From michal.arbet at ultimum.io Wed Mar 8 14:37:34 2023 From: michal.arbet at ultimum.io (Michal Arbet) Date: Wed, 8 Mar 2023 15:37:34 +0100 Subject: [Magnum] Magnum service status in Openstack Antelope In-Reply-To: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl> References: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl> Message-ID: Hi, I am also curious about that. What I registered in email communication there were the plans to implement https://github.com/vexxhost/magnum-cluster-api, so I hope that it will be supported because I wanted to try it :). Any magnum core or developers to give us this information ? Thanks Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook st 8. 3. 2023 v 10:38 odes?latel A Tom napsal: > Hi, > Will magnum be supported in Antelope and further releases of Openstack? I > was looking for some informations about this and I?m quite confused, > because here I don?t see Magnum service: > https://releases.openstack.org/antelope/index.html > > and here Magnum is mentioned: > https://docs.openstack.org/2023.1.antelope/projects.html > > Which version of kubernetes is supported in Zed, and which (if) will be in > Antelope? Because there?s no such information in compatibility matrix: > https://wiki.openstack.org/wiki/Magnum > > > Best regards, > > Adam Tomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Mar 8 14:46:05 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 Mar 2023 14:46:05 +0000 Subject: [Magnum] Magnum service status in Openstack Antelope In-Reply-To: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl> References: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl> Message-ID: <20230308144605.qp2lpcyf3pojhmhu@yuggoth.org> On 2023-03-08 10:32:35 +0100 (+0100), A Tom wrote: > Will magnum be supported in Antelope and further releases of > Openstack? I was looking for some informations about this and I?m > quite confused, because here I don?t see Magnum service: > https://releases.openstack.org/antelope/index.html [...] There's an initial release candidate linked for it there now, a handful of OpenStack services needed a little more time to tag their RCs and got an requested an extension on last week's deadline. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From kristin at openinfra.dev Wed Mar 8 18:26:40 2023 From: kristin at openinfra.dev (Kristin Barrientos) Date: Wed, 8 Mar 2023 12:26:40 -0600 Subject: OpenInfra Live March 9, 2023, at 9 a.m. CT Message-ID: Hi everyone, This week?s OpenInfra Live episode is brought to you by OpenMetal Episode: Examination of Cost Differences b/w Private Cloud and Hyperscalers Todd Robinson, president of OpenMetal, will be discussing the examination of cost differences between private cloud and hyperscalers. Date and time: March 9, 2023, at 9 a.m. CT (15:00 UTC) You can watch us live on: YouTube: https://www.youtube.com/watch?v=6GCGhuRpPqM LinkedIn: https://www.linkedin.com/events/7038909988492242944/comments/ WeChat: recording will be posted on OpenStack WeChat after the live stream Speakers: Todd Robinson Have an idea for a future episode? Share it now at ideas.openinfra.live. Thanks, Kristin Barrientos Marketing Coordinator OpenInfra Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From roberto.acosta at luizalabs.com Wed Mar 8 20:49:12 2023 From: roberto.acosta at luizalabs.com (Roberto Bartzen Acosta) Date: Wed, 8 Mar 2023 17:49:12 -0300 Subject: [neutron] Openstack Network Interconnection Message-ID: Hey folks. Does anyone have ideas on how to interconnect different Openstack deployments? Consider that we have multiple Datacenters and need to interconnect tenant networks. How could this be done in the context of OpenStack (without using VPN) ? We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks like a great solution to create a network layer between DCs/AZs with the help of the OVN driver. However, Neutron does not support the Transit Switches (OVN-IC design) that are required for this application. We've seen references to abandoned projects like [1] [2] [3]. Does anyone use something similar in production or have an idea about how to do it? Imagine that we need to put workloads on two different AZs that run different Openstack installations, and we want to communicate with the local networks without using a FIP. I believe that the most coherent way to maintain databases consistent in each Openstack would be an integration with Neutron, but I haven't seen any movement on that. Regards, Roberto [1] https://www.youtube.com/watch?v=GizLmSiH1Q0 [2] https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html [3] https://opendev.org/x/neutron-interconnection -- _?Esta mensagem ? direcionada apenas para os endere?os constantes no cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o imediatamente anuladas e proibidas?._ *?**?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From desirebarine16 at gmail.com Thu Mar 9 06:45:21 2023 From: desirebarine16 at gmail.com (Desire Barine) Date: Thu, 9 Mar 2023 07:45:21 +0100 Subject: [outreachy][cinder] Message-ID: Hello Sofia Enriquez, I'm Desire Barine, an Outreachy applicant. I would love to work on Extend automated validation of API reference request/response samples project. I would like to get started with the contribution. I am currently going over the instructions on contributions given. This is my first time contributing on an open source project but I'm really excited to get started. I'm proficient in python, bash and have worked on Rest api creation before. I would love to hear from you. Desire. -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Thu Mar 9 08:24:21 2023 From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=) Date: Thu, 9 Mar 2023 08:24:21 +0000 Subject: [neutron] Openstack Network Interconnection In-Reply-To: References: Message-ID: Hi Roberto, We will face a similar issue in the future and have also looked at ovn-interconnect (but not yet tested it). There is also ovn-bgp-agent [1] which has an evpn mode that might be relevant. Whatever you find I would definitely be interested in your results [1] https://opendev.org/x/ovn-bgp-agent -- Felix Huettner From: Roberto Bartzen Acosta Sent: Wednesday, March 8, 2023 9:49 PM To: openstack-discuss at lists.openstack.org Cc: Tiago Pires Subject: [neutron] Openstack Network Interconnection Hey folks. Does anyone have ideas on how to interconnect different Openstack deployments? Consider that we have multiple Datacenters and need to interconnect tenant networks. How could this be done in the context of OpenStack (without using VPN) ? We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks like a great solution to create a network layer between DCs/AZs with the help of the OVN driver. However, Neutron does not support the Transit Switches (OVN-IC design) that are required for this application. We've seen references to abandoned projects like [1] [2] [3]. Does anyone use something similar in production or have an idea about how to do it? Imagine that we need to put workloads on two different AZs that run different Openstack installations, and we want to communicate with the local networks without using a FIP. I believe that the most coherent way to maintain databases consistent in each Openstack would be an integration with Neutron, but I haven't seen any movement on that. Regards, Roberto [1] https://www.youtube.com/watch?v=GizLmSiH1Q0 [2] https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html [3] https://opendev.org/x/neutron-interconnection 'Esta mensagem ? direcionada apenas para os endere?os constantes no cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o imediatamente anuladas e proibidas'. 'Apesar do Magazine Luiza tomar todas as precau??es razo?veis para assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos'. Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Thu Mar 9 09:55:14 2023 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 9 Mar 2023 10:55:14 +0100 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: References: <20230306113543.a57aywefbn4cgsu3@localhost> Message-ID: <20230309095514.l3i67tys2ujaq6dp@localhost> On 06/03, Rishat Azizov wrote: > Hi, > > It works with smaller volumes. > > multipath.conf attached to thist email. > > Cinder version - 18.2.0 Wallaby Hi, After giving it some thought I think I may know what is going on. If you have DEBUG logs enabled in cinder-backup when it fails, how many calls do you see in the cinder-backup to "multipath -f" from os-brick, only one or do you see more? Cheers, Gorka. > > ??, 6 ???. 2023??. ? 17:35, Gorka Eguileor : > > > On 16/02, Rishat Azizov wrote: > > > Hello! > > > > > > We have an error with creating backups from iscsi volume. Usually, this > > > happens with large backups over 100GB. > > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > [req-f6619913-6f96-4226-8d75-2da3fca722f1 > > 23de1b92e7674cf59486f07ac75b886b > > > a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message > > handling: > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error > > while > > > running command. > > > Command: multipath -f 3624a93705842cfae35d7483200015ec6 > > > Exit code: 1 > > > Stdout: '' > > > Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a > > > multipath device\n' > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Traceback > > > (most recent call last): > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line > > 165, > > > in _process_incoming > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server res = > > > self.dispatcher.dispatch(message) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line > > > 309, in dispatch > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > self._do_dispatch(endpoint, method, ctxt, args) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line > > > 229, in _do_dispatch > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = > > > func(ctxt, **new_args) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > func(self, *args, **kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in > > > create_backup > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > volume_utils.update_backup_error(backup, str(err)) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in > > > __exit__ > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > self.force_reraise() > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in > > > force_reraise > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise > > > self.value > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in > > > create_backup > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server updates > > = > > > self._run_backup(context, backup, volume) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in > > > _run_backup > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > ignore_errors=True) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, > > in > > > _detach_device > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > force=force, ignore_errors=ignore_errors) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in > > > trace_logging_wrapper > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > f(*args, **kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line > > 360, > > > in inner > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > f(*args, **kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > > > line 880, in disconnect_volume > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > is_disconnect_call=True) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > > > line 942, in _cleanup_connection > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > self._linuxscsi.flush_multipath_device(multipath_name) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line > > > 382, in flush_multipath_device > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > root_helper=self._root_helper) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in > > > _execute > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server result = > > > self.__execute(*args, **kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line > > > 172, in execute > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > execute_root(*cmd, **kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line > > 247, > > > in _wrap > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server return > > > self.channel.remote_call(name, args, kwargs) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in > > > remote_call > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server raise > > > exc_type(*result[2]) > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error > > while > > > running command. > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command: > > > multipath -f 3624a93705842cfae35d7483200015ec6 > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit code: 1 > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: '' > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: 'Feb > > > 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath > > device\n' > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > > Could you please help with this error? > > > > Hi, > > > > Does it work for smaller volumes or does it also fail? > > > > What are your defaults in your /etc/multipath.conf file? > > > > What Cinder release are you using? > > > > Cheers, > > Gorka. > > > > > defaults { > user_friendly_names no > find_multipaths yes > enable_foreign "^$" > } > > blacklist_exceptions { > property "(SCSI_IDENT_|ID_WWN)" > } > > blacklist { > } > > devices { > device { > vendor "PURE" > product "FlashArray" > fast_io_fail_tmo 10 > path_grouping_policy "group_by_prio" > failback "immediate" > prio "alua" > hardware_handler "1 alua" > max_sectors_kb 4096 > } > } From arnaud.morin at gmail.com Thu Mar 9 10:35:13 2023 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Thu, 9 Mar 2023 10:35:13 +0000 Subject: [neutron][largescale-sig] agent_down_time and report_interval In-Reply-To: References: Message-ID: Thanks! On 08.03.23 - 13:58, Felix H?ttner wrote: > Hi everyone, > > i can share that we use a agent_down_time of 3600 seconds and the default report_interval for clusters with around 600 compute nodes. > > -- > Felix Huettner > > > -----Original Message----- > > From: Arnaud Morin > > Sent: Wednesday, March 8, 2023 2:30 PM > > To: discuss openstack > > Subject: [neutron][largescale-sig] agent_down_time and report_interval > > > > Hello neutron and large-scalers, > > > > Is there any recommendation on tuning the neutron report_interval > > (agent) and agent_down_time (server) to "optimize" the communication > > between agents and servers without putting to much heavy duty on both > > rabbit and database? > > > > We are currently facing some scaling issue regarding this, and we found > > out that, at least, CERN did some tweak about this ([1] and [2]) > > > > Is there anyone else with specific configuration on that part? > > > > I have the feeling that this could be increased (so report will happen > > less often). One obvious side effect of this is the fact that the server > > will take more time to see a down agent, is there any other side effect > > that could happen? > > > > Is it something we can eventually add in our large-scale > > documentation ([3])? > > > > Cheers, > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From nicolas.melot at lunarc.lu.se Thu Mar 9 10:38:57 2023 From: nicolas.melot at lunarc.lu.se (Nicolas Melot) Date: Thu, 9 Mar 2023 11:38:57 +0100 Subject: [cinder] Openstack-ansible and cinder GPFS backend Message-ID: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se> Hi, I can find doc on using various backends for cinder (https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm) and some documentation to configure a GPFS backend for cinder (https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html) but I cannot find any documentation to deploy cinder with GPFS backend using openstack-ansible. Does this exist at all? Is there any documentation? /Nicolas From noonedeadpunk at gmail.com Thu Mar 9 11:32:36 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 9 Mar 2023 12:32:36 +0100 Subject: [cinder] Openstack-ansible and cinder GPFS backend In-Reply-To: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se> References: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se> Message-ID: Hi, Nicolas, No, we don't really maintain documentation for each cinder driver that's available. So we assume using an override variable for adjustment of cinder configuration to match the desired state. So basically, you can use smth like that in your user_variables.yml: cinder_backends: GPFSNFS: volume_backend_name: GPFSNFS volume_driver: cinder.volume.drivers.ibm.gpfs.GPFSNFSDriver cinder_cinder_conf_overrides: DEFAULT: gpfs_hosts: ip.add.re.ss gpfs_storage_pool: cinder gpfs_images_share_mode: copy_on_write .... I have no idea though if gpfs_* variables can be defined or not inside the backend section, as they're referenced in DEFAULT in docs. But overrides will work regardless. ??, 9 ???. 2023??. ? 11:41, Nicolas Melot : > > Hi, > > I can find doc on using various backends for cinder > (https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm) > and some documentation to configure a GPFS backend for cinder > (https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html) > but I cannot find any documentation to deploy cinder with GPFS backend > using openstack-ansible. Does this exist at all? Is there any > documentation? > > /Nicolas > From finarffin at gmail.com Thu Mar 9 12:51:45 2023 From: finarffin at gmail.com (Jan Wasilewski) Date: Thu, 9 Mar 2023 13:51:45 +0100 Subject: [manila] Share configuration with cinder as a backend Message-ID: Hi, I am looking for instructions on how to configure a Manila service with Cinder as a backend. I have gone through the https://github.com/openstack/manila/blob/master/doc/source/configuration/shared-file-systems/drivers/generic-driver.rst page, and I am wondering if someone has a link to a preconfigured "golden image" that can be used as a service_image. Additionally, I am wondering if this configuration can be used with driver_handles_share_servers=True (generally, I see that both options are supported here), but are there any specific limitations? I have already configured other backends (standalone ZFS, NFS, and Huawei driver), but I would like to test share snapshot retrieval using Cinder as a backend. Unfortunately, as I see, Cinder is not as straightforward as other backends(or no one is using it), which is why I am asking for some hints. Thanks in advance /Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Thu Mar 9 14:06:23 2023 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Thu, 9 Mar 2023 11:06:23 -0300 Subject: [manila] Share configuration with cinder as a backend In-Reply-To: References: Message-ID: Hello, Jan! Em qui., 9 de mar. de 2023 ?s 09:56, Jan Wasilewski escreveu: > Hi, > > I am looking for instructions on how to configure a Manila service with > Cinder as a backend. I have gone through the > https://github.com/openstack/manila/blob/master/doc/source/configuration/shared-file-systems/drivers/generic-driver.rst > page, and I am wondering if someone has a link to a preconfigured "golden > image" that can be used as a service_image. > As for the image to be used as a service image, we usually use [1] on CI. It is a Ubuntu 22 image with minimal things installed. You would need to create a Glance image with this image and then add the service image name to your Manila.conf in case you are deploying with driver_handles_share_servers=True (DHSS=True). Or in case you want to use DHSS=False, you will need to create the VM and add the instance name or ID to the manila.conf. The admin guide for Manila has instructions for both approaches [2]. > Additionally, I am wondering if this configuration can be used with > driver_handles_share_servers=True (generally, I see that both options are > supported here), but are there any specific limitations? > Yes, you can use this configuration with DHSS=True. There are known restrictions documented here [3]. > I have already configured other backends (standalone ZFS, NFS, and Huawei > driver), but I would like to test share snapshot retrieval using Cinder as > a backend. Unfortunately, as I see, Cinder is not as straightforward as > other backends(or no one is using it), which is why I am asking for some > hints. > The Generic driver is mostly used for testing on CI. We are aware of people using it in production environments but it's not something we recommend. You should be able to create and delete snapshots using the Generic driver, as well as create shares from snapshots. Reverting to snapshots is not something available, as per the feature support mapping [4]. > Thanks in advance > /Jan > [1] http://tarballs.openstack.org/manila-image-elements/images/manila-service-image-master.qcow2 [2] https://docs.openstack.org/manila/latest/admin/generic_driver.html [3] https://docs.openstack.org/manila/latest/admin/generic_driver.html#known-restrictions [4] https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html Please let me know if you have more questions carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From batmanustc at gmail.com Thu Mar 9 01:18:56 2023 From: batmanustc at gmail.com (Simon Jones) Date: Thu, 9 Mar 2023 09:18:56 +0800 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com> Message-ID: Hi, all At last, I got the root cause of this 2 problem. And I suggest add these words to https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html: ``` Prerequisites: libvirt >= 7.9.0 . Like ubuntu-22.04, which use libvirt-8.0.0 by default. ``` Root cause of problem 1, which is "no valid host": - Because libvirt version is too low. Root cause of problem 2, which is "why there are topology in DPU in openstack create port command": - Because add --binding-profile params in openstack create port command, which is NOT right. ---- Simon Jones Dmitrii Shcherbakov ?2023?3?2??? 20:30??? > Hi {Sean, Simon}, > > > did you ever give a presentation on the DPU support > > Yes, there were a couple at different stages. > > The following is the one of the older ones that references the SMARTNIC > VNIC type but we later switched to REMOTE_MANAGED in the final code: > https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf, > however, it has a useful diagram on page 15 which shows the interactions of > different components. A lot of other content from it is present in the > OpenStack docs now which we added during the feature development. > > There is also a presentation with a demo that we did at the Open Infra > summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared > the material after the features got merged). > > Generally, as Sean described, the aim of this feature is to make the > interaction between components present at the hypervisor and the DPU side > automatic but, in order to make this workflow explicitly different from > SR-IOV or offload at the hypervisor side, one has to use the > "remote_managed" flag. This flag allows Nova to differentiate between > "regular" VFs and the ones that have to be programmed by a remote host > (DPU) - hence the name. > > A port needs to be pre-created with the remote-managed type - that way > when Nova tries to schedule a VM with that port attached, it will find > hosts which actually have PCI devices tagged with the "remote_managed": > "true" in the PCI whitelist. > > The important thing to note here is that you must not use PCI passthrough > directly for this - Nova will create a PCI device request automatically > with the remote_managed flag included. There is currently no way to > instruct Nova to choose one vendor/device ID vs the other for this (any > remote_managed=true device from a pool will match) but maybe the work that > was recently done to store PCI device information in the Placement service > will pave the way for such granularity in the future. > > Best Regards, > Dmitrii Shcherbakov > LP/MM/oftc: dmitriis > > > On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney wrote: > >> adding Dmitrii who was the primary developer of the openstack integration >> so >> they can provide more insight. >> >> Dmitrii did you ever give a presentationon the DPU support and how its >> configured/integrated >> that might help fill in the gaps for simon? >> >> more inline. >> >> On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote: >> > E... >> > >> > But there are these things: >> > >> > 1) Show some real happened in my test: >> > >> > - Let me clear that, I use DPU in compute node: >> > The graph in >> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html . >> > >> > - I configure exactly follow >> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html, >> > which is said bellow in "3) Let me post all what I do follow this link". >> > >> > - In my test, I found after first three command (which is "openstack >> > network create ...", "openstack subnet create", "openstack port create >> ..."), >> > there are network topology exist in DPU side, and there are rules exist >> in >> > OVN north DB, south DB of controller, like this: >> > >> > > ``` >> > > root at c1:~# ovn-nbctl show >> > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976 >> > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice) >> > > port 01a68701-0e6a-4c30-bfba-904d1b9813e1 >> > > addresses: ["unknown"] >> > > port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1) >> > > addresses: ["fa:16:3e:13:36:e2 172.1.1.228"] >> > > >> > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding >> > > _uuid : 61dc8bc0-ab33-4d67-ac13-0781f89c905a >> > > chassis : [] >> > > datapath : 91d3509c-d794-496a-ba11-3706ebf143c8 >> > > encap : [] >> > > external_ids : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24", >> > > "neutron:device_id"="", "neutron:device_owner"="", >> > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69, >> > > "neutron:port_name"=pf0vf1, >> > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9", >> > > "neutron:revision_number"="1", >> > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"} >> > > >> > > root at c1c2dpu:~# sudo ovs-vsctl show >> > > 62cf78e5-2c02-471e-927e-1d69c2c22195 >> > > Bridge br-int >> > > fail_mode: secure >> > > datapath_type: system >> > > Port br-int >> > > Interface br-int >> > > type: internal >> > > Port ovn--1 >> > > Interface ovn--1 >> > > type: geneve >> > > options: {csum="true", key=flow, >> remote_ip="172.168.2.98"} >> > > Port pf0vf1 >> > > Interface pf0vf1 >> > > ovs_version: "2.17.2-24a81c8" >> > > ``` >> > > >> > That's why I guess "first three command" has already create network >> > topology, and "openstack server create" command only need to plug VF >> into >> > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done. >> no that jsut looks like the standard bridge toplogy that gets created >> when you provision >> the dpu to be used with openstac vai ovn. >> >> that looks unrelated to the neuton comamnd you ran. >> > >> > - In my test, then I run "openstack server create" command, I got ERROR >> > which said "No valid host...", which is what the email said above. >> > The reason has already said, it's nova-scheduler's PCI filter module >> report >> > no valid host. The reason "nova-scheduler's PCI filter module report no >> > valid host" is nova-scheduler could NOT see PCI information of compute >> > node. The reason "nova-scheduler could NOT see PCI information of >> compute >> > node" is compute node's /etc/nova/nova.conf configure remote_managed tag >> > like this: >> > >> > > ``` >> > > [pci] >> > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >> > > "physical_network": null, "remote_managed": "true"} >> > > alias = { "vendor_id":"15b3", "product_id":"101e", >> > > "device_type":"type-VF", "name":"a1" } >> > > ``` >> > > >> > >> > 2) Discuss some detail design of "remote_managed" tag, I don't know if >> this >> > is right in the design of openstack with DPU: >> > >> > - In neutron-server side, use remote_managed tag in "openstack port >> create >> > ..." command. >> > This command will make neutron-server / OVN / ovn-controller / ovs to >> make >> > the network topology done, like above said. >> > I this this is right, because test shows that. >> that is not correct >> your test do not show what you think it does, they show the baisic bridge >> toplogy and flow configuraiton that ovn installs by defualt when it >> manages >> as ovs. >> >> please read the design docs for this feature for both nova and neutron to >> understand how the interacction works. >> >> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html >> >> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html >> > >> > - In nova side, there are 2 things should process, first is PCI >> passthrough >> > filter, second is nova-compute to plug VF into VM. >> > >> > If the link above is right, which remote_managed tag exists in >> > /etc/nova/nova.conf of controller node and exists in >> /etc/nova/nova.conf of >> > compute node. >> > As above ("- In my test, then I run "openstack server create" command") >> > said, got ERROR in this step. >> > So what should do in "PCI passthrough filter" ? How to configure ? >> > >> > Then, if "PCI passthrough filter" stage pass, what will do of >> nova-compute >> > in compute node? >> > >> > 3) Post all what I do follow this link: >> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. >> > - build openstack physical env, link plug DPU into compute mode, use VM >> as >> > controller ... etc. >> > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link. >> > - configure DPU side /etc/neutron/neutron.conf >> > - configure host side /etc/nova/nova.conf >> > - configure host side /etc/nova/nova-compute.conf >> > - run first 3 command >> > - last, run this command, got ERROR >> > >> > ---- >> > Simon Jones >> > >> > >> > Sean Mooney ?2023?3?1??? 18:35??? >> > >> > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: >> > > > Thanks a lot !!! >> > > > >> > > > As you say, I follow >> > > > >> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. >> > > > And I want to use DPU mode. Not "disable DPU mode". >> > > > So I think I should follow the link above exactlly, so I use >> > > > vnic-type=remote_anaged. >> > > > In my opnion, after I run first three command (which is "openstack >> > > network >> > > > create ...", "openstack subnet create", "openstack port create >> ..."), the >> > > > VF rep port and OVN and OVS rules are all ready. >> > > not at that point nothign will have been done on ovn/ovs >> > > >> > > that will only happen after the port is bound to a vm and host. >> > > >> > > > What I should do in "openstack server create ..." is to JUST add PCI >> > > device >> > > > into VM, do NOT call neutron-server in nova-compute of compute node >> ( >> > > like >> > > > call port_binding or something). >> > > this is incorrect. >> > > > >> > > > But as the log and steps said in the emails above, nova-compute call >> > > > port_binding to neutron-server while running the command "openstack >> > > server >> > > > create ...". >> > > > >> > > > So I still have questions is: >> > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do >> NOT >> > > call >> > > > neutron-server in nova-compute of compute node ( like call >> port_binding >> > > or >> > > > something)" . >> > > no this is not how its designed. >> > > until you attach the logical port to a vm (either at runtime or as >> part of >> > > vm create) >> > > the logical port is not assocated with any host or phsical dpu/vf. >> > > >> > > so its not possibel to instanciate the openflow rules in ovs form the >> > > logical switch model >> > > in the ovn north db as no chassie info has been populated and we do >> not >> > > have the dpu serial >> > > info in the port binding details. >> > > > 2) If it's right, how to deal with this? Which is how to JUST add >> PCI >> > > > device into VM, do NOT call neutron-server? By command or by >> configure? >> > > Is >> > > > there come document ? >> > > no this happens automaticaly when nova does the port binding which >> cannot >> > > happen until after >> > > teh vm is schduled to a host. >> > > > >> > > > ---- >> > > > Simon Jones >> > > > >> > > > >> > > > Sean Mooney ?2023?3?1??? 16:15??? >> > > > >> > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: >> > > > > > BTW, this link ( >> > > > > > >> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >> ) >> > > > > said >> > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that >> WRONG ? >> > > > > >> > > > > no its not wrong but for dpu smart nics you have to make a choice >> when >> > > you >> > > > > deploy >> > > > > either they can be used in dpu mode in which case remote_managed >> > > shoudl be >> > > > > set to true >> > > > > and you can only use them via neutron ports with >> > > vnic-type=remote_managed >> > > > > as descried in that doc >> > > > > >> > > > > >> > > >> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port >> > > > > >> > > > > >> > > > > or if you disable dpu mode in the nic frimware then you shoudl >> remvoe >> > > > > remote_managed form the pci device list and >> > > > > then it can be used liek a normal vf either for neutron sriov >> ports >> > > > > vnic-type=direct or via flavor based pci passthough. >> > > > > >> > > > > the issue you were havign is you configured the pci device list to >> > > contain >> > > > > "remote_managed: ture" which means >> > > > > the vf can only be consumed by a neutron port with >> > > > > vnic-type=remote_managed, when you have "remote_managed: false" or >> > > unset >> > > > > you can use it via vnic-type=direct i forgot that slight detail >> that >> > > > > vnic-type=remote_managed is required for "remote_managed: ture". >> > > > > >> > > > > >> > > > > in either case you foudn the correct doc >> > > > > >> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >> > > > > neutorn sriov port configuration is documented here >> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html >> > > > > and nova flavor based pci passthough is documeted here >> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html >> > > > > >> > > > > all three server slightly differnt uses. both neutron proceedures >> are >> > > > > exclusivly fo network interfaces. >> > > > > >> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >> > > > > requires the use of ovn deployed on the dpu >> > > > > to configure the VF contolplane. >> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html >> uses >> > > > > the sriov nic agent >> > > > > to manage the VF with ip tools. >> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html >> is >> > > > > intended for pci passthough >> > > > > of stateless acclerorators like qat devices. while the nova flavor >> > > approch >> > > > > cna be used with nics it not how its generally >> > > > > ment to be used and when used to passthough a nic expectation is >> that >> > > its >> > > > > not related to a neuton network. >> > > > > >> > > > > >> > > >> > > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From grasza at redhat.com Thu Mar 9 13:01:28 2023 From: grasza at redhat.com (Grzegorz Grasza) Date: Thu, 9 Mar 2023 14:01:28 +0100 Subject: [barbican] Canceling weekly meeting (March 14th) + PTG discussion topics Message-ID: Hi all, I'm on PTO next week, so I'm canceling the weekly meeting. We have just one last meeting before the Virtual PTG, so I booked a time slot for us on Tuesday, March 28th at 13:00 UTC. Please add topics you would like to discuss to the etherpad: https://etherpad.opendev.org/p/march2023-ptg-barbican If there are more things to discuss, I'll book another time slot later in the week. Thanks, / Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From x3om6ak at gmail.com Thu Mar 9 13:24:06 2023 From: x3om6ak at gmail.com (=?UTF-8?B?0JzQuNGF0LDQuNC7?=) Date: Thu, 9 Mar 2023 16:24:06 +0300 Subject: [neutron][ovn] domain-search dhcp option per network/subnet level Message-ID: Greetings to all! In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking ) I try to find a way to setup a scheme where my instances should get dhcp search domain option on network/subnet level. For example - when I create network/subnet, I want to tell neutron, that all created ports in that network/subnet must receive my dhcp search-domain option Any help advices will appreciated. Thank you! -- Best regards, Mikhail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Mar 9 15:13:03 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 9 Mar 2023 16:13:03 +0100 Subject: [neutron][ovn] domain-search dhcp option per network/subnet level In-Reply-To: References: Message-ID: Hello Mikhail: Short answer is no, we don't support this. Please check [1] and [2] for more context. Regards. [1]https://bugs.launchpad.net/neutron/+bug/1960850 [2] https://review.opendev.org/c/openstack/neutron-specs/+/832658/12/specs/zed/support-dns-subdomains-at-a-network-level.rst On Thu, Mar 9, 2023 at 3:19?PM ?????? wrote: > Greetings to all! > In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking ) > I try to find a way to setup a scheme where my instances should get dhcp > search domain option on network/subnet level. > > For example - when I create network/subnet, I want to tell neutron, that > all created ports in that network/subnet must receive my dhcp search-domain > option > > Any help advices will appreciated. Thank you! > > -- > Best regards, Mikhail. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From x3om6ak at gmail.com Thu Mar 9 15:23:38 2023 From: x3om6ak at gmail.com (=?UTF-8?B?0JzQuNGF0LDQuNC7?=) Date: Thu, 9 Mar 2023 18:23:38 +0300 Subject: [neutron][ovn] domain-search dhcp option per network/subnet level In-Reply-To: References: Message-ID: Thank you for your response. ??, 9 ???. 2023 ?., 18:13 Rodolfo Alonso Hernandez : > Hello Mikhail: > > Short answer is no, we don't support this. Please check [1] and [2] for > more context. > > Regards. > > [1]https://bugs.launchpad.net/neutron/+bug/1960850 > [2] > https://review.opendev.org/c/openstack/neutron-specs/+/832658/12/specs/zed/support-dns-subdomains-at-a-network-level.rst > > On Thu, Mar 9, 2023 at 3:19?PM ?????? wrote: > >> Greetings to all! >> In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking >> ) I try to find a way to setup a scheme where my instances should get dhcp >> search domain option on network/subnet level. >> >> For example - when I create network/subnet, I want to tell neutron, that >> all created ports in that network/subnet must receive my dhcp search-domain >> option >> >> Any help advices will appreciated. Thank you! >> >> -- >> Best regards, Mikhail. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristin at openinfra.dev Thu Mar 9 18:06:33 2023 From: kristin at openinfra.dev (Kristin Barrientos) Date: Thu, 9 Mar 2023 12:06:33 -0600 Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope Message-ID: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> Hi everyone, As we get closer to the OpenStack release, I wanted to reach out to see if any PTL?s were interested in providing their Antelope cycle highlights in an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we would get 4-6 projects represented. Previous examples of OpenStack release episodes can be found here[2]? and here[3] . Please let me know if you?re interested and I can provide next steps. If you would like to provide a project update but that time doesn?t work for you, please share a recording with me and I can get it added to the project navigator. Thanks, Kristin Barrientos Marketing Coordinator OpenInfra Foundation [1] https://openinfra.dev/live/ [2] https://www.youtube.com/watch?v=hwPfjvshxOM [3] https://www.youtube.com/watch?v=MSbB3L9_MeY -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Mar 9 21:43:02 2023 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 9 Mar 2023 16:43:02 -0500 Subject: [neutron] bonding sriov nic inside VMs Message-ID: Folks, As you know, SR-IOV doesn't support bonding so the only solution is to implement LACP bonding inside the VM. I did some tests in the lab to create two physnet and map them with two physical nic and create VF and attach them to VM. So far all good but one problem I am seeing is each neutron port I create has an IP address associated and I can use only one IP on bond but that is just a waste of IP in the Public IP pool. Are there any way to create sriov port but without IP address? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 9 22:52:20 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 9 Mar 2023 14:52:20 -0800 Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> Message-ID: I'll gladly represent Ironic again for the Antelope cycle highlights session. Thanks for running it! - Jay Faulkner Ironic PTL TC Member On Thu, Mar 9, 2023 at 10:19?AM Kristin Barrientos wrote: > Hi everyone, > > As we get closer to the OpenStack release, I wanted to reach out to see if > any PTL?s were interested in providing their Antelope cycle highlights in > an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we > would get 4-6 projects represented. Previous examples of OpenStack release > episodes can be found here[2] > and here[3] > . > > Please let me know if you?re interested and I can provide next steps. If > you would like to provide a project update but that time doesn?t work for > you, please share a recording with me and I can get it added to the project > navigator. > > Thanks, > > Kristin Barrientos > Marketing Coordinator > OpenInfra Foundation > > [1] https://openinfra.dev/live/ > > [2] https://www.youtube.com/watch?v=hwPfjvshxOM > > [3] https://www.youtube.com/watch?v=MSbB3L9_MeY > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 9 23:15:43 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 9 Mar 2023 15:15:43 -0800 Subject: [ironic][ptg] vPTG scheduling Message-ID: Hey all, The vPTG will be upon us soon, the week of March 27. I booked the following times on behalf of Ironic + BM SIG Operator hour, in accordance with what times worked in Antelope. It's my hope that since we've had little contributor turnover, these times continue to work. I'm completely open to having things moved around if it's more convenient to participants. I've booked the following times, all in Folsom: - Tuesday 1400 UTC - 1700 UTC - Wednesday 1300 UTC Operator hour: baremetal SIG - Wednesday 1400 UTC - 1600 UTC - Wednesday 2200 - 2300 UTC I propose that after the Ironic meeting on March 20, we shortly sync up in the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg) to pick topics and assign time. Again, this is all meant to be a suggestion, I'm happy to move things around but didn't want us to miss out on getting things booked. - Jay Faulkner Ironic PTL TC Member -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Fri Mar 10 07:20:57 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Fri, 10 Mar 2023 16:20:57 +0900 Subject: [all] broken pepe8 jobs caused by bandit 1.7.5 Message-ID: fyi; It seems the new release of bandit (1.7.5) just came out and this introduces a new lint rule to require defining the timeout parameter for all "requests" calls. https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb This is currently affecting heat and quick search shows some of the other projects contain some code not compliant with this rule(barbican, ceilometer, cinder, glance, manila, nova, ...). Also, it seems we do not pin bandit by u-c for some reason this likely affects all stable branches. Actually I first noticed this when I tried to backport one fix to 2023.1 branch of heat... Thank you, Takashi -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Fri Mar 10 07:27:59 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Fri, 10 Mar 2023 16:27:59 +0900 Subject: [all] broken pepe8 jobs caused by bandit 1.7.5 In-Reply-To: References: Message-ID: On Fri, Mar 10, 2023 at 4:20?PM Takashi Kajinami wrote: > fyi; > > It seems the new release of bandit (1.7.5) just came out and this > introduces a new lint rule > to require defining the timeout parameter for all "requests" calls. > > https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb > > This is currently affecting heat and quick search shows some of the other > projects contain some code > not compliant with this rule(barbican, ceilometer, cinder, glance, manila, > nova, ...). > Seems some of these (ceilometer, cinder, glance and manila) are not using bandit and others(nova) have the upper version defined. SO it might not affect limited number of projects using bandit without upper version but I'd recommend you check your own projects . > Also, it seems we do not pin bandit by u-c for some reason this likely > affects all stable branches. > Actually I first noticed this when I tried to backport one fix to 2023.1 > branch of heat... > > Thank you, > Takashi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Fri Mar 10 10:04:47 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 10 Mar 2023 11:04:47 +0100 Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> Message-ID: Le jeu. 9 mars 2023 ? 19:13, Kristin Barrientos a ?crit : > Hi everyone, > > As we get closer to the OpenStack release, I wanted to reach out to see if > any PTL?s were interested in providing their Antelope cycle highlights in > an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we > would get 4-6 projects represented. Previous examples of OpenStack release > episodes can be found here[2] > and here[3] > . > > Please let me know if you?re interested and I can provide next steps. If > you would like to provide a project update but that time doesn?t work for > you, please share a recording with me and I can get it added to the project > navigator. > > I can help again for the Nova project. > Thanks, > > Kristin Barrientos > Marketing Coordinator > OpenInfra Foundation > > [1] https://openinfra.dev/live/ > > [2] https://www.youtube.com/watch?v=hwPfjvshxOM > > [3] https://www.youtube.com/watch?v=MSbB3L9_MeY > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Fri Mar 10 10:44:30 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 10 Mar 2023 11:44:30 +0100 Subject: [all] broken pepe8 jobs caused by bandit 1.7.5 In-Reply-To: References: Message-ID: Le ven. 10 mars 2023 ? 08:33, Takashi Kajinami a ?crit : > > > On Fri, Mar 10, 2023 at 4:20?PM Takashi Kajinami > wrote: > >> fyi; >> >> It seems the new release of bandit (1.7.5) just came out and this >> introduces a new lint rule >> to require defining the timeout parameter for all "requests" calls. >> >> https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb >> >> This is currently affecting heat and quick search shows some of the other >> projects contain some code >> not compliant with this rule(barbican, ceilometer, cinder, glance, >> manila, nova, ...). >> > Seems some of these (ceilometer, cinder, glance and manila) are not using > bandit and others(nova) have > the upper version defined. SO it might not affect limited number of > projects using bandit without upper version > but I'd recommend you check your own projects . > > AFAIK, the Nova bandit specific tox target [1] isn't run on CI by any of the Zuul jobs we have [2] (we don't include a bandit check as part of a pep8 validation) I tested both 1.7.4 and 1.7.5 bandit versions on the tox target locally, and I don't see much of a difference. Sounds the issue is then unrelated to the Nova project, to clarify. -Sylvain [1] https://github.com/openstack/nova/blob/master/tox.ini#L260-L265 [2] https://github.com/openstack/nova/blob/master/.zuul.yaml Also, it seems we do not pin bandit by u-c for some reason this likely >> affects all stable branches. >> Actually I first noticed this when I tried to backport one fix to 2023.1 >> branch of heat... >> >> Thank you, >> Takashi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xek at redhat.com Fri Mar 10 10:48:56 2023 From: xek at redhat.com (Grzegorz Grasza) Date: Fri, 10 Mar 2023 11:48:56 +0100 Subject: [barbican] Canceling weekly meeting (March 14th) + PTG discussion topics Message-ID: Hi all, I'm on PTO next week, so I'm canceling the weekly meeting. We have just one last meeting before the Virtual PTG, so I booked a time slot for us on Tuesday, March 28th at 13:00 UTC. Please add topics you would like to discuss to the etherpad: https://etherpad.opendev.org/p/march2023-ptg-barbican If there are more things to discuss, I'll book another time slot later in the week. Thanks, / Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Mar 10 11:57:28 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 10 Mar 2023 11:57:28 +0000 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: References: Message-ID: On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > Folks, > > As you know, SR-IOV doesn't support bonding so the only solution is to > implement LACP bonding inside the VM. > > I did some tests in the lab to create two physnet and map them with two > physical nic and create VF and attach them to VM. So far all good but one > problem I am seeing is each neutron port I create has an IP address > associated and I can use only one IP on bond but that is just a waste of IP > in the Public IP pool. > > Are there any way to create sriov port but without IP address? techinially we now support adressless port in neutron and nova. so that shoudl be possible.? if you tried to do this with hardware offloaed ovs rather then the standard sriov with the sriov nic agent you likel will need to also use the allowed_adress_pairs extension to ensure that ovs did not drop the packets based on the ip adress. if you are using heriarcical port binding where you TOR is manged by an ml2 driver you might also need the allowed_adress_pairs extension with the sriov nic agent to make sure the packets are not drop at the swtitch level. as you likely arlready no we do not support VF bonding in openstack or bonded ports in general in then neutron api. there was an effort a few years ago to make a bond port extention that mirror hwo trunk ports work i.e. hanving 2 neutron subport and a bond port that agreates them but we never got that far with the design. that would have enabeld boning to be implemtned in diffent ml2 driver like ovs/sriov/ovn ectra with a consitent/common api. some people have used mellonox's VF lag functionalty howver that was never actully enable propelry in nova/neutron so its not officlaly supported upstream but that functional allow you to attach only a singel VF to the guest form bonded ports on a single card. there is no supprot in nova/neutron for that offically as i said it just happens to work unitnetionally so i would not advise that you use it in produciton unless your happy to work though any issues you find yourself. From roberto.acosta at luizalabs.com Fri Mar 10 11:58:32 2023 From: roberto.acosta at luizalabs.com (Roberto Bartzen Acosta) Date: Fri, 10 Mar 2023 08:58:32 -0300 Subject: [neutron] Openstack Network Interconnection In-Reply-To: References: Message-ID: Hi Felix, Thanks for your feedback. The ovn-bgp-agent is a very powerful application to interconnect multi-tenancy networks using BGP evpn type 5. This application integrates the br-ext with FRR and provides the interconnect using the BGP session. That would be one way to do it, but the problem is that bgpvpn service plugin is only integrated with Neutron. Imagine in the future that we need to integrate the tenant network between different cloud solutions (e.g using OpenStack, Kubernetes, LXD, etc.)... this could be possible if everyone uses OVN as a network backend and ovn-ic to interconnect the LRPs between AZs. Maybe I'm missing some point and there's no community interest in something like that. But back to the OpenStack/Neutron case, it might be interesting to continue the work on Neutron interconnect (or something like that), but maybe this time with the service plugin for ovn-ic. Regards, Roberto Em qui., 9 de mar. de 2023 ?s 05:24, Felix H?ttner escreveu: > Hi Roberto, > > > > We will face a similar issue in the future and have also looked at > ovn-interconnect (but not yet tested it). > > There is also ovn-bgp-agent [1] which has an evpn mode that might be > relevant. > > > > Whatever you find I would definitely be interested in your results > > > > [1] https://opendev.org/x/ovn-bgp-agent > > > > -- > > Felix Huettner > > > > *From:* Roberto Bartzen Acosta > *Sent:* Wednesday, March 8, 2023 9:49 PM > *To:* openstack-discuss at lists.openstack.org > *Cc:* Tiago Pires > *Subject:* [neutron] Openstack Network Interconnection > > > > Hey folks. > > Does anyone have ideas on how to interconnect different Openstack > deployments? > Consider that we have multiple Datacenters and need to interconnect tenant > networks. How could this be done in the context of OpenStack (without using > VPN) ? > > We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks > like a great solution to create a network layer between DCs/AZs with the > help of the OVN driver. However, Neutron does not support the Transit > Switches (OVN-IC design) that are required for this application. > > We've seen references to abandoned projects like [1] [2] [3]. > > Does anyone use something similar in production or have an idea about how > to do it? Imagine that we need to put workloads on two different AZs that > run different Openstack installations, and we want to communicate with the > local networks without using a FIP. > > I believe that the most coherent way to maintain databases consistent in > each Openstack would be an integration with Neutron, but I haven't seen any > movement on that. > > Regards, > Roberto > > [1] https://www.youtube.com/watch?v=GizLmSiH1Q0 > > [2] > https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html > > [3] https://opendev.org/x/neutron-interconnection > > > > > > > *?Esta mensagem ? direcionada apenas para os endere?os constantes no > cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no > cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa > mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o > imediatamente anuladas e proibidas?.* > > * ?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para > assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o > poder? aceitar a responsabilidade por quaisquer perdas ou danos causados > por esse e-mail ou por seus anexos?.* > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier . > -- _?Esta mensagem ? direcionada apenas para os endere?os constantes no cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o imediatamente anuladas e proibidas?._ *?**?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Fri Mar 10 12:27:01 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 10 Mar 2023 13:27:01 +0100 Subject: [neutron] Drivers meeting cancelled Message-ID: Hello Neutrinos: Due to the lack of agenda, today's drivers meeting is cancelled. Have a nice weekend! -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 10 13:30:21 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 10 Mar 2023 08:30:21 -0500 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: References: Message-ID: Thanks Sean, I don't have NIC which supports hardware offloading or any kind of feature. I am using intel nic 82599 just for SRIOV and looking for bonding support which is only possible inside VM. As you know we already run a large SRIOV environment with openstack but my biggest issue is to upgrade switches without downtime. I want to be more resilient to not worry about that. Do you still think it's dangerous or not a good idea to bond sriov nic inside VM? what could go wrong here just trying to understand before i go crazy :) On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney wrote: > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > > Folks, > > > > As you know, SR-IOV doesn't support bonding so the only solution is to > > implement LACP bonding inside the VM. > > > > I did some tests in the lab to create two physnet and map them with two > > physical nic and create VF and attach them to VM. So far all good but one > > problem I am seeing is each neutron port I create has an IP address > > associated and I can use only one IP on bond but that is just a waste of > IP > > in the Public IP pool. > > > > Are there any way to create sriov port but without IP address? > techinially we now support adressless port in neutron and nova. > so that shoudl be possible. > if you tried to do this with hardware offloaed ovs rather then the > standard sriov with the sriov > nic agent you likel will need to also use the allowed_adress_pairs > extension to ensure that ovs did not > drop the packets based on the ip adress. if you are using heriarcical port > binding where you TOR is manged > by an ml2 driver you might also need the allowed_adress_pairs extension > with the sriov nic agent to make sure > the packets are not drop at the swtitch level. > > as you likely arlready no we do not support VF bonding in openstack or > bonded ports in general in then neutron api. > there was an effort a few years ago to make a bond port extention that > mirror hwo trunk ports work > i.e. hanving 2 neutron subport and a bond port that agreates them but we > never got that far with > the design. that would have enabeld boning to be implemtned in diffent ml2 > driver like ovs/sriov/ovn ectra with > a consitent/common api. > > some people have used mellonox's VF lag functionalty howver that was never > actully enable propelry in nova/neutron > so its not officlaly supported upstream but that functional allow you to > attach only a singel VF to the guest form > bonded ports on a single card. > > there is no supprot in nova/neutron for that offically as i said it just > happens to work unitnetionally so i would not > advise that you use it in produciton unless your happy to work though any > issues you find yourself. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Mar 10 14:02:00 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 10 Mar 2023 14:02:00 +0000 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: References: Message-ID: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com> On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote: > Thanks Sean, > > I don't have NIC which supports hardware offloading or any kind of feature. > I am using intel nic 82599 just for SRIOV and looking for bonding > support which is only possible inside VM. As you know we already run a > large SRIOV environment with openstack but my biggest issue is to upgrade > switches without downtime. I want to be more resilient to not worry > about that. > > Do you still think it's dangerous or not a good idea to bond sriov nic > inside VM? what could go wrong here just trying to understand before i go > crazy :) lacp bond mode generaly dont work fully but you should be abel to get basic failover bondign working and perhaps tcp loadbalcing provide it does not require switch coperator to work form inside the guest. just keep in mind that by defintion if you decalre a network as on a seperate phsynet to another then you as the operator are asserting that there is no l2 connectivity between those networks. as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on phsynet_2 if you break that and use phsynets to select PFs you are also breaking neutron multi teancy model meaning it is not safy to aloow end uers to create vlan networks and instead you can only use provider created vlan networks. so what you want to do is proably achiveable but you menthion phsyntes per pf and that sounds like you are breaking the physnets are seperate isolagged phsycial netowrks rule. > > > > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney wrote: > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > > > Folks, > > > > > > As you know, SR-IOV doesn't support bonding so the only solution is to > > > implement LACP bonding inside the VM. > > > > > > I did some tests in the lab to create two physnet and map them with two > > > physical nic and create VF and attach them to VM. So far all good but one > > > problem I am seeing is each neutron port I create has an IP address > > > associated and I can use only one IP on bond but that is just a waste of > > IP > > > in the Public IP pool. > > > > > > Are there any way to create sriov port but without IP address? > > techinially we now support adressless port in neutron and nova. > > so that shoudl be possible. > > if you tried to do this with hardware offloaed ovs rather then the > > standard sriov with the sriov > > nic agent you likel will need to also use the allowed_adress_pairs > > extension to ensure that ovs did not > > drop the packets based on the ip adress. if you are using heriarcical port > > binding where you TOR is manged > > by an ml2 driver you might also need the allowed_adress_pairs extension > > with the sriov nic agent to make sure > > the packets are not drop at the swtitch level. > > > > as you likely arlready no we do not support VF bonding in openstack or > > bonded ports in general in then neutron api. > > there was an effort a few years ago to make a bond port extention that > > mirror hwo trunk ports work > > i.e. hanving 2 neutron subport and a bond port that agreates them but we > > never got that far with > > the design. that would have enabeld boning to be implemtned in diffent ml2 > > driver like ovs/sriov/ovn ectra with > > a consitent/common api. > > > > some people have used mellonox's VF lag functionalty howver that was never > > actully enable propelry in nova/neutron > > so its not officlaly supported upstream but that functional allow you to > > attach only a singel VF to the guest form > > bonded ports on a single card. > > > > there is no supprot in nova/neutron for that offically as i said it just > > happens to work unitnetionally so i would not > > advise that you use it in produciton unless your happy to work though any > > issues you find yourself. > > > > From katonalala at gmail.com Fri Mar 10 14:39:15 2023 From: katonalala at gmail.com (Lajos Katona) Date: Fri, 10 Mar 2023 15:39:15 +0100 Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough feature correctly? And is this BUG in update_devices_from_hypervisor_resources? In-Reply-To: References: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com> Message-ID: Hi, Could you please open a doc bug for this issue on launchpad: https://bugs.launchpad.net/neutron Thanks for the efforts. Lajos Katona (lajoskatona) Simon Jones ezt ?rta (id?pont: 2023. m?rc. 9., Cs, 15:29): > Hi, all > > At last, I got the root cause of this 2 problem. > And I suggest add these words to > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html: > ``` > Prerequisites: > libvirt >= 7.9.0 . Like ubuntu-22.04, which use libvirt-8.0.0 by default. > ``` > > Root cause of problem 1, which is "no valid host": > - Because libvirt version is too low. > > Root cause of problem 2, which is "why there are topology in DPU in > openstack create port command": > - Because add --binding-profile params in openstack create port command, > which is NOT right. > > ---- > Simon Jones > > > Dmitrii Shcherbakov ?2023?3?2??? > 20:30??? > >> Hi {Sean, Simon}, >> >> > did you ever give a presentation on the DPU support >> >> Yes, there were a couple at different stages. >> >> The following is the one of the older ones that references the SMARTNIC >> VNIC type but we later switched to REMOTE_MANAGED in the final code: >> https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf, >> however, it has a useful diagram on page 15 which shows the interactions of >> different components. A lot of other content from it is present in the >> OpenStack docs now which we added during the feature development. >> >> There is also a presentation with a demo that we did at the Open Infra >> summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared >> the material after the features got merged). >> >> Generally, as Sean described, the aim of this feature is to make the >> interaction between components present at the hypervisor and the DPU side >> automatic but, in order to make this workflow explicitly different from >> SR-IOV or offload at the hypervisor side, one has to use the >> "remote_managed" flag. This flag allows Nova to differentiate between >> "regular" VFs and the ones that have to be programmed by a remote host >> (DPU) - hence the name. >> >> A port needs to be pre-created with the remote-managed type - that way >> when Nova tries to schedule a VM with that port attached, it will find >> hosts which actually have PCI devices tagged with the "remote_managed": >> "true" in the PCI whitelist. >> >> The important thing to note here is that you must not use PCI passthrough >> directly for this - Nova will create a PCI device request automatically >> with the remote_managed flag included. There is currently no way to >> instruct Nova to choose one vendor/device ID vs the other for this (any >> remote_managed=true device from a pool will match) but maybe the work that >> was recently done to store PCI device information in the Placement service >> will pave the way for such granularity in the future. >> >> Best Regards, >> Dmitrii Shcherbakov >> LP/MM/oftc: dmitriis >> >> >> On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney wrote: >> >>> adding Dmitrii who was the primary developer of the openstack >>> integration so >>> they can provide more insight. >>> >>> Dmitrii did you ever give a presentationon the DPU support and how its >>> configured/integrated >>> that might help fill in the gaps for simon? >>> >>> more inline. >>> >>> On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote: >>> > E... >>> > >>> > But there are these things: >>> > >>> > 1) Show some real happened in my test: >>> > >>> > - Let me clear that, I use DPU in compute node: >>> > The graph in >>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >>> . >>> > >>> > - I configure exactly follow >>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html, >>> > which is said bellow in "3) Let me post all what I do follow this >>> link". >>> > >>> > - In my test, I found after first three command (which is "openstack >>> > network create ...", "openstack subnet create", "openstack port create >>> ..."), >>> > there are network topology exist in DPU side, and there are rules >>> exist in >>> > OVN north DB, south DB of controller, like this: >>> > >>> > > ``` >>> > > root at c1:~# ovn-nbctl show >>> > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976 >>> > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice) >>> > > port 01a68701-0e6a-4c30-bfba-904d1b9813e1 >>> > > addresses: ["unknown"] >>> > > port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1) >>> > > addresses: ["fa:16:3e:13:36:e2 172.1.1.228"] >>> > > >>> > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding >>> > > _uuid : 61dc8bc0-ab33-4d67-ac13-0781f89c905a >>> > > chassis : [] >>> > > datapath : 91d3509c-d794-496a-ba11-3706ebf143c8 >>> > > encap : [] >>> > > external_ids : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24 >>> ", >>> > > "neutron:device_id"="", "neutron:device_owner"="", >>> > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69, >>> > > "neutron:port_name"=pf0vf1, >>> > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9", >>> > > "neutron:revision_number"="1", >>> > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"} >>> > > >>> > > root at c1c2dpu:~# sudo ovs-vsctl show >>> > > 62cf78e5-2c02-471e-927e-1d69c2c22195 >>> > > Bridge br-int >>> > > fail_mode: secure >>> > > datapath_type: system >>> > > Port br-int >>> > > Interface br-int >>> > > type: internal >>> > > Port ovn--1 >>> > > Interface ovn--1 >>> > > type: geneve >>> > > options: {csum="true", key=flow, >>> remote_ip="172.168.2.98"} >>> > > Port pf0vf1 >>> > > Interface pf0vf1 >>> > > ovs_version: "2.17.2-24a81c8" >>> > > ``` >>> > > >>> > That's why I guess "first three command" has already create network >>> > topology, and "openstack server create" command only need to plug VF >>> into >>> > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done. >>> no that jsut looks like the standard bridge toplogy that gets created >>> when you provision >>> the dpu to be used with openstac vai ovn. >>> >>> that looks unrelated to the neuton comamnd you ran. >>> > >>> > - In my test, then I run "openstack server create" command, I got ERROR >>> > which said "No valid host...", which is what the email said above. >>> > The reason has already said, it's nova-scheduler's PCI filter module >>> report >>> > no valid host. The reason "nova-scheduler's PCI filter module report no >>> > valid host" is nova-scheduler could NOT see PCI information of compute >>> > node. The reason "nova-scheduler could NOT see PCI information of >>> compute >>> > node" is compute node's /etc/nova/nova.conf configure remote_managed >>> tag >>> > like this: >>> > >>> > > ``` >>> > > [pci] >>> > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e", >>> > > "physical_network": null, "remote_managed": "true"} >>> > > alias = { "vendor_id":"15b3", "product_id":"101e", >>> > > "device_type":"type-VF", "name":"a1" } >>> > > ``` >>> > > >>> > >>> > 2) Discuss some detail design of "remote_managed" tag, I don't know if >>> this >>> > is right in the design of openstack with DPU: >>> > >>> > - In neutron-server side, use remote_managed tag in "openstack port >>> create >>> > ..." command. >>> > This command will make neutron-server / OVN / ovn-controller / ovs to >>> make >>> > the network topology done, like above said. >>> > I this this is right, because test shows that. >>> that is not correct >>> your test do not show what you think it does, they show the baisic bridge >>> toplogy and flow configuraiton that ovn installs by defualt when it >>> manages >>> as ovs. >>> >>> please read the design docs for this feature for both nova and neutron >>> to understand how the interacction works. >>> >>> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html >>> >>> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html >>> > >>> > - In nova side, there are 2 things should process, first is PCI >>> passthrough >>> > filter, second is nova-compute to plug VF into VM. >>> > >>> > If the link above is right, which remote_managed tag exists in >>> > /etc/nova/nova.conf of controller node and exists in >>> /etc/nova/nova.conf of >>> > compute node. >>> > As above ("- In my test, then I run "openstack server create" command") >>> > said, got ERROR in this step. >>> > So what should do in "PCI passthrough filter" ? How to configure ? >>> > >>> > Then, if "PCI passthrough filter" stage pass, what will do of >>> nova-compute >>> > in compute node? >>> > >>> > 3) Post all what I do follow this link: >>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. >>> > - build openstack physical env, link plug DPU into compute mode, use >>> VM as >>> > controller ... etc. >>> > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link. >>> > - configure DPU side /etc/neutron/neutron.conf >>> > - configure host side /etc/nova/nova.conf >>> > - configure host side /etc/nova/nova-compute.conf >>> > - run first 3 command >>> > - last, run this command, got ERROR >>> > >>> > ---- >>> > Simon Jones >>> > >>> > >>> > Sean Mooney ?2023?3?1??? 18:35??? >>> > >>> > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote: >>> > > > Thanks a lot !!! >>> > > > >>> > > > As you say, I follow >>> > > > >>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html. >>> > > > And I want to use DPU mode. Not "disable DPU mode". >>> > > > So I think I should follow the link above exactlly, so I use >>> > > > vnic-type=remote_anaged. >>> > > > In my opnion, after I run first three command (which is "openstack >>> > > network >>> > > > create ...", "openstack subnet create", "openstack port create >>> ..."), the >>> > > > VF rep port and OVN and OVS rules are all ready. >>> > > not at that point nothign will have been done on ovn/ovs >>> > > >>> > > that will only happen after the port is bound to a vm and host. >>> > > >>> > > > What I should do in "openstack server create ..." is to JUST add >>> PCI >>> > > device >>> > > > into VM, do NOT call neutron-server in nova-compute of compute >>> node ( >>> > > like >>> > > > call port_binding or something). >>> > > this is incorrect. >>> > > > >>> > > > But as the log and steps said in the emails above, nova-compute >>> call >>> > > > port_binding to neutron-server while running the command "openstack >>> > > server >>> > > > create ...". >>> > > > >>> > > > So I still have questions is: >>> > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do >>> NOT >>> > > call >>> > > > neutron-server in nova-compute of compute node ( like call >>> port_binding >>> > > or >>> > > > something)" . >>> > > no this is not how its designed. >>> > > until you attach the logical port to a vm (either at runtime or as >>> part of >>> > > vm create) >>> > > the logical port is not assocated with any host or phsical dpu/vf. >>> > > >>> > > so its not possibel to instanciate the openflow rules in ovs form the >>> > > logical switch model >>> > > in the ovn north db as no chassie info has been populated and we do >>> not >>> > > have the dpu serial >>> > > info in the port binding details. >>> > > > 2) If it's right, how to deal with this? Which is how to JUST add >>> PCI >>> > > > device into VM, do NOT call neutron-server? By command or by >>> configure? >>> > > Is >>> > > > there come document ? >>> > > no this happens automaticaly when nova does the port binding which >>> cannot >>> > > happen until after >>> > > teh vm is schduled to a host. >>> > > > >>> > > > ---- >>> > > > Simon Jones >>> > > > >>> > > > >>> > > > Sean Mooney ?2023?3?1??? 16:15??? >>> > > > >>> > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote: >>> > > > > > BTW, this link ( >>> > > > > > >>> > > >>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) >>> > > > > said >>> > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that >>> WRONG ? >>> > > > > >>> > > > > no its not wrong but for dpu smart nics you have to make a >>> choice when >>> > > you >>> > > > > deploy >>> > > > > either they can be used in dpu mode in which case remote_managed >>> > > shoudl be >>> > > > > set to true >>> > > > > and you can only use them via neutron ports with >>> > > vnic-type=remote_managed >>> > > > > as descried in that doc >>> > > > > >>> > > > > >>> > > >>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port >>> > > > > >>> > > > > >>> > > > > or if you disable dpu mode in the nic frimware then you shoudl >>> remvoe >>> > > > > remote_managed form the pci device list and >>> > > > > then it can be used liek a normal vf either for neutron sriov >>> ports >>> > > > > vnic-type=direct or via flavor based pci passthough. >>> > > > > >>> > > > > the issue you were havign is you configured the pci device list >>> to >>> > > contain >>> > > > > "remote_managed: ture" which means >>> > > > > the vf can only be consumed by a neutron port with >>> > > > > vnic-type=remote_managed, when you have "remote_managed: false" >>> or >>> > > unset >>> > > > > you can use it via vnic-type=direct i forgot that slight detail >>> that >>> > > > > vnic-type=remote_managed is required for "remote_managed: ture". >>> > > > > >>> > > > > >>> > > > > in either case you foudn the correct doc >>> > > > > >>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >>> > > > > neutorn sriov port configuration is documented here >>> > > > > >>> https://docs.openstack.org/neutron/latest/admin/config-sriov.html >>> > > > > and nova flavor based pci passthough is documeted here >>> > > > > >>> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html >>> > > > > >>> > > > > all three server slightly differnt uses. both neutron >>> proceedures are >>> > > > > exclusivly fo network interfaces. >>> > > > > >>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html >>> > > > > requires the use of ovn deployed on the dpu >>> > > > > to configure the VF contolplane. >>> > > > > >>> https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses >>> > > > > the sriov nic agent >>> > > > > to manage the VF with ip tools. >>> > > > > >>> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is >>> > > > > intended for pci passthough >>> > > > > of stateless acclerorators like qat devices. while the nova >>> flavor >>> > > approch >>> > > > > cna be used with nics it not how its generally >>> > > > > ment to be used and when used to passthough a nic expectation is >>> that >>> > > its >>> > > > > not related to a neuton network. >>> > > > > >>> > > > > >>> > > >>> > > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Fri Mar 10 14:48:05 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Fri, 10 Mar 2023 14:48:05 +0000 Subject: [release] Release countdown for week R-1, March 13-17 Message-ID: Development Focus ----------------- We are on the final mile of the 2023.1 Antelope development cycle! Remember that the 2023.1 Antelope final release will include the latest release candidate (for cycle-with-rc deliverables) or the latest intermediary release (for cycle-with-intermediary deliverables) available. March 17th, 2023 is the deadline for final 2023.1 Antelope release candidates as well as any last cycle-with-intermediary deliverables. We will then enter a quiet period until we tag the final release on March 22nd, 2023. Teams should be prioritizing fixing release-critical bugs, before that deadline. Otherwise it's time to start planning the 2023.2 Bobcat development cycle, including discussing PTG sessions content, in preparation of the 2023.2 Bobcat Virtual PTG (March 27-31, 2023). Actions ------- Watch for any translation patches coming through on the stable/2023.1 branch and merge them quickly. If you discover a release-critical issue, please make sure to fix it on the master branch first, then backport the bugfix to the stable/2023.1 branch before triggering a new release. Please drop by #openstack-release with any questions or concerns about the upcoming release ! Upcoming Deadlines & Dates -------------------------- Final 2023.1 Antelope release: March 22nd, 2023 2023.2 Bobcat Virtual PTG: March 27-31, 2023 El?d Ill?s irc: elodilles @ #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcetto at gmail.com Fri Mar 10 15:04:43 2023 From: garcetto at gmail.com (garcetto) Date: Fri, 10 Mar 2023 16:04:43 +0100 Subject: [manila] support for encryption Message-ID: good afternoon, does manila support encryption in some sort ? thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 10 16:54:40 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 10 Mar 2023 11:54:40 -0500 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com> References: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com> Message-ID: Hi Sean, I have a few questions and they are in-line. This is the reference doc i am trying to achieve in my private cloud - https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney wrote: > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote: > > Thanks Sean, > > > > I don't have NIC which supports hardware offloading or any kind of > feature. > > I am using intel nic 82599 just for SRIOV and looking for bonding > > support which is only possible inside VM. As you know we already run a > > large SRIOV environment with openstack but my biggest issue is to upgrade > > switches without downtime. I want to be more resilient to not worry > > about that. > > > > Do you still think it's dangerous or not a good idea to bond sriov nic > > inside VM? what could go wrong here just trying to understand before i > go > > crazy :) > lacp bond mode generaly dont work fully but you should be abel to get > basic failover bondign working > and perhaps tcp loadbalcing provide it does not require switch coperator > to work form inside the guest. > What do you mean by not working fully? Are you talking about active-active vs active-standby? > > just keep in mind that by defintion if you decalre a network as on a > seperate phsynet to another > then you as the operator are asserting that there is no l2 connectivity > between those networks. > > This is interesting why not both physnet have the same L2 segment? Are you worried STP about the loop? But that is how LACP works both physical interfaces on the same segments. > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on > phsynet_2 > I did a test in the lab with physnet_1 and physnet_2 both on the same VLAN ID in the same L2 domain and all works. > > if you break that and use phsynets to select PFs you are also breaking > neutron multi teancy model > meaning it is not safy to aloow end uers to create vlan networks and > instead you can only use provider created > vlan networks. > This is a private cloud and we don't have any multi-tenancy model. We have all VLAN base providers and my Datacenter core router is the gateway for all my vlans providers. > > so what you want to do is proably achiveable but you menthion phsyntes per > pf and that sounds like you are breaking > the physnets are seperate isolagged phsycial netowrks rule. > I can understand each physnet should be in a different tenant but in my case its vlan base provider and not sure what rules it's going to break. > > > > > > > > > > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney wrote: > > > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > > > > Folks, > > > > > > > > As you know, SR-IOV doesn't support bonding so the only solution is > to > > > > implement LACP bonding inside the VM. > > > > > > > > I did some tests in the lab to create two physnet and map them with > two > > > > physical nic and create VF and attach them to VM. So far all good > but one > > > > problem I am seeing is each neutron port I create has an IP address > > > > associated and I can use only one IP on bond but that is just a > waste of > > > IP > > > > in the Public IP pool. > > > > > > > > Are there any way to create sriov port but without IP address? > > > techinially we now support adressless port in neutron and nova. > > > so that shoudl be possible. > > > if you tried to do this with hardware offloaed ovs rather then the > > > standard sriov with the sriov > > > nic agent you likel will need to also use the allowed_adress_pairs > > > extension to ensure that ovs did not > > > drop the packets based on the ip adress. if you are using heriarcical > port > > > binding where you TOR is manged > > > by an ml2 driver you might also need the allowed_adress_pairs extension > > > with the sriov nic agent to make sure > > > the packets are not drop at the swtitch level. > > > > > > as you likely arlready no we do not support VF bonding in openstack or > > > bonded ports in general in then neutron api. > > > there was an effort a few years ago to make a bond port extention that > > > mirror hwo trunk ports work > > > i.e. hanving 2 neutron subport and a bond port that agreates them but > we > > > never got that far with > > > the design. that would have enabeld boning to be implemtned in diffent > ml2 > > > driver like ovs/sriov/ovn ectra with > > > a consitent/common api. > > > > > > some people have used mellonox's VF lag functionalty howver that was > never > > > actully enable propelry in nova/neutron > > > so its not officlaly supported upstream but that functional allow you > to > > > attach only a singel VF to the guest form > > > bonded ports on a single card. > > > > > > there is no supprot in nova/neutron for that offically as i said it > just > > > happens to work unitnetionally so i would not > > > advise that you use it in produciton unless your happy to work though > any > > > issues you find yourself. > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Mar 10 17:07:53 2023 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 10 Mar 2023 18:07:53 +0100 Subject: [blazar][ptg] Bobcat PTG scheduling Message-ID: Hello, The Bobcat PTG will happen online during the week starting March 27. As the Blazar project has done in the past, I suggest we meet on Thursday, but starting 1400 UTC rather than the usual 1500 of our biweekly meeting. I have booked two hours in the Bexar room. If you want to join, please let me know if this works for you. To summarise, the Blazar project will meet on Thursday March 30 from 1400 UTC to 1600 UTC. We will prepare discussion topics on Etherpad. Cheers, Pierre Riteau (priteau) -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Mar 10 17:37:27 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 10 Mar 2023 17:37:27 +0000 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: References: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com> Message-ID: <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com> On Fri, 2023-03-10 at 11:54 -0500, Satish Patel wrote: > Hi Sean, > > I have a few questions and they are in-line. This is the reference doc i am > trying to achieve in my private cloud - > https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html ^ is only safe in a multi tenant envionment if https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.tenant_network_types does not container vlan or flat. it is technially breaking neutron rules for how to use phsyents. in private cloud where tenatn isolation is not required operators have abused this for years for things like selecting numa nodes and many other usecase which are unsafe in a public cloud. > > On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney wrote: > > > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote: > > > Thanks Sean, > > > > > > I don't have NIC which supports hardware offloading or any kind of > > feature. > > > I am using intel nic 82599 just for SRIOV and looking for bonding > > > support which is only possible inside VM. As you know we already run a > > > large SRIOV environment with openstack but my biggest issue is to upgrade > > > switches without downtime. I want to be more resilient to not worry > > > about that. > > > > > > Do you still think it's dangerous or not a good idea to bond sriov nic > > > inside VM? what could go wrong here just trying to understand before i > > go > > > crazy :) > > lacp bond mode generaly dont work fully but you should be abel to get > > basic failover bondign working > > and perhaps tcp loadbalcing provide it does not require switch coperator > > to work form inside the guest. > > > > What do you mean by not working fully? Are you talking about active-active > vs active-standby? some lacp modes require configuration on the swtich others do not you can only really do that form the pf as at the switch level you can bring down the port fo ronly some vlans in a failover case. https://docs.rackspace.com/blog/lacp-bonding-and-linux-configuration/ i belive mode 0, 1, 2, 5 and 6 can work withour sepcial switgh config. 3 and 4 i think reuqired switch cooperation IEEE 802.3ad (mode 4) in particalar i think neeed coperation with the switch. """The link is set up dynamically between two LACP-supporting peers.""" https://en.wikipedia.org/wiki/Link_aggregation that peerign session can only really run on the PFs balance-tlb (5) and balance-alb(6) shoudl work fine for teh VFs in the guest however. > > > > > > just keep in mind that by defintion if you decalre a network as on a > > seperate phsynet to another > > then you as the operator are asserting that there is no l2 connectivity > > between those networks. > > > > > This is interesting why not both physnet have the same L2 segment? Are you > worried STP about the loop? But that is how LACP works both physical > interfaces on the same segments. if they are on the same l2 segment then there is no multi tancy when using vlan or flat netowrks. more on this below. > > > > > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on > > phsynet_2 > > > > I did a test in the lab with physnet_1 and physnet_2 both on the same VLAN > ID in the same L2 domain and all works. if you create 2 neutron networks physnet_1_vlan_100 and physnet_2_vlan_100 and map phsynet_1 to eth1 and phsnet_2 to eth2 and plug the both into the same TOR with vlan 100 trunked to both then boot one vm on physnet_1_vlan_100 and a second on physnet_2_vlan_100 then a few things will hapen. the vms will boot fine and both will get ips. second there will be no isolation between the two networks so if you use the same subnet on both then they will be able to direcly ping each other. its unsafe to have teant cretable vlan networks in this if you have overlaping vlan ranges between physnet_1 and physnet_2 as there will be no tenant isolation enforeced at teh network level. form a neutron point of view physnet_1_vlan_100 and physnet_2_vlan_100 are two entrily differnt netowrks and its the oeprators responsiblity to ensure there network fabric ensure the same vlan on two phsnets cant comunicate. > > > > > > if you break that and use phsynets to select PFs you are also breaking > > neutron multi teancy model > > meaning it is not safy to aloow end uers to create vlan networks and > > instead you can only use provider created > > vlan networks. > > > > This is a private cloud and we don't have any multi-tenancy model. We have > all VLAN base providers and my Datacenter core router is the gateway for > all my vlans providers. ack in which case you can live with the fact that there is no mulit taenancy guarentees because the rules areound phsynets have been broken. this is prrety common in telco cloud by the way so you would not be the first to do this. > > > > > > so what you want to do is proably achiveable but you menthion phsyntes per > > pf and that sounds like you are breaking > > the physnets are seperate isolagged phsycial netowrks rule. > > > > I can understand each physnet should be in a different tenant but in my > case its vlan base provider and not sure what rules it's going to break. each physnet does not need to be a diffent tenatn the imporant thing is that neutron expects vlans on differnt physnets to be allcoateable seperatly. so the same vlan on 2 phsynets logically represnet 2 differnt networks. > > > > > > > > > > > > > > > > > > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney wrote: > > > > > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > > > > > Folks, > > > > > > > > > > As you know, SR-IOV doesn't support bonding so the only solution is > > to > > > > > implement LACP bonding inside the VM. > > > > > > > > > > I did some tests in the lab to create two physnet and map them with > > two > > > > > physical nic and create VF and attach them to VM. So far all good > > but one > > > > > problem I am seeing is each neutron port I create has an IP address > > > > > associated and I can use only one IP on bond but that is just a > > waste of > > > > IP > > > > > in the Public IP pool. > > > > > > > > > > Are there any way to create sriov port but without IP address? > > > > techinially we now support adressless port in neutron and nova. > > > > so that shoudl be possible. > > > > if you tried to do this with hardware offloaed ovs rather then the > > > > standard sriov with the sriov > > > > nic agent you likel will need to also use the allowed_adress_pairs > > > > extension to ensure that ovs did not > > > > drop the packets based on the ip adress. if you are using heriarcical > > port > > > > binding where you TOR is manged > > > > by an ml2 driver you might also need the allowed_adress_pairs extension > > > > with the sriov nic agent to make sure > > > > the packets are not drop at the swtitch level. > > > > > > > > as you likely arlready no we do not support VF bonding in openstack or > > > > bonded ports in general in then neutron api. > > > > there was an effort a few years ago to make a bond port extention that > > > > mirror hwo trunk ports work > > > > i.e. hanving 2 neutron subport and a bond port that agreates them but > > we > > > > never got that far with > > > > the design. that would have enabeld boning to be implemtned in diffent > > ml2 > > > > driver like ovs/sriov/ovn ectra with > > > > a consitent/common api. > > > > > > > > some people have used mellonox's VF lag functionalty howver that was > > never > > > > actully enable propelry in nova/neutron > > > > so its not officlaly supported upstream but that functional allow you > > to > > > > attach only a singel VF to the guest form > > > > bonded ports on a single card. > > > > > > > > there is no supprot in nova/neutron for that offically as i said it > > just > > > > happens to work unitnetionally so i would not > > > > advise that you use it in produciton unless your happy to work though > > any > > > > issues you find yourself. > > > > > > > > > > > > From jay at gr-oss.io Fri Mar 10 18:27:34 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Fri, 10 Mar 2023 10:27:34 -0800 Subject: [security-sig][ironic] Ironic + the VMT In-Reply-To: <20230227181721.jkxd2q3pfr6d7fo6@yuggoth.org> References: <20230227181721.jkxd2q3pfr6d7fo6@yuggoth.org> Message-ID: I've reviewed the requirements, and it's my intention to set Ironic as under the VMT. I'll wait until it can be announced at Monday's meeting to make it official so folks can have a chance to object if they wish. - Jay Faulkner Ironic PTL TC Member On Mon, Feb 27, 2023 at 10:26?AM Jeremy Stanley wrote: > On 2023-02-27 08:16:50 -0800 (-0800), Jay Faulkner wrote: > [...] > > Is there any reason Ironic should not be vulnerability-managed? Is the > > security team willing to have us? > > As long as you make sure you're good with this checklist, just > propose the specific repositories in question as an update to the > top section of the document (in openstack/ossa): > > https://security.openstack.org/repos-overseen.html#requirements > > > The only potential complication is that Ironic may receive reports > > for vendor libraries used by Ironic but not maintained by > > Ironic -- I was hoping there might already be some historical > > precedent for how we handle those; it can't be that unique to > > Ironic. > [...] > > 2. The VMT will not track or issue advisories for external > software components. Only source code provided by official > OpenStack project teams is eligible for oversight by the VMT. > For example, base operating system components included in a > server/container image or libraries vendored into compiled > binary artifacts are not within the VMT?s scope. > > Receiving bug reports about such things is fine, but the VMT doesn't > coordinate those reports nor issue official security advisories > about them since they need fixing by their upstream maintainers with > whom we have no direct relationship. You can still propose security > notes urging operators to update software in those situations, if it > seems appropriate to do so: > > https://wiki.openstack.org/wiki/Security_Notes > > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Fri Mar 10 18:39:53 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Fri, 10 Mar 2023 10:39:53 -0800 Subject: cryptography min version (non-rust) through 2024.1 In-Reply-To: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org> References: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org> Message-ID: > I expect the TC is going to choose Ubuntu 22.04 LTS as a target > platform for at least the OpenStack 2023.2 and 2024.1 coordinated > releases, but almost certainly the 2024.2 coordinated release as > well since Ubuntu 24.04 LTS won't be officially available before we > start that development cycle. That means the first coordinated > OpenStack release which would be able to effectively depend on > features from a newer python3-cryptography package on Ubuntu is > going to be 2025.1. Food for thought. > To be explicit, that testing platform means we require that OpenStack and its dependencies are installable *in a virtualenv*, not using distro python. So while this is a small imitation for this case (we cannot use cryptography that might require newer tooling to build than that ubuntu LTS would), the real motivation behind holding onto a lower-constraint for a longer time is in partnership with stable distros that ship OpenStack, who we don't want to alienate by giving them extra work of using newer releases than they are ready to. I am OK with this approach; but there is a trade-off: we may be leading OpenStack consumers who don't use distro packaging that it's OK to use an older cryptography which may not have security support. It'd be interesting if we could ensure backwards compatibility through testing while also ensuring that anything installed with pip gets a new enough version... but frankly, I don't know offhand what approach would be best to do that, and I don't have the time to pursue it myself so the status quo wins :D. Thanks for talking this through, I like when we're explicit about the motivations for what we do! -- Jay Faulkner Ironic PTL TC Member -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 10 19:14:30 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 10 Mar 2023 14:14:30 -0500 Subject: [neutron] bonding sriov nic inside VMs In-Reply-To: <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com> References: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com> <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com> Message-ID: Thank you Sean for the detailed explanation, I agree on LACP mode and I think Active-Standby would be a better and safer option for me. Yes, as you said telco is abusing many neutron rules and i think i am one of them because pretty much we are running telco applications :) As I said, it's a private cloud so I can break and bend rules to just make applications available 24x7. We don't have any multi-tenancy where I should be worried about security. Last question, Related MAC Address change because neutron doesn't allow change of Mac address correct so i have to set the same MAC Address on both sriov port. As per reference blog. On Fri, Mar 10, 2023 at 12:37?PM Sean Mooney wrote: > On Fri, 2023-03-10 at 11:54 -0500, Satish Patel wrote: > > Hi Sean, > > > > I have a few questions and they are in-line. This is the reference doc i > am > > trying to achieve in my private cloud - > > > https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html > ^ is only safe in a multi tenant envionment if > > https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.tenant_network_types > does not container vlan or flat. > > it is technially breaking neutron rules for how to use phsyents. > > in private cloud where tenatn isolation is not required operators have > abused this for years for things like selecting numa nodes > and many other usecase which are unsafe in a public cloud. > > > > > On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney wrote: > > > > > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote: > > > > Thanks Sean, > > > > > > > > I don't have NIC which supports hardware offloading or any kind of > > > feature. > > > > I am using intel nic 82599 just for SRIOV and looking for bonding > > > > support which is only possible inside VM. As you know we already run > a > > > > large SRIOV environment with openstack but my biggest issue is to > upgrade > > > > switches without downtime. I want to be more resilient to not worry > > > > about that. > > > > > > > > Do you still think it's dangerous or not a good idea to bond sriov > nic > > > > inside VM? what could go wrong here just trying to understand > before i > > > go > > > > crazy :) > > > lacp bond mode generaly dont work fully but you should be abel to get > > > basic failover bondign working > > > and perhaps tcp loadbalcing provide it does not require switch > coperator > > > to work form inside the guest. > > > > > > > What do you mean by not working fully? Are you talking about > active-active > > vs active-standby? > some lacp modes require configuration on the swtich others do not > you can only really do that form the pf as at the switch level you can > bring down > the port fo ronly some vlans in a failover case. > > https://docs.rackspace.com/blog/lacp-bonding-and-linux-configuration/ > > i belive mode 0, 1, 2, 5 and 6 can work withour sepcial switgh config. > > 3 and 4 i think reuqired switch cooperation > > IEEE 802.3ad (mode 4) in particalar i think neeed coperation with the > switch. > """The link is set up dynamically between two LACP-supporting peers.""" > https://en.wikipedia.org/wiki/Link_aggregation > > that peerign session can only really run on the PFs > > balance-tlb (5) and balance-alb(6) shoudl work fine for teh VFs in the > guest however. > > > > > > > > > > > just keep in mind that by defintion if you decalre a network as on a > > > seperate phsynet to another > > > then you as the operator are asserting that there is no l2 connectivity > > > between those networks. > > > > > > > > This is interesting why not both physnet have the same L2 segment? Are > you > > worried STP about the loop? But that is how LACP works both physical > > interfaces on the same segments. > if they are on the same l2 segment then there is no multi tancy when using > vlan or flat netowrks. > more on this below. > > > > > > > > > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan > 100 on > > > phsynet_2 > > > > > > > I did a test in the lab with physnet_1 and physnet_2 both on the same > VLAN > > ID in the same L2 domain and all works. > > if you create 2 neutron networks > > physnet_1_vlan_100 and physnet_2_vlan_100 > > and map phsynet_1 to eth1 and phsnet_2 to eth2 > and plug the both into the same TOR with vlan 100 trunked to both > > then boot one vm on physnet_1_vlan_100 and a second on physnet_2_vlan_100 > > then a few things will hapen. > > the vms will boot fine and both will get ips. > second there will be no isolation between the two networks > so if you use the same subnet on both then they will be able to direcly > ping each other. > > its unsafe to have teant cretable vlan networks in this if you have > overlaping vlan ranges between physnet_1 and physnet_2 > as there will be no tenant isolation enforeced at teh network level. > > form a neutron point of view physnet_1_vlan_100 and physnet_2_vlan_100 are > two entrily differnt netowrks and > its the oeprators responsiblity to ensure there network fabric ensure the > same vlan on two phsnets cant comunicate. > > > > > > > > > > > > if you break that and use phsynets to select PFs you are also breaking > > > neutron multi teancy model > > > meaning it is not safy to aloow end uers to create vlan networks and > > > instead you can only use provider created > > > vlan networks. > > > > > > > This is a private cloud and we don't have any multi-tenancy model. We > have > > all VLAN base providers and my Datacenter core router is the gateway for > > all my vlans providers. > ack in which case you can live with the fact that there is no mulit > taenancy > guarentees because the rules areound phsynets have been broken. > > this is prrety common in telco cloud by the way so you would not be the > first to do this. > > > > > > > > > > so what you want to do is proably achiveable but you menthion phsyntes > per > > > pf and that sounds like you are breaking > > > the physnets are seperate isolagged phsycial netowrks rule. > > > > > > > I can understand each physnet should be in a different tenant but in my > > case its vlan base provider and not sure what rules it's going to break. > each physnet does not need to be a diffent tenatn > the imporant thing is that neutron expects vlans on differnt physnets to > be allcoateable seperatly. > > so the same vlan on 2 phsynets logically represnet 2 differnt networks. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney > wrote: > > > > > > > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote: > > > > > > Folks, > > > > > > > > > > > > As you know, SR-IOV doesn't support bonding so the only solution > is > > > to > > > > > > implement LACP bonding inside the VM. > > > > > > > > > > > > I did some tests in the lab to create two physnet and map them > with > > > two > > > > > > physical nic and create VF and attach them to VM. So far all good > > > but one > > > > > > problem I am seeing is each neutron port I create has an IP > address > > > > > > associated and I can use only one IP on bond but that is just a > > > waste of > > > > > IP > > > > > > in the Public IP pool. > > > > > > > > > > > > Are there any way to create sriov port but without IP address? > > > > > techinially we now support adressless port in neutron and nova. > > > > > so that shoudl be possible. > > > > > if you tried to do this with hardware offloaed ovs rather then the > > > > > standard sriov with the sriov > > > > > nic agent you likel will need to also use the allowed_adress_pairs > > > > > extension to ensure that ovs did not > > > > > drop the packets based on the ip adress. if you are using > heriarcical > > > port > > > > > binding where you TOR is manged > > > > > by an ml2 driver you might also need the allowed_adress_pairs > extension > > > > > with the sriov nic agent to make sure > > > > > the packets are not drop at the swtitch level. > > > > > > > > > > as you likely arlready no we do not support VF bonding in > openstack or > > > > > bonded ports in general in then neutron api. > > > > > there was an effort a few years ago to make a bond port extention > that > > > > > mirror hwo trunk ports work > > > > > i.e. hanving 2 neutron subport and a bond port that agreates them > but > > > we > > > > > never got that far with > > > > > the design. that would have enabeld boning to be implemtned in > diffent > > > ml2 > > > > > driver like ovs/sriov/ovn ectra with > > > > > a consitent/common api. > > > > > > > > > > some people have used mellonox's VF lag functionalty howver that > was > > > never > > > > > actully enable propelry in nova/neutron > > > > > so its not officlaly supported upstream but that functional allow > you > > > to > > > > > attach only a singel VF to the guest form > > > > > bonded ports on a single card. > > > > > > > > > > there is no supprot in nova/neutron for that offically as i said it > > > just > > > > > happens to work unitnetionally so i would not > > > > > advise that you use it in produciton unless your happy to work > though > > > any > > > > > issues you find yourself. > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Mar 10 19:19:25 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 10 Mar 2023 19:19:25 +0000 Subject: cryptography min version (non-rust) through 2024.1 In-Reply-To: References: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org> Message-ID: <20230310191925.m5amjf3idd6mss2q@yuggoth.org> On 2023-03-10 10:39:53 -0800 (-0800), Jay Faulkner wrote: [...] > To be explicit, that testing platform means we require that > OpenStack and its dependencies are installable *in a virtualenv*, > not using distro python. Well, "sort of." Let's unpack those assertions: 1. "[we're testing] that OpenStack and its dependencies are installable in a virtualenv" Here I think you mean specifically its Python library dependencies, OpenStack has lots of (in many cases versioned) dependencies on non-Python components of a system and that's a big part of what we're checking by testing on multiple platforms. Where Python libraries are concerned, some may include non-Python "binary" extensions so it's not purely Python in the Python dependencies either. We also install some things like Javascript libraries from places other than the distribution (e.g. NPM) on those platforms. But we don't only test in virtualenvs/venvs either. Today at least, DevStack/Grenade jobs install a vast majority of those dependencies into the base system environment. That probably won't be the case in the future thanks to increasing adoption of the PEP 668 EXTERNALLY-MANAGED flag, but for now at least we're comingling pip-installed and distro-installed Python libraries in those jobs. 2. "[we're testing] OpenStack and its dependencies [...] not using distro python" This is definitely not true, but is maybe not what you meant. We definitely test with the exact build of Python interpreter and stdlib that these platforms ship, backported fixes and all. That's also a major reason why we test on multiple platforms, so that we can know our software will work with the Python that's provided on those platforms, which is how many of our users will be running our software too. > So while this is a small imitation for this case (we cannot use > cryptography that might require newer tooling to build than that > ubuntu LTS would), the real motivation behind holding onto a > lower-constraint for a longer time is in partnership with stable > distros that ship OpenStack, who we don't want to alienate by > giving them extra work of using newer releases than they are ready > to. Yes, more precisely these distros are not likely to backport an entire new Rust toolchain just so that new versions of OpenStack can be installed. Instead, they're going to patch the new versions of OpenStack so that they work with older libraries that don't require an entire new Rust toolchain, because that's the easier of the two options. If we can avoid requiring them to do either of those things, obviously it's even nicer for them. If they're going to do the second thing anyway, we could just say "hey send us those patches and we'll consider them bug fixes, because we want people to be able to easily use OpenStack on stable server platforms." > I am OK with this approach; but there is a trade-off: we may be > leading OpenStack consumers who don't use distro packaging that > it's OK to use an older cryptography which may not have security > support. I'd argue that we already do this, because our stable branches use frozen-in-time constraints lists specifying obsolete versions of dependencies which are not receiving upstream security support. The intent is that we only use those to test backported fixes without destabilizing our CI jobs, but some of our deployment projects even still build container images or deploy packages from PyPI based on what's in those frozen constraints lists, much to my dismay. > It'd be interesting if we could ensure backwards compatibility > through testing while also ensuring that anything installed with > pip gets a new enough version... but frankly, I don't know offhand > what approach would be best to do that, and I don't have the time > to pursue it myself so the status quo wins :D. [...] My preference would be to just keep testing with the latest version of PYCA/Cryptography from PyPI in our master branch jobs, and tell people who want to use OpenStack on stable server distributions that we'll accept bug reports and review their fixes if we start doing something that the python3-cryptography package on those distributions doesn't support. That's just one approach though, there are certainly other ways to go about it. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From alsotoes at gmail.com Fri Mar 10 20:48:34 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Fri, 10 Mar 2023 14:48:34 -0600 Subject: [manila] support for encryption In-Reply-To: References: Message-ID: You mean data or token encryption? Cheers! On Fri, Mar 10, 2023 at 9:08?AM garcetto wrote: > good afternoon, > does manila support encryption in some sort ? > > thank you > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Mar 10 20:55:49 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 10 Mar 2023 12:55:49 -0800 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: References: <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com> <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> Message-ID: <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> ---- On Wed, 22 Feb 2023 10:13:32 -0800 James Slagle wrote --- > On Wed, Feb 22, 2023 at 12:43 PM Ghanshyam Mann gmann at ghanshyammann.com> wrote: > > Hi James, > > > > Just checking if you got a chance to discuss this with the TripleO team? > > Yes, I asked folks to reply here if there are any volunteers for > stable/zed maintenance, or any other feedback about the approach. I do > not personally know of any volunteers. Ok. We discussed the stable/zed case in the TC meeting and decided[1] to keep stable/zed as 'supported but no maintainers' (will update this information in stable/zed README.rst file). For the master branch, you can follow the normal deprecation process mentioned in the project-team-guide[2]. I have proposed step 1 in governance to mark it deprecated, please check and we need PTL +1 on that. - https://review.opendev.org/c/openstack/governance/+/877132 NOTE: As this is deprecated and not retired yet, we still need PTL nomination for TrilpeO[3] [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.log.html#l-256 [2] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository [3] https://etherpad.opendev.org/p/2023.2-leaderless#L26 -gmann' > > -- > -- James Slagle > -- > From tomas.bredar at gmail.com Fri Mar 10 23:02:20 2023 From: tomas.bredar at gmail.com (=?UTF-8?B?VG9tw6HFoSBCcmVkw6Fy?=) Date: Sat, 11 Mar 2023 00:02:20 +0100 Subject: [ovn] safely change bridge_mappings In-Reply-To: References: Message-ID: Hi Rodolfo, you helped a lot. I managed configure this, manually. Just for future reference let me write down what I did. - First I already had the interface br-ex2 configured and correctly assigned physical interfaces in it - I added the bridge mappings to the OVN DB: ovs-vsctl set open . external-ids:ovn-bridge-mappings=datacentre:br-ex,m-storage:br-ex2 - I added my nw m-storage to ml2_conf.ini: [ml2_type_vlan] network_vlan_ranges=datacentre:1:2700,m-storage:3700:4000 [ml2_type_flat] flat_networks=datacentre,m-storage - I restarted the neutron service - since I already had the m-storage nw created in openstack, but as provider "datacenter" and I already had instance ports using it (but it was not working), I had to create a new network and subnet. Delete the original ports and recreate and reassign it to the instances. If I may, now I have two questions: 1. Shouldn't I also define this in ml2_conf.ini [ovs] bridge_mappings = datacentre:br-ex,m-storage:br-ex2 or is the setting of the vswitch register via ovs-vsctl persistent between redeployments or reboots? 2. Which parameters in tripleo-heat-templates sets the above ml2_conf.ini? I found these params: NeutronFlatNetworks NeutronNetworkVLANRanges NeutronBridgeMappings Thanks for your help Tomas ut 7. 3. 2023 o 10:13 Rodolfo Alonso Hernandez nap?sal(a): > Hello Tom??: > > You need to follow the steps in [1]: > * You need to create the new physical bridge "br-ex2". > * Then you need to add to the bridge the physical interface. > * In the compute node you need to add the bridge mappings to the OVN > database Open vSwitch register > * In the controller, you need to add the reference for this second > provider network in "flat_networks" and "network_vlan_ranges" (in the > ml2.ini file). Then you need to restart the Neutron server to read these > new parameters (this step is not mentioned in this link). > $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini > [ml2_type_flat] > flat_networks = public,public2 > [ml2_type_vlan] > network_vlan_ranges = public:11:200,public2:11:200 > > Regards. > > [1] > https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html > > On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r > wrote: > >> Hi, >> >> I have a running production OpenStack deployment - version Wallaby >> installed using TripleO. I'm using the default OVN/OVS networking. >> For provider networks I have two bridges on the compute nodes br-ex and >> br-ex2. Instances mainly use br-ex for provider networks, but there are >> some instances which started using a provider network which should be >> mapped to br-ex2, however I didn't specify "bridge_mappings" on >> ml2_conf.ini, so the traffic wants to flow through the default >> datacentre:br-ex. >> My questions is, what services should I restart on the controller and >> compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And >> if this operation is safe and if the instances already using br-ex will >> lose connectivity? >> >> Thanks for your help >> >> Tomas >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fsmith at techwiki.info Fri Mar 10 23:22:56 2023 From: fsmith at techwiki.info (Frank Smith) Date: Fri, 10 Mar 2023 16:22:56 -0700 Subject: bare metal provisioner replacement for Fuel Message-ID: Hey all, quick question: Previously I would use Fuel to deploy Openstack. It was a good tool, as it built a deployment server where I would see new, un-provisioned servers show up in the list, and I could choose the roles, including stacking roles, onto the servers and then provision the whole cluster. There were no complex yaml or JSON files to deal with and it would retry the deployment on an error state. When done, I had a nice OpenStack cluster, a Ceph backend, and life was good. Now it seems Fuel has been retired. What is comparable to the abilities of Fuel today? What are people using for such deployments now? I get that devstack, microstack and packstack are great for all-in-one installs, but Fuel was great for deploying a whole rack of servers from bare metal. I am not asking for any free training here, but instead asking for info on a similar product, if there is one, so I can learn that and be OpenStack productive once more. Thank you for any help, --Francis Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Sat Mar 11 00:24:47 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Fri, 10 Mar 2023 18:24:47 -0600 Subject: bare metal provisioner replacement for Fuel In-Reply-To: References: Message-ID: Hey Frank, Take a look at kolla-ansible [1] and then OSA [2] 1.- https://docs.openstack.org/kolla-ansible/latest/ 2.- https://docs.openstack.org/openstack-ansible/latest/ IMHO, kolla-ansible will be the best one to use =) Cheers! On Fri, Mar 10, 2023 at 5:59?PM Frank Smith wrote: > Hey all, quick question: > > Previously I would use Fuel to deploy Openstack. It was a good tool, as it > built a deployment server where I would see new, un-provisioned servers > show up in the list, and I could choose the roles, including stacking > roles, onto the servers and then provision the whole cluster. There were no > complex yaml or JSON files to deal with and it would retry the deployment > on an error state. When done, I had a nice OpenStack cluster, a Ceph > backend, and life was good. > > Now it seems Fuel has been retired. What is comparable to the abilities of > Fuel today? What are people using for such deployments now? I get > that devstack, microstack and packstack are great for all-in-one installs, > but Fuel was great for deploying a whole rack of servers from bare metal. I > am not asking for any free training here, but instead asking for info on a > similar product, if there is one, so I can learn that and be OpenStack > productive once more. > > Thank you for any help, > --Francis Smith > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamrankhadijadj at gmail.com Sat Mar 11 21:25:58 2023 From: kamrankhadijadj at gmail.com (Khadija) Date: Sun, 12 Mar 2023 02:25:58 +0500 Subject: [Outreachy] Setting up development envirenment Message-ID: Hi Sofia! I have made myself familiar with the launchpad, storyboard and gerrit. I have also experimented in the Sandbox projects. I was trying to set up my development environment following https://docs.openstack.org/cinder/latest/contributor/development.environment.html but I was unable to run unit tests, on running command 'tox -e py3' I get error saying AttributeError: module 'py' has no attribute 'io' Kindly help me with this so that I can start working on my first issue :) Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Sat Mar 11 21:41:23 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Sat, 11 Mar 2023 13:41:23 -0800 Subject: bare metal provisioner replacement for Fuel In-Reply-To: References: Message-ID: Echoing and enhancing what Alvaro said, kolla-ansible has great docs on deploying Bifrost+Ironic: https://docs.openstack.org/kolla-ansible/latest/reference/deployment-and-bootstrapping/bifrost.html . I don't know what you class as "complex YAML or JSON"; bifrost does require an inventory file with BMC information and credentials at a minimum. Bifrost is kinda a hidden gem in OpenStack; give it a shot. If there's a specific reason it doesn't fit your use case, or you encounter a problem if you choose to try it, please close the loop and let us know! Thanks! Good luck with your project. - Jay Faulkner Ironic PTL OpenStack TC Member On Fri, Mar 10, 2023 at 4:08?PM Frank Smith wrote: > Hey all, quick question: > > Previously I would use Fuel to deploy Openstack. It was a good tool, as it > built a deployment server where I would see new, un-provisioned servers > show up in the list, and I could choose the roles, including stacking > roles, onto the servers and then provision the whole cluster. There were no > complex yaml or JSON files to deal with and it would retry the deployment > on an error state. When done, I had a nice OpenStack cluster, a Ceph > backend, and life was good. > > Now it seems Fuel has been retired. What is comparable to the abilities of > Fuel today? What are people using for such deployments now? I get > that devstack, microstack and packstack are great for all-in-one installs, > but Fuel was great for deploying a whole rack of servers from bare metal. I > am not asking for any free training here, but instead asking for info on a > similar product, if there is one, so I can learn that and be OpenStack > productive once more. > > Thank you for any help, > --Francis Smith > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincentlee676 at gmail.com Sun Mar 12 04:33:30 2023 From: vincentlee676 at gmail.com (vincent lee) Date: Sat, 11 Mar 2023 22:33:30 -0600 Subject: Member access calendar Message-ID: Hi all, I am using kolla-ansible for openstack deployment in yoga version. Is it possible to allow normal user (member role) to view the calendar graph under reservation (lease)? Best Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Sun Mar 12 10:09:53 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sun, 12 Mar 2023 11:09:53 +0100 Subject: [neutron] detecting l3-agent readiness Message-ID: Hi folks, I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync.. Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question? Thanks Mohammed -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincentlee676 at gmail.com Sun Mar 12 12:18:08 2023 From: vincentlee676 at gmail.com (vincent lee) Date: Sun, 12 Mar 2023 07:18:08 -0500 Subject: Allow user access calendar Message-ID: Hi all, I am using kolla-ansible for openstack deployment in yoga version. Is it possible to allow normal user (member role) to view the calendar graph under reservation (lease)? Best Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamrankhadijadj at gmail.com Sun Mar 12 13:30:34 2023 From: kamrankhadijadj at gmail.com (Khadija Kamran) Date: Sun, 12 Mar 2023 18:30:34 +0500 Subject: [Outreachy] Setting up development envirenment In-Reply-To: References: Message-ID: On Sat, Mar 11, 2023 at 10:28:33PM +0000, Sofia Enriquez wrote: > ? Hi, > > I think you clone the cinder repository and try to run tox. Have you > install tox? > https://pypi.org/project/tox/ > > If you installed tox and this seeing a error: Please copy and paste the > full error on > https://paste.openstack.org/ and share the link here. > > Cheers, > Sofia > Hey Sofia! Yes I have installed tox. Here is the link to the error: https://paste.openstack.org/show/bQBHKrMiphG75uWHjkRj/ Kindly look into this. Thank you for your time :) > El El s?b, 11 mar 2023 a las 21:26, Khadija > escribi?: > > > Hi Sofia! > > I have made myself familiar with the launchpad, storyboard and gerrit. I > > have also experimented in the Sandbox projects. > > I was trying to set up my development environment following > > https://docs.openstack.org/cinder/latest/contributor/development.environment.html > > but I was unable to run unit tests, on running command 'tox -e py3' I get > > error saying AttributeError: module 'py' has no attribute 'io' > > Kindly help me with this so that I can start working on my first issue :) > > Thank you! > > > -- > Sofia Enriquez From thomas at goirand.fr Sat Mar 11 08:55:45 2023 From: thomas at goirand.fr (Thomas Goirand) Date: Sat, 11 Mar 2023 09:55:45 +0100 Subject: bare metal provisioner replacement for Fuel Message-ID: An HTML attachment was scrubbed... URL: From lsofia.enriquez at gmail.com Sat Mar 11 22:28:33 2023 From: lsofia.enriquez at gmail.com (Sofia Enriquez) Date: Sat, 11 Mar 2023 22:28:33 +0000 Subject: [Outreachy] Setting up development envirenment In-Reply-To: References: Message-ID: ? Hi, I think you clone the cinder repository and try to run tox. Have you install tox? https://pypi.org/project/tox/ If you installed tox and this seeing a error: Please copy and paste the full error on https://paste.openstack.org/ and share the link here. Cheers, Sofia El El s?b, 11 mar 2023 a las 21:26, Khadija escribi?: > Hi Sofia! > I have made myself familiar with the launchpad, storyboard and gerrit. I > have also experimented in the Sandbox projects. > I was trying to set up my development environment following > https://docs.openstack.org/cinder/latest/contributor/development.environment.html > but I was unable to run unit tests, on running command 'tox -e py3' I get > error saying AttributeError: module 'py' has no attribute 'io' > Kindly help me with this so that I can start working on my first issue :) > Thank you! > -- Sofia Enriquez -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sun Mar 12 13:52:07 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 12 Mar 2023 13:52:07 +0000 Subject: [Outreachy] Setting up development envirenment In-Reply-To: References: Message-ID: <20230312135207.lah4kpi2xh3getbg@yuggoth.org> On 2023-03-12 18:30:34 +0500 (+0500), Khadija Kamran wrote: [...] > Yes I have installed tox. > Here is the link to the error: > https://paste.openstack.org/show/bQBHKrMiphG75uWHjkRj/ > Kindly look into this. [...] Based on the slew of bug reports I found, this looks like an incompatibility between the latest versions of tox and pytest. Unfortunately there's a lot of finger-pointing, and the suggestions from their respective upstreams are to either uninstall one of the two packages, or pin one of them to an older version, or use an isolated environment such as a venv instead of using `sudo pip install ...` in the system Python's context. You could try one of the following: sudo pip install tox py sudo pip install tox 'pytest<7.2' sudo pip uninstall pytest ; sudo pip install tox Ultimately, though, we should be teaching people to install and run tox and similar tools with their distribution package manager or in a venv rather than running `sudo pip install ...` since PEP 668 is starting to disallow that workflow on newer platforms anyway. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From kamrankhadijadj at gmail.com Sun Mar 12 14:10:44 2023 From: kamrankhadijadj at gmail.com (Khadija Kamran) Date: Sun, 12 Mar 2023 19:10:44 +0500 Subject: [Outreachy] Setting up development envirenment In-Reply-To: <20230312135207.lah4kpi2xh3getbg@yuggoth.org> References: <20230312135207.lah4kpi2xh3getbg@yuggoth.org> Message-ID: Hey Jeremy! Thank you for the reply. I have tried all the above commands and it still doesn't seem to work. Also, I am using PyCharm with a venv using python3.9 Regards, Khadija From fungi at yuggoth.org Sun Mar 12 14:20:18 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 12 Mar 2023 14:20:18 +0000 Subject: [Outreachy] Setting up development envirenment In-Reply-To: References: <20230312135207.lah4kpi2xh3getbg@yuggoth.org> Message-ID: <20230312142018.3ohzzex2f5pospp4@yuggoth.org> On 2023-03-12 19:10:44 +0500 (+0500), Khadija Kamran wrote: [...] > I have tried all the above commands and it still doesn't seem to > work. By "doesn't work" do you mean you get the exact same error message, or did you get different errors? > Also, I am using PyCharm with a venv using python3.9 I don't know much about PyCharm (someone with more familiarity would need to chime in on how it might influence this situation, if at all), but if you're using `sudo pip install tox` as indicated in your earlier paste then that's definitely not installing tox into a venv of any kind, as evidenced by the traceback you pasted referencing modules in /usr/lib/python3.10 instead of a venv path. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From kamrankhadijadj at gmail.com Sun Mar 12 16:08:37 2023 From: kamrankhadijadj at gmail.com (Khadija Kamran) Date: Sun, 12 Mar 2023 21:08:37 +0500 Subject: [Outreachy] Setting up development envirenment In-Reply-To: <20230312142018.3ohzzex2f5pospp4@yuggoth.org> References: <20230312135207.lah4kpi2xh3getbg@yuggoth.org> <20230312142018.3ohzzex2f5pospp4@yuggoth.org> Message-ID: On Sun, Mar 12, 2023 at 02:20:18PM +0000, Jeremy Stanley wrote: > On 2023-03-12 19:10:44 +0500 (+0500), Khadija Kamran wrote: > [...] > > I have tried all the above commands and it still doesn't seem to > > work. > Hi Jeremy, The command runs successfully now. Yes, I was getting the exact same errors. But it worked when I restarted the IDE. Thank you for your time. Regards, Khadija > By "doesn't work" do you mean you get the exact same error message, > or did you get different errors? > > > Also, I am using PyCharm with a venv using python3.9 > > I don't know much about PyCharm (someone with more familiarity would > need to chime in on how it might influence this situation, if at > all), but if you're using `sudo pip install tox` as indicated in > your earlier paste then that's definitely not installing tox into a > venv of any kind, as evidenced by the traceback you pasted > referencing modules in /usr/lib/python3.10 instead of a venv path. > -- > Jeremy Stanley From benjaminfaruna at gmail.com Sun Mar 12 21:03:34 2023 From: benjaminfaruna at gmail.com (Benjamin Faruna) Date: Sun, 12 Mar 2023 22:03:34 +0100 Subject: [outreachy][cinder] Cannot access code repository Message-ID: Hello, good day, I was selected for the outreachy internship program and I want to contribute to cinders' codebase, but I am having trouble accessing it. Whenever I open the code tab on launchpad I get 2 repositories that haven't been updated in a while but the main page shows a lot of activities on the codebase. Please I want to request help setting up and getting started making contributions. The images of what I see is attached to this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cinder code.png Type: image/png Size: 101436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cinder2.png Type: image/png Size: 257583 bytes Desc: not available URL: From fungi at yuggoth.org Sun Mar 12 21:17:49 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 12 Mar 2023 21:17:49 +0000 Subject: [outreachy][cinder] Cannot access code repository In-Reply-To: References: Message-ID: <20230312211748.vhxztfd2yroj5lxl@yuggoth.org> On 2023-03-12 22:03:34 +0100 (+0100), Benjamin Faruna wrote: [...] > I want to contribute to cinders' codebase, but I am having trouble > accessing it. Whenever I open the code tab on launchpad I get 2 > repositories that haven't been updated in a while but the main > page shows a lot of activities on the codebase. [...] Like many OpenStack projects, Cinder uses Launchpad for defect tracking. OpenStack projects (including Cinder) do not use Launchpad for code hosting however, they use the OpenDev Collaboratory. Please see the OpenStack Code & Documentation Contributor Guide for details: https://docs.openstack.org/contributors/code-and-documentation/ -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jamesleong123098 at gmail.com Mon Mar 13 03:36:32 2023 From: jamesleong123098 at gmail.com (James Leong) Date: Sun, 12 Mar 2023 22:36:32 -0500 Subject: [Horizon]: allow user to access calendar on horizon Message-ID: Hi all, I am using kolla-ansible for OpenStack deployment in the yoga version. Is it possible to allow the user (member role) to view the calendar graph under the reservation tab (lease)? Currently, only the admin will be able to view the calendar graph with all the reserved leases. However, a user with other roles cannot load the calendar information. On the dashboard, I saw it displayed " Unable to load reservations." When I look into the log file, I get the below error message. "blazarclient.exception.BlazarClientException: ERROR: Policy doesn't allow blazar:oshosts:get to be performed." Is there a way to allow the policy? Thanks for your help. James -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Mar 13 06:15:54 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 13 Mar 2023 11:45:54 +0530 Subject: [cinder] proposing Jon Bernard for cinder core In-Reply-To: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com> References: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com> Message-ID: It's been a week and having heard no objections, I have added Jon Bernard to the cinder-core team. Jon, you should see a +2 and +W option in your review now. Welcome to the team! On Mon, Mar 6, 2023 at 7:32?PM Jay Bryant wrote: > No objections from me! I think Jon would be a great addition! > > Thanks, > > Jay > > On 3/3/2023 5:04 AM, Rajat Dhasmana wrote: > > Hello everyone, > > > > I would like to propose Jon Bernard as cinder core. Looking at the > > review stats > > for the past 60[1], 90[2], 120[3] days, he has been consistently in > > the top 5 > > reviewers with a good +/- ratio and leaving helpful comments > > indicating good > > quality of reviews. He has been managing the stable branch releases > > for the > > past 2 cycles (Zed and 2023.1) and has helped in releasing security > > issues as well. > > > > Jon has been part of the cinder and OpenStack community for a long > > time and > > has shown very active interest in upstream activities, be it release > > liaison, review > > contribution, attending cinder meetings and also involving in > > outreachy activities. > > He will be a very good addition to our team helping out with the > > review bandwidth > > and adding valuable input in our discussions. > > > > I will leave this thread open for a week and if there are no > > objections, I will add > > Jon Bernard to the cinder core team. > > > > [1] > > > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 > > < > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 > > > > [2] > > > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 > > < > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 > > > > [3] > > > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 > > < > https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 > > > > > > Thanks > > Rajat Dhasmana > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Mar 13 07:13:41 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 13 Mar 2023 12:43:41 +0530 Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) PTG Planning In-Reply-To: References: Message-ID: REMINDER! We have PTG in less than 2 weeks and only 1 topic has been added (apart from mine). Please add topics as soon as possible since it takes time to arrange them based on different parameters, like, availability of author, context of the topics, allocating a driver discussion day or not etc. Thanks Rajat Dhasmana On Tue, Mar 7, 2023 at 4:30?PM Rajat Dhasmana wrote: > Hello All, > > The 2023.2 (Bobcat) virtual PTG is approaching and will be held between > 27-31 March, 2023. > I've created a planning etherpad[1] and a PTG etherpad[2] to gather topics > for the PTG. > Note that you only need to add topics in the planning etherpad and those > will be arranged > in the PTG etherpad later. > > Dates: Tuesday (28th March) to Friday (31st March) 2023 > Time: 1300 to 1700 UTC > Etherpad: https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning > > Please add the topics as early as possible as finalizing and arranging > topics would require some > buffer time. > > [1] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning > [2] https://etherpad.opendev.org/p/bobcat-ptg-cinder > > Thanks > Rajat Dhasmana > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Mon Mar 13 07:39:20 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 13 Mar 2023 16:39:20 +0900 Subject: [heat][PTG] 2023.2 (Bobcat) PTG Planning Message-ID: Hello, I've signed up for the upcoming virtual PTG so that we can have some slots for Heat discussion. In case you are interested in attending the sessions or have any topics you want to discuss, please put your name and the proposed topics in the etherpad. https://etherpad.opendev.org/p/march2023-ptg-heat-planning It'd be nice if we can update the planning etherpad this week so that I'll fix our slots and topics early next week. Thank you, Takashi Kajinami -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Mar 13 08:18:00 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 13 Mar 2023 13:48:00 +0530 Subject: [outreachy][cinder] In-Reply-To: References: Message-ID: Hi Desire, I'm the co-mentor for the "Extend automated validation of API" project. Good to know you have interest in the project. You can contact Sofia or me on IRC in the #openstack-cinder channel if you have any doubts/queries regarding the onboarding process. If you're not on IRC, then that would be my first recommendation to configure IRC, connect to OFTC network and join the #openstack-cinder channel, you can find Cinder team members and other outreachy applicants also there. IRC nicks: Rajat: whoami-rajat Sofia: enriquetaso Thanks Rajat Dhasmana On Thu, Mar 9, 2023 at 12:22?PM Desire Barine wrote: > Hello Sofia Enriquez, > > I'm Desire Barine, an Outreachy applicant. I would love to work on Extend > automated validation of API reference request/response samples project. I > would like to get started with the contribution. > I am currently going over the instructions on contributions given. This is > my first time contributing on an open source project but I'm really > excited to get started. > I'm proficient in python, bash and have worked on Rest api creation > before. I would love to hear from you. > > Desire. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Mon Mar 13 08:26:13 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Mon, 13 Mar 2023 13:56:13 +0530 Subject: [cinder] Openstack-ansible and cinder GPFS backend In-Reply-To: References: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se> Message-ID: Hi, On Thu, Mar 9, 2023 at 5:09?PM Dmitriy Rabotyagov wrote: > Hi, Nicolas, > > No, we don't really maintain documentation for each cinder driver > that's available. So we assume using an override variable for > adjustment of cinder configuration to match the desired state. > > So basically, you can use smth like that in your user_variables.yml: > > cinder_backends: > GPFSNFS: > volume_backend_name: GPFSNFS > volume_driver: cinder.volume.drivers.ibm.gpfs.GPFSNFSDriver > > cinder_cinder_conf_overrides: > DEFAULT: > gpfs_hosts: ip.add.re.ss > gpfs_storage_pool: cinder > gpfs_images_share_mode: copy_on_write > .... > > I have no idea though if gpfs_* variables can be defined or not inside > the backend section, as they're referenced in DEFAULT in docs. But > overrides will work regardless. > > Backend related configuration should always go in the [BACKEND] section and not in the [DEFAULT] section so the documentation needs to be corrected for GPFS. > ??, 9 ???. 2023??. ? 11:41, Nicolas Melot : > > > > Hi, > > > > I can find doc on using various backends for cinder > > ( > https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm > ) > > and some documentation to configure a GPFS backend for cinder > > ( > https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html > ) > > but I cannot find any documentation to deploy cinder with GPFS backend > > using openstack-ansible. Does this exist at all? Is there any > > documentation? > > > > /Nicolas > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.rohmann at inovex.de Mon Mar 13 08:43:40 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 13 Mar 2023 09:43:40 +0100 Subject: [keystone] Best-practice for (admin) endpoint config Message-ID: Hello Openstack-discuss, I am wondering what the current recommendations / best-practices are in regards to the endpoints of keystone (and other services). There are three types of endpoints: public, internal and admin: ?* "public" certainly is for API access done by cloud users - so measures like rate limiting are likely in place. ?* "internal" is an alternative URL to be used by other services. Sounds reasonable to have an alternate path for the internal ? ?? communication of OpenStack services, like having a management network ???? Also reading https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-recommendations.html this ???? seems to be the recommendation. ?* "admin" - This is the one that gives me a little headache. According to commits ?1) https://opendev.org/openstack/keystone/commit/4ec69218454d9f8be7150e2cee50c28765d50c94 ?2) https://github.com/openstack/keystone/commit/ecf721a3c176daf67d00536c48e80e78bded1af6 there should actually be no admin endpoint for Keystone anymore. Or should there? But looking at openstack-ansible doing the endpoint config (https://opendev.org/openstack/openstack-ansible-os_keystone/src/commit/a020ff87cde136a5c507b2cdc719d8c1dd85824d/tasks/main.yml#L246) all tree types are still configured? Backwards compatibility for services still expecting this endpoint? 1) Ceilometer - https://bugs.launchpad.net/ceilometer/+bug/1981207 2) Heat - https://review.opendev.org/c/openstack/openstacksdk/+/777343 Apart from Keystone also other services have "admin" endpoints which can be configured and placed as such into the service catalog. What is the reasoning behind that? Thanks and with kind regards, Christian From garcetto at gmail.com Mon Mar 13 09:30:55 2023 From: garcetto at gmail.com (garcetto) Date: Mon, 13 Mar 2023 10:30:55 +0100 Subject: [manila] create snapshot from share not permitted Message-ID: good morning, i am using manila and generic driver with dhss true, but cannot create snapshot from shares, any help? where can i look at? (cinder backend is a linux nfs server) thank you $ manila snapshot-create share-01 --name Snapshot1 ERROR: Snapshots cannot be created for share '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that capability. (HTTP 422) (Request-ID: req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba) -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Mon Mar 13 09:38:39 2023 From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=) Date: Mon, 13 Mar 2023 09:38:39 +0000 Subject: [neutron] Openstack Network Interconnection In-Reply-To: References: Message-ID: Hi Roberto, yea then i guess ovn-interconnect sounds like the more correct solution. We are also aming for that for similar reasons. Our idea for now to handle the creation of the Transit Logical Switches outside of neutron (as otherwise one neutron would rule over other neutrons). As the transit switches are then created in the individual ovn deployments we thought about treating them as provider networks. So the creation flow would be: 1. Create a transit switch on the ic-northbound 2. Wait for it to replicate to all ovn deployments 3. Create the provider networks on the neutron sides with a new `provider-network-type` and `provider-phyiscal-network` set to the transit switch name So we would probably only be interested in the provider-network-type and not in the handling of the transit switches themselves in netron. -- Felix Huettner > Hi Felix, > > Thanks for your feedback. > > The ovn-bgp-agent is a very powerful application to interconnect multi-tenancy networks using BGP evpn type 5. This application integrates the br-ext with FRR and provides the interconnect using the BGP session. That would be one way to do it, but the problem is that bgpvpn service plugin is only integrated with Neutron. Imagine in the future that we need to integrate the tenant network between different cloud solutions (e.g using OpenStack, Kubernetes, LXD, etc.)... this could be possible if everyone uses OVN as a network backend and ovn-ic to interconnect the LRPs between AZs. > > Maybe I'm missing some point and there's no community interest in something like that. But back to the OpenStack/Neutron case, it might be interesting to continue the work on Neutron interconnect (or something like that), but maybe this time with the service plugin for ovn-ic. > > Regards, > Roberto > > > Em qui., 9 de mar. de 2023 ?s 05:24, Felix H?ttner escreveu: > > Hi Roberto, > > > > We will face a similar issue in the future and have also looked at ovn-interconnect (but not yet tested it). > > There is also ovn-bgp-agent [1] which has an evpn mode that might be relevant. > > > > Whatever you find I would definitely be interested in your results > > > > [1] https://opendev.org/x/ovn-bgp-agent > > > > -- > > Felix Huettner > > > > > From: Roberto Bartzen Acosta > > > Sent: Wednesday, March 8, 2023 9:49 PM > > > To: openstack-discuss at lists.openstack.org > > > Cc: Tiago Pires > > > Subject: [neutron] Openstack Network Interconnection > > > > > > Hey folks. > > > > > > Does anyone have ideas on how to interconnect different Openstack deployments? > > > Consider that we have multiple Datacenters and need to interconnect tenant networks. How could this be done in the context of OpenStack (without using VPN) ? > > > > > > We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks like a great solution to create a network layer between DCs/AZs with the help of the OVN driver. However, Neutron does not support the Transit Switches (OVN-IC design) that are required for this application. > > > > > > We've seen references to abandoned projects like [1] [2] [3]. > > > > > > Does anyone use something similar in production or have an idea about how to do it? Imagine that we need to put workloads on two different AZs that run different Openstack installations, and we want to communicate with the local networks without using a FIP. > > > I believe that the most coherent way to maintain databases consistent in each Openstack would be an integration with Neutron, but I haven't seen any movement on that. > > > > > > Regards, > > > Roberto > > > > > > [1] https://www.youtube.com/watch?v=GizLmSiH1Q0 > > > [2] https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html > > > [3] https://opendev.org/x/neutron-interconnection Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From senrique at redhat.com Mon Mar 13 09:45:07 2023 From: senrique at redhat.com (Sofia Enriquez) Date: Mon, 13 Mar 2023 09:45:07 +0000 Subject: [outreachy][cinder] Cannot access code repository In-Reply-To: <20230312211748.vhxztfd2yroj5lxl@yuggoth.org> References: <20230312211748.vhxztfd2yroj5lxl@yuggoth.org> Message-ID: As Jeremy mentioned, we use OpenDev [1]. You may be more familiar with GitHub to host the source code. However, you can find an openstack/cinder repository on GitHub; please don't push there since it's only a mirror. The source code of Cinder is at https://opendev.org/openstack/cinder, and you can `git clone` and `git review` from there. Best, Sofia [1] https://docs.openstack.org/contributors/code-and-documentation/using-gerrit.html On Sun, Mar 12, 2023 at 9:20?PM Jeremy Stanley wrote: > On 2023-03-12 22:03:34 +0100 (+0100), Benjamin Faruna wrote: > [...] > > I want to contribute to cinders' codebase, but I am having trouble > > accessing it. Whenever I open the code tab on launchpad I get 2 > > repositories that haven't been updated in a while but the main > > page shows a lot of activities on the codebase. > [...] > > Like many OpenStack projects, Cinder uses Launchpad for defect > tracking. OpenStack projects (including Cinder) do not use Launchpad > for code hosting however, they use the OpenDev Collaboratory. > > Please see the OpenStack Code & Documentation Contributor Guide for > details: > > https://docs.openstack.org/contributors/code-and-documentation/ > > -- > Jeremy Stanley > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Mon Mar 13 10:27:45 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 13 Mar 2023 11:27:45 +0100 Subject: [ovn] safely change bridge_mappings In-Reply-To: References: Message-ID: Hi Tom??: The ml2 config option "bridge_mappings" is used by ML2/OVS. The mechanism driver ML2/OVN read this config from the local OVS DB, reading the "ovn-bridge-mappings" option from the Open vSwitch register. In other words, this is not needed in ML2/OVN. NeutronFlatNetworks --> ml2_type_flat.flat_networks NeutronNetworkVLANRanges --> ml2_type_vlan.network_vlan_ranges NeutronBridgeMappings --> that will set "ovn_bridge_mappings" on the OVS register during the installation (when using ML2/OVN) Regards. On Sat, Mar 11, 2023 at 12:02?AM Tom?? Bred?r wrote: > Hi Rodolfo, > > you helped a lot. I managed configure this, manually. Just for future > reference let me write down what I did. > - First I already had the interface br-ex2 configured and correctly > assigned physical interfaces in it > - I added the bridge mappings to the OVN DB: > ovs-vsctl set open . > external-ids:ovn-bridge-mappings=datacentre:br-ex,m-storage:br-ex2 > > - I added my nw m-storage to ml2_conf.ini: > [ml2_type_vlan] > network_vlan_ranges=datacentre:1:2700,m-storage:3700:4000 > > [ml2_type_flat] > flat_networks=datacentre,m-storage > > - I restarted the neutron service > - since I already had the m-storage nw created in openstack, but as > provider "datacenter" and I already had instance ports using it (but it was > not working), I had to create a new network and subnet. Delete the original > ports and recreate and reassign it to the instances. > > If I may, now I have two questions: > 1. Shouldn't I also define this in ml2_conf.ini > [ovs] > bridge_mappings = datacentre:br-ex,m-storage:br-ex2 > > or is the setting of the vswitch register via ovs-vsctl persistent between > redeployments or reboots? > > 2. Which parameters in tripleo-heat-templates sets the above ml2_conf.ini? > I found these params: > NeutronFlatNetworks > NeutronNetworkVLANRanges > NeutronBridgeMappings > > Thanks for your help > > Tomas > > > ut 7. 3. 2023 o 10:13 Rodolfo Alonso Hernandez > nap?sal(a): > >> Hello Tom??: >> >> You need to follow the steps in [1]: >> * You need to create the new physical bridge "br-ex2". >> * Then you need to add to the bridge the physical interface. >> * In the compute node you need to add the bridge mappings to the OVN >> database Open vSwitch register >> * In the controller, you need to add the reference for this second >> provider network in "flat_networks" and "network_vlan_ranges" (in the >> ml2.ini file). Then you need to restart the Neutron server to read these >> new parameters (this step is not mentioned in this link). >> $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini >> [ml2_type_flat] >> flat_networks = public,public2 >> [ml2_type_vlan] >> network_vlan_ranges = public:11:200,public2:11:200 >> >> Regards. >> >> [1] >> https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html >> >> On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r >> wrote: >> >>> Hi, >>> >>> I have a running production OpenStack deployment - version Wallaby >>> installed using TripleO. I'm using the default OVN/OVS networking. >>> For provider networks I have two bridges on the compute nodes br-ex and >>> br-ex2. Instances mainly use br-ex for provider networks, but there are >>> some instances which started using a provider network which should be >>> mapped to br-ex2, however I didn't specify "bridge_mappings" on >>> ml2_conf.ini, so the traffic wants to flow through the default >>> datacentre:br-ex. >>> My questions is, what services should I restart on the controller and >>> compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And >>> if this operation is safe and if the instances already using br-ex will >>> lose connectivity? >>> >>> Thanks for your help >>> >>> Tomas >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Mon Mar 13 10:31:40 2023 From: senrique at redhat.com (Sofia Enriquez) Date: Mon, 13 Mar 2023 10:31:40 +0000 Subject: [cinder] proposing Jon Bernard for cinder core In-Reply-To: References: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com> Message-ID: ? On Mon, Mar 13, 2023 at 6:20?AM Rajat Dhasmana wrote: > It's been a week and having heard no objections, I have added Jon Bernard > to the cinder-core team. > Jon, you should see a +2 and +W option in your review now. > Welcome to the team! > > On Mon, Mar 6, 2023 at 7:32?PM Jay Bryant wrote: > >> No objections from me! I think Jon would be a great addition! >> >> Thanks, >> >> Jay >> >> On 3/3/2023 5:04 AM, Rajat Dhasmana wrote: >> > Hello everyone, >> > >> > I would like to propose Jon Bernard as cinder core. Looking at the >> > review stats >> > for the past 60[1], 90[2], 120[3] days, he has been consistently in >> > the top 5 >> > reviewers with a good +/- ratio and leaving helpful comments >> > indicating good >> > quality of reviews. He has been managing the stable branch releases >> > for the >> > past 2 cycles (Zed and 2023.1) and has helped in releasing security >> > issues as well. >> > >> > Jon has been part of the cinder and OpenStack community for a long >> > time and >> > has shown very active interest in upstream activities, be it release >> > liaison, review >> > contribution, attending cinder meetings and also involving in >> > outreachy activities. >> > He will be a very good addition to our team helping out with the >> > review bandwidth >> > and adding valuable input in our discussions. >> > >> > I will leave this thread open for a week and if there are no >> > objections, I will add >> > Jon Bernard to the cinder core team. >> > >> > [1] >> > >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 >> > < >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 >> > >> > [2] >> > >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 >> > < >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 >> > >> > [3] >> > >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 >> > < >> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 >> > >> > >> > Thanks >> > Rajat Dhasmana >> >> -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Mon Mar 13 11:07:32 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 13 Mar 2023 12:07:32 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: Hello Mohammed: So far we don't have any mechanism to report the sync status of an agent. I know that, for example, the DHCP agent reports an INFO message with the statement 'Synchronizing state complete'. But other agents don't provide this information or you need to manually observe the logs to detect that. Because this could be an interesting information, I'll open a RFE bug to try to bring this information to the existing agents. Regards. On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser wrote: > Hi folks, > > I'm working on improving the stability of rollouts when using Kubernetes > as a control plane, specifically around the L3 agent, it seems that I have > not found a clear way to detect in the code path where the L3 agent has > finished it's initial sync.. > > Am I missing it somewhere or is the architecture built in a way that > doesn't really answer that question? > > Thanks > Mohammed > > -- > Mohammed Naser > VEXXHOST, Inc. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Mar 13 11:24:34 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 13 Mar 2023 12:24:34 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: It looks like this has sparked a cool ops discussion. I?ve tried an attempt here, though I am not sure how I feel about it yet. https://github.com/vexxhost/atmosphere/pull/359/files I have not extensively tested it but would be good to hear from Neutron team on this approach vs the approach from Felix. On Mon, Mar 13, 2023 at 12:07 PM Rodolfo Alonso Hernandez < ralonsoh at redhat.com> wrote: > Hello Mohammed: > > So far we don't have any mechanism to report the sync status of an agent. > I know that, for example, the DHCP agent reports an INFO message with the > statement 'Synchronizing state complete'. But other agents don't provide > this information or you need to manually observe the logs to detect that. > > Because this could be an interesting information, I'll open a RFE bug to > try to bring this information to the existing agents. > > Regards. > > On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser > wrote: > >> Hi folks, >> >> I'm working on improving the stability of rollouts when using Kubernetes >> as a control plane, specifically around the L3 agent, it seems that I have >> not found a clear way to detect in the code path where the L3 agent has >> finished it's initial sync.. >> >> Am I missing it somewhere or is the architecture built in a way that >> doesn't really answer that question? >> >> Thanks >> Mohammed >> >> -- >> Mohammed Naser >> VEXXHOST, Inc. >> > -- Mohammed Naser VEXXHOST, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eblock at nde.ag Mon Mar 13 11:46:41 2023 From: eblock at nde.ag (Eugen Block) Date: Mon, 13 Mar 2023 11:46:41 +0000 Subject: [openstack][backup] Experience for instance backup In-Reply-To: Message-ID: <20230313114641.Horde.IuxsgTM4j7xJeTh3Q4J16nB@webmail.nde.ag> Hi, could you be more specific what is "not too fast" for you? I don't really have too much information from my side, but I can describe how we do it. We use Ceph as back end for all services (nova, glance, cinder), and the most important machines are backed up by our backup server directly via rbd commands: - We create a snapshot of the running instances and export the snapshot to an external drive, this is a full backup. In addition to that many of our machines have their home or working directories mounted from CephFS which we also backup as a tar ball to an external drive once a week (also full backup). And then we have also a "real" backup solution in place (bacula) where we store incremental as well as full backups from individually configured resources for different intervals. All these different approaches have different runtimes, of course. Just as an example, one 40 GB VM (rbd image) which has areound 24 GB in-use takes around 6 minutes for the full backup. Although we also have the cinder-backup service up and running nobody is using it because the important volumes are attached to instances which we backup by our rbd solution, so there's no real need for that. Regards, Eugen Zitat von Nguy?n H?u Kh?i : > Hello guys. > I am looking for instance backup solution. I am using Cinder backup with > nfs backup but it looks not too fast. I am using a 10Gbps network. I would > like to know experience for best practice for instance backup solutions on > Openstack. > Thank you. > Nguyen Huu Khoi From ralonsoh at redhat.com Mon Mar 13 12:11:01 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 13 Mar 2023 13:11:01 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: Technically is correct but you can imagine what my answer is about enabling the green threads backdoors. This functionality is for troubleshooting only and should not be enabled in a production environment. Just as a temporary workaround, we can add INFO messages in the "periodic_sync_routers_task" method that you can easily parse reading the logs. This patch could be also backported to stable versions. Bug for reporting full sync state in Neutron agents: https://bugs.launchpad.net/neutron/+bug/2011422 On Mon, Mar 13, 2023 at 12:24?PM Mohammed Naser wrote: > It looks like this has sparked a cool ops discussion. > > I?ve tried an attempt here, though I am not sure how I feel about it yet. > > https://github.com/vexxhost/atmosphere/pull/359/files > > I have not extensively tested it but would be good to hear from Neutron > team on this approach vs the approach from Felix. > > On Mon, Mar 13, 2023 at 12:07 PM Rodolfo Alonso Hernandez < > ralonsoh at redhat.com> wrote: > >> Hello Mohammed: >> >> So far we don't have any mechanism to report the sync status of an agent. >> I know that, for example, the DHCP agent reports an INFO message with the >> statement 'Synchronizing state complete'. But other agents don't provide >> this information or you need to manually observe the logs to detect that. >> >> Because this could be an interesting information, I'll open a RFE bug to >> try to bring this information to the existing agents. >> >> Regards. >> >> On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser >> wrote: >> >>> Hi folks, >>> >>> I'm working on improving the stability of rollouts when using Kubernetes >>> as a control plane, specifically around the L3 agent, it seems that I have >>> not found a clear way to detect in the code path where the L3 agent has >>> finished it's initial sync.. >>> >>> Am I missing it somewhere or is the architecture built in a way that >>> doesn't really answer that question? >>> >>> Thanks >>> Mohammed >>> >>> -- >>> Mohammed Naser >>> VEXXHOST, Inc. >>> >> -- > Mohammed Naser > VEXXHOST, Inc. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mister.mackarow at yandex.ru Mon Mar 13 07:55:46 2023 From: mister.mackarow at yandex.ru (=?utf-8?B?0JzQkNCa0JDQoNCe0JIg0JzQkNCa0KE=?=) Date: Mon, 13 Mar 2023 10:55:46 +0300 Subject: Magnum in yoga release on Ubuntu 22.04 Message-ID: <30941678692580@mail.yandex.ru> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: From adivya1.singh at gmail.com Mon Mar 13 14:46:03 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Mon, 13 Mar 2023 20:16:03 +0530 Subject: *Migration)Red Hat Open Stack Migration Lift and Shift Message-ID: Hi Team, I am making a plan for Red hat OpenStack migration(Lift and Shift), The Director is in VM sourced in VMware, which we are migrated in other DC, But the IP and VLAN will change Will it be advisable to change the Provisioning IP and other IP , according to the New VLAN designed in the new DC, Configured in the template , Do a introspection of the new design and Configured the Red hat OpenStack, Any addition on this or guidance would be highly usefull Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Mon Mar 13 15:05:23 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 13 Mar 2023 08:05:23 -0700 Subject: *Migration)Red Hat Open Stack Migration Lift and Shift In-Reply-To: References: Message-ID: Greetings, I suspect you might want to reach out to RH support. I think it is entirely going to depend on a union of your end user needs/requirements, as well as your planning needs/requirements. -Julia On Mon, Mar 13, 2023 at 7:49?AM Adivya Singh wrote: > Hi Team, > > I am making a plan for Red hat OpenStack migration(Lift and Shift), The > Director is in VM sourced in VMware, which we are migrated in other DC, But > the IP and VLAN will change > > Will it be advisable to change the Provisioning IP and other IP , > according to the New VLAN designed in the new DC, Configured in the > template , Do a introspection of the new design and Configured the Red hat > OpenStack, > > Any addition on this or guidance would be highly usefull > > Regards > Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.huettner at mail.schwarz Mon Mar 13 15:35:43 2023 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Mon, 13 Mar 2023 15:35:43 +0000 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: Hi Mohammed, > Subject: [neutron] detecting l3-agent readiness > > Hi folks, > > I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync.. > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-/blob/devel/files/startup_wait_for_ns.py Basically we are checking against the neutron api what routers should be on the node and then validate that all keepalived processes are up and running. > Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question? > Adding a option in the neutron api would be a lot nicer. But i guess that also counts for l2 and dhcp agents. > Thanks > Mohammed > > > -- > Mohammed Naser > VEXXHOST, Inc. -- Felix Huettner Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From christian.rohmann at inovex.de Mon Mar 13 16:11:17 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 13 Mar 2023 17:11:17 +0100 Subject: [openstack][backup] Experience for instance backup In-Reply-To: References: Message-ID: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de> Hey there, On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote: > > I am looking for instance backup solution. I am using Cinder backup > with nfs backup but it looks not too fast. I am using a 10Gbps > network. I would like to know experience for best practice for > instance backup solutions?on Openstack. On 13/03/2023 12:46, Eugen Block wrote: > We use Ceph as back end for all services (nova, glance, cinder), and > the most important machines are backed up by our backup server > directly via rbd commands: There is RBD and "the other" drivers. While RBD uses the native export / import feature of Ceph, all other drivers (file, NFS, object storages like S3) are based on the abstract chunked driver (https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py). This driver reads the volume / image and treats it as chunks before making use of a concrete driver (e.g. NFS or S3) to send those chunks off somewhere to be stored. Restore works just the opposite way. The performance of the chunked driver based back-ends is not (yet) comparable to what RBD can achieve due to various reasons. But again, while "RBD" uses Ceph's mechanisms internally all other "targets" for backup storage work differently. We ourselves were looking into using and S3-compatible storage and thus I started a dicsussion about the state of those other drivers at https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html This then led to a discussion at the Cinder PTG https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many observations. There also are changes in the works, like restore into sparse volumes (https://review.opendev.org/c/openstack/cinder/+/852654) when going via the chunked driver. But also features like "encryption" (https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being discussed. Regards Christian From juliaashleykreger at gmail.com Mon Mar 13 18:43:35 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 13 Mar 2023 11:43:35 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: Greetings! Time slot wise, I think that works for me. Time wise, in regards to the amount, I'm wondering if we need more. By my count, we have 11 new topics, 4 topics to revisit, in about six hours of non-operator dedicated time, not accounting for breaks for coffee/tea. Granted, some topics might be super quick at the 10 minute quick poll of the room, whereas other topics I feel like will require extensive discussion. If I were to size them, I think we would have 6 large-ish topics along with 3-4 medium sized topics. -Julia On Thu, Mar 9, 2023 at 3:19?PM Jay Faulkner wrote: > Hey all, > > The vPTG will be upon us soon, the week of March 27. > > I booked the following times on behalf of Ironic + BM SIG Operator hour, > in accordance with what times worked in Antelope. It's my hope that since > we've had little contributor turnover, these times continue to work. I'm > completely open to having things moved around if it's more convenient to > participants. > > I've booked the following times, all in Folsom: > - Tuesday 1400 UTC - 1700 UTC > - Wednesday 1300 UTC Operator hour: baremetal SIG > - Wednesday 1400 UTC - 1600 UTC > - Wednesday 2200 - 2300 UTC > > > I propose that after the Ironic meeting on March 20, we shortly sync up in > the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg) > to pick topics and assign time. > > > Again, this is all meant to be a suggestion, I'm happy to move things > around but didn't want us to miss out on getting things booked. > > > - > Jay Faulkner > Ironic PTL > TC Member > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.urdin at binero.com Mon Mar 13 18:46:38 2023 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 13 Mar 2023 18:46:38 +0000 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com> Hello, Interesting thread! We are also interested in this for use when we are upgrading services, we are currently doing our best to parse the logs but that?s only for OVS agent and I was going to look into this. I can imagine having something like this for containers would be crucial as well. Best regards Tobias > On 12 Mar 2023, at 11:09, Mohammed Naser wrote: > > Hi folks, > > I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync.. > > Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question? > > Thanks > Mohammed > > -- > Mohammed Naser > VEXXHOST, Inc. From tobias.urdin at binero.com Mon Mar 13 18:48:50 2023 From: tobias.urdin at binero.com (Tobias Urdin) Date: Mon, 13 Mar 2023 18:48:50 +0000 Subject: [openstack][backup] Experience for instance backup In-Reply-To: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de> References: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de> Message-ID: <533DE7EF-DEE7-4D36-9580-FDB757DC6658@binero.com> Hello, Indeed and interesting topic for the PTG, we are using the Swift backup driver and also had issues with backups timing out due to Keystone tokens of which we have done some work to mitigate. Best regards Tobias > On 13 Mar 2023, at 17:11, Christian Rohmann wrote: > > Hey there, > > On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote: >> >> I am looking for instance backup solution. I am using Cinder backup with nfs backup but it looks not too fast. I am using a 10Gbps network. I would like to know experience for best practice for instance backup solutions on Openstack. > > > On 13/03/2023 12:46, Eugen Block wrote: >> We use Ceph as back end for all services (nova, glance, cinder), and the most important machines are backed up by our backup server directly via rbd commands: > > There is RBD and "the other" drivers. While RBD uses the native export / import feature of Ceph, all other drivers (file, NFS, object storages like S3) are based on the abstract chunked driver > (https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py). > This driver reads the volume / image and treats it as chunks before making use of a concrete driver (e.g. NFS or S3) to send those chunks off somewhere to be stored. Restore works just the opposite way. The performance of the chunked driver based back-ends is not (yet) comparable to what RBD can achieve due to various reasons. > > But again, while "RBD" uses Ceph's mechanisms internally all other "targets" for backup storage work differently. > We ourselves were looking into using and S3-compatible storage and thus I started a dicsussion about the state of those other drivers at https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html > > This then led to a discussion at the Cinder PTG https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many observations. > > There also are changes in the works, like restore into sparse volumes (https://review.opendev.org/c/openstack/cinder/+/852654) when going via the chunked driver. > But also features like "encryption" (https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being discussed. > > > > Regards > > > Christian > > From amy at demarco.com Mon Mar 13 20:57:30 2023 From: amy at demarco.com (Amy Marrich) Date: Mon, 13 Mar 2023 15:57:30 -0500 Subject: [Diversity] Diversity and Inclusion WG Meeting reminder Message-ID: This is a reminder that the Diversity and Inclusion WG will be meeting tomorrow at 14:00 UTC in the #openinfra-diversity channel on OFTC. We hope members of all OpenInfra projects join us as we look at the Code of Conduct, and continue working on planning for the OpenInfra Summit as well as Foundation-wide diversity surveys. Thanks, Amy (spotz) 0 - https://etherpad.opendev.org/p/diversity-wg-agenda From alsotoes at gmail.com Mon Mar 13 22:45:04 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Mon, 13 Mar 2023 16:45:04 -0600 Subject: [manila] create snapshot from share not permitted In-Reply-To: References: Message-ID: If you are inside this features support matrix https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping Examine your configuration as well: - snapshot_support indicates whether snapshots are supported for shares created on the pool/backend. When administrators do not set this capability as an extra-spec in a share type, the scheduler can place new shares of that type in pools without regard for whether snapshots are supported, and those shares will not support snapshots. https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html Cheers! On Mon, Mar 13, 2023 at 3:35?AM garcetto wrote: > good morning, > i am using manila and generic driver with dhss true, but cannot create > snapshot from shares, any help? where can i look at? > (cinder backend is a linux nfs server) > > thank you > > $ manila snapshot-create share-01 --name Snapshot1 > ERROR: Snapshots cannot be created for share > '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that > capability. (HTTP 422) (Request-ID: > req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba) > > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Mon Mar 13 22:46:25 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Mon, 13 Mar 2023 16:46:25 -0600 Subject: [manila] support for encryption In-Reply-To: References: Message-ID: I don't think so, and it also doesn't sound like a feature that Manila should implement; try to do that in your backend storage. Cheers! On Sat, Mar 11, 2023 at 7:10?AM garcetto wrote: > good afternoon, > data encryption. > thank you > > Il Ven 10 Mar 2023, 21:48 Alvaro Soto ha scritto: > >> You mean data or token encryption? >> >> Cheers! >> >> On Fri, Mar 10, 2023 at 9:08?AM garcetto wrote: >> >>> good afternoon, >>> does manila support encryption in some sort ? >>> >>> thank you >>> >> >> >> -- >> >> Alvaro Soto >> >> *Note: My work hours may not be your work hours. Please do not feel the >> need to respond during a time that is not convenient for you.* >> ---------------------------------------------------------- >> Great people talk about ideas, >> ordinary people talk about things, >> small people talk... about other people. >> > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Tue Mar 14 05:18:02 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Tue, 14 Mar 2023 10:48:02 +0530 Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> Message-ID: Hi, I can provide updates for Cinder. Thanks Rajat Dhasmana On Thu, Mar 9, 2023 at 11:43?PM Kristin Barrientos wrote: > Hi everyone, > > As we get closer to the OpenStack release, I wanted to reach out to see if > any PTL?s were interested in providing their Antelope cycle highlights in > an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we > would get 4-6 projects represented. Previous examples of OpenStack release > episodes can be found here[2] > and here[3] > . > > Please let me know if you?re interested and I can provide next steps. If > you would like to provide a project update but that time doesn?t work for > you, please share a recording with me and I can get it added to the project > navigator. > > Thanks, > > Kristin Barrientos > Marketing Coordinator > OpenInfra Foundation > > [1] https://openinfra.dev/live/ > > [2] https://www.youtube.com/watch?v=hwPfjvshxOM > > [3] https://www.youtube.com/watch?v=MSbB3L9_MeY > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliver.weinmann at me.com Tue Mar 14 05:43:04 2023 From: oliver.weinmann at me.com (Oliver Weinmann) Date: Tue, 14 Mar 2023 06:43:04 +0100 Subject: Magnum in yoga release on Ubuntu 22.04 In-Reply-To: <30941678692580@mail.yandex.ru> References: <30941678692580@mail.yandex.ru> Message-ID: <183F8F44-25BD-4E6F-A5D6-0BC93A8B6EAF@me.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: favicon.ico Type: image/png Size: 338 bytes Desc: not available URL: From zakhar at gmail.com Tue Mar 14 06:34:48 2023 From: zakhar at gmail.com (Zakhar Kirpichenko) Date: Tue, 14 Mar 2023 08:34:48 +0200 Subject: Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC unusually slow Message-ID: Hi! We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra nodes with a RabbitMQ cluster. I updated Neutron components to version 18.6.0, which recently became available in the cloud repository ( http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby main). The exact package versions updated are as follows: Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic), openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic) Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2, 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1) Installed Neutron packages: ii neutron-common 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - common ii neutron-dhcp-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - DHCP agent Firewall-as-a-Service driver for OpenStack Neutron ii neutron-l3-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - l3 agent ii neutron-linuxbridge-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - linuxbridge agent ii neutron-metadata-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - metadata agent ii neutron-plugin-ml2 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - ML2 plugin ii neutron-server 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - server ii python3-neutron 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - Python library ii python3-neutron-lib 2.10.1-0ubuntu1~cloud0 all Neutron shared routines and utilities - Python 3.x ii python3-neutronclient 1:7.2.1-0ubuntu1~cloud0 all client API library for Neutron - Python 3.x Normally this would be an easy update, but this time neutron-dhcp-agent doesn't work properly: 2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state complete 2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c 2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c 2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c 2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c 2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534 2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534 2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534 2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534 2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for ports ... (A successful configuration here). While neutron-dhcp-agent is waiting, neutron-server log gets filled up with: neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76 ... neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76 This repeats for each port of each network neutron-dhcp-agent needs to configure. Each subsequent configuration for each network takes about 1-2 minutes, depending on the network size. With earlier Neutron versions the whole process of configuring all networks would finish in under a minute, i.e. DHCP configuration per port (and network) is several orders of magnitude slower than it should be. Once neutron-dhcp-agent finishes synchronization, it seems to work without issues although there aren't that many changes in our cloud to tell whether it's fast or slow, individual port updates seem to happen quickly. All other services are working well, RabbitMQ cluster is working well, infra nodes are not overloaded and there are no apparent issues other than this one with Neutron, thus I am inclined to think that the issue is specific to version 18.6.0 of neutron-dhcp-agent or neutron-server. I would appreciate any advice! Best regards, Zakhar -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Tue Mar 14 08:46:01 2023 From: geguileo at redhat.com (Gorka Eguileor) Date: Tue, 14 Mar 2023 09:46:01 +0100 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: <20230313163251.xpnzyvzb65b6zaal@localhost> References: <20230306113543.a57aywefbn4cgsu3@localhost> <20230309095514.l3i67tys2ujaq6dp@localhost> <20230313163251.xpnzyvzb65b6zaal@localhost> Message-ID: <20230314084601.t2ez24gcljnu5plq@localhost> [Sending the email again as it seems it didn't reach the ML] On 13/03, Gorka Eguileor wrote: > On 11/03, Rishat Azizov wrote: > > Hi, Gorka, > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments. > > Hi, There are multiple things going on here: 1. There is a bug in os-brick, because the disconnect_volume should not fail, since it is being called with force=True and ignore_errors=True. The issues is that this call [1] is not wrapped in the ExceptionChainer context manager, and it should not even be a flush call, it should be a call to "multipathd remove map $map" instead. 2. The way multipath code is written [2][3], the error we see about "3624a93705842cfae35d7483200015fce is not a multipath device" means 2 different things: it is not a multipath or an error happened. So we don't really know what happened without enabling more verbose multipathd log levels. 3. The "multipath -f" call should not be failing in the first place, because the failure is happening on disconnecting the source volume, which has no data buffered to be written and therefore no reason to fail the flush (unless it's using a friendly name). I don't know if it could be happening that the first flush fails with a timeout (maybe because there is an extend operation happening), but multipathd keeps trying to flush it in the background and when it succeeds it removes the multipath device, which makes following calls fail. If that's the case we would need to change the retry from automatic [4] to manual and check in-between to see if the device has been removed in-between calls. The first issue is definitely a bug, the 2nd one is something that could be changed in the deployment to try to get additional information on the failure, and the 3rd one could be a bug. I'll see if I can find someone who wants to work on the 1st and 3rd points. Cheers, Gorka. [1]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952 [2]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064 [3]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872 [4]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384 > > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor : > > > > > On 06/03, Rishat Azizov wrote: > > > > Hi, > > > > > > > > It works with smaller volumes. > > > > > > > > multipath.conf attached to thist email. > > > > > > > > Cinder version - 18.2.0 Wallaby > > > From kamil.madac at gmail.com Tue Mar 14 09:46:07 2023 From: kamil.madac at gmail.com (Kamil Madac) Date: Tue, 14 Mar 2023 10:46:07 +0100 Subject: [neutron] Message-ID: Hi All, I'm in the process of planning a small public cloud based on OpenStack. I have quite experience with kolla-ansible deployments which use OVS networking and I have no issues with that. It works stable for my use cases (Vlan provider networks, DVR, tenant networks, floating IPs). For that new deployment I'm looking at OVN deployment which from what I read should be more performant (faster build of instances) and with ability to cover more networking features in OVN instead of needing external software like iptables/dnsmasq. Does anyone use OVN in production and what is your experience (pros/cons)? Is OVN mature enough to replace OVS in the production deployment (are there some basic features from OVS missing)? Thanks in advance for sharing the experience. -- Kamil Madac -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Tue Mar 14 13:15:29 2023 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Tue, 14 Mar 2023 10:15:29 -0300 Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev> Message-ID: I would like to share the updates for Manila! :) Em qui., 9 de mar. de 2023 ?s 15:12, Kristin Barrientos < kristin at openinfra.dev> escreveu: > Hi everyone, > > As we get closer to the OpenStack release, I wanted to reach out to see if > any PTL?s were interested in providing their Antelope cycle highlights in > an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we > would get 4-6 projects represented. Previous examples of OpenStack release > episodes can be found here[2] > and here[3] > . > > Please let me know if you?re interested and I can provide next steps. If > you would like to provide a project update but that time doesn?t work for > you, please share a recording with me and I can get it added to the project > navigator. > > Thanks, > > Kristin Barrientos > Marketing Coordinator > OpenInfra Foundation > > [1] https://openinfra.dev/live/ > > [2] https://www.youtube.com/watch?v=hwPfjvshxOM > > [3] https://www.youtube.com/watch?v=MSbB3L9_MeY > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geguileo at redhat.com Mon Mar 13 16:32:51 2023 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 13 Mar 2023 17:32:51 +0100 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: References: <20230306113543.a57aywefbn4cgsu3@localhost> <20230309095514.l3i67tys2ujaq6dp@localhost> Message-ID: <20230313163251.xpnzyvzb65b6zaal@localhost> On 11/03, Rishat Azizov wrote: > Hi, Gorka, > > Thanks. I see multiple "multipath -f" calls. Logs in attachments. > Hi, There are multiple things going on here: 1. There is a bug in os-brick, because the disconnect_volume should not fail, since it is being called with force=True and ignore_errors=True. The issues is that this call [1] is not wrapped in the ExceptionChainer context manager, and it should not even be a flush call, it should be a call to "multipathd remove map $map" instead. 2. The way multipath code is written [2][3], the error we see about "3624a93705842cfae35d7483200015fce is not a multipath device" means 2 different things: it is not a multipath or an error happened. So we don't really know what happened without enabling more verbose multipathd log levels. 3. The "multipath -f" call should not be failing in the first place, because the failure is happening on disconnecting the source volume, which has no data buffered to be written and therefore no reason to fail the flush (unless it's using a friendly name). I don't know if it could be happening that the first flush fails with a timeout (maybe because there is an extend operation happening), but multipathd keeps trying to flush it in the background and when it succeeds it removes the multipath device, which makes following calls fail. If that's the case we would need to change the retry from automatic [4] to manual and check in-between to see if the device has been removed in-between calls. The first issue is definitely a bug, the 2nd one is something that could be changed in the deployment to try to get additional information on the failure, and the 3rd one could be a bug. I'll see if I can find someone who wants to work on the 1st and 3rd points. Cheers, Gorka. [1]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952 [2]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064 [3]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872 [4]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384 > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor : > > > On 06/03, Rishat Azizov wrote: > > > Hi, > > > > > > It works with smaller volumes. > > > > > > multipath.conf attached to thist email. > > > > > > Cinder version - 18.2.0 Wallaby > > > > Hi, > > > > After giving it some thought I think I may know what is going on. > > > > If you have DEBUG logs enabled in cinder-backup when it fails, how many > > calls do you see in the cinder-backup to "multipath -f" from os-brick, > > only one or do you see more? > > > > Cheers, > > Gorka. > > > > > > > > ??, 6 ???. 2023??. ? 17:35, Gorka Eguileor : > > > > > > > On 16/02, Rishat Azizov wrote: > > > > > Hello! > > > > > > > > > > We have an error with creating backups from iscsi volume. Usually, > > this > > > > > happens with large backups over 100GB. > > > > > > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > [req-f6619913-6f96-4226-8d75-2da3fca722f1 > > > > 23de1b92e7674cf59486f07ac75b886b > > > > > a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message > > > > handling: > > > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error > > > > while > > > > > running command. > > > > > Command: multipath -f 3624a93705842cfae35d7483200015ec6 > > > > > Exit code: 1 > > > > > Stdout: '' > > > > > Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a > > > > > multipath device\n' > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > Traceback > > > > > (most recent call last): > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line > > > > 165, > > > > > in _process_incoming > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server res > > = > > > > > self.dispatcher.dispatch(message) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", > > line > > > > > 309, in dispatch > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > self._do_dispatch(endpoint, method, ctxt, args) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", > > line > > > > > 229, in _do_dispatch > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > result = > > > > > func(ctxt, **new_args) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in > > wrapper > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > func(self, *args, **kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line > > 410, in > > > > > create_backup > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > volume_utils.update_backup_error(backup, str(err)) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, > > in > > > > > __exit__ > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > self.force_reraise() > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, > > in > > > > > force_reraise > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > raise > > > > > self.value > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line > > 399, in > > > > > create_backup > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > updates > > > > = > > > > > self._run_backup(context, backup, volume) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line > > 493, in > > > > > _run_backup > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > ignore_errors=True) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line > > 1066, > > > > in > > > > > _detach_device > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > force=force, ignore_errors=ignore_errors) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in > > > > > trace_logging_wrapper > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > f(*args, **kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", > > line > > > > 360, > > > > > in inner > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > f(*args, **kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > > > > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > > > > > line 880, in disconnect_volume > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > is_disconnect_call=True) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > > > > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", > > > > > line 942, in _cleanup_connection > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > self._linuxscsi.flush_multipath_device(multipath_name) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", > > line > > > > > 382, in flush_multipath_device > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > root_helper=self._root_helper) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in > > > > > _execute > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > result = > > > > > self.__execute(*args, **kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", > > line > > > > > 172, in execute > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > execute_root(*cmd, **kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line > > > > 247, > > > > > in _wrap > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > return > > > > > self.channel.remote_call(name, args, kwargs) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server File > > > > > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, > > in > > > > > remote_call > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > raise > > > > > exc_type(*result[2]) > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error > > > > while > > > > > running command. > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command: > > > > > multipath -f 3624a93705842cfae35d7483200015ec6 > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit > > code: 1 > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: > > '' > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: > > 'Feb > > > > > 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath > > > > device\n' > > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server > > > > > > > > > > Could you please help with this error? > > > > > > > > Hi, > > > > > > > > Does it work for smaller volumes or does it also fail? > > > > > > > > What are your defaults in your /etc/multipath.conf file? > > > > > > > > What Cinder release are you using? > > > > > > > > Cheers, > > > > Gorka. > > > > > > > > > > > > > defaults { > > > user_friendly_names no > > > find_multipaths yes > > > enable_foreign "^$" > > > } > > > > > > blacklist_exceptions { > > > property "(SCSI_IDENT_|ID_WWN)" > > > } > > > > > > blacklist { > > > } > > > > > > devices { > > > device { > > > vendor "PURE" > > > product "FlashArray" > > > fast_io_fail_tmo 10 > > > path_grouping_policy "group_by_prio" > > > failback "immediate" > > > prio "alua" > > > hardware_handler "1 alua" > > > max_sectors_kb 4096 > > > } > > > } > > > > > 2023-03-10 16:42:41.785 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Transferred chunk 1600 of 1600 (84387K/s) _transfer_data /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:426 > 2023-03-10 16:42:42.107 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-base-metadata' _save_vol_base_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:79 > 2023-03-10 16:42:42.139 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Completed fetching metadata type 'volume-base-metadata' _save_vol_base_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:98 > 2023-03-10 16:42:42.139 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-metadata' _save_vol_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:109 > 2023-03-10 16:42:42.147 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] No metadata type 'volume-metadata' available _save_vol_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:123 > 2023-03-10 16:42:42.148 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-glance-metadata' _save_vol_glance_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:132 > 2023-03-10 16:42:42.156 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Completed fetching metadata type 'volume-glance-metadata' _save_vol_glance_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:145 > 2023-03-10 16:42:42.157 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Backing up metadata for volume 7e9beead-01a6-488e-bc2f-47093a8cdbd9. _backup_metadata /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:946 > 2023-03-10 16:42:42.251 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Backup '4f32e959-509d-49d9-9674-acc3776b4b6a' of volume 7e9beead-01a6-488e-bc2f-47093a8cdbd9 finished. backup /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:1011 > 2023-03-10 16:42:42.252 2878341 DEBUG oslo_concurrency.processutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf chown 0 /dev/dm-5 execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:42:42.613 2878341 DEBUG oslo_concurrency.processutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf chown 0 /dev/dm-5" returned: 0 in 0.361s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:42:42.614 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] ==> disconnect_volume: call "{'args': (, {'target_discovered': False, 'discard': True, 'target_luns': [1, 1, 1, 1], 'target_iqns': ['iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'], 'target_portals': ['10.224.18.46:3260', '10.224.18.47:3260', '10.224.18.48:3260', '10.224.18.49:3260'], 'wwn': '3624a93705842cfae35d7483200015fce', 'qos_specs': {'total_bytes_sec': '524288000', 'read_iops_sec': '12800', 'write_iops_sec': '6400'}, 'access_mode': 'rw', 'encrypted': False, 'cacheable': False}, {'type': 'block', 'scsi_wwn': '3624a93705842cfae35d7483200015fce', 'path': '/dev/dm-5', 'multipath_id': '3624a93705842cfae35d7483200015fce'}), 'kwargs': {'force': True, 'ignore_errors': True}}" trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:150 > 2023-03-10 16:42:42.616 2878341 DEBUG oslo_concurrency.lockutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Lock "connect_volume" acquired by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: waited 0.001s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359 > 2023-03-10 16:42:42.616 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting connected devices for (ips,iqns,luns)=[('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1)] _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:821 > 2023-03-10 16:42:42.618 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:42:42.631 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:42:42.632 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('10.224.18.46:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.47:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.48:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.49:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:42:42.634 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m session execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:42:42.647 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m session" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:42:42.647 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:42:42.649 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('-m', 'session'): stdout=tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > stderr= _run_iscsiadm_bare /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1155 > 2023-03-10 16:42:42.649 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsi session list stdout=tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > tcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash) > stderr= _run_iscsi_session /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1144 > 2023-03-10 16:42:42.653 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Resulting device map defaultdict(. at 0x7fe6921bf268>, {('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdb'}, set()), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sda'}, set()), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdc'}, set()), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdd'}, set())}) _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:853 > 2023-03-10 16:42:42.654 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Removing multipathed devices sdc, sdd, sda, sdb remove_connection /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:308 > 2023-03-10 16:42:42.655 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flush multipath device 3624a93705842cfae35d7483200015fce flush_multipath_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:376 > 2023-03-10 16:42:42.657 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:42:46.675 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 4.019s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:42:46.676 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478 > 2023-03-10 16:42:46.676 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:42:46.685 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 20 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106 > 2023-03-10 16:42:55.994 2878341 DEBUG oslo_service.periodic_task [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Running periodic task BackupManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:211 > 2023-03-10 16:42:55.995 2878341 DEBUG cinder.manager [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.6/site-packages/cinder/manager.py:197 > 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 20.031s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478 > 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:06.714 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 40 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106 > 2023-03-10 16:43:46.753 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 40.046s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.754 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:474 > 2023-03-10 16:43:46.754 2880468 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140628271231200]: Unexpected error while running command. > Command: multipath -f 3624a93705842cfae35d7483200015fce > Exit code: 1 > Stdout: '' > Stderr: 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490 > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd > ret = func(*f_args, **f_kwargs) > File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap > return func(*args, **kwargs) > File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 197, in execute_root > return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs) > File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 146, in custom_execute > on_completion=on_completion, *cmd, **kwargs) > File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 441, in execute > cmd=sanitized_cmd) > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. > Command: multipath -f 3624a93705842cfae35d7483200015fce > Exit code: 1 > Stdout: '' > Stderr: 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' > 2023-03-10 16:43:46.757 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('', 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.758 2878341 WARNING os_brick.exception [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flushing 3624a93705842cfae35d7483200015fce failed: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. > 2023-03-10 16:43:46.761 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdc execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.772 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdc" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.772 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.774 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdc with /sys/block/sdc/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75 > 2023-03-10 16:43:46.774 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdc/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.814 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdc/device/delete" returned: 0 in 0.040s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.815 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.817 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdd execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.828 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdd" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.828 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.829 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdd with /sys/block/sdd/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75 > 2023-03-10 16:43:46.831 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdd/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.869 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdd/device/delete" returned: 0 in 0.039s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.870 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.872 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sda execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.883 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sda" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.884 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.885 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sda with /sys/block/sda/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75 > 2023-03-10 16:43:46.887 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sda/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.929 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sda/device/delete" returned: 0 in 0.042s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.930 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.931 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdb execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.942 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdb" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.943 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.944 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdb with /sys/block/sdb/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75 > 2023-03-10 16:43:46.945 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdb/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.979 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdb/device/delete" returned: 0 in 0.034s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.979 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.980 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Checking to see if SCSI volumes sdc, sdd, sda, sdb have been removed. wait_for_volumes_removal /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:83 > 2023-03-10 16:43:46.981 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] SCSI volumes sdc, sdd, sda, sdb have been removed. wait_for_volumes_removal /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:92 > 2023-03-10 16:43:46.982 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Disconnecting from: [('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1')] _disconnect_connection /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1132 > 2023-03-10 16:43:46.983 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:46.996 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op update -n node.startup -v manual" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:46.996 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:46.997 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:46.998 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.026 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --logout" returned: 0 in 0.028s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.027 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260]\nLogout of [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.028 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260] > Logout of [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260] successful. > stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.029 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.041 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op delete" returned: 0 in 0.012s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.041 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.042 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.043 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.055 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op update -n node.startup -v manual" returned: 0 in 0.012s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.055 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.056 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.057 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.084 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --logout" returned: 0 in 0.026s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.084 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260]\nLogout of [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.086 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260] > Logout of [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260] successful. > stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.088 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.101 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op delete" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.102 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.103 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.104 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.114 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op update -n node.startup -v manual" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.114 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.115 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.116 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.140 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --logout" returned: 0 in 0.024s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.140 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260]\nLogout of [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.141 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260] > Logout of [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260] successful. > stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.142 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.153 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op delete" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.153 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.154 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.155 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.165 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op update -n node.startup -v manual" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.165 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.166 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.167 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.189 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --logout" returned: 0 in 0.022s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.190 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260]\nLogout of [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.191 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260] > Logout of [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260] successful. > stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.192 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.202 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op delete" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.202 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:43:47.203 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009 > 2023-03-10 16:43:47.204 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flushing again multipath 3624a93705842cfae35d7483200015fce now that we removed the devices. _cleanup_connection /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:941 > 2023-03-10 16:43:47.204 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flush multipath device 3624a93705842cfae35d7483200015fce flush_multipath_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:376 > 2023-03-10 16:43:47.205 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.215 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:43:47.216 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478 > 2023-03-10 16:43:47.216 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:43:47.221 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 20 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106 > 2023-03-10 16:43:55.994 2878341 DEBUG oslo_service.periodic_task [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Running periodic task BackupManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:211 > 2023-03-10 16:43:55.995 2878341 DEBUG cinder.manager [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.6/site-packages/cinder/manager.py:197 > 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 20.026s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478 > 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384 > 2023-03-10 16:44:07.249 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 40 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106 > 2023-03-10 16:44:47.287 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 40.045s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423 > 2023-03-10 16:44:47.287 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:474 > 2023-03-10 16:44:47.288 2880468 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140628271231200]: Unexpected error while running command. > Command: multipath -f 3624a93705842cfae35d7483200015fce > Exit code: 1 > Stdout: '' > Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490 > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd > ret = func(*f_args, **f_kwargs) > File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap > return func(*args, **kwargs) > File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 197, in execute_root > return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs) > File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 146, in custom_execute > on_completion=on_completion, *cmd, **kwargs) > File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 441, in execute > cmd=sanitized_cmd) > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. > Command: multipath -f 3624a93705842cfae35d7483200015fce > Exit code: 1 > Stdout: '' > Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' > 2023-03-10 16:44:47.288 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('', 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511 > 2023-03-10 16:44:47.289 2878341 DEBUG oslo_concurrency.lockutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Lock "connect_volume" released by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 124.674s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371 > 2023-03-10 16:44:47.289 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] <== disconnect_volume: exception (124674ms) ProcessExecutionError('', 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None) trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:160 > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. > Command: multipath -f 3624a93705842cfae35d7483200015fce > Exit code: 1 > Stdout: '' > Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Traceback (most recent call last): > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server return func(self, *args, **kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in create_backup > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server volume_utils.update_backup_error(backup, str(err)) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server self.force_reraise() > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server raise self.value > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in create_backup > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server updates = self._run_backup(context, backup, volume) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in _run_backup > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server ignore_errors=True) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, in _detach_device > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server force=force, ignore_errors=ignore_errors) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 154, in trace_logging_wrapper > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 880, in disconnect_volume > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server is_disconnect_call=True) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 942, in _cleanup_connection > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server self._linuxscsi.flush_multipath_device(multipath_name) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 382, in flush_multipath_device > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server root_helper=self._root_helper) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server result = self.__execute(*args, **kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 172, in execute > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server return execute_root(*cmd, **kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server raise exc_type(*result[2]) > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Command: multipath -f 3624a93705842cfae35d7483200015fce > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Exit code: 1 > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Stdout: '' > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' > 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server From knikolla at bu.edu Tue Mar 14 14:40:57 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 14 Mar 2023 14:40:57 +0000 Subject: [tc][all] What's happening in the Technical Committee - 2023 March 14 Message-ID: Hi all, Please find below a summary for the last 2 weeks of the TC. Meetings ======= - March 1, 2023 A video recoding of the meeting is available https://www.youtube.com/watch?v=HA1owc9qGiE - March 8, 2023 Meeting notes are available https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.html - The next Technical Committee meeting will be tomorrow, March 14, 2023 at 16:00 UTC, chaired by Jay Faulkner. Agenda is available at the link below. Please contact me or Jay if you notice missing items that should be discussed. https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee Merged Changes ============== - Appoint Felipe Reyes as OpenStack_Charms PTL (formal-vote) https://review.opendev.org/c/openstack/governance/+/874971 - Appoint Liu as Senlin PTL (formal-vote) https://review.opendev.org/c/openstack/governance/+/874969 - Adding mailto link in upstream opportunities doc (documentation-change) https://review.opendev.org/c/openstack/governance/+/874968 - Move heat-translator and tosca-parser to tacker's governance (project-update) https://review.opendev.org/c/openstack/governance/+/876012 - Fix doc referencing 'admin_api' rule (typo-fix) https://review.opendev.org/c/openstack/governance/+/875860 - Correct CHAIR.rst: reflect vice-chair nom practice https://review.opendev.org/c/openstack/governance/+/875788 - Ironic program adopting x/virtualpdu https://review.opendev.org/c/openstack/governance/+/876208 - Volunteer to serve at 2023.2 TC vice chair https://review.opendev.org/c/openstack/governance/+/875787 - Add chair role to knikolla for 2023.2 https://review.opendev.org/c/openstack/governance/+/875742 Open Changes ============ There are 15 open changes to the Governance repo. https://review.opendev.org/q/project:openstack/governance+status:open Happenings ========== New TC Chair and Vice Chair ---------------------------------------- The TC has a new Chair (Kristi Nikolla) and Vice Chair (Jay Faulkner). Thank you Ghanshyam Mann for your 2 years of service as the TC chair. https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032506.html TripleO is being deprecated -------------------------------------- The TC voted to continue supporting the Zed release of TripleO, however without a project team. https://review.opendev.org/c/openstack/governance/+/877132 Leaderless projects -------------------------- We're now down to 7 projects without any PTL candidacies (Monasca, Rally, Sahara, Swift, TripleO, Vitrage, and Winstackers). The TC has started reaching out to the teams and former PTLs. If you have any information or interest, please reach out to the TC via the etherpad linked below. https://etherpad.opendev.org/p/2023.2-leaderless Release naming for projects -------------------------------------- The TC continued discussing changes related to version naming for projects. Specifically, there are two proposals about allowing projects to omit the OpenStack version when referring to their project, or flipping the OpenStack and project version order. https://review.opendev.org/c/openstack/governance/+/874484 https://review.opendev.org/c/openstack/governance/+/875942 Virtual PTG ---------------- The TC is considering scheduling the same PTG time slots for itself as the previous PTG. Namely, Thursday and Friday 15:00 - 19:00 UTC. I will be booking the slots later today. https://etherpad.opendev.org/p/tc-2023-2-ptg How to contact the TC: ================== If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways: 1. Email: you can send an email with the tag [tc] on the openstack-discuss mailing list. 2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 16:00 UTC 3. IRC: Ping us using the 'tc-members' keyword on the #openstack-tc IRC channel on OFTC. From skaplons at redhat.com Tue Mar 14 16:06:52 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 14 Mar 2023 17:06:52 +0100 Subject: [neutron] Bug deputy - report from week of 6th March Message-ID: <12170940.O9o76ZdvQC@p1> Hi, Sorry for sending it so late but I was off Monday and Tuesday and I forgot about it. Here's list of new bugs reported last week: ## High * https://bugs.launchpad.net/neutron/+bug/2009678 - [OVN] The OVN agent config entry point is incorrect in setup.cfg - assigned to Rodolfo, fix proposed * https://bugs.launchpad.net/neutron/+bug/2009632 - ARP requests from ovnmeta namespaces are sent to physical interfaces of computing nodes - **not assigned yet** * https://bugs.launchpad.net/neutron/+bug/2009703 - [OVN] HW offload event "QoSMinimumBandwidthEvent" fails if the min-bw rule is removed - assigned to Rodolfo, fix proposed ## Medium * https://bugs.launchpad.net/neutron/+bug/2009509 - Large number of FIPs and subnets causes slow sync_routers response - assigned to Adam Oswick * https://bugs.launchpad.net/neutron/+bug/2009804 - [OVN] Method ``get_port_qos`` should always return 2 values - **not assigned yet, low hanging fruit bug** ## Low * https://bugs.launchpad.net/neutron/+bug/2009728 - [OVS] "permitted_ethertypes" should be validated and filtered during the OVS agent initialization - **not assigned yet, low hanging fruit bug** * https://bugs.launchpad.net/neutron/+bug/2009832 - FWaaS docs lack required packages - **not assigned yet, low hanging fruit bug** * https://bugs.launchpad.net/neutron/+bug/2009831 - VPNaaS docs lack packages to install - **not assigned yet, low hanging fruit bug** ## Incomlete * https://bugs.launchpad.net/neutron/+bug/2009807 - Not able to create external physical network - needs some more info but for now it looks more like user's issue rather than bug ## Others * https://bugs.launchpad.net/neutron/+bug/2009705 - [FWaaS ]Openstack Zed - firewall group status doesn't change to ACTIVE. - **unassigned**, it needs someone from FWaaS to look at it and ZhaoHeng is looking into it already. -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From nguyenhuukhoinw at gmail.com Tue Mar 14 16:12:03 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Tue, 14 Mar 2023 23:12:03 +0700 Subject: [openstack][backup] Experience for instance backup In-Reply-To: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de> References: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de> Message-ID: Thank Christian, I will try to follow it. *Hello * Eugen, I use SAN to back our openstack services and I planned to use NFS for Cinder backup. Because of that, We separate tenants for different departments. So they can back up by themself. Thank you for your sharing. Nguyen Huu Khoi On Mon, Mar 13, 2023 at 11:11?PM Christian Rohmann < christian.rohmann at inovex.de> wrote: > Hey there, > > On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote: > > > > I am looking for instance backup solution. I am using Cinder backup > > with nfs backup but it looks not too fast. I am using a 10Gbps > > network. I would like to know experience for best practice for > > instance backup solutions on Openstack. > > > On 13/03/2023 12:46, Eugen Block wrote: > > We use Ceph as back end for all services (nova, glance, cinder), and > > the most important machines are backed up by our backup server > > directly via rbd commands: > > There is RBD and "the other" drivers. While RBD uses the native export / > import feature of Ceph, all other drivers (file, NFS, object storages > like S3) are based on the abstract chunked driver > ( > https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py > ). > This driver reads the volume / image and treats it as chunks before > making use of a concrete driver (e.g. NFS or S3) to send those chunks > off somewhere to be stored. Restore works just the opposite way. The > performance of the chunked driver based back-ends is not (yet) > comparable to what RBD can achieve due to various reasons. > > But again, while "RBD" uses Ceph's mechanisms internally all other > "targets" for backup storage work differently. > We ourselves were looking into using and S3-compatible storage and thus > I started a dicsussion about the state of those other drivers at > > https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html > > This then led to a discussion at the Cinder PTG > https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many > observations. > > There also are changes in the works, like restore into sparse volumes > (https://review.opendev.org/c/openstack/cinder/+/852654) when going via > the chunked driver. > But also features like "encryption" > (https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being > discussed. > > > > Regards > > > Christian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Mar 14 16:44:32 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 14 Mar 2023 17:44:32 +0100 Subject: [neutron] In-Reply-To: References: Message-ID: <5696019.DvuYhMxLoT@p1> Hi, Dnia wtorek, 14 marca 2023 10:46:07 CET Kamil Madac pisze: > Hi All, > > I'm in the process of planning a small public cloud based on OpenStack. I > have quite experience with kolla-ansible deployments which use OVS > networking and I have no issues with that. It works stable for my use cases > (Vlan provider networks, DVR, tenant networks, floating IPs). > > For that new deployment I'm looking at OVN deployment which from what I > read should be more performant (faster build of instances) and with ability > to cover more networking features in OVN instead of needing external > software like iptables/dnsmasq. > > Does anyone use OVN in production and what is your experience (pros/cons)? > Is OVN mature enough to replace OVS in the production deployment (are there > some basic features from OVS missing)? I'm not using it in production as I'm not cloud operator but I can say that it is stable and mature enough to use it. In Red Hat OpenStack (RH OSP) it's default networking backend since OSP16 (based on upstream Train version). Regarding list of the feature parity gaps You can check https://docs.openstack.org/neutron/latest/ovn/gaps.html - this list should be more or less up to date. In case of any doubts You can always ask on neutron channel on IRC about specific feature which You would need :) > > Thanks in advance for sharing the experience. > > -- > Kamil Madac > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From skaplons at redhat.com Tue Mar 14 16:47:27 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Tue, 14 Mar 2023 17:47:27 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: Message-ID: <2315188.ElGaqSPkdT@p1> Hi, Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze: > Hi Mohammed, > > > Subject: [neutron] detecting l3-agent readiness > > > > Hi folks, > > > > I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync.. > > > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-/blob/devel/files/startup_wait_for_ns.py > Basically we are checking against the neutron api what routers should be on the node and then validate that all keepalived processes are up and running. That would work only for HA routers. If You would also have routers which aren't "ha" this method may fail. > > > Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question? > > > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts for l2 and dhcp agents. > > > > Thanks > > Mohammed > > > > > > -- > > Mohammed Naser > > VEXXHOST, Inc. > > -- > Felix Huettner > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From zakhar at gmail.com Tue Mar 14 18:28:32 2023 From: zakhar at gmail.com (Zakhar Kirpichenko) Date: Tue, 14 Mar 2023 20:28:32 +0200 Subject: Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC unusually slow In-Reply-To: References: Message-ID: If anyone is interested, I reported the bug/regression: https://bugs.launchpad.net/cloud-archive/+bug/2011513 Is anyone else facing such issues? /Z On Tue, 14 Mar 2023 at 08:34, Zakhar Kirpichenko wrote: > Hi! > > We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra > nodes with a RabbitMQ cluster. I updated Neutron components to version > 18.6.0, which recently became available in the cloud repository ( > http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby > main). The exact package versions updated are as follows: > > Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic), > openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic) > Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0, > 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2, > 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64 > (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), > neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, > 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0, > 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0, > 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64 > (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), > neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, > 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64 > (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1) > > Installed Neutron packages: > > ii neutron-common 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - common > ii neutron-dhcp-agent 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - DHCP agent > Firewall-as-a-Service driver for OpenStack Neutron > ii neutron-l3-agent 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - l3 agent > ii neutron-linuxbridge-agent 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - linuxbridge agent > ii neutron-metadata-agent 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - metadata agent > ii neutron-plugin-ml2 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - ML2 plugin > ii neutron-server 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - server > ii python3-neutron 2:18.6.0-0ubuntu1~cloud1 > all Neutron is a virtual network service for > Openstack - Python library > ii python3-neutron-lib 2.10.1-0ubuntu1~cloud0 > all Neutron shared routines and utilities - > Python 3.x > ii python3-neutronclient 1:7.2.1-0ubuntu1~cloud0 > all client API library for Neutron - Python > 3.x > > Normally this would be an easy update, but this time neutron-dhcp-agent > doesn't work properly: > > 2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent > [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state > complete > 2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc > [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method > dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the > server is not down, consider increasing the rpc_response_timeout option as > Neutron server(s) may be overloaded and unable to respond quickly enough.: > oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply > to message ID bd97110b004e413cb2d6b05d9fb3b57c > 2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc > [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for > dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it > to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed > out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c > 2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent > [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying > server of ports ready. Retrying...: > oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply > to message ID bd97110b004e413cb2d6b05d9fb3b57c > 2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver > [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c > 2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc > [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method > dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the > server is not down, consider increasing the rpc_response_timeout option as > Neutron server(s) may be overloaded and unable to respond quickly enough.: > oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply > to message ID f254f735998243c4b0a58ce95c974534 > 2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc > [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for > dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it > to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed > out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534 > 2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver > [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534 > 2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent > [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying > server of ports ready. Retrying...: > oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply > to message ID f254f735998243c4b0a58ce95c974534 > 2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent > [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for > ports ... (A successful configuration here). > > While neutron-dhcp-agent is waiting, neutron-server log gets filled up > with: > > neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO > neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - > - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76 > ... > neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO > neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - > - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76 > > This repeats for each port of each network neutron-dhcp-agent needs to > configure. > > Each subsequent configuration for each network takes about 1-2 > minutes, depending on the network size. With earlier Neutron versions the > whole process of configuring all networks would finish in under a minute, > i.e. DHCP configuration per port (and network) is several orders of > magnitude slower than it should be. Once neutron-dhcp-agent finishes > synchronization, it seems to work without issues although there aren't that > many changes in our cloud to tell whether it's fast or slow, individual > port updates seem to happen quickly. > > All other services are working well, RabbitMQ cluster is working well, > infra nodes are not overloaded and there are no apparent issues other than > this one with Neutron, thus I am inclined to think that the issue is > specific to version 18.6.0 of neutron-dhcp-agent or neutron-server. > > I would appreciate any advice! > > Best regards, > Zakhar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fkr at hazardous.org Tue Mar 14 19:53:22 2023 From: fkr at hazardous.org (Felix Kronlage-Dammers) Date: Tue, 14 Mar 2023 20:53:22 +0100 Subject: [publiccloud-sig] Reminder - next meeting March 15th - 0800 UTC Message-ID: Hi everyone, better late than not at all ;) Here comes the reminder for the next meeting of the Public Cloud SIG: This is on March 15th (this wednesday) at 0800 UTC. We meet on IRC in #openstack-operators. A preliminary agenda can be found in the pad: https://etherpad.opendev.org/p/publiccloud-sig-meeting See also here for all other details: https://wiki.openstack.org/wiki/PublicCloudSIG read you on wednesday! felix From jay at gr-oss.io Tue Mar 14 21:08:20 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 14 Mar 2023 14:08:20 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: I'm not opposed to adding more time necessarily, but I wonder if the solution is to triage what we talk about better. Last PTG, we planned several features which we didn't have enough contribution to complete. IMO, it might be better to limit what we discuss to things that are likely to be accomplished next cycle. If we can't determine that without more discussion, then sure, let's add vPTG sessions.. Is there a suggestion as to specifically what times, and how much to add? Thanks, Jay Faulkner On Mon, Mar 13, 2023 at 11:43?AM Julia Kreger wrote: > Greetings! > > Time slot wise, I think that works for me. > > Time wise, in regards to the amount, I'm wondering if we need more. By my > count, we have 11 new topics, 4 topics to revisit, in about six hours > of non-operator dedicated time, not accounting for breaks for coffee/tea. > Granted, some topics might be super quick at the 10 minute quick poll of > the room, whereas other topics I feel like will require extensive > discussion. If I were to size them, I think we would have 6 large-ish > topics along with 3-4 medium sized topics. > > -Julia > > > On Thu, Mar 9, 2023 at 3:19?PM Jay Faulkner wrote: > >> Hey all, >> >> The vPTG will be upon us soon, the week of March 27. >> >> I booked the following times on behalf of Ironic + BM SIG Operator hour, >> in accordance with what times worked in Antelope. It's my hope that since >> we've had little contributor turnover, these times continue to work. I'm >> completely open to having things moved around if it's more convenient to >> participants. >> >> I've booked the following times, all in Folsom: >> - Tuesday 1400 UTC - 1700 UTC >> - Wednesday 1300 UTC Operator hour: baremetal SIG >> - Wednesday 1400 UTC - 1600 UTC >> - Wednesday 2200 - 2300 UTC >> >> >> I propose that after the Ironic meeting on March 20, we shortly sync up >> in the Bobcat PTG etherpad ( >> https://etherpad.opendev.org/p/ironic-bobcat-ptg) to pick topics and >> assign time. >> >> >> Again, this is all meant to be a suggestion, I'm happy to move things >> around but didn't want us to miss out on getting things booked. >> >> >> - >> Jay Faulkner >> Ironic PTL >> TC Member >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Tue Mar 14 21:41:18 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 14 Mar 2023 14:41:18 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: I think there is a bit of a challenge to navigate though in that the PTG as a sync point is needed especially on items which may take more than just one cycle to deliver. A great example is driver composition. The other unknown is if people are just not interested in some topics, which can result in that topic being very quick. One thing we also did in the past is guess how much time as a group in advance. I know for a few cycles we had a quick 15 minute call to discuss sizing. Based upon output from that, I think we could adjust the time slots accordingly. Maybe that might make sense to do? -Julia On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner wrote: > I'm not opposed to adding more time necessarily, but I wonder if the > solution is to triage what we talk about better. > > Last PTG, we planned several features which we didn't have enough > contribution to complete. IMO, it might be better to limit what we discuss > to things that are likely to be accomplished next cycle. If we can't > determine that without more discussion, then sure, let's add vPTG sessions.. > > Is there a suggestion as to specifically what times, and how much to add? > > Thanks, > Jay Faulkner > >> [trim] >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Mar 14 21:53:59 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 14 Mar 2023 14:53:59 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: That sync is exactly what I hoped > I propose that after the Ironic meeting on March 20, we shortly sync up in the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg) to pick topics and assign time. ^ that would be. Does that sound good to you? If it comes out in there we need more time, we should also have interested parties around to book it. -Jay On Tue, Mar 14, 2023 at 2:41?PM Julia Kreger wrote: > I think there is a bit of a challenge to navigate though in that the PTG > as a sync point is needed especially on items which may take more than just > one cycle to deliver. A great example is driver composition. The other > unknown is if people are just not interested in some topics, which can > result in that topic being very quick. > > One thing we also did in the past is guess how much time as a group in > advance. I know for a few cycles we had a quick 15 minute call to discuss > sizing. Based upon output from that, I think we could adjust the time > slots accordingly. Maybe that might make sense to do? > > -Julia > > On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner wrote: > >> I'm not opposed to adding more time necessarily, but I wonder if the >> solution is to triage what we talk about better. >> >> Last PTG, we planned several features which we didn't have enough >> contribution to complete. IMO, it might be better to limit what we discuss >> to things that are likely to be accomplished next cycle. If we can't >> determine that without more discussion, then sure, let's add vPTG sessions.. >> >> Is there a suggestion as to specifically what times, and how much to add? >> >> Thanks, >> Jay Faulkner >> >>> [trim] >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Tue Mar 14 22:07:05 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 14 Mar 2023 15:07:05 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: Sounds good to me! Thanks! -Julia On Tue, Mar 14, 2023 at 2:54?PM Jay Faulkner wrote: > > That sync is exactly what I hoped > > > > I propose that after the Ironic meeting on March 20, we shortly sync up in > the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg) > to pick topics and assign time. > > ^ that would be. > > Does that sound good to you? If it comes out in there we need more time, > we should also have interested parties around to book it. > > -Jay > > On Tue, Mar 14, 2023 at 2:41?PM Julia Kreger > wrote: > >> I think there is a bit of a challenge to navigate though in that the PTG >> as a sync point is needed especially on items which may take more than just >> one cycle to deliver. A great example is driver composition. The other >> unknown is if people are just not interested in some topics, which can >> result in that topic being very quick. >> >> One thing we also did in the past is guess how much time as a group in >> advance. I know for a few cycles we had a quick 15 minute call to discuss >> sizing. Based upon output from that, I think we could adjust the time >> slots accordingly. Maybe that might make sense to do? >> >> -Julia >> >> On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner wrote: >> >>> I'm not opposed to adding more time necessarily, but I wonder if the >>> solution is to triage what we talk about better. >>> >>> Last PTG, we planned several features which we didn't have enough >>> contribution to complete. IMO, it might be better to limit what we discuss >>> to things that are likely to be accomplished next cycle. If we can't >>> determine that without more discussion, then sure, let's add vPTG sessions.. >>> >>> Is there a suggestion as to specifically what times, and how much to >>> add? >>> >>> Thanks, >>> Jay Faulkner >>> >>>> [trim] >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 15 09:33:03 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 15 Mar 2023 10:33:03 +0100 Subject: [nova][PTG] PTG etherpad, please ideally add your topics before next week Message-ID: As we discussed on the last nova meeting [1], I'll create an agenda for all the topics we will have for the next vPTG by next Tuesday. For this, it would be nice if most of the topics people would like to discuss are already on the PTG etherpad before Tuesday, so please look at this etherpad and add your own topics rather sooner than later ;-) https://etherpad.opendev.org/p/nova-bobcat-ptg I also add a courtesy ping list item for each of the existing topics. Please add your IRC nick if you can't be around for all of the PTG time, so basically at the beginning of every PTG topic, I'd ping all the folks for it. As a reminder, I booked those time slots for the Diablo room : - *Tuesday* *13:00 UTC - 17:00 UTC* - *Wednesday 13:00 UTC - 17:00 UTC* - *Thursday 13:00 UTC - 17:00 UTC* - *Friday 13:00 UTC - 17:00 UTC* Thanks, -S [1] https://meetings.opendev.org/meetings/nova/2023/nova.2023-03-14-16.00.log.html#l-150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Wed Mar 15 11:06:34 2023 From: senrique at redhat.com (Sofia Enriquez) Date: Wed, 15 Mar 2023 11:06:34 +0000 Subject: [cinder] Bug Report | 15-03-2023 Message-ID: Hello Argonauts, Medium - Extending SCSI multipath doesn't work if Nova configuration changed. - *Status*: Unassigned. - An Extended multipathed device should not require a reconfigure. - *Status*: Unassigned. - [rbac] Reader user able to delete a user message. - *Status*: Unassigned. - Cinder Message API creates failure "'NoneType' object is not subscriptable". - *Status*: Marked as duplicate of similar bug . Fix proposed to master . Low - [HPE] Volume name migration fails with keyerror . - *Status*: No fix proposed to master yet. Incomplete - Cannot delete snapshot: invalid backing file. Cinder removed: - [nova] size_iops_sec does behave differently than mentioned in docs. Cheers, -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Wed Mar 15 11:26:37 2023 From: zigo at debian.org (Thomas Goirand) Date: Wed, 15 Mar 2023 12:26:37 +0100 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: References: Message-ID: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > Watcher is not good because It need cpu metric > such as cpu load in Ceilometer?which is removed so we cannot use it. Hi! What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi. Cheers, Thomas Goirand (zigo) From smooney at redhat.com Wed Mar 15 11:57:58 2023 From: smooney at redhat.com (Sean Mooney) Date: Wed, 15 Mar 2023 11:57:58 -0000 Subject: live resize In-Reply-To: References: Message-ID: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com> On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote: > hello > on openstack with ceph backend is it possible to live resize instances ? I > want to change flavor without any down time . no that is not support in nova with any storage backend or hypervior. From nguyenhuukhoinw at gmail.com Wed Mar 15 12:09:07 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 15 Mar 2023 19:09:07 +0700 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> Message-ID: Hello. I cannot use because missing cpu_util metric. I try to match it work but not yet. It need some code to make it work. It seem none care about balance reources on cloud. On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand wrote: > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > > Watcher is not good because It need cpu metric > > such as cpu load in Ceilometer which is removed so we cannot use it. > > Hi! > > What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, > and it works well... If by that, you mean "ceilometer-api" is removed, > then yes, but then you can use gnocchi. > > Cheers, > > Thomas Goirand (zigo) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Wed Mar 15 12:10:37 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 15 Mar 2023 19:10:37 +0700 Subject: live resize In-Reply-To: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com> References: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com> Message-ID: Hello. It looks like no way at this time. On Wed, Mar 15, 2023, 7:05 PM Sean Mooney wrote: > On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote: > > hello > > on openstack with ceph backend is it possible to live resize instances ? > I > > want to change flavor without any down time . > no that is not support in nova with any storage backend or hypervior. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Mar 15 12:25:05 2023 From: smooney at redhat.com (Sean Mooney) Date: Wed, 15 Mar 2023 12:25:05 +0000 Subject: live resize In-Reply-To: References: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com> Message-ID: On Wed, 2023-03-15 at 19:10 +0700, Nguy?n H?u Kh?i wrote: > Hello. > It looks like no way at this time. correct live reisze is not supproted and not planned to be supported in a future release. you can extend the disk live if its a boot form voluem instance but nova resize api is and will continue to be an offline operatoion. live resize wiht qemu/kvm has a lot of edge cases like startign the domain with the maxium ram set to the largeset value you might resize too and setting current to what is in the flavor. the same would be required for cpu. its really not something that is compatible with how we do flavor as there are too many ways that it could fail without signifciatly modifying how flavors work. > > On Wed, Mar 15, 2023, 7:05 PM Sean Mooney wrote: > > > On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote: > > > hello > > > on openstack with ceph backend is it possible to live resize instances ? > > I > > > want to change flavor without any down time . > > no that is not support in nova with any storage backend or hypervior. > > > > > > From michal.arbet at ultimum.io Wed Mar 15 12:28:20 2023 From: michal.arbet at ultimum.io (Michal Arbet) Date: Wed, 15 Mar 2023 13:28:20 +0100 Subject: Magnum in yoga release on Ubuntu 22.04 In-Reply-To: <30941678692580@mail.yandex.ru> References: <30941678692580@mail.yandex.ru> Message-ID: Hi, You can't import qcow and mark it as raw, before upload to glance you have to convert qcow2 to raw. openstack image create Fedora-CoreOS --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 --disk-format=raw --container-format=bare --property os_distro='fedora-coreos' --public to qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 fedora-coreos-35.20220313.3.1-openstack.x86_64.raw openstack image create Fedora-CoreOS --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw --container-format=bare --property os_distro='fedora-coreos' --public Hope it will help you. Regards, Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Po???? 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet at ultimum.io *https://ultimum.io * LinkedIn | Twitter | Facebook po 13. 3. 2023 v 13:25 odes?latel ??????? ???? napsal: > Hello, openstack team! Please help me! I'm trying to use magnum in the > yoga release on ubuntu 22.04 I can't understand why it doesn't work, when > creating a container I get an error > " > 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver > [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack > status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b, > reason:Resource CREATE failed: ResourceInError: > resources.kube_masters.resources[0].resources.kube-master: > Went to status ERROR due to "Message: Build of instance > 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image > 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw > format, Code: 500" > " > I downloaded the container image and tried to create it in raw with the > command and create a new Cluster template > " > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > --disk-format=raw --container-format=bare --property > os_distro='fedora-coreos' --public > " > and create a new Cluster template, when I tried to create a cluster, it > had the status CREATE_IN_PROGRESS for a very long time I got an error > " > 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver > [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack > status: > CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason: > Timed out= > " > why it doesn't work? I saw a massage on your website. > > Does this mean that magnum only works in the Zed release? > -- > With respect, > Makarov Maxim > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: From bence.romsics at gmail.com Wed Mar 15 12:39:22 2023 From: bence.romsics at gmail.com (Bence Romsics) Date: Wed, 15 Mar 2023 13:39:22 +0100 Subject: [nova][cinder] future of rebuild without reimaging Message-ID: Hi All! We have users who use 'rebuild' on volume booted servers before nova microversion 2.93, relying on the behavior that it keeps the volume as is. And they would like to keep doing this even after the openstack distro moves to a(n at least) zed base (sometime in the future). As a naive user, it seems to me both behaviors make sense. I can easily imagine use cases for rebuild with and without reimaging. However since the implementation of https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volume-backed-server-rebuild.html rebuild without reimaging is only possible using an old microversion (<2.93). With that change merged, rebuild without reimaging seems to be a somewhat less than fully supported feature. A few examples of what I mean by that: First, there's this warning: https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1cebad9245c27d58a0dafd7f363ece0/openstackclient/compute/v2/server.py#L3452-L3453 In which it is unclear to me what exactly will become an error in a future release. Rebuild with a different image? Or any rebuild with microversion <2.93? Then old nova microversions may get dropped. Though what I heard from nova folks, this is unlikely to happen. Then there are a few hypothetical situations like: a) Rebuild gets a new api feature (in a new microversion) which can never be combined with the do-not-reimage behavior. b) Rebuild may have a bug, whose fix requires a microversion bump. This again can never be combined with the old behavior. What do you think, are these concerns purely theoretical or real? If we would like to keep having rebuild without reimaging, can we rely on the old microversion indefinitely? Alternatively shall we propose and implement a nova spec to explicitly expose the choice in the rebuild api (just to express the idea: osc server rebuild --reimage|--no-reimage)? If the topic is worth further discussion beyond the ML, I can also bring it to the nova ptg. Thanks in advance, Bence Romsics (rubasov) ps: I'll be afk for a few days, but I'll follow up next Tuesday. From nguyenhuukhoinw at gmail.com Wed Mar 15 13:20:49 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 15 Mar 2023 20:20:49 +0700 Subject: Magnum in yoga release on Ubuntu 22.04 In-Reply-To: References: <30941678692580@mail.yandex.ru> Message-ID: This is image error. Not magnum On Wed, Mar 15, 2023, 7:34 PM Michal Arbet wrote: > Hi, > > You can't import qcow and mark it as raw, before upload to glance you have > to convert qcow2 to raw. > > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > --disk-format=raw --container-format=bare --property > os_distro='fedora-coreos' --public > > to > > qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > fedora-coreos-35.20220313.3.1-openstack.x86_64.raw > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw > --container-format=bare --property os_distro='fedora-coreos' --public > > Hope it will help you. > > Regards, > Michal Arbet > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > *https://ultimum.io * > > LinkedIn | Twitter > | Facebook > > > > po 13. 3. 2023 v 13:25 odes?latel ??????? ???? > napsal: > >> Hello, openstack team! Please help me! I'm trying to use magnum in the >> yoga release on ubuntu 22.04 I can't understand why it doesn't work, when >> creating a container I get an error >> " >> 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver >> [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack >> status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b, >> reason:Resource CREATE failed: ResourceInError: >> resources.kube_masters.resources[0].resources.kube-master: >> Went to status ERROR due to "Message: Build of instance >> 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image >> 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw >> format, Code: 500" >> " >> I downloaded the container image and tried to create it in raw with the >> command and create a new Cluster template >> " >> openstack image create Fedora-CoreOS >> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 >> --disk-format=raw --container-format=bare --property >> os_distro='fedora-coreos' --public >> " >> and create a new Cluster template, when I tried to create a cluster, it >> had the status CREATE_IN_PROGRESS for a very long time I got an error >> " >> 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver >> [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack >> status: >> CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason: >> Timed out= >> " >> why it doesn't work? I saw a massage on your website. >> >> Does this mean that magnum only works in the Zed release? >> -- >> With respect, >> Makarov Maxim >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: From mister.mackarow at yandex.ru Wed Mar 15 13:35:04 2023 From: mister.mackarow at yandex.ru (=?utf-8?B?0JzQkNCa0JDQoNCe0JIg0JzQkNCa0KE=?=) Date: Wed, 15 Mar 2023 16:35:04 +0300 Subject: Magnum in yoga release on Ubuntu 22.04 In-Reply-To: References: <30941678692580@mail.yandex.ru> Message-ID: <401541678887257@mail.yandex.ru> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: From nguyenhuukhoinw at gmail.com Wed Mar 15 13:49:41 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 15 Mar 2023 20:49:41 +0700 Subject: Magnum in yoga release on Ubuntu 22.04 In-Reply-To: <401541678887257@mail.yandex.ru> References: <30941678692580@mail.yandex.ru> <401541678887257@mail.yandex.ru> Message-ID: Just forcus heat log in k8s master instance. It is all you need to specify problems. On Wed, Mar 15, 2023, 8:46 PM ??????? ???? wrote: > Thanks for the answer! I solved this problem by specifying in the settings > nova.conf image-type = qcow2 now I can use qcow2, but unfortunately the > cluster does not create, it crashes with a time_out error, if I run heat > stack-list I see that kube_masters create in progress, now I'm stuck at > this step > > 15.03.2023, 15:28, "Michal Arbet" : > > Hi, > > You can't import qcow and mark it as raw, before upload to glance you have > to convert qcow2 to raw. > > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > --disk-format=raw --container-format=bare --property > os_distro='fedora-coreos' --public > > to > > qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > fedora-coreos-35.20220313.3.1-openstack.x86_64.raw > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw > --container-format=bare --property os_distro='fedora-coreos' --public > > Hope it will help you. > > Regards, > *Michal Arbet* > Openstack Engineer > > Ultimum Technologies a.s. > Na Po???? 1047/26, 11000 Praha 1 > Czech Republic > > +420 604 228 897 > michal.arbet at ultimum.io > *https://ultimum.io * > > LinkedIn | Twitter > | Facebook > > > > > > po 13. 3. 2023 v 13:25 odes?latel ??????? ???? > napsal: > > Hello, openstack team! Please help me! I'm trying to use magnum in the > yoga release on ubuntu 22.04 I can't understand why it doesn't work, when > creating a container I get an error > " > 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver > [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack > status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b, > reason:Resource CREATE failed: ResourceInError: > resources.kube_masters.resources[0].resources.kube-master: > Went to status ERROR due to "Message: Build of instance > 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image > 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw > format, Code: 500" > " > I downloaded the container image and tried to create it in raw with the > command and create a new Cluster template > " > openstack image create Fedora-CoreOS > --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2 > --disk-format=raw --container-format=bare --property > os_distro='fedora-coreos' --public > " > and create a new Cluster template, when I tried to create a cluster, it > had the status CREATE_IN_PROGRESS for a very long time I got an error > " > 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver > [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack > status: > CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason: > Timed out= > " > why it doesn't work? I saw a massage on your website. > > Does this mean that magnum only works in the Zed release? > -- > With respect, > Makarov Maxim > > > > > -- > With respect, > Makarov Maxim > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 4048 bytes Desc: not available URL: From felix.huettner at mail.schwarz Wed Mar 15 16:10:44 2023 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Wed, 15 Mar 2023 16:10:44 +0000 Subject: [neutron] detecting l3-agent readiness In-Reply-To: <2315188.ElGaqSPkdT@p1> References: <2315188.ElGaqSPkdT@p1> Message-ID: Hi, > Subject: Re: [neutron] detecting l3-agent readiness > > Hi, > > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze: > > Hi Mohammed, > > > > > Subject: [neutron] detecting l3-agent readiness > > > > > > Hi folks, > > > > > > I'm working on improving the stability of rollouts when using Kubernetes as a control > plane, specifically around the L3 agent, it seems that I have not found a clear way to > detect in the code path where the L3 agent has finished it's initial sync.. > > > > > > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/- > /blob/devel/files/startup_wait_for_ns.py > > Basically we are checking against the neutron api what routers should be on the node and > then validate that all keepalived processes are up and running. > > That would work only for HA routers. If You would also have routers which aren't "ha" this > method may fail. > Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived). > > > > > Am I missing it somewhere or is the architecture built in a way that doesn't really > answer that question? > > > > > > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts > for l2 and dhcp agents. > > > > > > > Thanks > > > Mohammed > > > > > > > > > -- > > > Mohammed Naser > > > VEXXHOST, Inc. > > > > -- > > Felix Huettner > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. > Hinweise zum Datenschutz finden Sie hier. > > > > > -- > Slawek Kaplonski > Principal Software Engineer > Red Hat -- Felix Huettner Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. From adivya1.singh at gmail.com Wed Mar 15 16:48:22 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Wed, 15 Mar 2023 22:18:22 +0530 Subject: (OpenStack-horizon) unable to open horizon page after installing Open Stack Message-ID: Hi Team, I am unable to open Open OpenStack horizon page, after installation When i am opening the link , it says Haproxy service seems up and running, I have tried to Flush IP tables also, Seeing this might be causing the issue Port 443 is also listening. Any thoughts on this [image: image.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 9294 bytes Desc: not available URL: From rdopiera at redhat.com Wed Mar 15 17:37:14 2023 From: rdopiera at redhat.com (Radomir Dopieralski) Date: Wed, 15 Mar 2023 18:37:14 +0100 Subject: (OpenStack-horizon) unable to open horizon page after installing Open Stack In-Reply-To: References: Message-ID: try /dashboard On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh wrote: > Hi Team, > > I am unable to open Open OpenStack horizon page, after installation > When i am opening the link , it says > > Haproxy service seems up and running, I have tried to Flush IP tables > also, Seeing this might be causing the issue > > Port 443 is also listening. > > Any thoughts on this > > [image: image.png] > -- Radomir Dopieralski -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 9294 bytes Desc: not available URL: From swogatpradhan22 at gmail.com Wed Mar 15 17:41:45 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 15 Mar 2023 23:11:45 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Brendan, Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes. The bonding options is set to mode=802.3ad (lacp=active). I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created. Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state. Here is the nova-compute log: 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command. Command: blkid overlay -s UUID -o value Exit code: 2 Stdout: '' Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html The volume is already created and i do not understand why the instance is stuck in spawning state. With regards, Swogat Pradhan On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard wrote: > Does your environment use different network interfaces for each of the > networks? Or does it have a bond with everything on it? > > One issue I have seen before is that when launching instances, there is a > lot of network traffic between nodes as the hypervisor needs to download > the image from Glance. Along with various other services sending normal > network traffic, it can be enough to cause issues if everything is running > over a single 1Gbe interface. > > I have seen the same situation in fact when using a single active/backup > bond on 1Gbe nics. It?s worth checking the network traffic while you try to > spawn the instance to see if you?re dropping packets. In the situation I > described, there were dropped packets which resulted in a loss of > communication between nova_compute and RMQ, so the node appeared offline. > You should also confirm that nova_compute is being disconnected in the > nova_compute logs if you tail them on the Hypervisor while spawning the > instance. > > In my case, changing from active/backup to LACP helped. So, based on that > experience, from my perspective, is certainly sounds like some kind of > network issue. > > Regards, > > Brendan Shephard > Senior Software Engineer > Red Hat Australia > > > > On 5 Mar 2023, at 6:47 am, Eugen Block wrote: > > Hi, > > I tried to help someone with a similar issue some time ago in this thread: > > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > > But apparently a neutron reinstallation fixed it for that user, not sure > if that could apply here. But is it possible that your nova and neutron > versions are different between central and edge site? Have you restarted > nova and neutron services on the compute nodes after installation? Have you > debug logs of nova-conductor and maybe nova-compute? Maybe they can help > narrow down the issue. > If there isn't any additional information in the debug logs I probably > would start "tearing down" rabbitmq. I didn't have to do that in a > production system yet so be careful. I can think of two routes: > > - Either remove queues, exchanges etc. while rabbit is running, this will > most likely impact client IO depending on your load. Check out the > rabbitmqctl commands. > - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes > and restart rabbitmq so the exchanges, queues etc. rebuild. > > I can imagine that the failed reply "survives" while being replicated > across the rabbit nodes. But I don't really know the rabbit internals too > well, so maybe someone else can chime in here and give a better advice. > > Regards, > Eugen > > Zitat von Swogat Pradhan : > > Hi, > Can someone please help me out on this issue? > > With regards, > Swogat Pradhan > > On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan > wrote: > > Hi > I don't see any major packet loss. > It seems the problem is somewhere in rabbitmq maybe but not due to packet > loss. > > with regards, > Swogat Pradhan > > On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan > wrote: > > Hi, > Yes the MTU is the same as the default '1500'. > Generally I haven't seen any packet loss, but never checked when > launching the instance. > I will check that and come back. > But everytime i launch an instance the instance gets stuck at spawning > state and there the hypervisor becomes down, so not sure if packet loss > causes this. > > With regards, > Swogat pradhan > > On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: > > One more thing coming to mind is MTU size. Are they identical between > central and edge site? Do you see packet loss through the tunnel? > > Zitat von Swogat Pradhan : > > > Hi Eugen, > > Request you to please add my email either on 'to' or 'cc' as i am not > > getting email's from you. > > Coming to the issue: > > > > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p > / > > Listing policies for vhost "/" ... > > vhost name pattern apply-to definition priority > > / ha-all ^(?!amq\.).* queues > > > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > > > > I have the edge site compute nodes up, it only goes down when i am > trying > > to launch an instance and the instance comes to a spawning state and > then > > gets stuck. > > > > I have a tunnel setup between the central and the edge sites. > > > > With regards, > > Swogat Pradhan > > > > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > > wrote: > > > >> Hi Eugen, > >> For some reason i am not getting your email to me directly, i am > checking > >> the email digest and there i am able to find your reply. > >> Here is the log for download: https://we.tl/t-L8FEkGZFSq > >> Yes, these logs are from the time when the issue occurred. > >> > >> *Note: i am able to create vm's and perform other activities in the > >> central site, only facing this issue in the edge site.* > >> > >> With regards, > >> Swogat Pradhan > >> > >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >> wrote: > >> > >>> Hi Eugen, > >>> Thanks for your response. > >>> I have actually a 4 controller setup so here are the details: > >>> > >>> *PCS Status:* > >>> * Container bundle set: rabbitmq-bundle [ > >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-no-ceph-3 > >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-2 > >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-1 > >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): > Started > >>> overcloud-controller-0 > >>> > >>> I have tried restarting the bundle multiple times but the issue is > still > >>> present. > >>> > >>> *Cluster status:* > >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status > >>> Cluster status of node > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... > >>> Basics > >>> > >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>> > >>> Disk Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Running Nodes > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>> > >>> Versions > >>> > >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ > 3.8.3 > >>> on Erlang 22.3.4.1 > >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > RabbitMQ > >>> 3.8.3 on Erlang 22.3.4.1 > >>> > >>> Alarms > >>> > >>> (none) > >>> > >>> Network Partitions > >>> > >>> (none) > >>> > >>> Listeners > >>> > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI > tool > >>> communication > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > interface: > >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: [::], port: 25672, protocol: clustering, purpose: > inter-node and > >>> CLI tool communication > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>> and AMQP 1.0 > >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > , > >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API > >>> > >>> Feature flags > >>> > >>> Flag: drop_unroutable_metric, state: enabled > >>> Flag: empty_basic_get_metric, state: enabled > >>> Flag: implicit_default_bindings, state: enabled > >>> Flag: quorum_queue, state: enabled > >>> Flag: virtual_host_metadata, state: enabled > >>> > >>> *Logs:* > >>> *(Attached)* > >>> > >>> With regards, > >>> Swogat Pradhan > >>> > >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> Please find the nova conductor as well as nova api log. > >>>> > >>>> nova-conuctor: > >>>> > >>>> 2023-02-26 08:45:01.108 31 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 16152921c1eb45c2b1f562087140168b > >>>> 2023-02-26 08:45:02.144 26 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> 83dbe5f567a940b698acfe986f6194fa > >>>> 2023-02-26 08:45:02.314 32 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to > >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply > >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds > due to a > >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.282 35 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.303 33 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > b240e3e89d99489284cd731e75f2a5db > >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > with > >>>> backend dogpile.cache.null. > >>>> 2023-02-26 08:50:01.264 27 WARNING > oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to > >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver > >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds > due to a > >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > Abandoning...: > >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> > >>>> With regards, > >>>> Swogat Pradhan > >>>> > >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> wrote: > >>>> > >>>>> Hi, > >>>>> I currently have 3 compute nodes on edge site1 where i am trying to > >>>>> launch vm's. > >>>>> When the VM is in spawning state the node goes down (openstack > compute > >>>>> service list), the node comes backup when i restart the nova > compute > >>>>> service but then the launch of the vm fails. > >>>>> > >>>>> nova-compute.log > >>>>> > >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running > >>>>> instance usage > >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 > to > >>>>> 2023-02-26 08:00:00. 0 instances. > >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node > >>>>> dcn01-hci-0.bdxworld.com > >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device > name: > >>>>> /dev/vda. Libvirt can't honour user-supplied dev names > >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume > >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > with > >>>>> backend dogpile.cache.null. > >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running > >>>>> privsep helper: > >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > 'privsep-helper', > >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', > >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new > privsep > >>>>> daemon via rootwrap > >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon starting > >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with uid/gid: 0/0 > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> process running with capabilities (eff/prm/inh): > >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep > >>>>> daemon running as pid 2647 > >>>>> 2023-02-26 08:49:55.956 7 WARNING > os_brick.initiator.connectors.nvmeof > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process > >>>>> execution error > >>>>> in _get_host_uuid: Unexpected error while running command. > >>>>> Command: blkid overlay -s UUID -o value > >>>>> Exit code: 2 > >>>>> Stdout: '' > >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > >>>>> Unexpected error while running command. > >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver > >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>>> > >>>>> Is there a way to solve this issue? > >>>>> > >>>>> > >>>>> With regards, > >>>>> > >>>>> Swogat Pradhan > >>>>> > >>>> > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 15 17:43:42 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 15 Mar 2023 23:13:42 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Update: In the hypervisor list the compute node state is showing down. On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan wrote: > Hi Brendan, > Now i have deployed another site where i have used 2 linux bonds network > template for both 3 compute nodes and 3 ceph nodes. > The bonding options is set to mode=802.3ad (lacp=active). > I used a cirros image to launch instance but the instance timed out so i > waited for the volume to be created. > Once the volume was created i tried launching the instance from the volume > and still the instance is stuck in spawning state. > > Here is the nova-compute log: > > 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon > starting > 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep > process running with uid/gid: 0/0 > 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep > process running with capabilities (eff/prm/inh): > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon > running as pid 185437 > 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error > in _get_host_uuid: Unexpected error while running command. > Command: blkid overlay -s UUID -o value > Exit code: 2 > Stdout: '' > Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > Unexpected error while running command. > 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > 450b749c-a10a-4308-80a9-3b8020fee758] Creating image > > It is stuck in creating image, do i need to run the template mentioned > here ?: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > > The volume is already created and i do not understand why the instance is > stuck in spawning state. > > With regards, > Swogat Pradhan > > > On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard > wrote: > >> Does your environment use different network interfaces for each of the >> networks? Or does it have a bond with everything on it? >> >> One issue I have seen before is that when launching instances, there is a >> lot of network traffic between nodes as the hypervisor needs to download >> the image from Glance. Along with various other services sending normal >> network traffic, it can be enough to cause issues if everything is running >> over a single 1Gbe interface. >> >> I have seen the same situation in fact when using a single active/backup >> bond on 1Gbe nics. It?s worth checking the network traffic while you try to >> spawn the instance to see if you?re dropping packets. In the situation I >> described, there were dropped packets which resulted in a loss of >> communication between nova_compute and RMQ, so the node appeared offline. >> You should also confirm that nova_compute is being disconnected in the >> nova_compute logs if you tail them on the Hypervisor while spawning the >> instance. >> >> In my case, changing from active/backup to LACP helped. So, based on that >> experience, from my perspective, is certainly sounds like some kind of >> network issue. >> >> Regards, >> >> Brendan Shephard >> Senior Software Engineer >> Red Hat Australia >> >> >> >> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >> >> Hi, >> >> I tried to help someone with a similar issue some time ago in this thread: >> >> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >> >> But apparently a neutron reinstallation fixed it for that user, not sure >> if that could apply here. But is it possible that your nova and neutron >> versions are different between central and edge site? Have you restarted >> nova and neutron services on the compute nodes after installation? Have you >> debug logs of nova-conductor and maybe nova-compute? Maybe they can help >> narrow down the issue. >> If there isn't any additional information in the debug logs I probably >> would start "tearing down" rabbitmq. I didn't have to do that in a >> production system yet so be careful. I can think of two routes: >> >> - Either remove queues, exchanges etc. while rabbit is running, this will >> most likely impact client IO depending on your load. Check out the >> rabbitmqctl commands. >> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes >> and restart rabbitmq so the exchanges, queues etc. rebuild. >> >> I can imagine that the failed reply "survives" while being replicated >> across the rabbit nodes. But I don't really know the rabbit internals too >> well, so maybe someone else can chime in here and give a better advice. >> >> Regards, >> Eugen >> >> Zitat von Swogat Pradhan : >> >> Hi, >> Can someone please help me out on this issue? >> >> With regards, >> Swogat Pradhan >> >> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >> wrote: >> >> Hi >> I don't see any major packet loss. >> It seems the problem is somewhere in rabbitmq maybe but not due to packet >> loss. >> >> with regards, >> Swogat Pradhan >> >> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >> wrote: >> >> Hi, >> Yes the MTU is the same as the default '1500'. >> Generally I haven't seen any packet loss, but never checked when >> launching the instance. >> I will check that and come back. >> But everytime i launch an instance the instance gets stuck at spawning >> state and there the hypervisor becomes down, so not sure if packet loss >> causes this. >> >> With regards, >> Swogat pradhan >> >> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >> >> One more thing coming to mind is MTU size. Are they identical between >> central and edge site? Do you see packet loss through the tunnel? >> >> Zitat von Swogat Pradhan : >> >> > Hi Eugen, >> > Request you to please add my email either on 'to' or 'cc' as i am not >> > getting email's from you. >> > Coming to the issue: >> > >> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >> / >> > Listing policies for vhost "/" ... >> > vhost name pattern apply-to definition priority >> > / ha-all ^(?!amq\.).* queues >> > >> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >> > >> > I have the edge site compute nodes up, it only goes down when i am >> trying >> > to launch an instance and the instance comes to a spawning state and >> then >> > gets stuck. >> > >> > I have a tunnel setup between the central and the edge sites. >> > >> > With regards, >> > Swogat Pradhan >> > >> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> > wrote: >> > >> >> Hi Eugen, >> >> For some reason i am not getting your email to me directly, i am >> checking >> >> the email digest and there i am able to find your reply. >> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >> >> Yes, these logs are from the time when the issue occurred. >> >> >> >> *Note: i am able to create vm's and perform other activities in the >> >> central site, only facing this issue in the edge site.* >> >> >> >> With regards, >> >> Swogat Pradhan >> >> >> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >> wrote: >> >> >> >>> Hi Eugen, >> >>> Thanks for your response. >> >>> I have actually a 4 controller setup so here are the details: >> >>> >> >>> *PCS Status:* >> >>> * Container bundle set: rabbitmq-bundle [ >> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-no-ceph-3 >> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-2 >> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-1 >> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >> Started >> >>> overcloud-controller-0 >> >>> >> >>> I have tried restarting the bundle multiple times but the issue is >> still >> >>> present. >> >>> >> >>> *Cluster status:* >> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >> >>> Cluster status of node >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> >>> Basics >> >>> >> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >>> >> >>> Disk Nodes >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>> >> >>> Running Nodes >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>> >> >>> Versions >> >>> >> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >> 3.8.3 >> >>> on Erlang 22.3.4.1 >> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >> RabbitMQ >> >>> 3.8.3 on Erlang 22.3.4.1 >> >>> >> >>> Alarms >> >>> >> >>> (none) >> >>> >> >>> Network Partitions >> >>> >> >>> (none) >> >>> >> >>> Listeners >> >>> >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >> tool >> >>> communication >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> interface: >> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> , >> >>> interface: [::], port: 25672, protocol: clustering, purpose: >> inter-node and >> >>> CLI tool communication >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> , >> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >> 0-9-1 >> >>> and AMQP 1.0 >> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> , >> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >> >>> >> >>> Feature flags >> >>> >> >>> Flag: drop_unroutable_metric, state: enabled >> >>> Flag: empty_basic_get_metric, state: enabled >> >>> Flag: implicit_default_bindings, state: enabled >> >>> Flag: quorum_queue, state: enabled >> >>> Flag: virtual_host_metadata, state: enabled >> >>> >> >>> *Logs:* >> >>> *(Attached)* >> >>> >> >>> With regards, >> >>> Swogat Pradhan >> >>> >> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> Please find the nova conductor as well as nova api log. >> >>>> >> >>>> nova-conuctor: >> >>>> >> >>>> 2023-02-26 08:45:01.108 31 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 16152921c1eb45c2b1f562087140168b >> >>>> 2023-02-26 08:45:02.144 26 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >> >>>> 83dbe5f567a940b698acfe986f6194fa >> >>>> 2023-02-26 08:45:02.314 32 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >> >>>> f3bfd7f65bd542b18d84cea3033abb43: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >> due to a >> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:48:01.282 35 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >> due to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:01.303 33 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 897911a234a445d8a0d8af02ece40f6f: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >> due to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> b240e3e89d99489284cd731e75f2a5db >> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >> with >> >>>> backend dogpile.cache.null. >> >>>> 2023-02-26 08:50:01.264 27 WARNING >> oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >> >>>> 8f723ceb10c3472db9a9f324861df2bb: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >> due to a >> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> Abandoning...: >> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >> >>>> With regards, >> >>>> Swogat Pradhan >> >>>> >> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >> >>>> swogatpradhan22 at gmail.com> wrote: >> >>>> >> >>>>> Hi, >> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >> >>>>> launch vm's. >> >>>>> When the VM is in spawning state the node goes down (openstack >> compute >> >>>>> service list), the node comes backup when i restart the nova >> compute >> >>>>> service but then the launch of the vm fails. >> >>>>> >> >>>>> nova-compute.log >> >>>>> >> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >> >>>>> instance usage >> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >> to >> >>>>> 2023-02-26 08:00:00. 0 instances. >> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >> >>>>> dcn01-hci-0.bdxworld.com >> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >> name: >> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >> with >> >>>>> backend dogpile.cache.null. >> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >> >>>>> privsep helper: >> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >> 'privsep-helper', >> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >> privsep >> >>>>> daemon via rootwrap >> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> daemon starting >> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> process running with uid/gid: 0/0 >> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> process running with capabilities (eff/prm/inh): >> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >> >>>>> daemon running as pid 2647 >> >>>>> 2023-02-26 08:49:55.956 7 WARNING >> os_brick.initiator.connectors.nvmeof >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >> >>>>> execution error >> >>>>> in _get_host_uuid: Unexpected error while running command. >> >>>>> Command: blkid overlay -s UUID -o value >> >>>>> Exit code: 2 >> >>>>> Stdout: '' >> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >> >>>>> Unexpected error while running command. >> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >> >>>>> >> >>>>> Is there a way to solve this issue? >> >>>>> >> >>>>> >> >>>>> With regards, >> >>>>> >> >>>>> Swogat Pradhan >> >>>>> >> >>>> >> >> >> >> >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From adivya1.singh at gmail.com Wed Mar 15 17:56:56 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Wed, 15 Mar 2023 23:26:56 +0530 Subject: (OpenStack-horizon) unable to open horizon page after installing Open Stack In-Reply-To: References: Message-ID: Same result On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski wrote: > try /dashboard > > On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh > wrote: > >> Hi Team, >> >> I am unable to open Open OpenStack horizon page, after installation >> When i am opening the link , it says >> >> Haproxy service seems up and running, I have tried to Flush IP tables >> also, Seeing this might be causing the issue >> >> Port 443 is also listening. >> >> Any thoughts on this >> >> [image: image.png] >> > > > -- > Radomir Dopieralski > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 9294 bytes Desc: not available URL: From sbauza at redhat.com Wed Mar 15 18:28:58 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 15 Mar 2023 19:28:58 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: Le mer. 15 mars 2023 ? 13:45, Bence Romsics a ?crit : > Hi All! > > We have users who use 'rebuild' on volume booted servers before nova > microversion 2.93, relying on the behavior that it keeps the volume as > is. And they would like to keep doing this even after the openstack > distro moves to a(n at least) zed base (sometime in the future). > > As a naive user, it seems to me both behaviors make sense. I can > easily imagine use cases for rebuild with and without reimaging. > However since the implementation of > > https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volume-backed-server-rebuild.html > rebuild without reimaging is only possible using an old microversion > (<2.93). With that change merged, rebuild without reimaging seems to > be a somewhat less than fully supported feature. A few examples of > what I mean by that: > > That's not really true : the new microversion just means we change the default behaviour, but you can still opt into the previous behaviour by requesting an older microversion. That being said, I do understand your concerns, further below. > First, there's this warning: > > https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1cebad9245c27d58a0dafd7f363ece0/openstackclient/compute/v2/server.py#L3452-L3453 > > In which it is unclear to me what exactly will become an error in a > future release. Rebuild with a different image? Or any rebuild with > microversion <2.93? > > The latter (in theory) : if you opt into a microversion older or equal than 2.93, you shouldn't expect your volume to *not* be rebuilt. Then old nova microversions may get dropped. Though what I heard from > nova folks, this is unlikely to happen. > > Correct, I never want to say never, but we don't have any plans in any subsequent futures to bump the minimum versions, for many many reasons, not only due to the tech debt but also and mainly because of the interoperatibility we must guarantee. > Then there are a few hypothetical situations like: > a) Rebuild gets a new api feature (in a new microversion) which can > never be combined with the do-not-reimage behavior. > b) Rebuild may have a bug, whose fix requires a microversion bump. > This again can never be combined with the old behavior. > > What do you think, are these concerns purely theoretical or real? > If we would like to keep having rebuild without reimaging, can we rely > on the old microversion indefinitely? > Alternatively shall we propose and implement a nova spec to explicitly > expose the choice in the rebuild api (just to express the idea: osc > server rebuild --reimage|--no-reimage)? > I'm not opposed to challenge the usecases in a spec, for sure. > > If the topic is worth further discussion beyond the ML, I can also > bring it to the nova ptg. > That's already the case. Add yourself to the courtesy ping list of that topic. https://etherpad.opendev.org/p/nova-bobcat-ptg#L152 -Sylvain > > Thanks in advance, > Bence Romsics (rubasov) > > ps: I'll be afk for a few days, but I'll follow up next Tuesday. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Wed Mar 15 18:33:47 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Wed, 15 Mar 2023 12:33:47 -0600 Subject: [manila] create snapshot from share not permitted In-Reply-To: References: Message-ID: So you can test and get fresh log outputs, please do the following manila create manila list manila show Then go to https://pastebin.com/ and paste the CLI output and errors you see on the manila log files Friendly reminder: please click on 'reply all' so this thread can help more people Cheers! On Tue, Mar 14, 2023 at 1:53?AM garcetto wrote: > seems enabled...dont know why is not working... > > $ manila pool-list --detail > > +--------------------------------------+----------------------------------------------+ > | Property | Value > | > > +--------------------------------------+----------------------------------------------+ > | name | ostack-test at generic#GENERIC > | > | share_backend_name | GENERIC > | > | driver_handles_share_servers | True > | > | vendor_name | Open Source > | > | driver_version | 1.0 > | > | storage_protocol | NFS_CIFS > | > | total_capacity_gb | unknown > | > | free_capacity_gb | unknown > | > | reserved_percentage | 0 > | > | reserved_snapshot_percentage | 0 > | > | reserved_share_extend_percentage | 0 > | > | qos | False > | > | pools | None > | > | snapshot_support | True > | > | create_share_from_snapshot_support | True > | > | revert_to_snapshot_support | False > | > | mount_snapshot_support | False > | > | replication_domain | None > | > | filter_function | None > | > | goodness_function | None > | > | security_service_update_support | False > | > | network_allocation_update_support | False > | > | share_server_multiple_subnet_support | False > | > | max_shares_per_share_server | -1 > | > | max_share_server_size | -1 > | > | share_group_stats | {'consistent_snapshot_support': > None} | > | ipv4_support | True > | > | ipv6_support | False > | > | server_pools_mapping | > {'2242964f-be38-4f11-8c90-cdcfcd20c20a': []} | > | timestamp | 2023-03-13T09:13:06.331713 > | > > +--------------------------------------+----------------------------------------------+ > > > On Mon, Mar 13, 2023 at 11:45?PM Alvaro Soto wrote: > >> If you are inside this features support matrix >> >> >> https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping >> >> Examine your configuration as well: >> >> >> - >> >> snapshot_support indicates whether snapshots are supported for shares >> created on the pool/backend. When administrators do not set this capability >> as an extra-spec in a share type, the scheduler can place new shares of >> that type in pools without regard for whether snapshots are supported, and >> those shares will not support snapshots. >> >> >> https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html >> >> Cheers! >> >> On Mon, Mar 13, 2023 at 3:35?AM garcetto wrote: >> >>> good morning, >>> i am using manila and generic driver with dhss true, but cannot create >>> snapshot from shares, any help? where can i look at? >>> (cinder backend is a linux nfs server) >>> >>> thank you >>> >>> $ manila snapshot-create share-01 --name Snapshot1 >>> ERROR: Snapshots cannot be created for share >>> '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that >>> capability. (HTTP 422) (Request-ID: >>> req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba) >>> >>> >> >> -- >> >> Alvaro Soto >> >> *Note: My work hours may not be your work hours. Please do not feel the >> need to respond during a time that is not convenient for you.* >> ---------------------------------------------------------- >> Great people talk about ideas, >> ordinary people talk about things, >> small people talk... about other people. >> > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Wed Mar 15 18:54:32 2023 From: dms at danplanet.com (Dan Smith) Date: Wed, 15 Mar 2023 11:54:32 -0700 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: (Sylvain Bauza's message of "Wed, 15 Mar 2023 19:28:58 +0100") References: Message-ID: > We have users who use 'rebuild' on volume booted servers before nova > microversion 2.93, relying on the behavior that it keeps the volume as > is. And they would like to keep doing this even after the openstack > distro moves to a(n at least) zed base (sometime in the future). Maybe I'm missing something, but what are the reasons you would want to rebuild an instance without ... rebuilding it? I assume it's because you want to redefine the metadata or name or something. There's a reason why those things are not easily mutable today, and why we had a lot of discussion on how to make user metadata mutable on an existing instance in the last cycle. However, I would really suggest that we not override "recreate the thing" to "maybe recreate the thing or just update a few fields". Instead, for things we think really should be mutable on a server at runtime, we should probably just do that. Imagine if the way you changed permissions recursively was to run 'rm -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but that is (IMHO) what "recreate but don't just change $name" means to a user. > As a naive user, it seems to me both behaviors make sense. I can > easily imagine use cases for rebuild with and without reimaging. I think that's because you're already familiar with the difference. For users not already in that mindset, I think it probably seems very weird that rebuild is destructive in one case and not the other. > Then there are a few hypothetical situations like: > a) Rebuild gets a new api feature (in a new microversion) which can > never be combined with the do-not-reimage behavior. > b) Rebuild may have a bug, whose fix requires a microversion bump. > This again can never be combined with the old behavior. > > What do you think, are these concerns purely theoretical or real? > If we would like to keep having rebuild without reimaging, can we rely > on the old microversion indefinitely? > Alternatively shall we propose and implement a nova spec to explicitly > expose the choice in the rebuild api (just to express the idea: osc > server rebuild --reimage|--no-reimage)? > > I'm not opposed to challenge the usecases in a spec, for sure. I really want to know what the use-case is for "rebuild but not really". And also what "rebuild" means to a user if --no-reimage is passed. What's being rebuilt? The docs[0] for the API say very clearly: "This operation recreates the root disk of the server." That was a lie for volume-backed instances for technical reasons. It was a bug, not a feature. I also strongly believe that if we're going to add a "but not really" flag, it needs to apply to volume-backed and regular instances identically. Because that's what the change here was doing - unifying the behavior for a single API operation. Going the other direction does not seem useful to me. --Dan [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action From alsotoes at gmail.com Wed Mar 15 19:17:53 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Wed, 15 Mar 2023 13:17:53 -0600 Subject: (OpenStack-horizon) unable to open horizon page after installing Open Stack In-Reply-To: References: Message-ID: Try to curl from the controller node where the dashboard lives; if it works, the dashboard is up and running, but maybe the access is behind a firewall on an upper level and you will need to tunnel your way. Cheers! On Wed, Mar 15, 2023 at 12:01?PM Adivya Singh wrote: > Same result > > On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski > wrote: > >> try /dashboard >> >> On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh >> wrote: >> >>> Hi Team, >>> >>> I am unable to open Open OpenStack horizon page, after installation >>> When i am opening the link , it says >>> >>> Haproxy service seems up and running, I have tried to Flush IP tables >>> also, Seeing this might be causing the issue >>> >>> Port 443 is also listening. >>> >>> Any thoughts on this >>> >>> [image: image.png] >>> >> >> >> -- >> Radomir Dopieralski >> > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 9294 bytes Desc: not available URL: From gouthampravi at gmail.com Wed Mar 15 20:27:16 2023 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 15 Mar 2023 13:27:16 -0700 Subject: [manila] create snapshot from share not permitted In-Reply-To: References: Message-ID: On Wed, Mar 15, 2023 at 11:34?AM Alvaro Soto wrote: > > So you can test and get fresh log outputs, please do the following > > manila create > manila list > manila show > > Then go to https://pastebin.com/ and paste the CLI output and errors you see on the manila log files > > Friendly reminder: please click on 'reply all' so this thread can help more people ++ thanks Alvaro! I'd like to point back to the doc that Alvaro linked in his original response: https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html With "manila pool-list --detail", you are able to see the backend's capabilities. You would use this information to create share types. The share type you're using needs to have the extra-spec "snapshot_support=True". Without it, shares created of that type will not support snapshots. > > Cheers! > > On Tue, Mar 14, 2023 at 1:53?AM garcetto wrote: >> >> seems enabled...dont know why is not working... >> >> $ manila pool-list --detail >> +--------------------------------------+----------------------------------------------+ >> | Property | Value | >> +--------------------------------------+----------------------------------------------+ >> | name | ostack-test at generic#GENERIC | >> | share_backend_name | GENERIC | >> | driver_handles_share_servers | True | >> | vendor_name | Open Source | >> | driver_version | 1.0 | >> | storage_protocol | NFS_CIFS | >> | total_capacity_gb | unknown | >> | free_capacity_gb | unknown | >> | reserved_percentage | 0 | >> | reserved_snapshot_percentage | 0 | >> | reserved_share_extend_percentage | 0 | >> | qos | False | >> | pools | None | >> | snapshot_support | True | >> | create_share_from_snapshot_support | True | >> | revert_to_snapshot_support | False | >> | mount_snapshot_support | False | >> | replication_domain | None | >> | filter_function | None | >> | goodness_function | None | >> | security_service_update_support | False | >> | network_allocation_update_support | False | >> | share_server_multiple_subnet_support | False | >> | max_shares_per_share_server | -1 | >> | max_share_server_size | -1 | >> | share_group_stats | {'consistent_snapshot_support': None} | >> | ipv4_support | True | >> | ipv6_support | False | >> | server_pools_mapping | {'2242964f-be38-4f11-8c90-cdcfcd20c20a': []} | >> | timestamp | 2023-03-13T09:13:06.331713 | >> +--------------------------------------+----------------------------------------------+ >> >> >> On Mon, Mar 13, 2023 at 11:45?PM Alvaro Soto wrote: >>> >>> If you are inside this features support matrix >>> >>> https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping >>> >>> Examine your configuration as well: >>> >>> snapshot_support indicates whether snapshots are supported for shares created on the pool/backend. When administrators do not set this capability as an extra-spec in a share type, the scheduler can place new shares of that type in pools without regard for whether snapshots are supported, and those shares will not support snapshots. >>> >>> https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html >>> >>> Cheers! >>> >>> On Mon, Mar 13, 2023 at 3:35?AM garcetto wrote: >>>> >>>> good morning, >>>> i am using manila and generic driver with dhss true, but cannot create snapshot from shares, any help? where can i look at? >>>> (cinder backend is a linux nfs server) >>>> >>>> thank you >>>> >>>> $ manila snapshot-create share-01 --name Snapshot1 >>>> ERROR: Snapshots cannot be created for share '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that capability. (HTTP 422) (Request-ID: req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba) >>>> >>> >>> >>> -- >>> >>> Alvaro Soto >>> >>> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. >>> ---------------------------------------------------------- >>> Great people talk about ideas, >>> ordinary people talk about things, >>> small people talk... about other people. > > > > -- > > Alvaro Soto > > Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you. > ---------------------------------------------------------- > Great people talk about ideas, > ordinary people talk about things, > small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Wed Mar 15 23:25:47 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Thu, 16 Mar 2023 04:55:47 +0530 Subject: [cinder] Canceling upstream meeting 22nd March Message-ID: Hello Argonauts, As discussed in this week's meeting[1], we will be canceling the Cinder upstream meeting next week i.e. 22nd March, 2023. Since we have RC2 this week, 2023.1 release next week and PTG after that, we don't expect many topics next week, but if you still have any, please add them to the PTG planning etherpad[2]. See you all at the PTG! [1] https://meetings.opendev.org/irclogs/%23openstack-meeting-alt/%23openstack-meeting-alt.2023-03-15.log.html#t2023-03-15T14:12:41 [2] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdhasman at redhat.com Wed Mar 15 23:28:19 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Thu, 16 Mar 2023 04:58:19 +0530 Subject: [cinder] festival of feature reviews 17th March 2023 Message-ID: Hello Argonauts, We will be having our monthly festival of reviews tomorrow i.e. 17th March (Friday) from 1400-1600 UTC. We are close to the PTG but we still have backlog so good to have a head start in reviews for the next (2023.2) cycle. Following are some additional details: Date: 17th March, 2023 Time: 1400-1600 UTC Meeting link: https://bluejeans.com/556681290 etherpad: https://etherpad.opendev.org/p/cinder-festival-of-reviews See you there! Thanks Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 15 19:20:12 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 16 Mar 2023 00:50:12 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume. Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance. I will try and create a new fresh image and test again then update. With regards, Swogat Pradhan On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan wrote: > Update: > In the hypervisor list the compute node state is showing down. > > > On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan > wrote: > >> Hi Brendan, >> Now i have deployed another site where i have used 2 linux bonds network >> template for both 3 compute nodes and 3 ceph nodes. >> The bonding options is set to mode=802.3ad (lacp=active). >> I used a cirros image to launch instance but the instance timed out so i >> waited for the volume to be created. >> Once the volume was created i tried launching the instance from the >> volume and still the instance is stuck in spawning state. >> >> Here is the nova-compute log: >> >> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep >> daemon starting >> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep >> process running with uid/gid: 0/0 >> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >> process running with capabilities (eff/prm/inh): >> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >> daemon running as pid 185437 >> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >> in _get_host_uuid: Unexpected error while running command. >> Command: blkid overlay -s UUID -o value >> Exit code: 2 >> Stdout: '' >> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >> Unexpected error while running command. >> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >> >> It is stuck in creating image, do i need to run the template mentioned >> here ?: >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >> >> The volume is already created and i do not understand why the instance is >> stuck in spawning state. >> >> With regards, >> Swogat Pradhan >> >> >> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard >> wrote: >> >>> Does your environment use different network interfaces for each of the >>> networks? Or does it have a bond with everything on it? >>> >>> One issue I have seen before is that when launching instances, there is >>> a lot of network traffic between nodes as the hypervisor needs to download >>> the image from Glance. Along with various other services sending normal >>> network traffic, it can be enough to cause issues if everything is running >>> over a single 1Gbe interface. >>> >>> I have seen the same situation in fact when using a single active/backup >>> bond on 1Gbe nics. It?s worth checking the network traffic while you try to >>> spawn the instance to see if you?re dropping packets. In the situation I >>> described, there were dropped packets which resulted in a loss of >>> communication between nova_compute and RMQ, so the node appeared offline. >>> You should also confirm that nova_compute is being disconnected in the >>> nova_compute logs if you tail them on the Hypervisor while spawning the >>> instance. >>> >>> In my case, changing from active/backup to LACP helped. So, based on >>> that experience, from my perspective, is certainly sounds like some kind of >>> network issue. >>> >>> Regards, >>> >>> Brendan Shephard >>> Senior Software Engineer >>> Red Hat Australia >>> >>> >>> >>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>> >>> Hi, >>> >>> I tried to help someone with a similar issue some time ago in this >>> thread: >>> >>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>> >>> But apparently a neutron reinstallation fixed it for that user, not sure >>> if that could apply here. But is it possible that your nova and neutron >>> versions are different between central and edge site? Have you restarted >>> nova and neutron services on the compute nodes after installation? Have you >>> debug logs of nova-conductor and maybe nova-compute? Maybe they can help >>> narrow down the issue. >>> If there isn't any additional information in the debug logs I probably >>> would start "tearing down" rabbitmq. I didn't have to do that in a >>> production system yet so be careful. I can think of two routes: >>> >>> - Either remove queues, exchanges etc. while rabbit is running, this >>> will most likely impact client IO depending on your load. Check out the >>> rabbitmqctl commands. >>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes >>> and restart rabbitmq so the exchanges, queues etc. rebuild. >>> >>> I can imagine that the failed reply "survives" while being replicated >>> across the rabbit nodes. But I don't really know the rabbit internals too >>> well, so maybe someone else can chime in here and give a better advice. >>> >>> Regards, >>> Eugen >>> >>> Zitat von Swogat Pradhan : >>> >>> Hi, >>> Can someone please help me out on this issue? >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >> > >>> wrote: >>> >>> Hi >>> I don't see any major packet loss. >>> It seems the problem is somewhere in rabbitmq maybe but not due to packet >>> loss. >>> >>> with regards, >>> Swogat Pradhan >>> >>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >> > >>> wrote: >>> >>> Hi, >>> Yes the MTU is the same as the default '1500'. >>> Generally I haven't seen any packet loss, but never checked when >>> launching the instance. >>> I will check that and come back. >>> But everytime i launch an instance the instance gets stuck at spawning >>> state and there the hypervisor becomes down, so not sure if packet loss >>> causes this. >>> >>> With regards, >>> Swogat pradhan >>> >>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>> >>> One more thing coming to mind is MTU size. Are they identical between >>> central and edge site? Do you see packet loss through the tunnel? >>> >>> Zitat von Swogat Pradhan : >>> >>> > Hi Eugen, >>> > Request you to please add my email either on 'to' or 'cc' as i am not >>> > getting email's from you. >>> > Coming to the issue: >>> > >>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>> / >>> > Listing policies for vhost "/" ... >>> > vhost name pattern apply-to definition priority >>> > / ha-all ^(?!amq\.).* queues >>> > >>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>> > >>> > I have the edge site compute nodes up, it only goes down when i am >>> trying >>> > to launch an instance and the instance comes to a spawning state and >>> then >>> > gets stuck. >>> > >>> > I have a tunnel setup between the central and the edge sites. >>> > >>> > With regards, >>> > Swogat Pradhan >>> > >>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> > wrote: >>> > >>> >> Hi Eugen, >>> >> For some reason i am not getting your email to me directly, i am >>> checking >>> >> the email digest and there i am able to find your reply. >>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>> >> Yes, these logs are from the time when the issue occurred. >>> >> >>> >> *Note: i am able to create vm's and perform other activities in the >>> >> central site, only facing this issue in the edge site.* >>> >> >>> >> With regards, >>> >> Swogat Pradhan >>> >> >>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >> wrote: >>> >> >>> >>> Hi Eugen, >>> >>> Thanks for your response. >>> >>> I have actually a 4 controller setup so here are the details: >>> >>> >>> >>> *PCS Status:* >>> >>> * Container bundle set: rabbitmq-bundle [ >>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-no-ceph-3 >>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-2 >>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-1 >>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>> Started >>> >>> overcloud-controller-0 >>> >>> >>> >>> I have tried restarting the bundle multiple times but the issue is >>> still >>> >>> present. >>> >>> >>> >>> *Cluster status:* >>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>> >>> Cluster status of node >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>> >>> Basics >>> >>> >>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>> >>> >>> Disk Nodes >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> >>> >>> Running Nodes >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>> >>> >>> Versions >>> >>> >>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>> 3.8.3 >>> >>> on Erlang 22.3.4.1 >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>> RabbitMQ >>> >>> 3.8.3 on Erlang 22.3.4.1 >>> >>> >>> >>> Alarms >>> >>> >>> >>> (none) >>> >>> >>> >>> Network Partitions >>> >>> >>> >>> (none) >>> >>> >>> >>> Listeners >>> >>> >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>> tool >>> >>> communication >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> interface: >>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>> inter-node and >>> >>> CLI tool communication >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>> and AMQP 1.0 >>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> , >>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>> >>> >>> >>> Feature flags >>> >>> >>> >>> Flag: drop_unroutable_metric, state: enabled >>> >>> Flag: empty_basic_get_metric, state: enabled >>> >>> Flag: implicit_default_bindings, state: enabled >>> >>> Flag: quorum_queue, state: enabled >>> >>> Flag: virtual_host_metadata, state: enabled >>> >>> >>> >>> *Logs:* >>> >>> *(Attached)* >>> >>> >>> >>> With regards, >>> >>> Swogat Pradhan >>> >>> >>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> Please find the nova conductor as well as nova api log. >>> >>>> >>> >>>> nova-conuctor: >>> >>>> >>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 16152921c1eb45c2b1f562087140168b >>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> >>>> 83dbe5f567a940b698acfe986f6194fa >>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> b240e3e89d99489284cd731e75f2a5db >>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>> with >>> >>>> backend dogpile.cache.null. >>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>> oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>> due to a >>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> Abandoning...: >>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>> >>>> With regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> >>>> swogatpradhan22 at gmail.com> wrote: >>> >>>> >>> >>>>> Hi, >>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>> >>>>> launch vm's. >>> >>>>> When the VM is in spawning state the node goes down (openstack >>> compute >>> >>>>> service list), the node comes backup when i restart the nova >>> compute >>> >>>>> service but then the launch of the vm fails. >>> >>>>> >>> >>>>> nova-compute.log >>> >>>>> >>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>> >>>>> instance usage >>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>> to >>> >>>>> 2023-02-26 08:00:00. 0 instances. >>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>> >>>>> dcn01-hci-0.bdxworld.com >>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>> name: >>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>> with >>> >>>>> backend dogpile.cache.null. >>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>> >>>>> privsep helper: >>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>> 'privsep-helper', >>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>> privsep >>> >>>>> daemon via rootwrap >>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> daemon starting >>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> process running with uid/gid: 0/0 >>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> process running with capabilities (eff/prm/inh): >>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>> >>>>> daemon running as pid 2647 >>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>> os_brick.initiator.connectors.nvmeof >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>> >>>>> execution error >>> >>>>> in _get_host_uuid: Unexpected error while running command. >>> >>>>> Command: blkid overlay -s UUID -o value >>> >>>>> Exit code: 2 >>> >>>>> Stdout: '' >>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>> >>>>> Unexpected error while running command. >>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>> >>>>> >>> >>>>> Is there a way to solve this issue? >>> >>>>> >>> >>>>> >>> >>>>> With regards, >>> >>>>> >>> >>>>> Swogat Pradhan >>> >>>>> >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 16 01:03:25 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 16 Mar 2023 02:03:25 +0100 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> Message-ID: Eventually I don't fully understand reasons behind need of such service. As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying that it brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance. If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively. If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads. Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i : > Hello. > I cannot use because missing cpu_util metric. I try to match it work but > not yet. It need some code to make it work. It seem none care about balance > reources on cloud. > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand wrote: > >> On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: >> > Watcher is not good because It need cpu metric >> > such as cpu load in Ceilometer which is removed so we cannot use it. >> >> Hi! >> >> What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, >> and it works well... If by that, you mean "ceilometer-api" is removed, >> then yes, but then you can use gnocchi. >> >> Cheers, >> >> Thomas Goirand (zigo) >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 16 08:46:51 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 16 Mar 2023 08:46:51 +0000 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> Message-ID: On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote: > Eventually I don't fully understand reasons behind need of such service. > > As fighting with high load by migrating instances between computes is > fighting with consequences rather then with root cause, not saying that it > brings more negative effects then positive for experience of the end-users, > as you're just moving problem to another place affecting more workloads > with degraded performance. > > If you struggling from high load on a daily basis - then you have too high > cpu_allocation_ratio set for computes. As high load issues always come from > attempts to oversell too agressively. > > If you have workloads in the cloud that always utilize all CPUs available - > then you should consider having flavors and aggregates with cpu-pinning, > meaning providing physical CPUs for such workloads. > > Also don't forget, that it's worth setting more realistic numbers for > reserved resources on computes, because default 2gb of RAM is usually too > small. i tend to agree although there are some thing you can do in the nova schduler ot help e.g. prefering spreading over packing. for cpu load in particalar you can also enable the metric weigher i have not read this thread in detail altough skiming i see refrences to ceilometer. nova's metrics weigher has no depency on it. the metrics weigher https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py is configured by adding weight_setting in the schduler config https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting [metrics] weight_setting = name1=1.0, name2=-1.0 and enabeling the monitors in the nova-comptue config https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors [DEFAULT] compute_monitors = cpu.virt_driver ^ that is the only one we support the datafiles we report are set here https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101 the more intersting values are "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent" we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus and this help blance systme load. [metrics] weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0 you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation. and you woudl want to prefer idle host if your intent is to blance load. iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so [metrics] weight_setting = cpu.percent=-1.0 would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for each compute. if you have a lot of compute that might be problematic. its one of the reasons we decided not to add more metrics like this. > > > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i : > > > Hello. > > I cannot use because missing cpu_util metric. I try to match it work but > > not yet. It need some code to make it work. It seem none care about balance > > reources on cloud. > > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand wrote: > > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > > > > Watcher is not good because It need cpu metric > > > > such as cpu load in Ceilometer which is removed so we cannot use it. > > > > > > Hi! > > > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, > > > and it works well... If by that, you mean "ceilometer-api" is removed, > > > then yes, but then you can use gnocchi. > > > > > > Cheers, > > > > > > Thomas Goirand (zigo) > > > > > > From noonedeadpunk at gmail.com Thu Mar 16 09:35:52 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 16 Mar 2023 10:35:52 +0100 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> Message-ID: Oh, thanks for that detailed explanation! I was looking at metrics weighter for years and looked through code couple of times but never got it properly configured. That is very helpful, thanks a lot! ??, 16 ???. 2023 ?., 09:46 Sean Mooney : > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote: > > Eventually I don't fully understand reasons behind need of such service. > > > > As fighting with high load by migrating instances between computes is > > fighting with consequences rather then with root cause, not saying that > it > > brings more negative effects then positive for experience of the > end-users, > > as you're just moving problem to another place affecting more workloads > > with degraded performance. > > > > If you struggling from high load on a daily basis - then you have too > high > > cpu_allocation_ratio set for computes. As high load issues always come > from > > attempts to oversell too agressively. > > > > If you have workloads in the cloud that always utilize all CPUs > available - > > then you should consider having flavors and aggregates with cpu-pinning, > > meaning providing physical CPUs for such workloads. > > > > Also don't forget, that it's worth setting more realistic numbers for > > reserved resources on computes, because default 2gb of RAM is usually too > > small. > i tend to agree although there are some thing you can do in the nova > schduler ot help > e.g. prefering spreading over packing. > > for cpu load in particalar you can also enable the metric weigher > > i have not read this thread in detail altough skiming i see refrences to > ceilometer. > nova's metrics weigher has no depency on it. > the metrics weigher > > https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py > is configured by adding weight_setting in the schduler config > > https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting > > [metrics] > weight_setting = name1=1.0, name2=-1.0 > and enabeling the monitors in the nova-comptue config > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors > [DEFAULT] > compute_monitors = cpu.virt_driver > > ^ that is the only one we support > > the datafiles we report are set here > > https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101 > > the more intersting values are > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent" > > we have a fairly large internal cloud that is used for dev and ci and as > of about 12 to 18 months ago they > have been using this to help balance the schduling fo instance as we have > a mix of hyperviros skus > and this help blance systme load. > > [metrics] > weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, > cpu.idle.percent=1.0 > > you want iowait and cpu.percent to be negitive since you want to avoid > host with high iowait or high cpu utilsation. > and you woudl want to prefer idle host if your intent is to blance load. > > iowait is actully included in cpu.percent and infact cpu.percent is > basicaly cpu load - idel so > [metrics] > weight_setting = cpu.percent=-1.0 > would have a simialreffect but you might want the extra granularity to > weight iowait vs idle differntly > > so if you find the normal cpu/ram/disk weigher are not sufficent to blance > based onload check out the > metrics weigher and see it that helps. just be awere that collecting the > cpu metrics and providing them > to the schduelr will increase rabbitmq load a little since we perodicly > have ot update those values for > each compute. if you have a lot of compute that might be problematic. its > one of the reasons we > decided not to add more metrics like this. > > > > > > > > > > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i : > > > > > Hello. > > > I cannot use because missing cpu_util metric. I try to match it work > but > > > not yet. It need some code to make it work. It seem none care about > balance > > > reources on cloud. > > > > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand wrote: > > > > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > > > > > Watcher is not good because It need cpu metric > > > > > such as cpu load in Ceilometer which is removed so we cannot use > it. > > > > > > > > Hi! > > > > > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't > dead, > > > > and it works well... If by that, you mean "ceilometer-api" is > removed, > > > > then yes, but then you can use gnocchi. > > > > > > > > Cheers, > > > > > > > > Thomas Goirand (zigo) > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 16 09:46:20 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 16 Mar 2023 09:46:20 +0000 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> Message-ID: <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com> On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote: > Oh, thanks for that detailed explanation! > I was looking at metrics weighter for years and looked through code couple > of times but never got it properly configured. That is very helpful, thanks > a lot! that tells me i sure porbaly update the docs... > > ??, 16 ???. 2023 ?., 09:46 Sean Mooney : > > > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote: > > > Eventually I don't fully understand reasons behind need of such service. > > > > > > As fighting with high load by migrating instances between computes is > > > fighting with consequences rather then with root cause, not saying that > > it > > > brings more negative effects then positive for experience of the > > end-users, > > > as you're just moving problem to another place affecting more workloads > > > with degraded performance. > > > > > > If you struggling from high load on a daily basis - then you have too > > high > > > cpu_allocation_ratio set for computes. As high load issues always come > > from > > > attempts to oversell too agressively. > > > > > > If you have workloads in the cloud that always utilize all CPUs > > available - > > > then you should consider having flavors and aggregates with cpu-pinning, > > > meaning providing physical CPUs for such workloads. > > > > > > Also don't forget, that it's worth setting more realistic numbers for > > > reserved resources on computes, because default 2gb of RAM is usually too > > > small. > > i tend to agree although there are some thing you can do in the nova > > schduler ot help > > e.g. prefering spreading over packing. > > > > for cpu load in particalar you can also enable the metric weigher > > > > i have not read this thread in detail altough skiming i see refrences to > > ceilometer. > > nova's metrics weigher has no depency on it. > > the metrics weigher > > > > https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py > > is configured by adding weight_setting in the schduler config > > > > https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting > > > > [metrics] > > weight_setting = name1=1.0, name2=-1.0 > > and enabeling the monitors in the nova-comptue config > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors > > [DEFAULT] > > compute_monitors = cpu.virt_driver > > > > ^ that is the only one we support > > > > the datafiles we report are set here > > > > https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101 > > > > the more intersting values are > > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent" > > > > we have a fairly large internal cloud that is used for dev and ci and as > > of about 12 to 18 months ago they > > have been using this to help balance the schduling fo instance as we have > > a mix of hyperviros skus > > and this help blance systme load. > > > > [metrics] > > weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, > > cpu.idle.percent=1.0 > > > > you want iowait and cpu.percent to be negitive since you want to avoid > > host with high iowait or high cpu utilsation. > > and you woudl want to prefer idle host if your intent is to blance load. > > > > iowait is actully included in cpu.percent and infact cpu.percent is > > basicaly cpu load - idel so > > [metrics] > > weight_setting = cpu.percent=-1.0 > > would have a simialreffect but you might want the extra granularity to > > weight iowait vs idle differntly > > > > so if you find the normal cpu/ram/disk weigher are not sufficent to blance > > based onload check out the > > metrics weigher and see it that helps. just be awere that collecting the > > cpu metrics and providing them > > to the schduelr will increase rabbitmq load a little since we perodicly > > have ot update those values for > > each compute. if you have a lot of compute that might be problematic. its > > one of the reasons we > > decided not to add more metrics like this. > > > > > > > > > > > > > > > > > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i : > > > > > > > Hello. > > > > I cannot use because missing cpu_util metric. I try to match it work > > but > > > > not yet. It need some code to make it work. It seem none care about > > balance > > > > reources on cloud. > > > > > > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand wrote: > > > > > > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > > > > > > Watcher is not good because It need cpu metric > > > > > > such as cpu load in Ceilometer which is removed so we cannot use > > it. > > > > > > > > > > Hi! > > > > > > > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't > > dead, > > > > > and it works well... If by that, you mean "ceilometer-api" is > > removed, > > > > > then yes, but then you can use gnocchi. > > > > > > > > > > Cheers, > > > > > > > > > > Thomas Goirand (zigo) > > > > > > > > > > > > > > From christian.rohmann at inovex.de Thu Mar 16 11:09:35 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Thu, 16 Mar 2023 12:09:35 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com> References: <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com> Message-ID: <0f117b65-7006-d6b3-7b96-ba5e01bbf09e@inovex.de> On 13/03/2023 19:46, Tobias Urdin wrote: > Interesting thread! +1 Most installations run into this issue of wondering when a network node is really ready / fully synced. While the tooling that Mohammed or Felix does work in "observing" or "determining" the sync state independently, I strongly believe a network agent should report it's sync state back to the control plane. Orchestration of e.g. rolling upgrades of agents should be possible with state information provided by neutron itself and not require external tooling. By implementing the state data structure and then having the drivers (OVN, OVS, linuxbridge) report this back, this is independent from the particular implementation details (network NS, certain processes running, ...). Looking at this problem the taints and tolerations model use for node "readiness" from Kubernetes come to mind (https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable). Regards Christian From nguyenhuukhoinw at gmail.com Thu Mar 16 11:34:33 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 16 Mar 2023 18:34:33 +0700 Subject: [Openstack] Lack of Balance solution such as Watcher. In-Reply-To: <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com> References: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org> <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com> Message-ID: Thank you very much for sharing! I will dig dive with it. Nguyen Huu Khoi On Thu, Mar 16, 2023 at 4:54?PM Sean Mooney wrote: > On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote: > > Oh, thanks for that detailed explanation! > > I was looking at metrics weighter for years and looked through code > couple > > of times but never got it properly configured. That is very helpful, > thanks > > a lot! > > that tells me i sure porbaly update the docs... > > > > ??, 16 ???. 2023 ?., 09:46 Sean Mooney : > > > > > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote: > > > > Eventually I don't fully understand reasons behind need of such > service. > > > > > > > > As fighting with high load by migrating instances between computes is > > > > fighting with consequences rather then with root cause, not saying > that > > > it > > > > brings more negative effects then positive for experience of the > > > end-users, > > > > as you're just moving problem to another place affecting more > workloads > > > > with degraded performance. > > > > > > > > If you struggling from high load on a daily basis - then you have too > > > high > > > > cpu_allocation_ratio set for computes. As high load issues always > come > > > from > > > > attempts to oversell too agressively. > > > > > > > > If you have workloads in the cloud that always utilize all CPUs > > > available - > > > > then you should consider having flavors and aggregates with > cpu-pinning, > > > > meaning providing physical CPUs for such workloads. > > > > > > > > Also don't forget, that it's worth setting more realistic numbers for > > > > reserved resources on computes, because default 2gb of RAM is > usually too > > > > small. > > > i tend to agree although there are some thing you can do in the nova > > > schduler ot help > > > e.g. prefering spreading over packing. > > > > > > for cpu load in particalar you can also enable the metric weigher > > > > > > i have not read this thread in detail altough skiming i see refrences > to > > > ceilometer. > > > nova's metrics weigher has no depency on it. > > > the metrics weigher > > > > > > > https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py > > > is configured by adding weight_setting in the schduler config > > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting > > > > > > [metrics] > > > weight_setting = name1=1.0, name2=-1.0 > > > and enabeling the monitors in the nova-comptue config > > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors > > > [DEFAULT] > > > compute_monitors = cpu.virt_driver > > > > > > ^ that is the only one we support > > > > > > the datafiles we report are set here > > > > > > > https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101 > > > > > > the more intersting values are > > > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent" > > > > > > we have a fairly large internal cloud that is used for dev and ci and > as > > > of about 12 to 18 months ago they > > > have been using this to help balance the schduling fo instance as we > have > > > a mix of hyperviros skus > > > and this help blance systme load. > > > > > > [metrics] > > > weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, > > > cpu.idle.percent=1.0 > > > > > > you want iowait and cpu.percent to be negitive since you want to avoid > > > host with high iowait or high cpu utilsation. > > > and you woudl want to prefer idle host if your intent is to blance > load. > > > > > > iowait is actully included in cpu.percent and infact cpu.percent is > > > basicaly cpu load - idel so > > > [metrics] > > > weight_setting = cpu.percent=-1.0 > > > would have a simialreffect but you might want the extra granularity to > > > weight iowait vs idle differntly > > > > > > so if you find the normal cpu/ram/disk weigher are not sufficent to > blance > > > based onload check out the > > > metrics weigher and see it that helps. just be awere that collecting > the > > > cpu metrics and providing them > > > to the schduelr will increase rabbitmq load a little since we perodicly > > > have ot update those values for > > > each compute. if you have a lot of compute that might be problematic. > its > > > one of the reasons we > > > decided not to add more metrics like this. > > > > > > > > > > > > > > > > > > > > > > > > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i < > nguyenhuukhoinw at gmail.com>: > > > > > > > > > Hello. > > > > > I cannot use because missing cpu_util metric. I try to match it > work > > > but > > > > > not yet. It need some code to make it work. It seem none care about > > > balance > > > > > reources on cloud. > > > > > > > > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand > wrote: > > > > > > > > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote: > > > > > > > Watcher is not good because It need cpu metric > > > > > > > such as cpu load in Ceilometer which is removed so we cannot > use > > > it. > > > > > > > > > > > > Hi! > > > > > > > > > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't > > > dead, > > > > > > and it works well... If by that, you mean "ceilometer-api" is > > > removed, > > > > > > then yes, but then you can use gnocchi. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Thomas Goirand (zigo) > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Thu Mar 16 11:54:36 2023 From: johfulto at redhat.com (John Fulton) Date: Thu, 16 Mar 2023 07:54:36 -0400 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan wrote: > > Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume. > > Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance. Try following this document and making the same observations in your environment for AZs and their local ceph cluster. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites On a DCN site if you run a command like this: $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring /etc/ceph/dcn0.client.admin.keyring $ rbd --cluster dcn0 -p volumes ls -l NAME SIZE PARENT FMT PROT LOCK volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl $ Then, you should see the parent of the volume is the image which is on the same local ceph cluster. I wonder if something is misconfigured and thus you're encountering the streaming behavior described here: Ideally all images should reside in the central Glance and be copied to DCN sites before instances of those images are booted on DCN sites. If an image is not copied to a DCN site before it is booted, then the image will be streamed to the DCN site and then the image will boot as an instance. This happens because Glance at the DCN site has access to the images store at the Central ceph cluster. Though the booting of the image will take time because it has not been copied in advance, this is still preferable to failing to boot the image. You can also exec into the cinder container at the DCN site and confirm it's using it's local ceph cluster. John > > I will try and create a new fresh image and test again then update. > > With regards, > Swogat Pradhan > > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan wrote: >> >> Update: >> In the hypervisor list the compute node state is showing down. >> >> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan wrote: >>> >>> Hi Brendan, >>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>> The bonding options is set to mode=802.3ad (lacp=active). >>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created. >>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state. >>> >>> Here is the nova-compute log: >>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command. >>> Command: blkid overlay -s UUID -o value >>> Exit code: 2 >>> Stdout: '' >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>> >>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> >>> The volume is already created and i do not understand why the instance is stuck in spawning state. >>> >>> With regards, >>> Swogat Pradhan >>> >>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard wrote: >>>> >>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it? >>>> >>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface. >>>> >>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance. >>>> >>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue. >>>> >>>> Regards, >>>> >>>> Brendan Shephard >>>> Senior Software Engineer >>>> Red Hat Australia >>>> >>>> >>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>>> >>>> Hi, >>>> >>>> I tried to help someone with a similar issue some time ago in this thread: >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>> >>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue. >>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes: >>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands. >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>> >>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice. >>>> >>>> Regards, >>>> Eugen >>>> >>>> Zitat von Swogat Pradhan : >>>> >>>> Hi, >>>> Can someone please help me out on this issue? >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >>>> wrote: >>>> >>>> Hi >>>> I don't see any major packet loss. >>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet >>>> loss. >>>> >>>> with regards, >>>> Swogat Pradhan >>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >>>> wrote: >>>> >>>> Hi, >>>> Yes the MTU is the same as the default '1500'. >>>> Generally I haven't seen any packet loss, but never checked when >>>> launching the instance. >>>> I will check that and come back. >>>> But everytime i launch an instance the instance gets stuck at spawning >>>> state and there the hypervisor becomes down, so not sure if packet loss >>>> causes this. >>>> >>>> With regards, >>>> Swogat pradhan >>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>>> >>>> One more thing coming to mind is MTU size. Are they identical between >>>> central and edge site? Do you see packet loss through the tunnel? >>>> >>>> Zitat von Swogat Pradhan : >>>> >>>> > Hi Eugen, >>>> > Request you to please add my email either on 'to' or 'cc' as i am not >>>> > getting email's from you. >>>> > Coming to the issue: >>>> > >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>>> / >>>> > Listing policies for vhost "/" ... >>>> > vhost name pattern apply-to definition priority >>>> > / ha-all ^(?!amq\.).* queues >>>> > >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>> > >>>> > I have the edge site compute nodes up, it only goes down when i am >>>> trying >>>> > to launch an instance and the instance comes to a spawning state and >>>> then >>>> > gets stuck. >>>> > >>>> > I have a tunnel setup between the central and the edge sites. >>>> > >>>> > With regards, >>>> > Swogat Pradhan >>>> > >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> > wrote: >>>> > >>>> >> Hi Eugen, >>>> >> For some reason i am not getting your email to me directly, i am >>>> checking >>>> >> the email digest and there i am able to find your reply. >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>> >> Yes, these logs are from the time when the issue occurred. >>>> >> >>>> >> *Note: i am able to create vm's and perform other activities in the >>>> >> central site, only facing this issue in the edge site.* >>>> >> >>>> >> With regards, >>>> >> Swogat Pradhan >>>> >> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >> wrote: >>>> >> >>>> >>> Hi Eugen, >>>> >>> Thanks for your response. >>>> >>> I have actually a 4 controller setup so here are the details: >>>> >>> >>>> >>> *PCS Status:* >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-no-ceph-3 >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-2 >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-1 >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>> Started >>>> >>> overcloud-controller-0 >>>> >>> >>>> >>> I have tried restarting the bundle multiple times but the issue is >>>> still >>>> >>> present. >>>> >>> >>>> >>> *Cluster status:* >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>> >>> Cluster status of node >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>> >>> Basics >>>> >>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>> >>> >>>> >>> Disk Nodes >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>> >>>> >>> Running Nodes >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>> >>>> >>> Versions >>>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>> 3.8.3 >>>> >>> on Erlang 22.3.4.1 >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>> >>> >>>> >>> Alarms >>>> >>> >>>> >>> (none) >>>> >>> >>>> >>> Network Partitions >>>> >>> >>>> >>> (none) >>>> >>> >>>> >>> Listeners >>>> >>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> tool >>>> >>> communication >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> interface: >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>> inter-node and >>>> >>> CLI tool communication >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>>> 0-9-1 >>>> >>> and AMQP 1.0 >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> , >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>> >>>> >>> Feature flags >>>> >>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>> >>> Flag: implicit_default_bindings, state: enabled >>>> >>> Flag: quorum_queue, state: enabled >>>> >>> Flag: virtual_host_metadata, state: enabled >>>> >>> >>>> >>> *Logs:* >>>> >>> *(Attached)* >>>> >>> >>>> >>> With regards, >>>> >>> Swogat Pradhan >>>> >>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>> wrote: >>>> >>> >>>> >>>> Hi, >>>> >>>> Please find the nova conductor as well as nova api log. >>>> >>>> >>>> >>>> nova-conuctor: >>>> >>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> with >>>> >>>> backend dogpile.cache.null. >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>>> due to a >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> Abandoning...: >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>> >>>> >>>>> Hi, >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>> >>>>> launch vm's. >>>> >>>>> When the VM is in spawning state the node goes down (openstack >>>> compute >>>> >>>>> service list), the node comes backup when i restart the nova >>>> compute >>>> >>>>> service but then the launch of the vm fails. >>>> >>>>> >>>> >>>>> nova-compute.log >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>> >>>>> instance usage >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>>> to >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>> >>>>> dcn01-hci-0.bdxworld.com >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>>> name: >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> with >>>> >>>>> backend dogpile.cache.null. >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>> >>>>> privsep helper: >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>> 'privsep-helper', >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>>> privsep >>>> >>>>> daemon via rootwrap >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> daemon starting >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> process running with uid/gid: 0/0 >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> process running with capabilities (eff/prm/inh): >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> daemon running as pid 2647 >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>> os_brick.initiator.connectors.nvmeof >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>> >>>>> execution error >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>> >>>>> Command: blkid overlay -s UUID -o value >>>> >>>>> Exit code: 2 >>>> >>>>> Stdout: '' >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> >>>>> Unexpected error while running command. >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>> >>>>> >>>> >>>>> >>>> >>>>> With regards, >>>> >>>>> >>>> >>>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From geguileo at redhat.com Thu Mar 16 12:10:23 2023 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 16 Mar 2023 13:10:23 +0100 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: References: <20230306113543.a57aywefbn4cgsu3@localhost> <20230309095514.l3i67tys2ujaq6dp@localhost> <20230313163251.xpnzyvzb65b6zaal@localhost> <20230314084601.t2ez24gcljnu5plq@localhost> Message-ID: <20230316121023.tdzgu6zinm7spvjp@localhost> On 16/03, Rishat Azizov wrote: > Hi Gorka, > > Thanks! > I fixed issue by adding to multipathd config uxsock_timeout directive: > uxsock_timeout 10000 > > Because in multipathd logs I saw this error: > 3624a93705842cfae35d7483200015fd8: map flushed > cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after > 4.858561 secs > > Now large disk backups work fine. > > 2. This happens because despite the timeout of the first attempt and exit > code 1, the multipath device was disconnected, so the next attempts ended > with an error "is not a multipath device", since the multipath device had > already disconnected. > Hi, That's a nice workaround until we fix it upstream!! Thanks for confirming my suspicions were right. This is the 3rd thing I mentioned was happening, flush call failed but it actually removed the device. We'll proceed to fix the flushing code in master. Cheers, Gorka. > > ??, 14 ???. 2023??. ? 14:46, Gorka Eguileor : > > > [Sending the email again as it seems it didn't reach the ML] > > > > > > On 13/03, Gorka Eguileor wrote: > > > On 11/03, Rishat Azizov wrote: > > > > Hi, Gorka, > > > > > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments. > > > > > > > > > > > > Hi, > > > > There are multiple things going on here: > > > > 1. There is a bug in os-brick, because the disconnect_volume should not > > fail, since it is being called with force=True and > > ignore_errors=True. > > > > The issues is that this call [1] is not wrapped in the > > ExceptionChainer context manager, and it should not even be a flush > > call, it should be a call to "multipathd remove map $map" instead. > > > > 2. The way multipath code is written [2][3], the error we see about > > "3624a93705842cfae35d7483200015fce is not a multipath device" means 2 > > different things: it is not a multipath or an error happened. > > > > So we don't really know what happened without enabling more verbose > > multipathd log levels. > > > > 3. The "multipath -f" call should not be failing in the first place, > > because the failure is happening on disconnecting the source volume, > > which has no data buffered to be written and therefore no reason to > > fail the flush (unless it's using a friendly name). > > > > I don't know if it could be happening that the first flush fails with > > a timeout (maybe because there is an extend operation happening), but > > multipathd keeps trying to flush it in the background and when it > > succeeds it removes the multipath device, which makes following calls > > fail. > > > > If that's the case we would need to change the retry from automatic > > [4] to manual and check in-between to see if the device has been > > removed in-between calls. > > > > The first issue is definitely a bug, the 2nd one is something that could > > be changed in the deployment to try to get additional information on the > > failure, and the 3rd one could be a bug. > > > > I'll see if I can find someone who wants to work on the 1st and 3rd > > points. > > > > Cheers, > > Gorka. > > > > [1]: > > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952 > > [2]: > > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064 > > [3]: > > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872 > > [4]: > > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384 > > > > > > > > > > > > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor : > > > > > > > > > On 06/03, Rishat Azizov wrote: > > > > > > Hi, > > > > > > > > > > > > It works with smaller volumes. > > > > > > > > > > > > multipath.conf attached to thist email. > > > > > > > > > > > > Cinder version - 18.2.0 Wallaby > > > > > > > > > From dtantsur at protonmail.com Thu Mar 16 12:21:54 2023 From: dtantsur at protonmail.com (Dmitry Tantsur) Date: Thu, 16 Mar 2023 12:21:54 +0000 Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o Message-ID: Hi all, I would like to do a clean-up of old or wrongly created IPA images on the tarballs site. Before I do that, could you please check the proposed list to make sure we don't remove something we expect to be used? The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/ Dmitry From noonedeadpunk at gmail.com Thu Mar 16 12:35:10 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 16 Mar 2023 13:35:10 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: > Maybe I'm missing something, but what are the reasons you would want to > rebuild an instance without ... rebuilding it? I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own. ??, 15 ???. 2023??. ? 19:57, Dan Smith : > > > We have users who use 'rebuild' on volume booted servers before nova > > microversion 2.93, relying on the behavior that it keeps the volume as > > is. And they would like to keep doing this even after the openstack > > distro moves to a(n at least) zed base (sometime in the future). > > Maybe I'm missing something, but what are the reasons you would want to > rebuild an instance without ... rebuilding it? > > I assume it's because you want to redefine the metadata or name or > something. There's a reason why those things are not easily mutable > today, and why we had a lot of discussion on how to make user metadata > mutable on an existing instance in the last cycle. However, I would > really suggest that we not override "recreate the thing" to "maybe > recreate the thing or just update a few fields". Instead, for things we > think really should be mutable on a server at runtime, we should > probably just do that. > > Imagine if the way you changed permissions recursively was to run 'rm > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > that is (IMHO) what "recreate but don't just change $name" means to a > user. > > > As a naive user, it seems to me both behaviors make sense. I can > > easily imagine use cases for rebuild with and without reimaging. > > I think that's because you're already familiar with the difference. For > users not already in that mindset, I think it probably seems very weird > that rebuild is destructive in one case and not the other. > > > Then there are a few hypothetical situations like: > > a) Rebuild gets a new api feature (in a new microversion) which can > > never be combined with the do-not-reimage behavior. > > b) Rebuild may have a bug, whose fix requires a microversion bump. > > This again can never be combined with the old behavior. > > > > What do you think, are these concerns purely theoretical or real? > > If we would like to keep having rebuild without reimaging, can we rely > > on the old microversion indefinitely? > > Alternatively shall we propose and implement a nova spec to explicitly > > expose the choice in the rebuild api (just to express the idea: osc > > server rebuild --reimage|--no-reimage)? > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > I really want to know what the use-case is for "rebuild but not > really". And also what "rebuild" means to a user if --no-reimage is > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > "This operation recreates the root disk of the server." > > That was a lie for volume-backed instances for technical reasons. It was > a bug, not a feature. > > I also strongly believe that if we're going to add a "but not > really" flag, it needs to apply to volume-backed and regular instances > identically. Because that's what the change here was doing - unifying > the behavior for a single API operation. Going the other direction does > not seem useful to me. > > --Dan > > [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > From rishat.azizov at gmail.com Thu Mar 16 11:02:07 2023 From: rishat.azizov at gmail.com (Rishat Azizov) Date: Thu, 16 Mar 2023 17:02:07 +0600 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: <20230314084601.t2ez24gcljnu5plq@localhost> References: <20230306113543.a57aywefbn4cgsu3@localhost> <20230309095514.l3i67tys2ujaq6dp@localhost> <20230313163251.xpnzyvzb65b6zaal@localhost> <20230314084601.t2ez24gcljnu5plq@localhost> Message-ID: Hi Gorka, Thanks! I fixed issue by adding to multipathd config uxsock_timeout directive: uxsock_timeout 10000 Because in multipathd logs I saw this error: 3624a93705842cfae35d7483200015fd8: map flushed cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after 4.858561 secs Now large disk backups work fine. 2. This happens because despite the timeout of the first attempt and exit code 1, the multipath device was disconnected, so the next attempts ended with an error "is not a multipath device", since the multipath device had already disconnected. ??, 14 ???. 2023??. ? 14:46, Gorka Eguileor : > [Sending the email again as it seems it didn't reach the ML] > > > On 13/03, Gorka Eguileor wrote: > > On 11/03, Rishat Azizov wrote: > > > Hi, Gorka, > > > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments. > > > > > > > Hi, > > There are multiple things going on here: > > 1. There is a bug in os-brick, because the disconnect_volume should not > fail, since it is being called with force=True and > ignore_errors=True. > > The issues is that this call [1] is not wrapped in the > ExceptionChainer context manager, and it should not even be a flush > call, it should be a call to "multipathd remove map $map" instead. > > 2. The way multipath code is written [2][3], the error we see about > "3624a93705842cfae35d7483200015fce is not a multipath device" means 2 > different things: it is not a multipath or an error happened. > > So we don't really know what happened without enabling more verbose > multipathd log levels. > > 3. The "multipath -f" call should not be failing in the first place, > because the failure is happening on disconnecting the source volume, > which has no data buffered to be written and therefore no reason to > fail the flush (unless it's using a friendly name). > > I don't know if it could be happening that the first flush fails with > a timeout (maybe because there is an extend operation happening), but > multipathd keeps trying to flush it in the background and when it > succeeds it removes the multipath device, which makes following calls > fail. > > If that's the case we would need to change the retry from automatic > [4] to manual and check in-between to see if the device has been > removed in-between calls. > > The first issue is definitely a bug, the 2nd one is something that could > be changed in the deployment to try to get additional information on the > failure, and the 3rd one could be a bug. > > I'll see if I can find someone who wants to work on the 1st and 3rd > points. > > Cheers, > Gorka. > > [1]: > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952 > [2]: > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064 > [3]: > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872 > [4]: > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384 > > > > > > > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor : > > > > > > > On 06/03, Rishat Azizov wrote: > > > > > Hi, > > > > > > > > > > It works with smaller volumes. > > > > > > > > > > multipath.conf attached to thist email. > > > > > > > > > > Cinder version - 18.2.0 Wallaby > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shr.chauhan at gmail.com Thu Mar 16 12:00:04 2023 From: shr.chauhan at gmail.com (Shrey Chauhan) Date: Thu, 16 Mar 2023 17:30:04 +0530 Subject: Neutron Message-ID: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol> An HTML attachment was scrubbed... URL: From eblock at nde.ag Thu Mar 16 13:57:45 2023 From: eblock at nde.ag (Eugen Block) Date: Thu, 16 Mar 2023 13:57:45 +0000 Subject: (OpenStack-horizon) unable to open horizon page after installing Open Stack In-Reply-To: References: Message-ID: <20230316135745.Horde.JMkdtl7hq-QngIyx4x6MI1S@webmail.nde.ag> Can you also try /horizon ? I have an Ubuntu based Victoria test environment and we had to modify the apache openstack-dashboard.conf because the default didn't work for me as well: # before openstack-dashboard.conf: WSGIScriptAlias /horizon /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py process-group=horizon # after openstack-dashboard.conf: WSGIScriptAlias / /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py process-group=horizon Regards, Eugen Zitat von Adivya Singh : > Same result > > On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski > wrote: > >> try /dashboard >> >> On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh >> wrote: >> >>> Hi Team, >>> >>> I am unable to open Open OpenStack horizon page, after installation >>> When i am opening the link , it says >>> >>> Haproxy service seems up and running, I have tried to Flush IP tables >>> also, Seeing this might be causing the issue >>> >>> Port 443 is also listening. >>> >>> Any thoughts on this >>> >>> [image: image.png] >>> >> >> >> -- >> Radomir Dopieralski >> From smooney at redhat.com Thu Mar 16 14:03:47 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 16 Mar 2023 14:03:47 +0000 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: <1f9b4be304b2a9c1e463eb28420635630374529b.camel@redhat.com> On Thu, 2023-03-16 at 13:35 +0100, Dmitriy Rabotyagov wrote: > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > I think it might be the case of rescheduling the VM to other compute > without a long-lasting shelve/unshelve and when you don't need to > change the flavor. So kind of self-service when the user does detect > some weirdness, but before bothering the tech team will attempt to > reschedule to another compute on their own. rebuild is __not__ a move operation the curernt special case is a hack to alow image metadata properties to be updated for an exsitng vm but it will not reschedule the vm to another host. we talks about this in paste ptg where i propsoed adding a recreate api. i do not think we should ever make rebuilt a move oepratation but we could supprot a new api to enable the vm to recreate (keeping its data) on a new host with updated flavor/image extra specs based on teh current value of either. i really wish we coudl remvoe the current rebuild beahvior but when we discussed doing that before we decied it woudl break to many people. > > ??, 15 ???. 2023??. ? 19:57, Dan Smith : > > > > > ?We have users who use 'rebuild' on volume booted servers before nova > > > ?microversion 2.93, relying on the behavior that it keeps the volume as > > > ?is. And they would like to keep doing this even after the openstack > > > ?distro moves to a(n at least) zed base (sometime in the future). > > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > > > I assume it's because you want to redefine the metadata or name or > > something. There's a reason why those things are not easily mutable > > today, and why we had a lot of discussion on how to make user metadata > > mutable on an existing instance in the last cycle. However, I would > > really suggest that we not override "recreate the thing" to "maybe > > recreate the thing or just update a few fields". Instead, for things we > > think really should be mutable on a server at runtime, we should > > probably just do that. > > > > Imagine if the way you changed permissions recursively was to run 'rm > > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > > that is (IMHO) what "recreate but don't just change $name" means to a > > user. > > > > > ?As a naive user, it seems to me both behaviors make sense. I can > > > ?easily imagine use cases for rebuild with and without reimaging. > > > > I think that's because you're already familiar with the difference. For > > users not already in that mindset, I think it probably seems very weird > > that rebuild is destructive in one case and not the other. > > > > > ?Then there are a few hypothetical situations like: > > > ?a) Rebuild gets a new api feature (in a new microversion) which can > > > ?never be combined with the do-not-reimage behavior. > > > ?b) Rebuild may have a bug, whose fix requires a microversion bump. > > > ?This again can never be combined with the old behavior. > > > > > > ?What do you think, are these concerns purely theoretical or real? > > > ?If we would like to keep having rebuild without reimaging, can we rely > > > ?on the old microversion indefinitely? > > > ?Alternatively shall we propose and implement a nova spec to explicitly > > > ?expose the choice in the rebuild api (just to express the idea: osc > > > ?server rebuild --reimage|--no-reimage)? > > > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > > > I really want to know what the use-case is for "rebuild but not > > really". And also what "rebuild" means to a user if --no-reimage is > > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > > > "This operation recreates the root disk of the server." > > > > That was a lie for volume-backed instances for technical reasons. It was > > a bug, not a feature. > > > > I also strongly believe that if we're going to add a "but not > > really" flag, it needs to apply to volume-backed and regular instances > > identically. Because that's what the change here was doing - unifying > > the behavior for a single API operation. Going the other direction does > > not seem useful to me. > > > > --Dan > > > > [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > > > From sylvain.bauza at gmail.com Thu Mar 16 14:21:14 2023 From: sylvain.bauza at gmail.com (Sylvain Bauza) Date: Thu, 16 Mar 2023 15:21:14 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov a ?crit : > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > I think it might be the case of rescheduling the VM to other compute > without a long-lasting shelve/unshelve and when you don't need to > change the flavor. So kind of self-service when the user does detect > some weirdness, but before bothering the tech team will attempt to > reschedule to another compute on their own. > > We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor) ??, 15 ???. 2023??. ? 19:57, Dan Smith : > > > > > We have users who use 'rebuild' on volume booted servers before nova > > > microversion 2.93, relying on the behavior that it keeps the volume as > > > is. And they would like to keep doing this even after the openstack > > > distro moves to a(n at least) zed base (sometime in the future). > > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > > > I assume it's because you want to redefine the metadata or name or > > something. There's a reason why those things are not easily mutable > > today, and why we had a lot of discussion on how to make user metadata > > mutable on an existing instance in the last cycle. However, I would > > really suggest that we not override "recreate the thing" to "maybe > > recreate the thing or just update a few fields". Instead, for things we > > think really should be mutable on a server at runtime, we should > > probably just do that. > > > > Imagine if the way you changed permissions recursively was to run 'rm > > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > > that is (IMHO) what "recreate but don't just change $name" means to a > > user. > > > > > As a naive user, it seems to me both behaviors make sense. I can > > > easily imagine use cases for rebuild with and without reimaging. > > > > I think that's because you're already familiar with the difference. For > > users not already in that mindset, I think it probably seems very weird > > that rebuild is destructive in one case and not the other. > > > > > Then there are a few hypothetical situations like: > > > a) Rebuild gets a new api feature (in a new microversion) which can > > > never be combined with the do-not-reimage behavior. > > > b) Rebuild may have a bug, whose fix requires a microversion bump. > > > This again can never be combined with the old behavior. > > > > > > What do you think, are these concerns purely theoretical or real? > > > If we would like to keep having rebuild without reimaging, can we rely > > > on the old microversion indefinitely? > > > Alternatively shall we propose and implement a nova spec to explicitly > > > expose the choice in the rebuild api (just to express the idea: osc > > > server rebuild --reimage|--no-reimage)? > > > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > > > I really want to know what the use-case is for "rebuild but not > > really". And also what "rebuild" means to a user if --no-reimage is > > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > > > "This operation recreates the root disk of the server." > > > > That was a lie for volume-backed instances for technical reasons. It was > > a bug, not a feature. > > > > I also strongly believe that if we're going to add a "but not > > really" flag, it needs to apply to volume-backed and regular instances > > identically. Because that's what the change here was doing - unifying > > the behavior for a single API operation. Going the other direction does > > not seem useful to me. > > > > --Dan > > > > [0] > https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Thu Mar 16 14:26:53 2023 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Thu, 16 Mar 2023 11:26:53 -0300 Subject: [manila] Bobcat vPTG slots and topics Message-ID: Hello, Zorillas! PTG is right around the corner and I would like to remind you to please add the topics you would like to bring up during our sessions to the planning etherpad [1] until next Tuesday (Mar 21st). I have already allocated some slots for our sessions: - Monday: 16:00 to 17:00 UTC - Wednesday: 14:00 to 16:00 UTC - Thursday: 14:00 to 16:00 UTC - Friday: 14:00 to 17:00 UTC We will be meeting in the Austin room, you can access the meeting room through the PTG page [2]. If you have a preference of date/time for your topic to be discussed, please let me know and I will try to accommodate it. Looking forward to meeting you! [1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning [2] https://ptg.opendev.org/ptg.html Thanks, carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 16 15:40:14 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 16 Mar 2023 08:40:14 -0700 Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o In-Reply-To: References: Message-ID: It's good that we get these out of the way of folks looking for modern images. I'm on board. Thanks, Jay Faulkner Ironic PTL TC Member On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur wrote: > Hi all, > > I would like to do a clean-up of old or wrongly created IPA images on > the tarballs site. Before I do that, could you please check the proposed > list to make sure we don't remove something we expect to be used? > > The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/ > > Dmitry > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Mar 16 15:44:20 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 16 Mar 2023 16:44:20 +0100 Subject: [neutron] PTG poll for meeting slots Message-ID: Hello Neutrinos: This is the link [1] (that I should have sent yesterday) to vote for the meeting slots during the PTG week. I think that 3 hours per day, from Tuesday to Friday, will be enough to cover the topics we need to discuss. Please feel free to send your vote here. Next week I'll close the poll and schedule the meetings. Thank you! [1]https://doodle.com/meeting/participate/id/eZ83mmvb -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Mar 16 16:15:42 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 16 Mar 2023 17:15:42 +0100 Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects) Message-ID: Hello: I'm sending this mail in advance to propose transitioning Neutron and all related projects to EOL. I'll propose this topic too during the next Neutron meeting. The announcement is the first step [1] to transition a stable branch to EOL. The patch to mark these branches as EOL will be pushed in two weeks. If you have any inconvenience, please let me know in this mail chain or in IRC (ralonsoh, #openstack-neutron channel). You can also contact any Neutron core reviewer in the IRC channel. Regards. [1] https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Mar 16 16:24:07 2023 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 16 Mar 2023 09:24:07 -0700 Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o In-Reply-To: References: Message-ID: <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com> On Thu, Mar 16, 2023, at 8:40 AM, Jay Faulkner wrote: > It's good that we get these out of the way of folks looking for modern > images. I'm on board. I'm not familiar enough with IPA to know the answers to these questions, but I think they play an important role in the decision making here. Can a user use the latest version of IPA with an old deployment of Ironic? If so why do we bother to publish a bunch of version and distro specific images? You should be able to keep an up to date image published that users find and use? If the versions do matter then you should be very careful to avoid deleting images that a user may need to run with their version of Ironic. > > Thanks, > Jay Faulkner > Ironic PTL > TC Member > > On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur wrote: >> Hi all, >> >> I would like to do a clean-up of old or wrongly created IPA images on >> the tarballs site. Before I do that, could you please check the proposed >> list to make sure we don't remove something we expect to be used? >> >> The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/ >> >> Dmitry >> >> From Danny.Webb at thehutgroup.com Thu Mar 16 16:43:02 2023 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Thu, 16 Mar 2023 16:43:02 +0000 Subject: [neutron] In-Reply-To: References: Message-ID: Hi Kamil, We're currently running 4 (soon to be 5) production regions all using kolla ansible as our deployer with OVN as our neutron backend. It's been fairly solid for us and we've had less issues with OVN than the traditional hybrid OVS / Iptables neutron driver (which we ran for about a year before switching to OVN). Our regions are anywhere from 50-60 compute hosts with 1-2k+ VMs per region. As far as I know most of the new development is going into OVN so would be a good place to start. Ultimately, we've only really had 2 real issues whilst running it. First was an issue where we had the provider network spamming gateway changes into southbound as we had our anycast SVI bound to our top of rack switches which made OVN keep updating it's location. We mitigated this by moving the provider SVIs to our border routers and the issue went away and dropped the load on our OVN controllers significantly. Only other real issue we had was during an upgrade of a region we ended up with what we believed to be some sort of stale flows that resulted in some hypervisors losing connectivity until we rebooted them. Hope this helps! Cheers, Danny ________________________________ From: Kamil Madac Sent: 14 March 2023 09:46 To: openstack-discuss Subject: [neutron] CAUTION: This email originates from outside THG ________________________________ Hi All, I'm in the process of planning a small public cloud based on OpenStack. I have quite experience with kolla-ansible deployments which use OVS networking and I have no issues with that. It works stable for my use cases (Vlan provider networks, DVR, tenant networks, floating IPs). For that new deployment I'm looking at OVN deployment which from what I read should be more performant (faster build of instances) and with ability to cover more networking features in OVN instead of needing external software like iptables/dnsmasq. Does anyone use OVN in production and what is your experience (pros/cons)? Is OVN mature enough to replace OVS in the production deployment (are there some basic features from OVS missing)? Thanks in advance for sharing the experience. -- Kamil Madac Danny Webb Principal OpenStack Engineer Danny.Webb at thehutgroup.com [THG Ingenuity Logo] [https://i.imgur.com/wbpVRW6.png] [https://i.imgur.com/c3040tr.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 16 16:48:43 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 16 Mar 2023 09:48:43 -0700 Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o In-Reply-To: <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com> References: <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com> Message-ID: The first half of the posted list are the ramdisk artifacts corresponding to now-retired bugfix branches. I could see an argument being made that we should continue providing those deliverables, as we do on PyPI -- I am OK with deleting them even with that in mind, as they contain massively out of date software (beyond IPA) that is likely unfit for running on production servers anymore. These are potential targets for movement to a deprecated location instead of deletion, if we feel we should continue providing them. The second half of the list are extra dangerous; they are advertised as builds from "master" branch, but they are very old and out of date due to us no longer creating images for those distributions or using those tools anymore. The CoreOS images mentioned are from 2016, to put this in perspective. These should be deleted IMO, even if we find a softer way for the retired bugfix branch ramdisks. To be frank, if someone *is* consuming these old images, and deleting them forced them to make a support contact with upstream, it'd likely end up with a better solution for them overall than running years-old software on their bare metal. -- Jay Faulkner Ironic PTL TC Member On Thu, Mar 16, 2023 at 9:32?AM Clark Boylan wrote: > On Thu, Mar 16, 2023, at 8:40 AM, Jay Faulkner wrote: > > It's good that we get these out of the way of folks looking for modern > > images. I'm on board. > > I'm not familiar enough with IPA to know the answers to these questions, > but I think they play an important role in the decision making here. > > Can a user use the latest version of IPA with an old deployment of Ironic? > If so why do we bother to publish a bunch of version and distro specific > images? You should be able to keep an up to date image published that users > find and use? > > If the versions do matter then you should be very careful to avoid > deleting images that a user may need to run with their version of Ironic. > > > > > Thanks, > > Jay Faulkner > > Ironic PTL > > TC Member > > > > On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur > wrote: > >> Hi all, > >> > >> I would like to do a clean-up of old or wrongly created IPA images on > >> the tarballs site. Before I do that, could you please check the proposed > >> list to make sure we don't remove something we expect to be used? > >> > >> The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/ > >> > >> Dmitry > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thremes172 at gmail.com Thu Mar 16 17:05:23 2023 From: thremes172 at gmail.com (kaqiu pi) Date: Fri, 17 Mar 2023 01:05:23 +0800 Subject: Fwd: [kolla-ansible] Whether the cluster of two control nodes is available In-Reply-To: References: Message-ID: Hi? I'm a newer in kolla-ansibe. And I could deploy a cluster of two controll nodes by kolla-ansible. But I don't konw whether the cluster is anvailable? I would like to ask, when the number of control nodes is 2, the status of mariadb and rabbitmq clusters, are they safe and available for production? Thanks for any guidance. Good Luck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Danny.Webb at thehutgroup.com Thu Mar 16 18:05:48 2023 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Thu, 16 Mar 2023 18:05:48 +0000 Subject: [kolla-ansible] Whether the cluster of two control nodes is available In-Reply-To: References: Message-ID: You can't do 2 control nodes with kolla as you need an odd number of mariadb nodes for quorum purposes (1 or 3 or more). ________________________________ From: kaqiu pi Sent: 16 March 2023 17:05 To: openstack-discuss at lists.openstack.org Subject: Fwd: [kolla-ansible] Whether the cluster of two control nodes is available CAUTION: This email originates from outside THG ________________________________ Hi? I'm a newer in kolla-ansibe. And I could deploy a cluster of two controll nodes by kolla-ansible. But I don't konw whether the cluster is anvailable? I would like to ask, when the number of control nodes is 2, the status of mariadb and rabbitmq clusters, are they safe and available for production? Thanks for any guidance. Good Luck Danny Webb Principal OpenStack Engineer Danny.Webb at thehutgroup.com [THG Ingenuity Logo] [https://i.imgur.com/wbpVRW6.png] [https://i.imgur.com/c3040tr.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Thu Mar 16 16:56:26 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 16 Mar 2023 17:56:26 +0100 Subject: Neutron In-Reply-To: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol> References: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol> Message-ID: Hello Shrey: First of all, let me say that the ML2 Linux Bridge mechanism driver is now considered as "experimental support". That means we no longer have active developers working on this driver and we always recommend using others like ML2/OVS or ML2/OVN (or ML2/SR-IOV in case you have the needed hardware). Let me also point you to launchpad [1] that is the place to report a defect like this one. Please open a bug in this link. In order to debug and try to reproduce this issue, can you please print the values you are passing in [2] (the name, the namespace name and kwargs)? Thanks! [1]https://bugs.launchpad.net/neutron/ [2] https://github.com/openstack/neutron/blob/85b82d4452ed3199c7f1f7c2455d2a75faaa2991/neutron/agent/linux/ip_lib.py#L321 On Thu, Mar 16, 2023 at 2:19?PM Shrey Chauhan wrote: > Hi, > > I am running an openstack xena environment > > > > My neutron version > > dnf list installed | grep neutron > > > > > > > > > *openstack-neutron.noarch 1:19.4.0-2.el8 > @ecnlocalrepoopenstack-neutron-common.noarch 1:19.4.0-2.el8 > @ecnlocalrepoopenstack-neutron-linuxbridge.noarch 1:19.4.0-2.el8 > @ecnlocalrepoopenstack-neutron-ml2.noarch 1:19.4.0-2.el8 > @ecnlocalrepopython3-neutron.noarch 1:19.4.0-2.el8 > @ecnlocalrepopython3-neutron-lib.noarch 2.15.2-1.el8 > @ecnlocalrepopython3-neutronclient.noarch 7.6.0-1.el8 @ecnlocalrepo* > > > I observed when I create a vm, sometimes the vm was not getting the right > dhcp ip which was getting assigned from openstack: > I looked inside the vm dhcp call was just timing out, I am still not able > to figure out what is wrong with the setup. > One thing that we have noticed the Linux bridge logs and are just filled > with these errors, the whole log file is just filled with these errors: > 2023-03-16 07:38:29.182 118098 INFO > neutron.plugins.ml2.drivers.agent._common_agent > [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Port tapc0ef9dda-42 > updated. Details: {'device': 'tapc0ef9dda-42', 'network_id': > 'e67e45d1-cf29-416b-89ef-353db4ef3586', 'port_id': > 'c0ef9dda-4278-4e97-a014-09bc125bb57d', 'mac_address': 'fa:16:3e:32:f4:94', > 'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 1908, > 'physical_network': None, 'mtu': 1450, 'fixed_ips': [{'subnet_id': > '67d6e448-0976-4fc8-a54e-7df40d0a438d', 'ip_address': '169.254.195.108'}], > 'device_owner': 'network:router_ha_interface', 'allowed_address_pairs': [], > 'port_security_enabled': False, 'qos_policy_id': None, > 'network_qos_policy_id': None, 'profile': {}, 'propagate_uplink_status': > False} > > 2023-03-16 07:38:29.191 118098 INFO > neutron.plugins.ml2.drivers.linuxbridge.agent.arp_protect > [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Skipping ARP spoofing > rules for port 'tapc0ef9dda-42' because it has port security disabled > > 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-] File > "/usr/lib64/python3.6/threading.py", line 905, in _bootstrap > > self._bootstrap_inner() > > File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner > > self.run() > > File "/usr/lib64/python3.6/threading.py", line 885, in run > > self._target(*self._args, **self._kwargs) > > File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 69, in > _worker > > work_item.run() > > File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run > > result = self.fn(*self.args, **self.kwargs) > > File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line > 477, in _process_cmd > > ret = func(*f_args, **f_kwargs) > > File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", > line 274, in _wrap > > return func(*args, **kwargs) > > File > "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", > line 317, in create_interface > > return ip.link("add", ifname=ifname, kind=kind, **kwargs) > > File "/usr/lib/python3.6/site-packages/pr2modules/iproute/linux.py", > line 1461, in link > > msg_flags=msg_flags) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py", > line 397, in nlm_request > > return tuple(self._genlm_request(*argv, **kwarg)) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py", > line 888, in nlm_request > > self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py", > line 636, in put > > self.sendto_gate(msg, addr) > > File > "/usr/lib/python3.6/site-packages/pr2modules/netlink/rtnl/iprsocket.py", > line 61, in _gate_linux > > msg.encode() > > File > "/usr/lib/python3.6/site-packages/pr2modules/netlink/rtnl/ifinfmsg/__init__.py", > line 511, in encode > > return super(ifinfbase, self).encode() > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1062, in encode > > offset = self.encode_nlas(offset) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1323, in encode_nlas > > nla_instance.encode() > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1062, in encode > > offset = self.encode_nlas(offset) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1323, in encode_nlas > > nla_instance.encode() > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1062, in encode > > offset = self.encode_nlas(offset) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1323, in encode_nlas > > nla_instance.encode() > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1059, in encode > > offset, diff = self.ft_encode(offset) > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1493, in ft_encode > > log.error(''.join(traceback.format_stack())) > > > > 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-] Traceback > (most recent call last): > > File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py", > line 1491, in ft_encode > > struct.pack_into(efmt, self.data, offset, value) > > struct.error: required argument is not an integer > > > > 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-] error pack: B > b'inherit' > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent > [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Error in agent loop. > Devices info: {'current': {'tapc0ef9dda-42'}, 'timestamps': > {'tapc0ef9dda-42': 27}, 'added': {'tapc0ef9dda-42'}, 'removed': set(), > 'updated': set()}: struct.error: required argument is not an integer > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call > last): > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", > line 465, in daemon_loop > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent sync = > self.process_network_devices(device_info) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in > wrapper > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent result = f(*args, > **kwargs) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", > line 214, in process_network_devices > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent resync_a = > self.treat_devices_added_updated(devices_added_updated) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in > wrapper > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent result = f(*args, > **kwargs) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", > line 231, in treat_devices_added_updated > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent > self._process_device_if_exists(device_details) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", > line 258, in _process_device_if_exists > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent device, > device_details['device_owner']) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 585, in plug_interface > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent network_segment.mtu) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 520, in add_tap_interface > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent return False > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in > __exit__ > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent self.force_reraise() > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in > force_reraise > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent raise self.value > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 512, in add_tap_interface > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent tap_device_name, > device_owner, mtu) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 545, in _add_tap_interface > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent mtu): > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 484, in ensure_physical_in_bridge > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent return > self.ensure_vxlan_bridge(network_id, segmentation_id, mtu) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 259, in ensure_vxlan_bridge > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent interface = > self.ensure_vxlan(segmentation_id, mtu) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", > line 356, in ensure_vxlan > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent self.local_int, **args) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 322, > in add_vxlan > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent > privileged.create_interface(name, self.namespace, "vxlan", **kwargs) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 272, > in _wrap > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent r_call_timeout) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent File > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 216, in > remote_call > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent raise > exc_type(*result[2]) > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent struct.error: required > argument is not an integer > > 2023-03-16 07:38:29.242 118098 ERROR > neutron.plugins.ml2.drivers.agent._common_agent > > 2023-03-16 07:38:30.849 118098 INFO > neutron.plugins.ml2.drivers.agent._common_agent > [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Linux bridge agent > Agent out of sync with plugin! > > 2023-03-16 07:38:30.850 118098 INFO neutron.agent.securitygroups_rpc > [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Preparing filters for > devices {'tapc0ef9dda-42'} > > > > I have been struggling with these in our environment, any suggestion what > could be the reason behind this? > What is wrong in my setup here? > Thanks in advance for any help > > > > Sent from Mail for > Windows > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toheeb.olawale.to23 at gmail.com Thu Mar 16 19:10:54 2023 From: toheeb.olawale.to23 at gmail.com (Toheeb Oyekola) Date: Thu, 16 Mar 2023 20:10:54 +0100 Subject: [outreachy][cinder] Running test in dev Enviornment Message-ID: Hi everyone, I am having some error when i run "tox -e py3", here is a linke to the error https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Thu Mar 16 20:06:36 2023 From: cboylan at sapwetik.org (Clark Boylan) Date: Thu, 16 Mar 2023 13:06:36 -0700 Subject: [outreachy][cinder] Running test in dev Enviornment In-Reply-To: References: Message-ID: <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com> On Thu, Mar 16, 2023, at 12:10 PM, Toheeb Oyekola wrote: > Hi everyone, I am having some error when i run "tox -e py3", here is a > linke to the error > https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/. The paste indicates "Error: pg_config executable not found." which appears to be necessary to install the psycopg2 PostgreSQL python library. Cinder's bindep.txt file [0] captures the system level dependencies for PostgreSQL which i expect cover this. You can use the bindep tool to ensure you've got all the necessary system libs installed. However, I note in your paste that you have windows filesystem paths and I'm not sure if bindep will run properly in that environment. We'd be happy to hear if it works or not as that is good info to have. [0] https://opendev.org/openstack/cinder/src/branch/master/bindep.txt#L28-L31 From toheeb.olawale.to23 at gmail.com Thu Mar 16 20:51:32 2023 From: toheeb.olawale.to23 at gmail.com (Toheeb Oyekola) Date: Thu, 16 Mar 2023 21:51:32 +0100 Subject: [outreachy][cinder] Running test in dev Enviornment In-Reply-To: <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com> References: <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com> Message-ID: Thanks, I'll check it out now. On Thu, Mar 16, 2023 at 9:07?PM Clark Boylan wrote: > On Thu, Mar 16, 2023, at 12:10 PM, Toheeb Oyekola wrote: > > Hi everyone, I am having some error when i run "tox -e py3", here is a > > linke to the error > > https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/. > > The paste indicates "Error: pg_config executable not found." which appears > to be necessary to install the psycopg2 PostgreSQL python library. Cinder's > bindep.txt file [0] captures the system level dependencies for PostgreSQL > which i expect cover this. You can use the bindep tool to ensure you've got > all the necessary system libs installed. However, I note in your paste that > you have windows filesystem paths and I'm not sure if bindep will run > properly in that environment. We'd be happy to hear if it works or not as > that is good info to have. > > [0] > https://opendev.org/openstack/cinder/src/branch/master/bindep.txt#L28-L31 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Thu Mar 16 22:22:51 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Thu, 16 Mar 2023 15:22:51 -0700 Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra maintainers Message-ID: Hi PTLs, The TC recently voted[1] to require humans be removed from PyPI access for OpenStack-managed projects. This helps ensure all releases are created via releases team tooling and makes it less likely for a user account compromise to impact OpenStack packages. Many projects have already updated https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33 with a list of packages that contain extra maintainers. We'd like to request that PTLs, or their designate, reach out to any extra maintainers listed for projects you are responsible for and request they remove their access in accordance with policy. An example email, and detailed steps to follow have been provided at https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template . Thank you for your cooperation as we work to improve our security posture and harden against supply chain attacks. Thank you, Jay Faulkner TC Vice-Chair 1: https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d -------------- next part -------------- An HTML attachment was scrubbed... URL: From alsotoes at gmail.com Thu Mar 16 23:18:51 2023 From: alsotoes at gmail.com (Alvaro Soto) Date: Thu, 16 Mar 2023 17:18:51 -0600 Subject: [kolla-ansible] Whether the cluster of two control nodes is available In-Reply-To: References: Message-ID: Just to complement Danny's comment: this applies to any kind of distributed cluster; if you have an even number of members or only one member in a cluster that requires a quorum to work, you will be prone to a split-brain situation. So it's not safe for production. https://en.wikipedia.org/wiki/Split-brain_(computing) Cheers! On Thu, Mar 16, 2023 at 12:12?PM Danny Webb wrote: > You can't do 2 control nodes with kolla as you need an odd number of > mariadb nodes for quorum purposes (1 or 3 or more). > ------------------------------ > *From:* kaqiu pi > *Sent:* 16 March 2023 17:05 > *To:* openstack-discuss at lists.openstack.org < > openstack-discuss at lists.openstack.org> > *Subject:* Fwd: [kolla-ansible] Whether the cluster of two control nodes > is available > > > * CAUTION: This email originates from outside THG * > ------------------------------ > Hi? > > I'm a newer in kolla-ansibe. And I could deploy a cluster of two controll > nodes by kolla-ansible. But I don't konw whether the cluster is anvailable? > > I would like to ask, when the number of control nodes is 2, the status of > mariadb and rabbitmq clusters, are they safe and available for production? > > Thanks for any guidance. > Good Luck > > *Danny Webb* > Principal OpenStack Engineer > Danny.Webb at thehutgroup.com > [image: THG Ingenuity Logo] > > > -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Fri Mar 17 00:22:26 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Fri, 17 Mar 2023 07:22:26 +0700 Subject: [kolla-ansible] migrate from OVS to OVN Message-ID: Hello guys. Can we use kolla ansible to migrate from OVS to OVN? If then will it have downtime or impacts? Thank you much, Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Fri Mar 17 00:55:11 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Fri, 17 Mar 2023 07:55:11 +0700 Subject: [magnum][kolla ansible]Ask about use higher magnum image on previous opentack version Message-ID: Hello guys. I use Openstack Xena by using Kolla Ansible tool. due to some reason, I want to use Zed Magnum on my current system(Xena). Can I do this task by rebuilding Magnum images from code then retagging and replacing the Magnum container with new images? Any experience for this? Thank you very much. Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Mar 17 04:50:38 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 17 Mar 2023 04:50:38 +0000 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: IMHO, 0.001% of the time someone might be running rebuild to do something that?s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it?s a user expecting a fresh VM Get Outlook for iOS ________________________________ From: Sylvain Bauza Sent: Thursday, March 16, 2023 10:21:14 AM To: Dmitriy Rabotyagov Cc: openstack-discuss Subject: Re: [nova][cinder] future of rebuild without reimaging Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov > a ?crit : > Maybe I'm missing something, but what are the reasons you would want to > rebuild an instance without ... rebuilding it? I think it might be the case of rescheduling the VM to other compute without a long-lasting shelve/unshelve and when you don't need to change the flavor. So kind of self-service when the user does detect some weirdness, but before bothering the tech team will attempt to reschedule to another compute on their own. We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor) ??, 15 ???. 2023??. ? 19:57, Dan Smith >: > > > We have users who use 'rebuild' on volume booted servers before nova > > microversion 2.93, relying on the behavior that it keeps the volume as > > is. And they would like to keep doing this even after the openstack > > distro moves to a(n at least) zed base (sometime in the future). > > Maybe I'm missing something, but what are the reasons you would want to > rebuild an instance without ... rebuilding it? > > I assume it's because you want to redefine the metadata or name or > something. There's a reason why those things are not easily mutable > today, and why we had a lot of discussion on how to make user metadata > mutable on an existing instance in the last cycle. However, I would > really suggest that we not override "recreate the thing" to "maybe > recreate the thing or just update a few fields". Instead, for things we > think really should be mutable on a server at runtime, we should > probably just do that. > > Imagine if the way you changed permissions recursively was to run 'rm > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > that is (IMHO) what "recreate but don't just change $name" means to a > user. > > > As a naive user, it seems to me both behaviors make sense. I can > > easily imagine use cases for rebuild with and without reimaging. > > I think that's because you're already familiar with the difference. For > users not already in that mindset, I think it probably seems very weird > that rebuild is destructive in one case and not the other. > > > Then there are a few hypothetical situations like: > > a) Rebuild gets a new api feature (in a new microversion) which can > > never be combined with the do-not-reimage behavior. > > b) Rebuild may have a bug, whose fix requires a microversion bump. > > This again can never be combined with the old behavior. > > > > What do you think, are these concerns purely theoretical or real? > > If we would like to keep having rebuild without reimaging, can we rely > > on the old microversion indefinitely? > > Alternatively shall we propose and implement a nova spec to explicitly > > expose the choice in the rebuild api (just to express the idea: osc > > server rebuild --reimage|--no-reimage)? > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > I really want to know what the use-case is for "rebuild but not > really". And also what "rebuild" means to a user if --no-reimage is > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > "This operation recreates the root disk of the server." > > That was a lie for volume-backed instances for technical reasons. It was > a bug, not a feature. > > I also strongly believe that if we're going to add a "but not > really" flag, it needs to apply to volume-backed and regular instances > identically. Because that's what the change here was doing - unifying > the behavior for a single API operation. Going the other direction does > not seem useful to me. > > --Dan > > [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 17 07:31:35 2023 From: mnasiadka at gmail.com (=?UTF-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 17 Mar 2023 08:31:35 +0100 Subject: [kolla-ansible] migrate from OVS to OVN In-Reply-To: References: Message-ID: Hello, Kolla-Ansible does not support migration from OVS to OVN yet. Best regards, Michal W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i napisa?(a): > Hello guys. > Can we use kolla ansible to migrate from OVS to OVN? If then will it have > downtime or impacts? > Thank you much, > Nguyen Huu Khoi > -- Micha? Nasiadka mnasiadka at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Fri Mar 17 07:35:18 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Fri, 17 Mar 2023 14:35:18 +0700 Subject: [kolla-ansible] migrate from OVS to OVN In-Reply-To: References: Message-ID: Thank you for your information. Will we do it in the future? Nguyen Huu Khoi On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka wrote: > Hello, > > Kolla-Ansible does not support migration from OVS to OVN yet. > > Best regards, > Michal > > W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i > napisa?(a): > >> Hello guys. >> Can we use kolla ansible to migrate from OVS to OVN? If then will it have >> downtime or impacts? >> Thank you much, >> Nguyen Huu Khoi >> > -- > Micha? Nasiadka > mnasiadka at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 17 07:51:38 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 17 Mar 2023 08:51:38 +0100 Subject: [kolla-ansible] migrate from OVS to OVN In-Reply-To: References: Message-ID: <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com> No contributors have mentioned that they want to contribute this feature for now, but I?ll add this topic for the upcoming PTG. Best regards, Michal > On 17 Mar 2023, at 08:35, Nguy?n H?u Kh?i wrote: > > Thank you for your information. > Will we do it in the future? > Nguyen Huu Khoi > > > On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka > wrote: >> Hello, >> >> Kolla-Ansible does not support migration from OVS to OVN yet. >> >> Best regards, >> Michal >> >> W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i > napisa?(a): >>> Hello guys. >>> Can we use kolla ansible to migrate from OVS to OVN? If then will it have downtime or impacts? >>> Thank you much, >>> Nguyen Huu Khoi >> -- >> Micha? Nasiadka >> mnasiadka at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Fri Mar 17 08:04:53 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Fri, 17 Mar 2023 09:04:53 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: Just in case I wasn't saying anything about how legit or widespread this use case is, I was just providing an example of how rebuild without real rebuild could be leveraged by operators. Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host or smth, so that non-admins could not pick any arbitrary compute and had to rely on scheduler only. ??, 17 ???. 2023 ?., 05:50 Mohammed Naser : > IMHO, 0.001% of the time someone might be running rebuild to do something > that?s to fix an issue in metadata or something (and probably an operator > too) and 99.999% of the time it?s a user expecting a fresh VM > > Get Outlook for iOS > ------------------------------ > *From:* Sylvain Bauza > *Sent:* Thursday, March 16, 2023 10:21:14 AM > *To:* Dmitriy Rabotyagov > *Cc:* openstack-discuss > *Subject:* Re: [nova][cinder] future of rebuild without reimaging > > > > Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov > a ?crit : > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > I think it might be the case of rescheduling the VM to other compute > without a long-lasting shelve/unshelve and when you don't need to > change the flavor. So kind of self-service when the user does detect > some weirdness, but before bothering the tech team will attempt to > reschedule to another compute on their own. > > > We already have an existing API method for this, which is 'cold-migrate' > (and it does the same that resize, without changing the flavor) > > > ??, 15 ???. 2023??. ? 19:57, Dan Smith : > > > > > We have users who use 'rebuild' on volume booted servers before nova > > > microversion 2.93, relying on the behavior that it keeps the volume as > > > is. And they would like to keep doing this even after the openstack > > > distro moves to a(n at least) zed base (sometime in the future). > > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > > > I assume it's because you want to redefine the metadata or name or > > something. There's a reason why those things are not easily mutable > > today, and why we had a lot of discussion on how to make user metadata > > mutable on an existing instance in the last cycle. However, I would > > really suggest that we not override "recreate the thing" to "maybe > > recreate the thing or just update a few fields". Instead, for things we > > think really should be mutable on a server at runtime, we should > > probably just do that. > > > > Imagine if the way you changed permissions recursively was to run 'rm > > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > > that is (IMHO) what "recreate but don't just change $name" means to a > > user. > > > > > As a naive user, it seems to me both behaviors make sense. I can > > > easily imagine use cases for rebuild with and without reimaging. > > > > I think that's because you're already familiar with the difference. For > > users not already in that mindset, I think it probably seems very weird > > that rebuild is destructive in one case and not the other. > > > > > Then there are a few hypothetical situations like: > > > a) Rebuild gets a new api feature (in a new microversion) which can > > > never be combined with the do-not-reimage behavior. > > > b) Rebuild may have a bug, whose fix requires a microversion bump. > > > This again can never be combined with the old behavior. > > > > > > What do you think, are these concerns purely theoretical or real? > > > If we would like to keep having rebuild without reimaging, can we rely > > > on the old microversion indefinitely? > > > Alternatively shall we propose and implement a nova spec to explicitly > > > expose the choice in the rebuild api (just to express the idea: osc > > > server rebuild --reimage|--no-reimage)? > > > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > > > I really want to know what the use-case is for "rebuild but not > > really". And also what "rebuild" means to a user if --no-reimage is > > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > > > "This operation recreates the root disk of the server." > > > > That was a lie for volume-backed instances for technical reasons. It was > > a bug, not a feature. > > > > I also strongly believe that if we're going to add a "but not > > really" flag, it needs to apply to volume-backed and regular instances > > identically. Because that's what the change here was doing - unifying > > the behavior for a single API operation. Going the other direction does > > not seem useful to me. > > > > --Dan > > > > [0] > https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Fri Mar 17 08:05:26 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Fri, 17 Mar 2023 15:05:26 +0700 Subject: [kolla-ansible] migrate from OVS to OVN In-Reply-To: <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com> References: <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com> Message-ID: Thank you very much. :) Nguyen Huu Khoi On Fri, Mar 17, 2023 at 2:51?PM Micha? Nasiadka wrote: > No contributors have mentioned that they want to contribute this feature > for now, but I?ll add this topic for the upcoming PTG. > > Best regards, > Michal > > On 17 Mar 2023, at 08:35, Nguy?n H?u Kh?i > wrote: > > Thank you for your information. > Will we do it in the future? > Nguyen Huu Khoi > > > On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka > wrote: > >> Hello, >> >> Kolla-Ansible does not support migration from OVS to OVN yet. >> >> Best regards, >> Michal >> >> W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i >> napisa?(a): >> >>> Hello guys. >>> Can we use kolla ansible to migrate from OVS to OVN? If then will it >>> have downtime or impacts? >>> Thank you much, >>> Nguyen Huu Khoi >>> >> -- >> Micha? Nasiadka >> mnasiadka at gmail.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Fri Mar 17 08:52:55 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 17 Mar 2023 09:52:55 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: Le ven. 17 mars 2023 ? 09:10, Dmitriy Rabotyagov a ?crit : > Just in case I wasn't saying anything about how legit or widespread this > use case is, I was just providing an example of how rebuild without real > rebuild could be leveraged by operators. > > Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host > or smth, so that non-admins could not pick any arbitrary compute and had > to rely on scheduler only. > > Ah, I see your point, I'll add it for the vPTG agenda. -Sylvain ??, 17 ???. 2023 ?., 05:50 Mohammed Naser : > IMHO, 0.001% of the time someone might be running rebuild to do something > that?s to fix an issue in metadata or something (and probably an operator > too) and 99.999% of the time it?s a user expecting a fresh VM > > Get Outlook for iOS > ------------------------------ > *From:* Sylvain Bauza > *Sent:* Thursday, March 16, 2023 10:21:14 AM > *To:* Dmitriy Rabotyagov > *Cc:* openstack-discuss > *Subject:* Re: [nova][cinder] future of rebuild without reimaging > > > > Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov > a ?crit : > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > I think it might be the case of rescheduling the VM to other compute > without a long-lasting shelve/unshelve and when you don't need to > change the flavor. So kind of self-service when the user does detect > some weirdness, but before bothering the tech team will attempt to > reschedule to another compute on their own. > > > We already have an existing API method for this, which is 'cold-migrate' > (and it does the same that resize, without changing the flavor) > > > ??, 15 ???. 2023??. ? 19:57, Dan Smith : > > > > > We have users who use 'rebuild' on volume booted servers before nova > > > microversion 2.93, relying on the behavior that it keeps the volume as > > > is. And they would like to keep doing this even after the openstack > > > distro moves to a(n at least) zed base (sometime in the future). > > > > Maybe I'm missing something, but what are the reasons you would want to > > rebuild an instance without ... rebuilding it? > > > > I assume it's because you want to redefine the metadata or name or > > something. There's a reason why those things are not easily mutable > > today, and why we had a lot of discussion on how to make user metadata > > mutable on an existing instance in the last cycle. However, I would > > really suggest that we not override "recreate the thing" to "maybe > > recreate the thing or just update a few fields". Instead, for things we > > think really should be mutable on a server at runtime, we should > > probably just do that. > > > > Imagine if the way you changed permissions recursively was to run 'rm > > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but > > that is (IMHO) what "recreate but don't just change $name" means to a > > user. > > > > > As a naive user, it seems to me both behaviors make sense. I can > > > easily imagine use cases for rebuild with and without reimaging. > > > > I think that's because you're already familiar with the difference. For > > users not already in that mindset, I think it probably seems very weird > > that rebuild is destructive in one case and not the other. > > > > > Then there are a few hypothetical situations like: > > > a) Rebuild gets a new api feature (in a new microversion) which can > > > never be combined with the do-not-reimage behavior. > > > b) Rebuild may have a bug, whose fix requires a microversion bump. > > > This again can never be combined with the old behavior. > > > > > > What do you think, are these concerns purely theoretical or real? > > > If we would like to keep having rebuild without reimaging, can we rely > > > on the old microversion indefinitely? > > > Alternatively shall we propose and implement a nova spec to explicitly > > > expose the choice in the rebuild api (just to express the idea: osc > > > server rebuild --reimage|--no-reimage)? > > > > > > I'm not opposed to challenge the usecases in a spec, for sure. > > > > I really want to know what the use-case is for "rebuild but not > > really". And also what "rebuild" means to a user if --no-reimage is > > passed. What's being rebuilt? The docs[0] for the API say very clearly: > > > > "This operation recreates the root disk of the server." > > > > That was a lie for volume-backed instances for technical reasons. It was > > a bug, not a feature. > > > > I also strongly believe that if we're going to add a "but not > > really" flag, it needs to apply to volume-backed and regular instances > > identically. Because that's what the change here was doing - unifying > > the behavior for a single API operation. Going the other direction does > > not seem useful to me. > > > > --Dan > > > > [0] > https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hiromu.asahina.az at hco.ntt.co.jp Fri Mar 17 09:57:35 2023 From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina) Date: Fri, 17 Mar 2023 18:57:35 +0900 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> Message-ID: <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> Thank you for your reply. I'd like to decide the time slot for this topic. I just checked PTG schedule [1]. We have the following time slots. Which one is convenient to gether? (I didn't get reply but I listed Barbican, as its cores are almost the same as Keystone) Mon, 27: - 14 (keystone) - 15 (keystone) Tue, 28 - 13 (barbican) - 14 (keystone, ironic) - 15 (keysonte, ironic) - 16 (ironic) Wed, 29 - 13 (ironic) - 14 (keystone, ironic) - 15 (keystone, ironic) - 21 (ironic) Thanks, [1] https://ptg.opendev.org/ptg.html Hiromu Asahina On 2023/02/11 1:41, Jay Faulkner wrote: > I think it's safe to say the Ironic community would be very invested in > such an effort. Let's make sure the time chosen for vPTG with this is such > that Ironic contributors can attend as well. > > Thanks, > Jay Faulkner > Ironic PTL > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > >> Hello Everyone, >> >> Recently, Tacker and Keystone have been working together on a new Keystone >> Middleware that can work with external authentication >> services, such as Keycloak. The code has already been submitted [1], but >> we want to make this middleware a generic plugin that works >> with as many OpenStack services as possible. To that end, we would like to >> hear from other projects with similar use cases >> (especially Ironic and Barbican, which run as standalone services). We >> will make a time slot to discuss this topic at the next vPTG. >> Please contact me if you are interested and available to participate. >> >> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 >> >> -- >> Hiromu Asahina >> >> >> >> > -- ?-------------------------------------? NTT Network Innovation Center Hiromu Asahina ------------------------------------- 3-9-11, Midori-cho, Musashino-shi Tokyo 180-8585, Japan ? Phone: +81-422-59-7008 ? Email: hiromu.asahina.az at hco.ntt.co.jp ?-------------------------------------? From mnasiadka at gmail.com Fri Mar 17 10:52:32 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 17 Mar 2023 11:52:32 +0100 Subject: [kolla] Bobcat vPTG slots and topics Message-ID: Hello, Koalas! Allocated slots for Kolla sessions: 27-30 March 2023: Monday - 13.00 - 17.00 UTC (general, Kolla and Kolla-Ansible) Tuesday - 13.00 - 15.00 UTC (Kolla-Ansible) Tuesday - 15.00 - 17.00 UTC (Operator Hours Kolla) Thursday - 13.00 - 15.00 UTC (Kayobe) Please look at Kolla planning etherpad [1] and fill out topic proposals. Looking forward to meeting you! [1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning [2] https://ptg.opendev.org/ptg.html Thanks, mnasiadka -------------- next part -------------- An HTML attachment was scrubbed... URL: From manchandavishal143 at gmail.com Fri Mar 17 10:59:37 2023 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Fri, 17 Mar 2023 16:29:37 +0530 Subject: [horizon] Bobcat PTG Schedule Message-ID: Hello Team, Please Find the Schedule for Horizon Bobcat PTG in the eherpad [1]. Feel Free to add the topics you want to discuss in the PTG. Don't forget to register for PTG, if not done yet [2]. See you at the PTG! Thanks & Regards, Vishal Manchanda (irc: vishalmanchanda) [1] https://etherpad.opendev.org/p/horizon-bobcat-ptg [2] https://openinfra-ptg.eventbrite.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Fri Mar 17 13:20:17 2023 From: mkopec at redhat.com (Martin Kopec) Date: Fri, 17 Mar 2023 14:20:17 +0100 Subject: [qa][ptg] Virtual Bobcat vPTG Planning Message-ID: Hello everyone, here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your topics there if there is anything you would like to discuss / propose ... You can also vote for time slots for our sessions so that they fit your schedule at [2]. We will go most likely with 1-hour slot per day, as they usually fit easier into everyone's schedule. The number of slots will depend on the number of topics proposed in [1]. [1] https://etherpad.opendev.org/p/qa-bobcat-ptg [2] https://framadate.org/sLZppMVkFw2FcEhO Thanks, -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA IM: kopecmartin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Fri Mar 17 14:19:12 2023 From: hberaud at redhat.com (Herve Beraud) Date: Fri, 17 Mar 2023 15:19:12 +0100 Subject: [release] Release countdown for week R-0, March 20-24 Message-ID: Development Focus ----------------- We will be releasing the coordinated OpenStack Antelope 2023.1 release next week, on March 22. Thanks to everyone involved in the Antelope 2023.1 cycle! We are now in pre-release freeze, so no new deliverable will be created until final release, unless a release-critical regression is spotted. Otherwise, teams attending the virtual PTG should start to plan what they will be discussing there, by creating and filling team etherpads. You can access the list of PTG etherpads at: http://ptg.openstack.org/etherpads.html General Information ------------------- On release day, the release team will produce final versions of deliverables following the cycle-with-rc release model, by re-tagging the commit used for the last RC. A patch doing just that will be proposed. PTLs and release liaisons should watch for that final release patch from the release team. While not required, we would appreciate having an ack from each team before we approve it on the 22nd, so that their approval is included in the metadata that goes onto the signed tag. Upcoming Deadlines & Dates -------------------------- Final Antelope 2023.1 release: March 22 Virtual PTG: March 27-31 -- Herv? Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Fri Mar 17 14:29:31 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Fri, 17 Mar 2023 07:29:31 -0700 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> Message-ID: I'm not sure how many Ironic contributors would be the ones to attend a discussion, in part because this is disjointed from the items they need to focus on. It is much more of a "big picture" item for those of us who are leaders in the project. I think it would help to understand how much time you expect the discussion to take to determine a path forward and how we can collaborate. Ironic has a huge number of topics we want to discuss during the PTG, and I suspect our team meeting on Monday next week should yield more interest/awareness as well as an amount of time for each topic which will aid us in scheduling. If you can let us know how long, then I think we can figure out when the best day/time will be. Thanks! -Julia On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < hiromu.asahina.az at hco.ntt.co.jp> wrote: > Thank you for your reply. > > I'd like to decide the time slot for this topic. > I just checked PTG schedule [1]. > > We have the following time slots. Which one is convenient to gether? > (I didn't get reply but I listed Barbican, as its cores are almost the > same as Keystone) > > Mon, 27: > > - 14 (keystone) > - 15 (keystone) > > Tue, 28 > > - 13 (barbican) > - 14 (keystone, ironic) > - 15 (keysonte, ironic) > - 16 (ironic) > > Wed, 29 > > - 13 (ironic) > - 14 (keystone, ironic) > - 15 (keystone, ironic) > - 21 (ironic) > > Thanks, > > [1] https://ptg.opendev.org/ptg.html > > Hiromu Asahina > > > On 2023/02/11 1:41, Jay Faulkner wrote: > > I think it's safe to say the Ironic community would be very invested in > > such an effort. Let's make sure the time chosen for vPTG with this is > such > > that Ironic contributors can attend as well. > > > > Thanks, > > Jay Faulkner > > Ironic PTL > > > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > >> Hello Everyone, > >> > >> Recently, Tacker and Keystone have been working together on a new > Keystone > >> Middleware that can work with external authentication > >> services, such as Keycloak. The code has already been submitted [1], but > >> we want to make this middleware a generic plugin that works > >> with as many OpenStack services as possible. To that end, we would like > to > >> hear from other projects with similar use cases > >> (especially Ironic and Barbican, which run as standalone services). We > >> will make a time slot to discuss this topic at the next vPTG. > >> Please contact me if you are interested and available to participate. > >> > >> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 > >> > >> -- > >> Hiromu Asahina > >> > >> > >> > >> > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > Phone: +81-422-59-7008 > Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ihrachys at redhat.com Fri Mar 17 15:07:44 2023 From: ihrachys at redhat.com (Ihar Hrachyshka) Date: Fri, 17 Mar 2023 11:07:44 -0400 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 Message-ID: Hi all, (I've tagged the thread with [ovn] because this question was raised in the context of OVN, but it really is about the intent of neutron stateless SG API.) Neutron API supports 'stateless' field for security groups: https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group The API reference doesn't explain the intent of the API, merely walking through the field mechanics, as in "The stateful security group extension (stateful-security-group) adds the stateful field to security groups, allowing users to configure stateful or stateless security groups for ports. The existing security groups will all be considered as stateful. Update of the stateful attribute is allowed when there is no port associated with the security group." The meaning of the API is left for users to deduce. It's customary understood as something like "allowing to bypass connection tracking in the firewall, potentially providing performance and simplicity benefits" (while imposing additional complexity onto rule definitions - the user now has to explicitly define rules for both directions of a duplex connection.) [This is not an official definition, nor it's quoted from a respected source, please don't criticize it. I don't think this is an important point here.] Either way, the definition doesn't explain what should happen with basic network services that a user of Neutron SG API is used to rely on. Specifically, what happens for a port related to a stateless SG when it trying to fetch metadata from 169.254.169.254 (or its IPv6 equivalent), or what happens when it attempts to use SLAAC / DHCPv6 procedure to configure its IPv6 stack. As part of our testing of stateless SG implementation for OVN backend, we've noticed that VMs fail to configure via metadata, or use SLAAC to configure IPv6. metadata: https://bugs.launchpad.net/neutron/+bug/2009053 slaac: https://bugs.launchpad.net/neutron/+bug/2006949 We've noticed that adding explicit SG rules to allow 'returning' communication for 169.254.169.254:80 and RA / NA fixes the problem. I figured that these services are "base" / "basic" and should be provided to ports regardless of the stateful-ness of SG. I proposed patches for this here: metadata series: https://review.opendev.org/q/topic:bug%252F2009053 RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 Discussion in the patch that adjusts the existing stateless SG test scenarios to not create explicit SG rules for metadata and ICMP replies suggests that it's not a given / common understanding that these "base" services should work by default for stateless SGs. See discussion in comments here: https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 While this discussion is happening in the context of OVN, I think it should be resolved in a broader context. Specifically, a decision should be made about what Neutron API "means" by stateless SGs, and how "base" services are supposed to behave. Then backends can act accordingly. There's also an open question of how this should be implemented. Whether Neutron would like to create explicit SG rules visible in API that would allow for the returning traffic and that could be deleted as needed, or whether backends should do it implicitly. We already have "default" egress rules, so there's a precedent here. On the other hand, the egress rules are broad (allowing everything) and there's more rationale to delete them and replace them with tighter filters. In my OVN series, I implement ACLs directly in OVN database, without creating SG rules in Neutron API. So, questions for the community to clarify: - whether Neutron API should define behavior of stateless SGs in general, - if so, whether Neutron API should also define behavior of stateless SGs in terms of "base" services like metadata and DHCP, - if so, whether backends should implement the necessary filters themselves, or Neutron will create default SG rules itself. I hope I laid the problem out clearly, let me know if anything needs clarification or explanation. Yours, Ihar From dpawlik at redhat.com Fri Mar 17 15:42:49 2023 From: dpawlik at redhat.com (Daniel Pawlik) Date: Fri, 17 Mar 2023 16:42:49 +0100 Subject: Opensearch service upgrade Message-ID: Hello, We would like to notify you that the Opensearch service [1] would be updated to a newer version on 03.04.2023 at 12:00 PM UTC. This procedure might take a while, depending of the cluster size. During that time, the Opensearch service would not be available. If anyone has any doubts, please reply to the email. Dan [1] - https://opensearch.logs.openstack.org/_dashboards/app/home -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Mar 17 21:19:22 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 17 Mar 2023 14:19:22 -0700 Subject: [ptl][tc][ops][ptg] Operator + Developers interaction (operator-hours) slots in 2023.2 Bobcat PTG Message-ID: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> Hello Everyone/PTL, To improve the interaction/feedback between operators and developers, one of the efforts is to schedule the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, which had mixed productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in March vPTG also. TC will not book the placeholder this time so that slots can be booked in the project room itself, and operators can join developers to have a joint discussion. But at the same time, we need to avoid slot conflict for operators. Every project needs to make sure its 'operator-hour' does not overlap with the related projects (integrated projects which might have common operators, for example. nova, cinder, neutron etc needs to avoid conflict) 'operator-hour'. Guidelines for the project team to book 'operator-hour' --------------------------------------------------------------------------------------- * Request in #openinfra-events IRC channel to register the new track 'operator-hour-'. For example, 'operator-hour-nova' * Once the track is registered, find a spot in your project slots where no other project (which you think is related/integrated project and might have common operators) has already booked their operator-hour. Accordingly, book with the newly registered track 'operator-hour-'. For example, #operator-hour-nova book essex-WedB1 . * Do not book more than one slot (1 hour) so that other projects will have enough slots open to book. If more discussion is needed on anything, it can be continued in project-specific slots. We request that every project book an 'operator hour' slot for operators to join your PTG session. For any query/conflict, ping TC in #openstack-tc or #openinfra-events IRC channel. [1] https://etherpad.opendev.org/p/Oct2022_PTGFeedback#L32 -gmann From jamesleong123098 at gmail.com Sat Mar 18 04:49:09 2023 From: jamesleong123098 at gmail.com (James Leong) Date: Fri, 17 Mar 2023 23:49:09 -0500 Subject: [zun] allow zun to get information from blazar database Message-ID: Hi all, I am using kolla-ansible for OpenStack deployment in the yoga version. Is It possible to allow zun to retrieve information from the blazar database in zun_api container? I have tried to include the blazar database connection information in the zun.conf file. However, when I try to use the newly added blazar information, I am getting the following error message. oslo_config.cfg.NoSuchOptError: no such option connectionTest in group [database] It seems like I have to set up some option somewhere else in the code. But I was not able to identify them. Thanks for your help James -------------- next part -------------- An HTML attachment was scrubbed... URL: From udaydikshit2007 at gmail.com Sat Mar 18 05:18:35 2023 From: udaydikshit2007 at gmail.com (Uday Dikshit) Date: Sat, 18 Mar 2023 10:48:35 +0530 Subject: Autoscaling in Kolla Ansible wallaby series Message-ID: Hello Team I am looking forward to have an autoscaling feature on Kolla Ansible wallaby series openstack. I am using Senlin to create cluster, gnocchi for metrics and aodh for alarm. However I am facing issue with aodh alarm state as it gets stuck in the state in which it is. I also found once the load is hiking, the gnocchi metrics get inconsistent. Due to this also the alarm state sticks and do not trigger alarm. To solve this, i relied upon collectd. But collectd service does not push any metric from the Hypervisor to gnocchi. My objective is that, in case of a metric reaching the threshold, alarm should automatically trigger and cluster should scale up or down based on the load. Once the scale up or down task is successfully completed, alarm should return to normal state automatically. Does anybody have any experience with this use case or wanna propose any other solution? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Sun Mar 19 12:23:57 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 19 Mar 2023 19:23:57 +0700 Subject: [openstack][masakari] Ask about Masakari segment Message-ID: Hello guys. I want to ask if I create two Masakari segments then instances will failover on only segment has a group of computes? Because I do test with this scenario, my instance failover on a different segment which has different compute hosts? Do I understand Masakari wrong? Thank you. Regards Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From techstep at gmail.com Sun Mar 19 15:00:30 2023 From: techstep at gmail.com (Rob Jefferson) Date: Sun, 19 Mar 2023 11:00:30 -0400 Subject: [openstack][masakari] Ask about Masakari segment In-Reply-To: References: Message-ID: On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i wrote: > > Hello guys. > I want to ask if I create two Masakari segments then instances will failover on only segment has a group of computes? Because I do test with this scenario, my instance failover on a different segment which has different compute hosts? Do I understand Masakari wrong? I would check which recovery method you're using. If you have two failover segments, and you set the recovery method to `reserved_host`, the failover will happen on a node in the non-active segment. If you set the method to `rh_priority`, it will try that first, but then attempt fall back on a machine in the active segment. If you want to recover on a host in the *same* segment (possibly the same host), use `auto` or `auto_priority` as the recovery method. Rob From nguyenhuukhoinw at gmail.com Sun Mar 19 15:10:49 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 19 Mar 2023 22:10:49 +0700 Subject: [openstack][masakari] Ask about Masakari segment In-Reply-To: References: Message-ID: Hello., thanks for sharing, I use auto as recovery method but instance recovery on a different segment, I just want to separate segment by different hypervisor hosts. Nguyen Huu Khoi On Sun, Mar 19, 2023 at 10:00?PM Rob Jefferson wrote: > On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i > wrote: > > > > Hello guys. > > I want to ask if I create two Masakari segments then instances will > failover on only segment has a group of computes? Because I do test with > this scenario, my instance failover on a different segment which has > different compute hosts? Do I understand Masakari wrong? > > I would check which recovery method you're using. > > If you have two failover segments, and you set the recovery method to > `reserved_host`, the failover will happen on a node in the non-active > segment. If you set the method to `rh_priority`, it will try that > first, but then attempt fall back on a machine in the active segment. > > If you want to recover on a host in the *same* segment (possibly the > same host), use `auto` or `auto_priority` as the recovery method. > > Rob > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Sun Mar 19 15:19:08 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 19 Mar 2023 22:19:08 +0700 Subject: [openstack][masakari] Ask about Masakari segment In-Reply-To: References: Message-ID: I have read it https://docs.openstack.org/masakari/xena/install/overview.html I tested with two segments but It dont have my desired result. Segment A: compute01 compute02 compute03 Segment B: compute04 compute05 When I turn off compute01, I hope that instance will recover on compute02 or compute03 but it recovered on segment B. I feel strange. Nguyen Huu Khoi On Sun, Mar 19, 2023 at 10:10?PM Nguy?n H?u Kh?i wrote: > Hello., thanks for sharing, > I use auto as recovery method but instance recovery on a > different segment, I just want to separate segment by different hypervisor > hosts. > Nguyen Huu Khoi > > > On Sun, Mar 19, 2023 at 10:00?PM Rob Jefferson wrote: > >> On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i >> wrote: >> > >> > Hello guys. >> > I want to ask if I create two Masakari segments then instances will >> failover on only segment has a group of computes? Because I do test with >> this scenario, my instance failover on a different segment which has >> different compute hosts? Do I understand Masakari wrong? >> >> I would check which recovery method you're using. >> >> If you have two failover segments, and you set the recovery method to >> `reserved_host`, the failover will happen on a node in the non-active >> segment. If you set the method to `rh_priority`, it will try that >> first, but then attempt fall back on a machine in the active segment. >> >> If you want to recover on a host in the *same* segment (possibly the >> same host), use `auto` or `auto_priority` as the recovery method. >> >> Rob >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Sun Mar 19 16:01:43 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Mon, 20 Mar 2023 01:01:43 +0900 Subject: [storlets] Proposal to make Train/Ussuri/Victoria EOL Message-ID: Hello, Currently we have multiple stable branches open but we haven't seen any backport proposed so far. To reduce number of branches we have to maintain, I'd like to propose retiring old stable branches(train, ussuri and victoria). In case you have any concerns, please let me know. Thank you, Takashi Kajinami -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Sun Mar 19 17:36:18 2023 From: wodel.youchi at gmail.com (wodel youchi) Date: Sun, 19 Mar 2023 18:36:18 +0100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod Message-ID: Hi, I am trying to attach a cinder volume to my pod, but it does not work. The long story, the default version of kubernetes used in Yoga is 1.23.3 fcore35. When creating a default kubernetes cluster we got : > Image: quay.io/k8scsi/csi-attacher:v2.0.0 > Image: quay.io/k8scsi/csi-provisioner:v1.4.0 > Image: quay.io/k8scsi/csi-snapshotter:v1.2.2 > Image: quay.io/k8scsi/csi-resizer:v0.3.0 > Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0 > Image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 > Image: > docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1 > Which 1 - Does not correspond to the documentation of Magnum, the documentation states these defaults for yoga : > Image: 10.0.0.165:4000/csi-attacher:v3.3.0 > Image: 10.0.0.165:4000/csi-provisioner:v3.0.0 > Image: 10.0.0.165:4000/csi-snapshotter:v4.2.1 > Image: 10.0.0.165:4000/csi-resizer:v1.3.0 > Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 > Image: 10.0.0.165:4000/csi-node-driver-registrar:v2.4.0 > Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 > (cinder-csi-plugin:v1.23.0 which does not exists anymore) > 2 - And does not work, csi-cinder-controllerplugin keeps crashing. I tried to use the updates images (using a local registry), but I couldn't attach the cinder-volume, I got : Volumes: html-volume: Type: Cinder (a Persistent Disk resource in OpenStack) VolumeID: f780cb46-ed2a-405d-b901-7201b49c3df1 FSType: ext4 ReadOnly: false SecretRef: nil kube-api-access-slqf4: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- * Warning FailedMount 26m (x10 over 135m) kubelet Unable to attach or mount volumes: unmounted volumes=[html-volume], unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting for the condition Warning FailedAttachVolume 3m39s (x40 over 146m) attachdetach-controller AttachVolume.Attach failed for volume "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1 Warning FailedMount 104s (x54 over 146m) kubelet Unable to attach or mount volumes: unmounted volumes=[html-volume], unattached volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the condition* "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]} I0319 12:36:50.910443 1 connection.go:201] GRPC error: I0319 12:36:56.925658 1 controller.go:210] Started VA processing "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.925682 1 csi_handler.go:224] CSIHandler: processing VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.925687 1 csi_handler.go:251] Attaching "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.925691 1 csi_handler.go:421] Starting attach operation for "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.925705 1 csi_handler.go:740] Found NodeID 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode k8intcalnewer-56bgom6jntbm-node-0 I0319 12:36:56.925828 1 csi_handler.go:312] VA finalizer added to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.925836 1 csi_handler.go:326] NodeID annotation added to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:56.947632 1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerPublishVolume I0319 12:36:56.947646 1 connection.go:194] GRPC request: {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"} I0319 12:36:58.343821 1 connection.go:200] GRPC response: {"publish_context":{"DevicePath":"/dev/vdc"}} I0319 12:36:58.343834 1 connection.go:201] GRPC error: I0319 12:36:58.343841 1 csi_handler.go:264] Attached "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.343848 1 util.go:38] Marking as attached "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" *I0319 12:36:58.348467 1 csi_handler.go:234] Error processing "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": failed to mark as attached: volumeattachments.storage.k8s.io "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is forbidden: User "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch resource "volumeattachments/status" in API group "storage.k8s.io " at the cluster scope* I0319 12:36:58.348503 1 controller.go:210] Started VA processing "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348509 1 csi_handler.go:224] CSIHandler: processing VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348513 1 csi_handler.go:251] Attaching "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348517 1 csi_handler.go:421] Starting attach operation for "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348525 1 csi_handler.go:740] Found NodeID 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode k8intcalnewer-56bgom6jntbm-node-0 I0319 12:36:58.348540 1 csi_handler.go:304] VA finalizer is already set on "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348552 1 csi_handler.go:318] NodeID annotation is already set on "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" I0319 12:36:58.348564 1 connection.go:193] GRPC call: /csi.v1.Controller/ControllerPublishVolume I0319 12:36:58.348567 1 connection.go:194] GRPC request: {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext: The I tried even the most updated images : > 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 > 10.0.0.165:4000/csi-provisioner:v3.4.0 > 10.0.0.165:4000/csi-resizer:v1.7.0 > 10.0.0.165:4000/csi-snapshotter:v6.2.1 > 10.0.0.165:4000/csi-attacher:v4.2.0 > 10.0.0.165:4000/csi-node-driver-registrar:v2.7.0 > I had the same problem. Then I tried to use an older version of kubernetes : 1.21.11 with the older images shown above (following this link https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/), and it worked, the cinder volume was successfully mounted inside my nginx pod. - What is the meaning of the error I am having? - Is it magnum related or kubernetes related or both? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliver.weinmann at me.com Sun Mar 19 20:31:28 2023 From: oliver.weinmann at me.com (Oliver Weinmann) Date: Sun, 19 Mar 2023 21:31:28 +0100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Mon Mar 20 02:34:11 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 20 Mar 2023 09:34:11 +0700 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: References: Message-ID: Hello. Are you enable enable_cluster_user_trust? Nguyen Huu Khoi On Mon, Mar 20, 2023 at 12:42?AM wodel youchi wrote: > Hi, > > I am trying to attach a cinder volume to my pod, but it does not work. > > The long story, the default version of kubernetes used in Yoga is 1.23.3 > fcore35. When creating a default kubernetes cluster we got : > >> Image: quay.io/k8scsi/csi-attacher:v2.0.0 >> Image: quay.io/k8scsi/csi-provisioner:v1.4.0 >> Image: quay.io/k8scsi/csi-snapshotter:v1.2.2 >> Image: quay.io/k8scsi/csi-resizer:v0.3.0 >> Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0 >> Image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 >> Image: >> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1 >> > > Which > 1 - Does not correspond to the documentation of Magnum, the documentation > states these defaults for yoga : > >> Image: 10.0.0.165:4000/csi-attacher:v3.3.0 >> Image: 10.0.0.165:4000/csi-provisioner:v3.0.0 >> Image: 10.0.0.165:4000/csi-snapshotter:v4.2.1 >> Image: 10.0.0.165:4000/csi-resizer:v1.3.0 >> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >> Image: 10.0.0.165:4000/csi-node-driver-registrar:v2.4.0 >> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >> (cinder-csi-plugin:v1.23.0 which does not exists anymore) >> > > 2 - And does not work, csi-cinder-controllerplugin keeps crashing. > > I tried to use the updates images (using a local registry), but I couldn't > attach the cinder-volume, I got : > > Volumes: > html-volume: > Type: Cinder (a Persistent Disk resource in OpenStack) > VolumeID: f780cb46-ed2a-405d-b901-7201b49c3df1 > FSType: ext4 > ReadOnly: false > SecretRef: nil > kube-api-access-slqf4: > Type: Projected (a volume that contains injected > data from multiple sources) > TokenExpirationSeconds: 3607 > ConfigMapName: kube-root-ca.crt > ConfigMapOptional: > DownwardAPI: true > QoS Class: Burstable > Node-Selectors: > Tolerations: node.kubernetes.io/not-ready:NoExecute > op=Exists for 300s > node.kubernetes.io/unreachable:NoExecute > op=Exists for 300s > Events: > Type Reason Age From > Message > ---- ------ ---- ---- > ------- > > > * Warning FailedMount 26m (x10 over 135m) kubelet > Unable to attach or mount volumes: unmounted volumes=[html-volume], > unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting > for the condition Warning FailedAttachVolume 3m39s (x40 over 146m) > attachdetach-controller AttachVolume.Attach failed for volume > "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach > timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1 Warning > FailedMount 104s (x54 over 146m) kubelet Unable > to attach or mount volumes: unmounted volumes=[html-volume], unattached > volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the > condition* > > > > "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]} > I0319 12:36:50.910443 1 connection.go:201] GRPC error: > I0319 12:36:56.925658 1 controller.go:210] Started VA processing > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.925682 1 csi_handler.go:224] CSIHandler: processing > VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.925687 1 csi_handler.go:251] Attaching > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.925691 1 csi_handler.go:421] Starting attach > operation for > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.925705 1 csi_handler.go:740] Found NodeID > 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode > k8intcalnewer-56bgom6jntbm-node-0 > I0319 12:36:56.925828 1 csi_handler.go:312] VA finalizer added to > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.925836 1 csi_handler.go:326] NodeID annotation added > to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:56.947632 1 connection.go:193] GRPC call: > /csi.v1.Controller/ControllerPublishVolume > I0319 12:36:56.947646 1 connection.go:194] GRPC request: > {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"} > I0319 12:36:58.343821 1 connection.go:200] GRPC response: > {"publish_context":{"DevicePath":"/dev/vdc"}} > I0319 12:36:58.343834 1 connection.go:201] GRPC error: > I0319 12:36:58.343841 1 csi_handler.go:264] Attached > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.343848 1 util.go:38] Marking as attached > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > *I0319 12:36:58.348467 1 csi_handler.go:234] Error processing > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": > failed to mark as attached: volumeattachments.storage.k8s.io > > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is > forbidden: User > "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch > resource "volumeattachments/status" in API group "storage.k8s.io > " at the cluster scope* > I0319 12:36:58.348503 1 controller.go:210] Started VA processing > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348509 1 csi_handler.go:224] CSIHandler: processing > VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348513 1 csi_handler.go:251] Attaching > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348517 1 csi_handler.go:421] Starting attach > operation for > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348525 1 csi_handler.go:740] Found NodeID > 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode > k8intcalnewer-56bgom6jntbm-node-0 > I0319 12:36:58.348540 1 csi_handler.go:304] VA finalizer is already > set on > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348552 1 csi_handler.go:318] NodeID annotation is > already set on > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > I0319 12:36:58.348564 1 connection.go:193] GRPC call: > /csi.v1.Controller/ControllerPublishVolume > I0319 12:36:58.348567 1 connection.go:194] GRPC request: > {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext: > > > The I tried even the most updated images : > >> 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >> 10.0.0.165:4000/csi-provisioner:v3.4.0 >> 10.0.0.165:4000/csi-resizer:v1.7.0 >> 10.0.0.165:4000/csi-snapshotter:v6.2.1 >> 10.0.0.165:4000/csi-attacher:v4.2.0 >> 10.0.0.165:4000/csi-node-driver-registrar:v2.7.0 >> > > I had the same problem. > > Then I tried to use an older version of kubernetes : 1.21.11 with the > older images shown above (following this link > https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/), > and it worked, the cinder volume was successfully mounted inside my nginx > pod. > > > > - What is the meaning of the error I am having? > - Is it magnum related or kubernetes related or both? > > > Regards. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnaud.morin at gmail.com Mon Mar 20 09:17:52 2023 From: arnaud.morin at gmail.com (Arnaud Morin) Date: Mon, 20 Mar 2023 09:17:52 +0000 Subject: [neutron] Extra routes Message-ID: Hey all, When using DVR, is there any way to set extra-routes only on the snat network nodes? I want routes to apply only on north/south communication, not on east/west. I can't find something like this is API. Cheers, From jake.yip at ardc.edu.au Mon Mar 20 10:33:58 2023 From: jake.yip at ardc.edu.au (Jake Yip) Date: Mon, 20 Mar 2023 21:33:58 +1100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: References: Message-ID: <7014a240-19c1-ae44-795d-0123d3c0b7b1@ardc.edu.au> Hi, I had a feeling the below two issues are due to a missing backport[1] to Yoga. I tried to backport it locally but it failed devstack, so it might take a while before we have something. Regards, Jake [1] https://review.opendev.org/c/openstack/magnum/+/833354 On 20/3/2023 4:36 am, wodel youchi wrote: > Hi, > > I am trying to attach a cinder volume to my pod, but it does not work. > > The long story, the default version of kubernetes used in Yoga is 1.23.3 > fcore35. When creating a default kubernetes cluster we got : > > ? ? Image: quay.io/k8scsi/csi-attacher:v2.0.0 > > ... > > Which > 1 - Does not correspond to the documentation of Magnum, the > documentation states these defaults for yoga : > > ??? Image: 10.0.0.165:4000/csi-attacher:v3.3.0 > > ... > > > > I tried to use the updates images (using a local registry), but I > couldn't attach the cinder-volume, I got : > > ... > *I0319 12:36:58.348467 ? ? ? 1 csi_handler.go:234] Error processing > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": > failed to mark as attached: volumeattachments.storage.k8s.io > > "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" > is forbidden: User > "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot > patch resource "volumeattachments/status" in API group "storage.k8s.io > " at the cluster scope* From wodel.youchi at gmail.com Mon Mar 20 10:40:35 2023 From: wodel.youchi at gmail.com (wodel youchi) Date: Mon, 20 Mar 2023 11:40:35 +0100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: References: Message-ID: Hi, @Oliver, thanks to you for your blog, it was simple yet it helped me a lot. I am a newbie in the kubernetes world. @Nguyen, yes I do have enable_cluster_user_trust enabled in my globals.yml >From these two threads (https://github.com/rook/rook/issues/6457, https://bugzilla.redhat.com/show_bug.cgi?id=1769693), I think it's an access right problem, a missing access right, what I don't know, is should I add this access right manually? should I update the rest of the images in the cluster, maybe one of them contains the missing right? In the first thread it is said : Solution:- - apiGroups: ["storage.k8s.io"] resources: ["volumeattachments/status"] verbs: ["patch"] need to be added to rbd-external-provisioner-runner and cephfs-external-provisioner-runner ClusterRole In the second thread : csi-external-attacher has changed in 4.3 external attacher needs extra privileges to patch various API objects. Regards. Le lun. 20 mars 2023 ? 03:34, Nguy?n H?u Kh?i a ?crit : > Hello. > Are you enable enable_cluster_user_trust? > Nguyen Huu Khoi > > > On Mon, Mar 20, 2023 at 12:42?AM wodel youchi > wrote: > >> Hi, >> >> I am trying to attach a cinder volume to my pod, but it does not work. >> >> The long story, the default version of kubernetes used in Yoga is 1.23.3 >> fcore35. When creating a default kubernetes cluster we got : >> >>> Image: quay.io/k8scsi/csi-attacher:v2.0.0 >>> Image: quay.io/k8scsi/csi-provisioner:v1.4.0 >>> Image: quay.io/k8scsi/csi-snapshotter:v1.2.2 >>> Image: quay.io/k8scsi/csi-resizer:v0.3.0 >>> Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0 >>> Image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 >>> Image: >>> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1 >>> >> >> Which >> 1 - Does not correspond to the documentation of Magnum, the documentation >> states these defaults for yoga : >> >>> Image: 10.0.0.165:4000/csi-attacher:v3.3.0 >>> Image: 10.0.0.165:4000/csi-provisioner:v3.0.0 >>> Image: 10.0.0.165:4000/csi-snapshotter:v4.2.1 >>> Image: 10.0.0.165:4000/csi-resizer:v1.3.0 >>> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>> Image: 10.0.0.165:4000/csi-node-driver-registrar:v2.4.0 >>> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>> (cinder-csi-plugin:v1.23.0 which does not exists anymore) >>> >> >> 2 - And does not work, csi-cinder-controllerplugin keeps crashing. >> >> I tried to use the updates images (using a local registry), but I >> couldn't attach the cinder-volume, I got : >> >> Volumes: >> html-volume: >> Type: Cinder (a Persistent Disk resource in OpenStack) >> VolumeID: f780cb46-ed2a-405d-b901-7201b49c3df1 >> FSType: ext4 >> ReadOnly: false >> SecretRef: nil >> kube-api-access-slqf4: >> Type: Projected (a volume that contains injected >> data from multiple sources) >> TokenExpirationSeconds: 3607 >> ConfigMapName: kube-root-ca.crt >> ConfigMapOptional: >> DownwardAPI: true >> QoS Class: Burstable >> Node-Selectors: >> Tolerations: node.kubernetes.io/not-ready:NoExecute >> op=Exists for 300s >> node.kubernetes.io/unreachable:NoExecute >> op=Exists for 300s >> Events: >> Type Reason Age From >> Message >> ---- ------ ---- ---- >> ------- >> >> >> * Warning FailedMount 26m (x10 over 135m) kubelet >> Unable to attach or mount volumes: unmounted volumes=[html-volume], >> unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting >> for the condition Warning FailedAttachVolume 3m39s (x40 over 146m) >> attachdetach-controller AttachVolume.Attach failed for volume >> "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach >> timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1 Warning >> FailedMount 104s (x54 over 146m) kubelet Unable >> to attach or mount volumes: unmounted volumes=[html-volume], unattached >> volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the >> condition* >> >> >> >> "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]} >> I0319 12:36:50.910443 1 connection.go:201] GRPC error: >> I0319 12:36:56.925658 1 controller.go:210] Started VA processing >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.925682 1 csi_handler.go:224] CSIHandler: processing >> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.925687 1 csi_handler.go:251] Attaching >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.925691 1 csi_handler.go:421] Starting attach >> operation for >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.925705 1 csi_handler.go:740] Found NodeID >> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode >> k8intcalnewer-56bgom6jntbm-node-0 >> I0319 12:36:56.925828 1 csi_handler.go:312] VA finalizer added to >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.925836 1 csi_handler.go:326] NodeID annotation added >> to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:56.947632 1 connection.go:193] GRPC call: >> /csi.v1.Controller/ControllerPublishVolume >> I0319 12:36:56.947646 1 connection.go:194] GRPC request: >> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"} >> I0319 12:36:58.343821 1 connection.go:200] GRPC response: >> {"publish_context":{"DevicePath":"/dev/vdc"}} >> I0319 12:36:58.343834 1 connection.go:201] GRPC error: >> I0319 12:36:58.343841 1 csi_handler.go:264] Attached >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.343848 1 util.go:38] Marking as attached >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> *I0319 12:36:58.348467 1 csi_handler.go:234] Error processing >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": >> failed to mark as attached: volumeattachments.storage.k8s.io >> >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is >> forbidden: User >> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch >> resource "volumeattachments/status" in API group "storage.k8s.io >> " at the cluster scope* >> I0319 12:36:58.348503 1 controller.go:210] Started VA processing >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348509 1 csi_handler.go:224] CSIHandler: processing >> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348513 1 csi_handler.go:251] Attaching >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348517 1 csi_handler.go:421] Starting attach >> operation for >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348525 1 csi_handler.go:740] Found NodeID >> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode >> k8intcalnewer-56bgom6jntbm-node-0 >> I0319 12:36:58.348540 1 csi_handler.go:304] VA finalizer is already >> set on >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348552 1 csi_handler.go:318] NodeID annotation is >> already set on >> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >> I0319 12:36:58.348564 1 connection.go:193] GRPC call: >> /csi.v1.Controller/ControllerPublishVolume >> I0319 12:36:58.348567 1 connection.go:194] GRPC request: >> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext: >> >> >> The I tried even the most updated images : >> >>> 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>> 10.0.0.165:4000/csi-provisioner:v3.4.0 >>> 10.0.0.165:4000/csi-resizer:v1.7.0 >>> 10.0.0.165:4000/csi-snapshotter:v6.2.1 >>> 10.0.0.165:4000/csi-attacher:v4.2.0 >>> 10.0.0.165:4000/csi-node-driver-registrar:v2.7.0 >>> >> >> I had the same problem. >> >> Then I tried to use an older version of kubernetes : 1.21.11 with the >> older images shown above (following this link >> https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/), >> and it worked, the cinder volume was successfully mounted inside my nginx >> pod. >> >> >> >> - What is the meaning of the error I am having? >> - Is it magnum related or kubernetes related or both? >> >> >> Regards. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Mar 20 10:42:47 2023 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 20 Mar 2023 11:42:47 +0100 Subject: [largescale-sig] Next meeting: March 22, 15utc Message-ID: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org> Hi everyone, The Large Scale SIG will be meeting this Wednesday, the Antelope release day, in #openstack-operators on OFTC IRC, at 15UTC, our EU+US-friendly time. Since we currently are in DST hell, you should doublecheck how that UTC time translates locally at: https://www.timeanddate.com/worldclock/fixedtime.html?iso=20230322T15 Feel free to add topics to the agenda: https://etherpad.opendev.org/p/large-scale-sig-meeting Regards, -- Thierry Carrez From pdeore at redhat.com Mon Mar 20 13:45:07 2023 From: pdeore at redhat.com (Pranali Deore) Date: Mon, 20 Mar 2023 19:15:07 +0530 Subject: [Glance][PTG] Glance 2023.2 (Bobcat) vPTG Schedule Message-ID: Hello All, The 2023.2 (Bobcat) virtual PTG is going to start next week and we have created our PTG etherpad [1] and also added day wise topics along with timings we are going to discuss. Kindly let me know if you have any concerns with allotted time slots. Friday is reserved for any unplanned discussions. So please feel free to add your topics if you haven't added yet. As a reminder, these are the time slots for our discussion. Tuesday 28 MARCH 2023 1400 UTC to 1700 UTC Wednesday 29 MARCH 2023 1400 UTC to 1700 UTC Thursday 30 MARCH 2023 1400 UTC to 1700 UTC Friday 31 MARCH 2023 1400 UTC to 1700 UTC NOTE: We have scheduled glance operator hours on Thursday at 16:20 UTC(we can extend it if required), let us know your availability for the same. At the moment we don't have any sessions scheduled on Friday, if there are any last moment request(s)/topic(s) we will discuss that on Friday else we will conclude our PTG on Thursday 30th March. We will be using bluejeans for our discussion, kindly try to use it once before the actual discussion. The meeting URL is mentioned in etherpad [1] and will be the same throughout the PTG. [1]: https://etherpad.opendev.org/p/glance-bobcat-ptg Hope to see you there!! Thanks & Regards, Pranali Deore -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliver.weinmann at me.com Mon Mar 20 14:53:35 2023 From: oliver.weinmann at me.com (Oliver Weinmann) Date: Mon, 20 Mar 2023 15:53:35 +0100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: References: Message-ID: <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com> An HTML attachment was scrubbed... URL: From ts-takahashi at nec.com Mon Mar 20 15:28:25 2023 From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=) Date: Mon, 20 Mar 2023 15:28:25 +0000 Subject: [openstack-helm][tacker] Message-ID: Hi Openstack-helm team, I?m Toshiaki Takahashi, Tacker?s core developer. To my understanding, OpenStack-helm does not currently provide a Tacker?s helm chart. Recently there have been some requests to deploy Tacker with helm, and we would like to proceed with development of it if possible. Do we need to take any action such as to participate meeting of Openstack-helm project and propose our plan? Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the plan at PTG, but is it planned? (I don't see any schedule at the moment). Regards, Toshiaki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5764 bytes Desc: not available URL: From J.Horstmann at mittwald.de Mon Mar 20 15:33:26 2023 From: J.Horstmann at mittwald.de (Jan Horstmann) Date: Mon, 20 Mar 2023 15:33:26 +0000 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: <2315188.ElGaqSPkdT@p1> Message-ID: <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de> On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote: > Hi, > > > Subject: Re: [neutron] detecting l3-agent readiness > > > > Hi, > > > > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze: > > > Hi Mohammed, > > > > > > > Subject: [neutron] detecting l3-agent readiness > > > > > > > > Hi folks, > > > > > > > > I'm working on improving the stability of rollouts when using Kubernetes as a control > > plane, specifically around the L3 agent, it seems that I have not found a clear way to > > detect in the code path where the L3 agent has finished it's initial sync.. > > > > > > > > > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/- > > /blob/devel/files/startup_wait_for_ns.py > > > Basically we are checking against the neutron api what routers should be on the node and > > then validate that all keepalived processes are up and running. > > > > That would work only for HA routers. If You would also have routers which aren't "ha" this > > method may fail. > > > > Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived). > Instead of counting processes I have been using the l3 agent's `configurations.routers` field to determine its readiness. From my understanding comparing this number with the number of active routers hosted by the agent should be a good indicator of its sync status. Using two api calls for this is inherently racy, but could be a sufficient workaround for environments with a moderate number of router events. So a simple test snippet for the sync status of all agents could be: ``` import sys import openstack client = openstack.connection.Connection( ... ) l3_agent_synced = [ len([ router for router in client.network.agent_hosted_routers(agent) if router.is_admin_state_up ]) <= client.network.get_agent(agent).configuration["routers"] for agent in client.network.agents() if agent.agent_type == "L3 agent" and (agent.configuration["agent_mode"] == "dvr_snat" or agent.configuration["agent_mode"] == "legacy") ] if not all(l3_agent_synced): sys.exit(1) ``` Please let me know if I am way off with this approach :) > > > > > > > Am I missing it somewhere or is the architecture built in a way that doesn't really > > answer that question? > > > > > > > > > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts > > for l2 and dhcp agents. > > > > > > > > > > Thanks > > > > Mohammed > > > > > > > > > > > > -- > > > > Mohammed Naser > > > > VEXXHOST, Inc. > > > > > > -- > > > Felix Huettner > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung > > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger > > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. > > Hinweise zum Datenschutz finden Sie hier. > > > > > > > > > -- > > Slawek Kaplonski > > Principal Software Engineer > > Red Hat > > -- > Felix Huettner > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier. -- Jan Horstmann From ts-takahashi at nec.com Mon Mar 20 15:33:35 2023 From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=) Date: Mon, 20 Mar 2023 15:33:35 +0000 Subject: [openstack-helm][tacker] Proposal to provide HelmChart for Tacker In-Reply-To: References: Message-ID: Sorry, I forgot to put the subject in my email ... From: TAKAHASHI TOSHIAKI(?????) Sent: Tuesday, March 21, 2023 12:28 AM To: openstack-discuss at lists.openstack.org Subject: [openstack-helm][tacker] Hi Openstack-helm team, I?m Toshiaki Takahashi, Tacker?s core developer. To my understanding, OpenStack-helm does not currently provide a Tacker?s helm chart. Recently there have been some requests to deploy Tacker with helm, and we would like to proceed with development of it if possible. Do we need to take any action such as to participate meeting of Openstack-helm project and propose our plan? Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the plan at PTG, but is it planned? (I don't see any schedule at the moment). Regards, Toshiaki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5764 bytes Desc: not available URL: From mnaser at vexxhost.com Mon Mar 20 15:42:35 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 20 Mar 2023 15:42:35 +0000 Subject: [openstack-helm][tacker] In-Reply-To: References: Message-ID: Hi! I think we?re always open to folks who are looking to contribute new charts. As a matter of fact, we?re in the process of adding Manila support. We?ve got a new PTL so perhaps it might be good to get them up to date on the PTG and reserve ?space?. I added Vladimir to this email thread so they can hopefully provide some input too. Thanks Mohammed From: TAKAHASHI TOSHIAKI(?????) Date: Monday, March 20, 2023 at 11:36 AM To: openstack-discuss at lists.openstack.org Subject: [openstack-helm][tacker] Hi Openstack-helm team, I?m Toshiaki Takahashi, Tacker?s core developer. To my understanding, OpenStack-helm does not currently provide a Tacker?s helm chart. Recently there have been some requests to deploy Tacker with helm, and we would like to proceed with development of it if possible. Do we need to take any action such as to participate meeting of Openstack-helm project and propose our plan? Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the plan at PTG, but is it planned? (I don't see any schedule at the moment). Regards, Toshiaki -------------- next part -------------- An HTML attachment was scrubbed... URL: From wodel.youchi at gmail.com Mon Mar 20 15:50:03 2023 From: wodel.youchi at gmail.com (wodel youchi) Date: Mon, 20 Mar 2023 16:50:03 +0100 Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod In-Reply-To: <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com> References: <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com> Message-ID: Hi, As stated by @Jake, there is some code lingering in https://review.opendev.org/c/openstack/magnum/+/833354 but it has not been merged, it does not exist even in the master branch of Magnum. It looks like we have no choice but the 1.21 version of kubernetes for now. Regards. Le lun. 20 mars 2023 ? 15:53, Oliver Weinmann a ?crit : > Hi, > > Good point: > > external attacher needs extra privileges to patch various API objects > > > I remember that I played around with this an tried to apply some yaml files but couldn?t make it work. > > > Cheers, > > Oliver > > > Von meinem iPhone gesendet > > Am 20.03.2023 um 11:43 schrieb wodel youchi : > > ? > Hi, > @Oliver, thanks to you for your blog, it was simple yet it helped me a > lot. I am a newbie in the kubernetes world. > > @Nguyen, yes I do have enable_cluster_user_trust enabled in my globals.yml > > From these two threads (https://github.com/rook/rook/issues/6457, > https://bugzilla.redhat.com/show_bug.cgi?id=1769693), I think it's an > access right problem, a missing access right, what I don't know, is should > I add this access right manually? should I update the rest of the images in > the cluster, maybe one of them contains the missing right? > > In the first thread it is said : > > Solution:- > > - apiGroups: ["storage.k8s.io"] > resources: ["volumeattachments/status"] > verbs: ["patch"] > need to be added to rbd-external-provisioner-runner and > cephfs-external-provisioner-runner ClusterRole > > In the second thread : > > csi-external-attacher has changed in 4.3 > > external attacher needs extra privileges to patch various API objects. > > > > Regards. > > Le lun. 20 mars 2023 ? 03:34, Nguy?n H?u Kh?i > a ?crit : > >> Hello. >> Are you enable enable_cluster_user_trust? >> Nguyen Huu Khoi >> >> >> On Mon, Mar 20, 2023 at 12:42?AM wodel youchi >> wrote: >> >>> Hi, >>> >>> I am trying to attach a cinder volume to my pod, but it does not work. >>> >>> The long story, the default version of kubernetes used in Yoga is 1.23.3 >>> fcore35. When creating a default kubernetes cluster we got : >>> >>>> Image: quay.io/k8scsi/csi-attacher:v2.0.0 >>>> Image: quay.io/k8scsi/csi-provisioner:v1.4.0 >>>> Image: quay.io/k8scsi/csi-snapshotter:v1.2.2 >>>> Image: quay.io/k8scsi/csi-resizer:v0.3.0 >>>> Image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0 >>>> Image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 >>>> Image: >>>> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1 >>>> >>> >>> Which >>> 1 - Does not correspond to the documentation of Magnum, the >>> documentation states these defaults for yoga : >>> >>>> Image: 10.0.0.165:4000/csi-attacher:v3.3.0 >>>> Image: 10.0.0.165:4000/csi-provisioner:v3.0.0 >>>> Image: 10.0.0.165:4000/csi-snapshotter:v4.2.1 >>>> Image: 10.0.0.165:4000/csi-resizer:v1.3.0 >>>> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>>> Image: 10.0.0.165:4000/csi-node-driver-registrar:v2.4.0 >>>> Image: 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>>> (cinder-csi-plugin:v1.23.0 which does not exists anymore) >>>> >>> >>> 2 - And does not work, csi-cinder-controllerplugin keeps crashing. >>> >>> I tried to use the updates images (using a local registry), but I >>> couldn't attach the cinder-volume, I got : >>> >>> Volumes: >>> html-volume: >>> Type: Cinder (a Persistent Disk resource in OpenStack) >>> VolumeID: f780cb46-ed2a-405d-b901-7201b49c3df1 >>> FSType: ext4 >>> ReadOnly: false >>> SecretRef: nil >>> kube-api-access-slqf4: >>> Type: Projected (a volume that contains injected >>> data from multiple sources) >>> TokenExpirationSeconds: 3607 >>> ConfigMapName: kube-root-ca.crt >>> ConfigMapOptional: >>> DownwardAPI: true >>> QoS Class: Burstable >>> Node-Selectors: >>> Tolerations: node.kubernetes.io/not-ready:NoExecute >>> op=Exists for 300s >>> node.kubernetes.io/unreachable:NoExecute >>> op=Exists for 300s >>> Events: >>> Type Reason Age From >>> Message >>> ---- ------ ---- ---- >>> ------- >>> >>> >>> * Warning FailedMount 26m (x10 over 135m) kubelet >>> Unable to attach or mount volumes: unmounted volumes=[html-volume], >>> unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting >>> for the condition Warning FailedAttachVolume 3m39s (x40 over 146m) >>> attachdetach-controller AttachVolume.Attach failed for volume >>> "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach >>> timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1 Warning >>> FailedMount 104s (x54 over 146m) kubelet Unable >>> to attach or mount volumes: unmounted volumes=[html-volume], unattached >>> volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the >>> condition* >>> >>> >>> >>> "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]} >>> I0319 12:36:50.910443 1 connection.go:201] GRPC error: >>> I0319 12:36:56.925658 1 controller.go:210] Started VA processing >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.925682 1 csi_handler.go:224] CSIHandler: processing >>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.925687 1 csi_handler.go:251] Attaching >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.925691 1 csi_handler.go:421] Starting attach >>> operation for >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.925705 1 csi_handler.go:740] Found NodeID >>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode >>> k8intcalnewer-56bgom6jntbm-node-0 >>> I0319 12:36:56.925828 1 csi_handler.go:312] VA finalizer added to >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.925836 1 csi_handler.go:326] NodeID annotation >>> added to >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:56.947632 1 connection.go:193] GRPC call: >>> /csi.v1.Controller/ControllerPublishVolume >>> I0319 12:36:56.947646 1 connection.go:194] GRPC request: >>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"} >>> I0319 12:36:58.343821 1 connection.go:200] GRPC response: >>> {"publish_context":{"DevicePath":"/dev/vdc"}} >>> I0319 12:36:58.343834 1 connection.go:201] GRPC error: >>> I0319 12:36:58.343841 1 csi_handler.go:264] Attached >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.343848 1 util.go:38] Marking as attached >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> *I0319 12:36:58.348467 1 csi_handler.go:234] Error processing >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": >>> failed to mark as attached: volumeattachments.storage.k8s.io >>> >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is >>> forbidden: User >>> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch >>> resource "volumeattachments/status" in API group "storage.k8s.io >>> " at the cluster scope* >>> I0319 12:36:58.348503 1 controller.go:210] Started VA processing >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348509 1 csi_handler.go:224] CSIHandler: processing >>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348513 1 csi_handler.go:251] Attaching >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348517 1 csi_handler.go:421] Starting attach >>> operation for >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348525 1 csi_handler.go:740] Found NodeID >>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode >>> k8intcalnewer-56bgom6jntbm-node-0 >>> I0319 12:36:58.348540 1 csi_handler.go:304] VA finalizer is >>> already set on >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348552 1 csi_handler.go:318] NodeID annotation is >>> already set on >>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" >>> I0319 12:36:58.348564 1 connection.go:193] GRPC call: >>> /csi.v1.Controller/ControllerPublishVolume >>> I0319 12:36:58.348567 1 connection.go:194] GRPC request: >>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext: >>> >>> >>> The I tried even the most updated images : >>> >>>> 10.0.0.165:4000/cinder-csi-plugin:v1.26.2 >>>> 10.0.0.165:4000/csi-provisioner:v3.4.0 >>>> 10.0.0.165:4000/csi-resizer:v1.7.0 >>>> 10.0.0.165:4000/csi-snapshotter:v6.2.1 >>>> 10.0.0.165:4000/csi-attacher:v4.2.0 >>>> 10.0.0.165:4000/csi-node-driver-registrar:v2.7.0 >>>> >>> >>> I had the same problem. >>> >>> Then I tried to use an older version of kubernetes : 1.21.11 with the >>> older images shown above (following this link >>> https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/), >>> and it worked, the cinder volume was successfully mounted inside my nginx >>> pod. >>> >>> >>> >>> - What is the meaning of the error I am having? >>> - Is it magnum related or kubernetes related or both? >>> >>> >>> Regards. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkopec at redhat.com Mon Mar 20 16:02:55 2023 From: mkopec at redhat.com (Martin Kopec) Date: Mon, 20 Mar 2023 17:02:55 +0100 Subject: [interop][ptg] Virtual Bobcat vPTG Planning Message-ID: Hello everyone, here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your topics there if there is anything you would like to discuss / propose ... You can also vote for time slots for our session(s), so that they fit your schedule, at [2]. If you have any questions, feel free to reach out to me. [1] https://etherpad.opendev.org/p/bobcat-ptg-interop [2] https://framadate.org/2IPXOCvJNNoSGqHu Thanks, -- Martin Kopec Senior Software Quality Engineer Red Hat EMEA IM: kopecmartin -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Mar 20 16:03:07 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Mon, 20 Mar 2023 17:03:07 +0100 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: References: Message-ID: <3840757.STTH5IQzZg@p1> Hi, Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: > Hi all, > > (I've tagged the thread with [ovn] because this question was raised in > the context of OVN, but it really is about the intent of neutron > stateless SG API.) > > Neutron API supports 'stateless' field for security groups: > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group > > The API reference doesn't explain the intent of the API, merely > walking through the field mechanics, as in > > "The stateful security group extension (stateful-security-group) adds > the stateful field to security groups, allowing users to configure > stateful or stateless security groups for ports. The existing security > groups will all be considered as stateful. Update of the stateful > attribute is allowed when there is no port associated with the > security group." > > The meaning of the API is left for users to deduce. It's customary > understood as something like > > "allowing to bypass connection tracking in the firewall, potentially > providing performance and simplicity benefits" (while imposing > additional complexity onto rule definitions - the user now has to > explicitly define rules for both directions of a duplex connection.) > [This is not an official definition, nor it's quoted from a respected > source, please don't criticize it. I don't think this is an important > point here.] > > Either way, the definition doesn't explain what should happen with > basic network services that a user of Neutron SG API is used to rely > on. Specifically, what happens for a port related to a stateless SG > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 > procedure to configure its IPv6 stack. > > As part of our testing of stateless SG implementation for OVN backend, > we've noticed that VMs fail to configure via metadata, or use SLAAC to > configure IPv6. > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 > > We've noticed that adding explicit SG rules to allow 'returning' > communication for 169.254.169.254:80 and RA / NA fixes the problem. > > I figured that these services are "base" / "basic" and should be > provided to ports regardless of the stateful-ness of SG. I proposed > patches for this here: > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 > > Discussion in the patch that adjusts the existing stateless SG test > scenarios to not create explicit SG rules for metadata and ICMP > replies suggests that it's not a given / common understanding that > these "base" services should work by default for stateless SGs. > > See discussion in comments here: > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 > > While this discussion is happening in the context of OVN, I think it > should be resolved in a broader context. Specifically, a decision > should be made about what Neutron API "means" by stateless SGs, and > how "base" services are supposed to behave. Then backends can act > accordingly. > > There's also an open question of how this should be implemented. > Whether Neutron would like to create explicit SG rules visible in API > that would allow for the returning traffic and that could be deleted > as needed, or whether backends should do it implicitly. We already > have "default" egress rules, so there's a precedent here. On the other > hand, the egress rules are broad (allowing everything) and there's > more rationale to delete them and replace them with tighter filters. > In my OVN series, I implement ACLs directly in OVN database, without > creating SG rules in Neutron API. > > So, questions for the community to clarify: > - whether Neutron API should define behavior of stateless SGs in general, > - if so, whether Neutron API should also define behavior of stateless > SGs in terms of "base" services like metadata and DHCP, > - if so, whether backends should implement the necessary filters > themselves, or Neutron will create default SG rules itself. I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there? > > I hope I laid the problem out clearly, let me know if anything needs > clarification or explanation. Yes :) At least for me. > > Yours, > Ihar > > > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From ralonsoh at redhat.com Mon Mar 20 16:09:12 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 20 Mar 2023 17:09:12 +0100 Subject: [neutron] detecting l3-agent readiness In-Reply-To: <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de> References: <2315188.ElGaqSPkdT@p1> <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de> Message-ID: Hello: I think I'm repeating myself here but we have two approaches to solve this problem: * The easiest one, at least for the L3 agent, is to report an INFO level log before and after the full sync. That could be parsed by any tool to detect that. You can propose a patch to the Neutron repository. * https://bugs.launchpad.net/neutron/+bug/2011422: a more elaborated way to report the agent status. That could provide the start flag, the revived flag, the sync processing flag and many other ones that could be defined only for this specific agent. Regards. On Mon, Mar 20, 2023 at 4:33?PM Jan Horstmann wrote: > On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote: > > Hi, > > > > > Subject: Re: [neutron] detecting l3-agent readiness > > > > > > Hi, > > > > > > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze: > > > > Hi Mohammed, > > > > > > > > > Subject: [neutron] detecting l3-agent readiness > > > > > > > > > > Hi folks, > > > > > > > > > > I'm working on improving the stability of rollouts when using > Kubernetes as a control > > > plane, specifically around the L3 agent, it seems that I have not > found a clear way to > > > detect in the code path where the L3 agent has finished it's initial > sync.. > > > > > > > > > > > > > We build such a solution here: > https://gitlab.com/yaook/images/neutron-l3-agent/- > > > /blob/devel/files/startup_wait_for_ns.py > > > > Basically we are checking against the neutron api what routers > should be on the node and > > > then validate that all keepalived processes are up and running. > > > > > > That would work only for HA routers. If You would also have routers > which aren't "ha" this > > > method may fail. > > > > > > > Yep, since we only have HA routers that works fine for us. But I guess > it should also work for non-ha routers without too much adoption (maybe > just check for namespaces instead of keepalived). > > > > Instead of counting processes I have been using the l3 agent's > `configurations.routers` field to determine its readiness. > From my understanding comparing this number with the number of active > routers hosted by the agent should be a good indicator of its sync > status. > Using two api calls for this is inherently racy, but could be a > sufficient workaround for environments with a moderate number of > router events. > So a simple test snippet for the sync status of all agents could be: > > ``` > import sys > import openstack > client = openstack.connection.Connection( > ... > ) > l3_agent_synced = [ > len([ > router > for router in client.network.agent_hosted_routers(agent) > if router.is_admin_state_up > ]) <= client.network.get_agent(agent).configuration["routers"] > for agent in client.network.agents() > if agent.agent_type == "L3 agent" > and (agent.configuration["agent_mode"] == "dvr_snat" > or agent.configuration["agent_mode"] == "legacy") > ] > if not all(l3_agent_synced): > sys.exit(1) > ``` > > Please let me know if I am way off with this approach :) > > > > > > > > > > > Am I missing it somewhere or is the architecture built in a way > that doesn't really > > > answer that question? > > > > > > > > > > > > > Adding a option in the neutron api would be a lot nicer. But i guess > that also counts > > > for l2 and dhcp agents. > > > > > > > > > > > > > Thanks > > > > > Mohammed > > > > > > > > > > > > > > > -- > > > > > Mohammed Naser > > > > > VEXXHOST, Inc. > > > > > > > > -- > > > > Felix Huettner > > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur > f?r die Verwertung > > > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der > vorgesehene Empf?nger > > > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und > l?schen diese E Mail. > > > Hinweise zum Datenschutz finden Sie hier< > https://www.datenschutz.schwarz>. > > > > > > > > > > > > > -- > > > Slawek Kaplonski > > > Principal Software Engineer > > > Red Hat > > > > -- > > Felix Huettner > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r > die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht > der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich > in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie > hier. > > -- > Jan Horstmann > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ts-takahashi at nec.com Mon Mar 20 16:15:01 2023 From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=) Date: Mon, 20 Mar 2023 16:15:01 +0000 Subject: [openstack-helm][tacker] Proposal to provide HelmChart for Tacker In-Reply-To: References: Message-ID: Hi Mohammed, Thank you for your quick response. If openstack-helm team will reserve PTG time and I can participate at the time, I?ll participate it. (My timezone is Asia Tokyo, and Tacker will have PTG at 6-8 UTC on 28th, 29th and 30th.) Anyway, I?d like to proceed with development for Tacker?s helm chart! Regards, Toshiaki From: Mohammed Naser Sent: Tuesday, March 21, 2023 12:43 AM To: TAKAHASHI TOSHIAKI(?????) ; openstack-discuss at lists.openstack.org; kozhukalov at gmail.com Subject: Re: [openstack-helm][tacker] Hi! I think we?re always open to folks who are looking to contribute new charts. As a matter of fact, we?re in the process of adding Manila support. We?ve got a new PTL so perhaps it might be good to get them up to date on the PTG and reserve ?space?. I added Vladimir to this email thread so they can hopefully provide some input too. Thanks Mohammed From: TAKAHASHI TOSHIAKI(?????) > Date: Monday, March 20, 2023 at 11:36 AM To: openstack-discuss at lists.openstack.org > Subject: [openstack-helm][tacker] Hi Openstack-helm team, I?m Toshiaki Takahashi, Tacker?s core developer. To my understanding, OpenStack-helm does not currently provide a Tacker?s helm chart. Recently there have been some requests to deploy Tacker with helm, and we would like to proceed with development of it if possible. Do we need to take any action such as to participate meeting of Openstack-helm project and propose our plan? Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the plan at PTG, but is it planned? (I don't see any schedule at the moment). Regards, Toshiaki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5764 bytes Desc: not available URL: From haleyb.dev at gmail.com Mon Mar 20 18:46:59 2023 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 20 Mar 2023 14:46:59 -0400 Subject: [neutron] Bug deputy report for week of March 13th Message-ID: <2c8cea57-7d58-4a2f-f416-be16a31dbea0@gmail.com> Hi, I was Neutron bug deputy last week. Below is a short summary about the reported bugs. -Brian High bugs --------- * https://bugs.launchpad.net/neutron/+bug/2011573 - [ovn-octavia-provider] Job pep8 failing due to bandit new lint rule - https://review.opendev.org/c/openstack/ovn-octavia-provider/+/877357 * https://bugs.launchpad.net/neutron/+bug/2011590 - Startup times for large OVN dbs is greatly increased by frozen_row() calls - https://review.opendev.org/c/openstack/neutron/+/877383 * https://bugs.launchpad.net/neutron/+bug/2011600 - functional test_get_datapath_id fails with neutron.common.utils.WaitTimeout: Timed out after 5 seconds - needs owner * https://bugs.launchpad.net/neutron/+bug/2011800 - ovn qos extension: update router does not remove no longer present qos rules - https://review.opendev.org/c/openstack/neutron/+/877603 (test only) - Needs code fix still Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/2011377 - test_agent_resync_on_non_existing_bridge failing intermittently sp - https://review.opendev.org/c/openstack/neutron/+/877535 * https://bugs.launchpad.net/neutron/+bug/2011724 - [OVN] Method "create_metadata_port" should pass the "fixed_ips" when creating the port - https://review.opendev.org/c/openstack/neutron/+/877528 * https://bugs.launchpad.net/neutron/+bug/2012104 - Neutron picking incorrect ovn records - Possibly related to https://bugs.launchpad.net/neutron/+bug/1951149 - OVN chassis deleted issue - does user need to manually clean-up? Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/2011687 - O flag is not enabled when ipv6_ra_mode is dhcpv6-stateful - O=1 is actually unnecessary in this case with M=1 based on the RFC, will need to figure out how to update code and/or docs to be in sync - https://review.opendev.org/c/openstack/neutron/+/877601 Misc bugs --------- * https://bugs.launchpad.net/neutron/+bug/2012144 - [OVN] adding/removing floating IPs neutron server errors about binding port - Mech driver shows ovn-bridge-mappings=[], but ovn-sbctl has them - Asked for more information Wishlist bugs ------------- * https://bugs.launchpad.net/neutron/+bug/2011422 - [RFE] The Neutron agents should report the sync process status * https://bugs.launchpad.net/neutron/+bug/2012069 - [OVN] Flooding issue on provider networks with disabled port security - https://review.opendev.org/c/openstack/neutron/+/877675 From swogatpradhan22 at gmail.com Mon Mar 20 16:56:20 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Mon, 20 Mar 2023 22:26:20 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Jhon, I checked in the ceph od dcn02, I can see the images created after importing from the central site. But launching an instance normally fails as it takes a long time for the volume to get created. When launching an instance from volume the instance is getting created properly without any errors. I tried to cache images in nova using https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html but getting checksum failed error. With regards, Swogat Pradhan On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: > On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan > wrote: > > > > Update: After restarting the nova services on the controller and running > the deploy script on the edge site, I was able to launch the VM from volume. > > > > Right now the instance creation is failing as the block device creation > is stuck in creating state, it is taking more than 10 mins for the volume > to be created, whereas the image has already been imported to the edge > glance. > > Try following this document and making the same observations in your > environment for AZs and their local ceph cluster. > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites > > On a DCN site if you run a command like this: > > $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring > /etc/ceph/dcn0.client.admin.keyring > $ rbd --cluster dcn0 -p volumes ls -l > NAME SIZE PARENT > FMT PROT LOCK > volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB > images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl > $ > > Then, you should see the parent of the volume is the image which is on > the same local ceph cluster. > > I wonder if something is misconfigured and thus you're encountering > the streaming behavior described here: > > Ideally all images should reside in the central Glance and be copied > to DCN sites before instances of those images are booted on DCN sites. > If an image is not copied to a DCN site before it is booted, then the > image will be streamed to the DCN site and then the image will boot as > an instance. This happens because Glance at the DCN site has access to > the images store at the Central ceph cluster. Though the booting of > the image will take time because it has not been copied in advance, > this is still preferable to failing to boot the image. > > You can also exec into the cinder container at the DCN site and > confirm it's using it's local ceph cluster. > > John > > > > > I will try and create a new fresh image and test again then update. > > > > With regards, > > Swogat Pradhan > > > > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >> > >> Update: > >> In the hypervisor list the compute node state is showing down. > >> > >> > >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>> > >>> Hi Brendan, > >>> Now i have deployed another site where i have used 2 linux bonds > network template for both 3 compute nodes and 3 ceph nodes. > >>> The bonding options is set to mode=802.3ad (lacp=active). > >>> I used a cirros image to launch instance but the instance timed out so > i waited for the volume to be created. > >>> Once the volume was created i tried launching the instance from the > volume and still the instance is stuck in spawning state. > >>> > >>> Here is the nova-compute log: > >>> > >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep > daemon starting > >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep > process running with uid/gid: 0/0 > >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep > process running with capabilities (eff/prm/inh): > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep > daemon running as pid 185437 > >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error > in _get_host_uuid: Unexpected error while running command. > >>> Command: blkid overlay -s UUID -o value > >>> Exit code: 2 > >>> Stdout: '' > >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > Unexpected error while running command. > >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > 450b749c-a10a-4308-80a9-3b8020fee758] Creating image > >>> > >>> It is stuck in creating image, do i need to run the template mentioned > here ?: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > >>> > >>> The volume is already created and i do not understand why the instance > is stuck in spawning state. > >>> > >>> With regards, > >>> Swogat Pradhan > >>> > >>> > >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard > wrote: > >>>> > >>>> Does your environment use different network interfaces for each of > the networks? Or does it have a bond with everything on it? > >>>> > >>>> One issue I have seen before is that when launching instances, there > is a lot of network traffic between nodes as the hypervisor needs to > download the image from Glance. Along with various other services sending > normal network traffic, it can be enough to cause issues if everything is > running over a single 1Gbe interface. > >>>> > >>>> I have seen the same situation in fact when using a single > active/backup bond on 1Gbe nics. It?s worth checking the network traffic > while you try to spawn the instance to see if you?re dropping packets. In > the situation I described, there were dropped packets which resulted in a > loss of communication between nova_compute and RMQ, so the node appeared > offline. You should also confirm that nova_compute is being disconnected in > the nova_compute logs if you tail them on the Hypervisor while spawning the > instance. > >>>> > >>>> In my case, changing from active/backup to LACP helped. So, based on > that experience, from my perspective, is certainly sounds like some kind of > network issue. > >>>> > >>>> Regards, > >>>> > >>>> Brendan Shephard > >>>> Senior Software Engineer > >>>> Red Hat Australia > >>>> > >>>> > >>>> > >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: > >>>> > >>>> Hi, > >>>> > >>>> I tried to help someone with a similar issue some time ago in this > thread: > >>>> > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > >>>> > >>>> But apparently a neutron reinstallation fixed it for that user, not > sure if that could apply here. But is it possible that your nova and > neutron versions are different between central and edge site? Have you > restarted nova and neutron services on the compute nodes after > installation? Have you debug logs of nova-conductor and maybe nova-compute? > Maybe they can help narrow down the issue. > >>>> If there isn't any additional information in the debug logs I > probably would start "tearing down" rabbitmq. I didn't have to do that in a > production system yet so be careful. I can think of two routes: > >>>> > >>>> - Either remove queues, exchanges etc. while rabbit is running, this > will most likely impact client IO depending on your load. Check out the > rabbitmqctl commands. > >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all > nodes and restart rabbitmq so the exchanges, queues etc. rebuild. > >>>> > >>>> I can imagine that the failed reply "survives" while being replicated > across the rabbit nodes. But I don't really know the rabbit internals too > well, so maybe someone else can chime in here and give a better advice. > >>>> > >>>> Regards, > >>>> Eugen > >>>> > >>>> Zitat von Swogat Pradhan : > >>>> > >>>> Hi, > >>>> Can someone please help me out on this issue? > >>>> > >>>> With regards, > >>>> Swogat Pradhan > >>>> > >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>> wrote: > >>>> > >>>> Hi > >>>> I don't see any major packet loss. > >>>> It seems the problem is somewhere in rabbitmq maybe but not due to > packet > >>>> loss. > >>>> > >>>> with regards, > >>>> Swogat Pradhan > >>>> > >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>> wrote: > >>>> > >>>> Hi, > >>>> Yes the MTU is the same as the default '1500'. > >>>> Generally I haven't seen any packet loss, but never checked when > >>>> launching the instance. > >>>> I will check that and come back. > >>>> But everytime i launch an instance the instance gets stuck at spawning > >>>> state and there the hypervisor becomes down, so not sure if packet > loss > >>>> causes this. > >>>> > >>>> With regards, > >>>> Swogat pradhan > >>>> > >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: > >>>> > >>>> One more thing coming to mind is MTU size. Are they identical between > >>>> central and edge site? Do you see packet loss through the tunnel? > >>>> > >>>> Zitat von Swogat Pradhan : > >>>> > >>>> > Hi Eugen, > >>>> > Request you to please add my email either on 'to' or 'cc' as i am > not > >>>> > getting email's from you. > >>>> > Coming to the issue: > >>>> > > >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies > -p > >>>> / > >>>> > Listing policies for vhost "/" ... > >>>> > vhost name pattern apply-to definition priority > >>>> > / ha-all ^(?!amq\.).* queues > >>>> > > >>>> > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > >>>> > > >>>> > I have the edge site compute nodes up, it only goes down when i am > >>>> trying > >>>> > to launch an instance and the instance comes to a spawning state and > >>>> then > >>>> > gets stuck. > >>>> > > >>>> > I have a tunnel setup between the central and the edge sites. > >>>> > > >>>> > With regards, > >>>> > Swogat Pradhan > >>>> > > >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> > >>>> > wrote: > >>>> > > >>>> >> Hi Eugen, > >>>> >> For some reason i am not getting your email to me directly, i am > >>>> checking > >>>> >> the email digest and there i am able to find your reply. > >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq > >>>> >> Yes, these logs are from the time when the issue occurred. > >>>> >> > >>>> >> *Note: i am able to create vm's and perform other activities in the > >>>> >> central site, only facing this issue in the edge site.* > >>>> >> > >>>> >> With regards, > >>>> >> Swogat Pradhan > >>>> >> > >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> > >>>> >> wrote: > >>>> >> > >>>> >>> Hi Eugen, > >>>> >>> Thanks for your response. > >>>> >>> I have actually a 4 controller setup so here are the details: > >>>> >>> > >>>> >>> *PCS Status:* > >>>> >>> * Container bundle set: rabbitmq-bundle [ > >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): > >>>> Started > >>>> >>> overcloud-controller-no-ceph-3 > >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): > >>>> Started > >>>> >>> overcloud-controller-2 > >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): > >>>> Started > >>>> >>> overcloud-controller-1 > >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): > >>>> Started > >>>> >>> overcloud-controller-0 > >>>> >>> > >>>> >>> I have tried restarting the bundle multiple times but the issue is > >>>> still > >>>> >>> present. > >>>> >>> > >>>> >>> *Cluster status:* > >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status > >>>> >>> Cluster status of node > >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... > >>>> >>> Basics > >>>> >>> > >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>>> >>> > >>>> >>> Disk Nodes > >>>> >>> > >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>> > >>>> >>> Running Nodes > >>>> >>> > >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>> > >>>> >>> Versions > >>>> >>> > >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ > >>>> 3.8.3 > >>>> >>> on Erlang 22.3.4.1 > >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ > >>>> 3.8.3 > >>>> >>> on Erlang 22.3.4.1 > >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ > >>>> 3.8.3 > >>>> >>> on Erlang 22.3.4.1 > >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > >>>> RabbitMQ > >>>> >>> 3.8.3 on Erlang 22.3.4.1 > >>>> >>> > >>>> >>> Alarms > >>>> >>> > >>>> >>> (none) > >>>> >>> > >>>> >>> Network Partitions > >>>> >>> > >>>> >>> (none) > >>>> >>> > >>>> >>> Listeners > >>>> >>> > >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and > CLI > >>>> tool > >>>> >>> communication > >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>>> >>> and AMQP 1.0 > >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and > CLI > >>>> tool > >>>> >>> communication > >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>>> >>> and AMQP 1.0 > >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and > CLI > >>>> tool > >>>> >>> communication > >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 > >>>> >>> and AMQP 1.0 > >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> interface: > >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> , > >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: > >>>> inter-node and > >>>> >>> CLI tool communication > >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> , > >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: > AMQP > >>>> 0-9-1 > >>>> >>> and AMQP 1.0 > >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> , > >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>> > >>>> >>> Feature flags > >>>> >>> > >>>> >>> Flag: drop_unroutable_metric, state: enabled > >>>> >>> Flag: empty_basic_get_metric, state: enabled > >>>> >>> Flag: implicit_default_bindings, state: enabled > >>>> >>> Flag: quorum_queue, state: enabled > >>>> >>> Flag: virtual_host_metadata, state: enabled > >>>> >>> > >>>> >>> *Logs:* > >>>> >>> *(Attached)* > >>>> >>> > >>>> >>> With regards, > >>>> >>> Swogat Pradhan > >>>> >>> > >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > >>>> swogatpradhan22 at gmail.com> > >>>> >>> wrote: > >>>> >>> > >>>> >>>> Hi, > >>>> >>>> Please find the nova conductor as well as nova api log. > >>>> >>>> > >>>> >>>> nova-conuctor: > >>>> >>>> > >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply > to > >>>> >>>> 16152921c1eb45c2b1f562087140168b > >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply > to > >>>> >>>> 83dbe5f567a940b698acfe986f6194fa > >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply > to > >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply > >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds > >>>> due to a > >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). > >>>> Abandoning...: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply > to > >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds > >>>> due to a > >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>> Abandoning...: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply > to > >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds > >>>> due to a > >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>> Abandoning...: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled > >>>> with > >>>> >>>> backend dogpile.cache.null. > >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING > >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply > to > >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply > >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds > >>>> due to a > >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>> Abandoning...: > >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>> > >>>> >>>> With regards, > >>>> >>>> Swogat Pradhan > >>>> >>>> > >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>> >>>> swogatpradhan22 at gmail.com> wrote: > >>>> >>>> > >>>> >>>>> Hi, > >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am > trying to > >>>> >>>>> launch vm's. > >>>> >>>>> When the VM is in spawning state the node goes down (openstack > >>>> compute > >>>> >>>>> service list), the node comes backup when i restart the nova > >>>> compute > >>>> >>>>> service but then the launch of the vm fails. > >>>> >>>>> > >>>> >>>>> nova-compute.log > >>>> >>>>> > >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running > >>>> >>>>> instance usage > >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 > 07:00:00 > >>>> to > >>>> >>>>> 2023-02-26 08:00:00. 0 instances. > >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node > >>>> >>>>> dcn01-hci-0.bdxworld.com > >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device > >>>> name: > >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names > >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume > >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache > enabled > >>>> with > >>>> >>>>> backend dogpile.cache.null. > >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running > >>>> >>>>> privsep helper: > >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > >>>> 'privsep-helper', > >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', > >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new > >>>> privsep > >>>> >>>>> daemon via rootwrap > >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] > privsep > >>>> >>>>> daemon starting > >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] > privsep > >>>> >>>>> process running with uid/gid: 0/0 > >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] > privsep > >>>> >>>>> process running with capabilities (eff/prm/inh): > >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] > privsep > >>>> >>>>> daemon running as pid 2647 > >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING > >>>> os_brick.initiator.connectors.nvmeof > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process > >>>> >>>>> execution error > >>>> >>>>> in _get_host_uuid: Unexpected error while running command. > >>>> >>>>> Command: blkid overlay -s UUID -o value > >>>> >>>>> Exit code: 2 > >>>> >>>>> Stdout: '' > >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > >>>> >>>>> Unexpected error while running command. > >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver > >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>> >>>>> > >>>> >>>>> Is there a way to solve this issue? > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> With regards, > >>>> >>>>> > >>>> >>>>> Swogat Pradhan > >>>> >>>>> > >>>> >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Mon Mar 20 19:38:46 2023 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Tue, 21 Mar 2023 04:38:46 +0900 Subject: [tacker][ptg] Bobcat vPTG Planning Message-ID: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com> Hi team, We are going to have the Bobcat vPTG through three days, 28-30 Mar 6am-8am UTC as agreed at the IRC meeting last week. I've booked rooms for the sessions and uploaded etherpad [1]. Please feel free to add your proposal on the etherpad. [1] https://etherpad.opendev.org/p/tacker-bobcat-ptg Thanks, Yasufumi From yasufum.o at gmail.com Mon Mar 20 19:50:33 2023 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Tue, 21 Mar 2023 04:50:33 +0900 Subject: [tacker] Cancelling next two IRC meetings Message-ID: <2aa1abc9-5a7b-3af2-6104-3b3fa4043e2c@gmail.com> Hi, I'd like to skip the next two IRC meetings due to a holiday tomorrow for many of us joining from Japan and next week for the bobcat vPTG. I'm looking forward to meet you guys on the vPTG! Cheers, Yasufumi From ihrachys at redhat.com Mon Mar 20 21:18:00 2023 From: ihrachys at redhat.com (Ihar Hrachyshka) Date: Mon, 20 Mar 2023 17:18:00 -0400 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: <3840757.STTH5IQzZg@p1> References: <3840757.STTH5IQzZg@p1> Message-ID: On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski wrote: > > Hi, > > > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: > > > Hi all, > > > > > > (I've tagged the thread with [ovn] because this question was raised in > > > the context of OVN, but it really is about the intent of neutron > > > stateless SG API.) > > > > > > Neutron API supports 'stateless' field for security groups: > > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group > > > > > > The API reference doesn't explain the intent of the API, merely > > > walking through the field mechanics, as in > > > > > > "The stateful security group extension (stateful-security-group) adds > > > the stateful field to security groups, allowing users to configure > > > stateful or stateless security groups for ports. The existing security > > > groups will all be considered as stateful. Update of the stateful > > > attribute is allowed when there is no port associated with the > > > security group." > > > > > > The meaning of the API is left for users to deduce. It's customary > > > understood as something like > > > > > > "allowing to bypass connection tracking in the firewall, potentially > > > providing performance and simplicity benefits" (while imposing > > > additional complexity onto rule definitions - the user now has to > > > explicitly define rules for both directions of a duplex connection.) > > > [This is not an official definition, nor it's quoted from a respected > > > source, please don't criticize it. I don't think this is an important > > > point here.] > > > > > > Either way, the definition doesn't explain what should happen with > > > basic network services that a user of Neutron SG API is used to rely > > > on. Specifically, what happens for a port related to a stateless SG > > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 > > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 > > > procedure to configure its IPv6 stack. > > > > > > As part of our testing of stateless SG implementation for OVN backend, > > > we've noticed that VMs fail to configure via metadata, or use SLAAC to > > > configure IPv6. > > > > > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 > > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 > > > > > > We've noticed that adding explicit SG rules to allow 'returning' > > > communication for 169.254.169.254:80 and RA / NA fixes the problem. > > > > > > I figured that these services are "base" / "basic" and should be > > > provided to ports regardless of the stateful-ness of SG. I proposed > > > patches for this here: > > > > > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 > > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 > > > > > > Discussion in the patch that adjusts the existing stateless SG test > > > scenarios to not create explicit SG rules for metadata and ICMP > > > replies suggests that it's not a given / common understanding that > > > these "base" services should work by default for stateless SGs. > > > > > > See discussion in comments here: > > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 > > > > > > While this discussion is happening in the context of OVN, I think it > > > should be resolved in a broader context. Specifically, a decision > > > should be made about what Neutron API "means" by stateless SGs, and > > > how "base" services are supposed to behave. Then backends can act > > > accordingly. > > > > > > There's also an open question of how this should be implemented. > > > Whether Neutron would like to create explicit SG rules visible in API > > > that would allow for the returning traffic and that could be deleted > > > as needed, or whether backends should do it implicitly. We already > > > have "default" egress rules, so there's a precedent here. On the other > > > hand, the egress rules are broad (allowing everything) and there's > > > more rationale to delete them and replace them with tighter filters. > > > In my OVN series, I implement ACLs directly in OVN database, without > > > creating SG rules in Neutron API. > > > > > > So, questions for the community to clarify: > > > - whether Neutron API should define behavior of stateless SGs in general, > > > - if so, whether Neutron API should also define behavior of stateless > > > SGs in terms of "base" services like metadata and DHCP, > > > - if so, whether backends should implement the necessary filters > > > themselves, or Neutron will create default SG rules itself. > > > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. > > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. > > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there? > Thanks for clarifying the rationale on picking SG rules and not per-backend implementation. What would be your answer to the two other questions in the list above, specifically, "whether Neutron API should define behavior of stateless SGs in general" and "whether Neutron API should define behavior of stateless SGs in relation to metadata / RA / NA". Once we have agreement on these points, we can discuss the exact mechanism - whether to implement in backend or in API. But these two questions are first order in my view. (To give an idea of my thinking, I believe API definition should not only define fields and their mechanics but also semantics, so - yes, api-ref should define the meaning ("behavior") of stateless SG in general, and - yes, api-ref should also define the meaning ("behavior") of stateless SG in relation to "standard" services like ipv6 addressing or metadata. As to the last question - whether it's up to ml2 backend to implement the behavior, or up to the core SG database plugin - I don't have a strong opinion. I lean to "backend" solution just because it allows for more granular definition because SG rules may not express some filter rules, e.g. source port for metadata replies (an unfortunate limitation of SG API that we inherited from AWS?). But perhaps others prefer paying the price for having neutron ml2 plugin enforcing the behavior consistently across all backends. > > > > > > I hope I laid the problem out clearly, let me know if anything needs > > > clarification or explanation. > > > Yes :) At least for me. > > > > > > > Yours, > > > Ihar > > > > > > > > > > > > > -- > > Slawek Kaplonski > > Principal Software Engineer > > Red Hat From jay at gr-oss.io Mon Mar 20 21:39:24 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 20 Mar 2023 14:39:24 -0700 Subject: [ironic][ptg] vPTG scheduling In-Reply-To: References: Message-ID: Hey all, Based on the results of our quick vPTG sync this morning, I've done the following: * I booked one additional slot for Ironic; Wednesday 1600 UTC - 1700 UTC, to ensure we'd have plenty of discussion time once accounting for breaks that we'll certainly need. * I've tentatively scheduled all topics here: https://etherpad.opendev.org/p/ironic-bobcat-ptg -- please review, if there's anything that creates a hardship lets work it out, in the the etherpad, IRC, or on the mail list :). Thanks, looking forward to planning another release of Ironic with you all! -Jay On Thu, Mar 9, 2023 at 3:15?PM Jay Faulkner wrote: > Hey all, > > The vPTG will be upon us soon, the week of March 27. > > I booked the following times on behalf of Ironic + BM SIG Operator hour, > in accordance with what times worked in Antelope. It's my hope that since > we've had little contributor turnover, these times continue to work. I'm > completely open to having things moved around if it's more convenient to > participants. > > I've booked the following times, all in Folsom: > - Tuesday 1400 UTC - 1700 UTC > - Wednesday 1300 UTC Operator hour: baremetal SIG > - Wednesday 1400 UTC - 1600 UTC > - Wednesday 2200 - 2300 UTC > > > I propose that after the Ironic meeting on March 20, we shortly sync up in > the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg) > to pick topics and assign time. > > > Again, this is all meant to be a suggestion, I'm happy to move things > around but didn't want us to miss out on getting things booked. > > > - > Jay Faulkner > Ironic PTL > TC Member > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamil.madac at gmail.com Tue Mar 21 10:27:37 2023 From: kamil.madac at gmail.com (Kamil Madac) Date: Tue, 21 Mar 2023 11:27:37 +0100 Subject: [neutron] In-Reply-To: References: Message-ID: Hi Danny, thanks for sharing your positive experience. I'm going to deploy OVN in dev environment with kolla-ansible. Maybe one more question. Is there any official way to migrate from OVS to OVN with kolla-ansible, or have you used the official migration script https://docs.openstack.org/networking-ovn/latest/install/migration.html? On Thu, Mar 16, 2023 at 5:43?PM Danny Webb wrote: > Hi Kamil, > > We're currently running 4 (soon to be 5) production regions all using > kolla ansible as our deployer with OVN as our neutron backend. It's been > fairly solid for us and we've had less issues with OVN than the > traditional hybrid OVS / Iptables neutron driver (which we ran for about a > year before switching to OVN). Our regions are anywhere from 50-60 compute > hosts with 1-2k+ VMs per region. As far as I know most of the new > development is going into OVN so would be a good place to start. > Ultimately, we've only really had 2 real issues whilst running it. First > was an issue where we had the provider network spamming gateway changes > into southbound as we had our anycast SVI bound to our top of rack switches > which made OVN keep updating it's location. We mitigated this by moving > the provider SVIs to our border routers and the issue went away and dropped > the load on our OVN controllers significantly. Only other real issue we > had was during an upgrade of a region we ended up with what we believed to > be some sort of stale flows that resulted in some hypervisors losing > connectivity until we rebooted them. > > Hope this helps! > > Cheers, > > Danny > ------------------------------ > *From:* Kamil Madac > *Sent:* 14 March 2023 09:46 > *To:* openstack-discuss > *Subject:* [neutron] > > > * CAUTION: This email originates from outside THG * > ------------------------------ > Hi All, > > I'm in the process of planning a small public cloud based on OpenStack. I > have quite experience with kolla-ansible deployments which use OVS > networking and I have no issues with that. It works stable for my use cases > (Vlan provider networks, DVR, tenant networks, floating IPs). > > For that new deployment I'm looking at OVN deployment which from what I > read should be more performant (faster build of instances) and with ability > to cover more networking features in OVN instead of needing external > software like iptables/dnsmasq. > > Does anyone use OVN in production and what is your experience (pros/cons)? > Is OVN mature enough to replace OVS in the production deployment (are > there some basic features from OVS missing)? > > Thanks in advance for sharing the experience. > > -- > Kamil Madac > > *Danny Webb* > Principal OpenStack Engineer > Danny.Webb at thehutgroup.com > [image: THG Ingenuity Logo] > > > -- Kamil Madac -------------- next part -------------- An HTML attachment was scrubbed... URL: From Danny.Webb at thehutgroup.com Tue Mar 21 11:22:43 2023 From: Danny.Webb at thehutgroup.com (Danny Webb) Date: Tue, 21 Mar 2023 11:22:43 +0000 Subject: [neutron] In-Reply-To: References: Message-ID: Not via kolla-ansible as far as I know, tripleo had some migration steps built in that you may be able to mimic. When we did our migration we had our internal tenants evacuate the region and we rebuilt in situ. ________________________________ From: Kamil Madac Sent: 21 March 2023 10:27 To: Danny Webb Cc: openstack-discuss Subject: Re: [neutron] CAUTION: This email originates from outside THG ________________________________ Hi Danny, thanks for sharing your positive experience. I'm going to deploy OVN in dev environment with kolla-ansible. Maybe one more question. Is there any official way to migrate from OVS to OVN with kolla-ansible, or have you used the official migration script https://docs.openstack.org/networking-ovn/latest/install/migration.html? On Thu, Mar 16, 2023 at 5:43?PM Danny Webb > wrote: Hi Kamil, We're currently running 4 (soon to be 5) production regions all using kolla ansible as our deployer with OVN as our neutron backend. It's been fairly solid for us and we've had less issues with OVN than the traditional hybrid OVS / Iptables neutron driver (which we ran for about a year before switching to OVN). Our regions are anywhere from 50-60 compute hosts with 1-2k+ VMs per region. As far as I know most of the new development is going into OVN so would be a good place to start. Ultimately, we've only really had 2 real issues whilst running it. First was an issue where we had the provider network spamming gateway changes into southbound as we had our anycast SVI bound to our top of rack switches which made OVN keep updating it's location. We mitigated this by moving the provider SVIs to our border routers and the issue went away and dropped the load on our OVN controllers significantly. Only other real issue we had was during an upgrade of a region we ended up with what we believed to be some sort of stale flows that resulted in some hypervisors losing connectivity until we rebooted them. Hope this helps! Cheers, Danny ________________________________ From: Kamil Madac > Sent: 14 March 2023 09:46 To: openstack-discuss > Subject: [neutron] CAUTION: This email originates from outside THG ________________________________ Hi All, I'm in the process of planning a small public cloud based on OpenStack. I have quite experience with kolla-ansible deployments which use OVS networking and I have no issues with that. It works stable for my use cases (Vlan provider networks, DVR, tenant networks, floating IPs). For that new deployment I'm looking at OVN deployment which from what I read should be more performant (faster build of instances) and with ability to cover more networking features in OVN instead of needing external software like iptables/dnsmasq. Does anyone use OVN in production and what is your experience (pros/cons)? Is OVN mature enough to replace OVS in the production deployment (are there some basic features from OVS missing)? Thanks in advance for sharing the experience. -- Kamil Madac Danny Webb Principal OpenStack Engineer Danny.Webb at thehutgroup.com [THG Ingenuity Logo] [https://i.imgur.com/wbpVRW6.png] [https://i.imgur.com/c3040tr.png] -- Kamil Madac Danny Webb Principal OpenStack Engineer Danny.Webb at thehutgroup.com [THG Ingenuity Logo] [https://i.imgur.com/wbpVRW6.png] [https://i.imgur.com/c3040tr.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: From pdeore at redhat.com Tue Mar 21 11:30:55 2023 From: pdeore at redhat.com (Pranali Deore) Date: Tue, 21 Mar 2023 17:00:55 +0530 Subject: [Glance]Weekly Meeting Cancelled for this week Message-ID: Hello, As discussed during last weekly meeting[1], Glance upstream weekly meeting for this week i.e., 23rd March, 2023 has been cancelled. See you all at PTG ! Thanks & Regards, Pranali [1]: https://meetings.opendev.org/irclogs/%23openstack-meeting/%23openstack-meeting.2023-03-16.log.html#t2023-03-16T14:38:06 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gthiemonge at redhat.com Tue Mar 21 11:41:57 2023 From: gthiemonge at redhat.com (Gregory Thiemonge) Date: Tue, 21 Mar 2023 12:41:57 +0100 Subject: [Octavia][PTG] Bobcat PTG planning Message-ID: Hi Folks, A reminder: the Octavia PTG will be on March 28th (14:00-18:00 UTC), There's a dedicated etherpad for this session: https://etherpad.opendev.org/p/bobcat-ptg-octavia Feel free to add your topic, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 21 04:06:08 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 21 Mar 2023 09:36:08 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Update: I uploaded an image directly to the dcn02 store, and it takes around 10,15 minutes to create a volume with image in dcn02. The image size is 389 MB. On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan wrote: > Hi Jhon, > I checked in the ceph od dcn02, I can see the images created after > importing from the central site. > But launching an instance normally fails as it takes a long time for the > volume to get created. > > When launching an instance from volume the instance is getting created > properly without any errors. > > I tried to cache images in nova using > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > but getting checksum failed error. > > With regards, > Swogat Pradhan > > On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: > >> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >> wrote: >> > >> > Update: After restarting the nova services on the controller and >> running the deploy script on the edge site, I was able to launch the VM >> from volume. >> > >> > Right now the instance creation is failing as the block device creation >> is stuck in creating state, it is taking more than 10 mins for the volume >> to be created, whereas the image has already been imported to the edge >> glance. >> >> Try following this document and making the same observations in your >> environment for AZs and their local ceph cluster. >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >> >> On a DCN site if you run a command like this: >> >> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >> /etc/ceph/dcn0.client.admin.keyring >> $ rbd --cluster dcn0 -p volumes ls -l >> NAME SIZE PARENT >> FMT PROT LOCK >> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >> $ >> >> Then, you should see the parent of the volume is the image which is on >> the same local ceph cluster. >> >> I wonder if something is misconfigured and thus you're encountering >> the streaming behavior described here: >> >> Ideally all images should reside in the central Glance and be copied >> to DCN sites before instances of those images are booted on DCN sites. >> If an image is not copied to a DCN site before it is booted, then the >> image will be streamed to the DCN site and then the image will boot as >> an instance. This happens because Glance at the DCN site has access to >> the images store at the Central ceph cluster. Though the booting of >> the image will take time because it has not been copied in advance, >> this is still preferable to failing to boot the image. >> >> You can also exec into the cinder container at the DCN site and >> confirm it's using it's local ceph cluster. >> >> John >> >> > >> > I will try and create a new fresh image and test again then update. >> > >> > With regards, >> > Swogat Pradhan >> > >> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >> >> >> Update: >> >> In the hypervisor list the compute node state is showing down. >> >> >> >> >> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>> >> >>> Hi Brendan, >> >>> Now i have deployed another site where i have used 2 linux bonds >> network template for both 3 compute nodes and 3 ceph nodes. >> >>> The bonding options is set to mode=802.3ad (lacp=active). >> >>> I used a cirros image to launch instance but the instance timed out >> so i waited for the volume to be created. >> >>> Once the volume was created i tried launching the instance from the >> volume and still the instance is stuck in spawning state. >> >>> >> >>> Here is the nova-compute log: >> >>> >> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep >> daemon starting >> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep >> process running with uid/gid: 0/0 >> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >> process running with capabilities (eff/prm/inh): >> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >> daemon running as pid 185437 >> >>> 2023-03-15 17:35:47.974 8 WARNING >> os_brick.initiator.connectors.nvmeof >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >> in _get_host_uuid: Unexpected error while running command. >> >>> Command: blkid overlay -s UUID -o value >> >>> Exit code: 2 >> >>> Stdout: '' >> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >> Unexpected error while running command. >> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >> >>> >> >>> It is stuck in creating image, do i need to run the template >> mentioned here ?: >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >> >>> >> >>> The volume is already created and i do not understand why the >> instance is stuck in spawning state. >> >>> >> >>> With regards, >> >>> Swogat Pradhan >> >>> >> >>> >> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard >> wrote: >> >>>> >> >>>> Does your environment use different network interfaces for each of >> the networks? Or does it have a bond with everything on it? >> >>>> >> >>>> One issue I have seen before is that when launching instances, there >> is a lot of network traffic between nodes as the hypervisor needs to >> download the image from Glance. Along with various other services sending >> normal network traffic, it can be enough to cause issues if everything is >> running over a single 1Gbe interface. >> >>>> >> >>>> I have seen the same situation in fact when using a single >> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >> while you try to spawn the instance to see if you?re dropping packets. In >> the situation I described, there were dropped packets which resulted in a >> loss of communication between nova_compute and RMQ, so the node appeared >> offline. You should also confirm that nova_compute is being disconnected in >> the nova_compute logs if you tail them on the Hypervisor while spawning the >> instance. >> >>>> >> >>>> In my case, changing from active/backup to LACP helped. So, based on >> that experience, from my perspective, is certainly sounds like some kind of >> network issue. >> >>>> >> >>>> Regards, >> >>>> >> >>>> Brendan Shephard >> >>>> Senior Software Engineer >> >>>> Red Hat Australia >> >>>> >> >>>> >> >>>> >> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> I tried to help someone with a similar issue some time ago in this >> thread: >> >>>> >> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >> >>>> >> >>>> But apparently a neutron reinstallation fixed it for that user, not >> sure if that could apply here. But is it possible that your nova and >> neutron versions are different between central and edge site? Have you >> restarted nova and neutron services on the compute nodes after >> installation? Have you debug logs of nova-conductor and maybe nova-compute? >> Maybe they can help narrow down the issue. >> >>>> If there isn't any additional information in the debug logs I >> probably would start "tearing down" rabbitmq. I didn't have to do that in a >> production system yet so be careful. I can think of two routes: >> >>>> >> >>>> - Either remove queues, exchanges etc. while rabbit is running, this >> will most likely impact client IO depending on your load. Check out the >> rabbitmqctl commands. >> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all >> nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >> >>>> >> >>>> I can imagine that the failed reply "survives" while being >> replicated across the rabbit nodes. But I don't really know the rabbit >> internals too well, so maybe someone else can chime in here and give a >> better advice. >> >>>> >> >>>> Regards, >> >>>> Eugen >> >>>> >> >>>> Zitat von Swogat Pradhan : >> >>>> >> >>>> Hi, >> >>>> Can someone please help me out on this issue? >> >>>> >> >>>> With regards, >> >>>> Swogat Pradhan >> >>>> >> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>>> wrote: >> >>>> >> >>>> Hi >> >>>> I don't see any major packet loss. >> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to >> packet >> >>>> loss. >> >>>> >> >>>> with regards, >> >>>> Swogat Pradhan >> >>>> >> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>>> wrote: >> >>>> >> >>>> Hi, >> >>>> Yes the MTU is the same as the default '1500'. >> >>>> Generally I haven't seen any packet loss, but never checked when >> >>>> launching the instance. >> >>>> I will check that and come back. >> >>>> But everytime i launch an instance the instance gets stuck at >> spawning >> >>>> state and there the hypervisor becomes down, so not sure if packet >> loss >> >>>> causes this. >> >>>> >> >>>> With regards, >> >>>> Swogat pradhan >> >>>> >> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >> >>>> >> >>>> One more thing coming to mind is MTU size. Are they identical between >> >>>> central and edge site? Do you see packet loss through the tunnel? >> >>>> >> >>>> Zitat von Swogat Pradhan : >> >>>> >> >>>> > Hi Eugen, >> >>>> > Request you to please add my email either on 'to' or 'cc' as i am >> not >> >>>> > getting email's from you. >> >>>> > Coming to the issue: >> >>>> > >> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >> list_policies -p >> >>>> / >> >>>> > Listing policies for vhost "/" ... >> >>>> > vhost name pattern apply-to definition priority >> >>>> > / ha-all ^(?!amq\.).* queues >> >>>> > >> >>>> >> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >> >>>> > >> >>>> > I have the edge site compute nodes up, it only goes down when i am >> >>>> trying >> >>>> > to launch an instance and the instance comes to a spawning state >> and >> >>>> then >> >>>> > gets stuck. >> >>>> > >> >>>> > I have a tunnel setup between the central and the edge sites. >> >>>> > >> >>>> > With regards, >> >>>> > Swogat Pradhan >> >>>> > >> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >> >>>> swogatpradhan22 at gmail.com> >> >>>> > wrote: >> >>>> > >> >>>> >> Hi Eugen, >> >>>> >> For some reason i am not getting your email to me directly, i am >> >>>> checking >> >>>> >> the email digest and there i am able to find your reply. >> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >> >>>> >> Yes, these logs are from the time when the issue occurred. >> >>>> >> >> >>>> >> *Note: i am able to create vm's and perform other activities in >> the >> >>>> >> central site, only facing this issue in the edge site.* >> >>>> >> >> >>>> >> With regards, >> >>>> >> Swogat Pradhan >> >>>> >> >> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >> >>>> swogatpradhan22 at gmail.com> >> >>>> >> wrote: >> >>>> >> >> >>>> >>> Hi Eugen, >> >>>> >>> Thanks for your response. >> >>>> >>> I have actually a 4 controller setup so here are the details: >> >>>> >>> >> >>>> >>> *PCS Status:* >> >>>> >>> * Container bundle set: rabbitmq-bundle [ >> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >> >>>> Started >> >>>> >>> overcloud-controller-no-ceph-3 >> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >> >>>> Started >> >>>> >>> overcloud-controller-2 >> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >> >>>> Started >> >>>> >>> overcloud-controller-1 >> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >> >>>> Started >> >>>> >>> overcloud-controller-0 >> >>>> >>> >> >>>> >>> I have tried restarting the bundle multiple times but the issue >> is >> >>>> still >> >>>> >>> present. >> >>>> >>> >> >>>> >>> *Cluster status:* >> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >> >>>> >>> Cluster status of node >> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> >>>> >>> Basics >> >>>> >>> >> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >>>> >>> >> >>>> >>> Disk Nodes >> >>>> >>> >> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>> >>> >> >>>> >>> Running Nodes >> >>>> >>> >> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>> >>> >> >>>> >>> Versions >> >>>> >>> >> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >> >>>> 3.8.3 >> >>>> >>> on Erlang 22.3.4.1 >> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >> >>>> 3.8.3 >> >>>> >>> on Erlang 22.3.4.1 >> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >> >>>> 3.8.3 >> >>>> >>> on Erlang 22.3.4.1 >> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >> >>>> RabbitMQ >> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >> >>>> >>> >> >>>> >>> Alarms >> >>>> >>> >> >>>> >>> (none) >> >>>> >>> >> >>>> >>> Network Partitions >> >>>> >>> >> >>>> >>> (none) >> >>>> >>> >> >>>> >>> Listeners >> >>>> >>> >> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and >> CLI >> >>>> tool >> >>>> >>> communication >> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>>> >>> and AMQP 1.0 >> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and >> CLI >> >>>> tool >> >>>> >>> communication >> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>>> >>> and AMQP 1.0 >> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and >> CLI >> >>>> tool >> >>>> >>> communication >> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >> >>>> >>> and AMQP 1.0 >> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>> interface: >> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>> , >> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >> >>>> inter-node and >> >>>> >>> CLI tool communication >> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>> , >> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: >> AMQP >> >>>> 0-9-1 >> >>>> >>> and AMQP 1.0 >> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>> , >> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >> >>>> >>> >> >>>> >>> Feature flags >> >>>> >>> >> >>>> >>> Flag: drop_unroutable_metric, state: enabled >> >>>> >>> Flag: empty_basic_get_metric, state: enabled >> >>>> >>> Flag: implicit_default_bindings, state: enabled >> >>>> >>> Flag: quorum_queue, state: enabled >> >>>> >>> Flag: virtual_host_metadata, state: enabled >> >>>> >>> >> >>>> >>> *Logs:* >> >>>> >>> *(Attached)* >> >>>> >>> >> >>>> >>> With regards, >> >>>> >>> Swogat Pradhan >> >>>> >>> >> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >> >>>> swogatpradhan22 at gmail.com> >> >>>> >>> wrote: >> >>>> >>> >> >>>> >>>> Hi, >> >>>> >>>> Please find the nova conductor as well as nova api log. >> >>>> >>>> >> >>>> >>>> nova-conuctor: >> >>>> >>>> >> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >> reply to >> >>>> >>>> 16152921c1eb45c2b1f562087140168b >> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >> reply to >> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >> reply to >> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >> >>>> due to a >> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >> >>>> Abandoning...: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >> reply to >> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >> >>>> due to a >> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>> Abandoning...: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >> reply to >> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >> >>>> due to a >> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>> Abandoning...: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >> enabled >> >>>> with >> >>>> >>>> backend dogpile.cache.null. >> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >> >>>> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >> reply to >> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >> >>>> due to a >> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>> Abandoning...: >> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>> >>>> >> >>>> >>>> With regards, >> >>>> >>>> Swogat Pradhan >> >>>> >>>> >> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >> >>>> >>>> >> >>>> >>>>> Hi, >> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am >> trying to >> >>>> >>>>> launch vm's. >> >>>> >>>>> When the VM is in spawning state the node goes down (openstack >> >>>> compute >> >>>> >>>>> service list), the node comes backup when i restart the nova >> >>>> compute >> >>>> >>>>> service but then the launch of the vm fails. >> >>>> >>>>> >> >>>> >>>>> nova-compute.log >> >>>> >>>>> >> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >> >>>> >>>>> instance usage >> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 >> 07:00:00 >> >>>> to >> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >> >>>> >>>>> dcn01-hci-0.bdxworld.com >> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >> >>>> name: >> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >> enabled >> >>>> with >> >>>> >>>>> backend dogpile.cache.null. >> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >> >>>> >>>>> privsep helper: >> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >> >>>> 'privsep-helper', >> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >> >>>> privsep >> >>>> >>>>> daemon via rootwrap >> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] >> privsep >> >>>> >>>>> daemon starting >> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] >> privsep >> >>>> >>>>> process running with uid/gid: 0/0 >> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >> privsep >> >>>> >>>>> process running with capabilities (eff/prm/inh): >> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >> privsep >> >>>> >>>>> daemon running as pid 2647 >> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >> >>>> os_brick.initiator.connectors.nvmeof >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >> >>>> >>>>> execution error >> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >> >>>> >>>>> Command: blkid overlay -s UUID -o value >> >>>> >>>>> Exit code: 2 >> >>>> >>>>> Stdout: '' >> >>>> >>>>> Stderr: '': >> oslo_concurrency.processutils.ProcessExecutionError: >> >>>> >>>>> Unexpected error while running command. >> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >> >>>> >>>>> >> >>>> >>>>> Is there a way to solve this issue? >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> With regards, >> >>>> >>>>> >> >>>> >>>>> Swogat Pradhan >> >>>> >>>>> >> >>>> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Tue Mar 21 10:23:44 2023 From: johfulto at redhat.com (John Fulton) Date: Tue, 21 Mar 2023 06:23:44 -0400 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: in my last message under the line "On a DCN site if you run a command like this:" I suggested some steps you could try to confirm the image is a COW from the local glance as well as how to look at your cinder config. On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan wrote: > Update: > I uploaded an image directly to the dcn02 store, and it takes around 10,15 > minutes to create a volume with image in dcn02. > The image size is 389 MB. > > On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan > wrote: > >> Hi Jhon, >> I checked in the ceph od dcn02, I can see the images created after >> importing from the central site. >> But launching an instance normally fails as it takes a long time for the >> volume to get created. >> >> When launching an instance from volume the instance is getting created >> properly without any errors. >> >> I tried to cache images in nova using >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >> but getting checksum failed error. >> >> With regards, >> Swogat Pradhan >> >> On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: >> >>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>> wrote: >>> > >>> > Update: After restarting the nova services on the controller and >>> running the deploy script on the edge site, I was able to launch the VM >>> from volume. >>> > >>> > Right now the instance creation is failing as the block device >>> creation is stuck in creating state, it is taking more than 10 mins for the >>> volume to be created, whereas the image has already been imported to the >>> edge glance. >>> >>> Try following this document and making the same observations in your >>> environment for AZs and their local ceph cluster. >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>> >>> On a DCN site if you run a command like this: >>> >>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>> /etc/ceph/dcn0.client.admin.keyring >>> $ rbd --cluster dcn0 -p volumes ls -l >>> NAME SIZE PARENT >>> FMT PROT LOCK >>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>> $ >>> >>> Then, you should see the parent of the volume is the image which is on >>> the same local ceph cluster. >>> >>> I wonder if something is misconfigured and thus you're encountering >>> the streaming behavior described here: >>> >>> Ideally all images should reside in the central Glance and be copied >>> to DCN sites before instances of those images are booted on DCN sites. >>> If an image is not copied to a DCN site before it is booted, then the >>> image will be streamed to the DCN site and then the image will boot as >>> an instance. This happens because Glance at the DCN site has access to >>> the images store at the Central ceph cluster. Though the booting of >>> the image will take time because it has not been copied in advance, >>> this is still preferable to failing to boot the image. >>> >>> You can also exec into the cinder container at the DCN site and >>> confirm it's using it's local ceph cluster. >>> >>> John >>> >>> > >>> > I will try and create a new fresh image and test again then update. >>> > >>> > With regards, >>> > Swogat Pradhan >>> > >>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >> >>> >> Update: >>> >> In the hypervisor list the compute node state is showing down. >>> >> >>> >> >>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>> >>> >>> Hi Brendan, >>> >>> Now i have deployed another site where i have used 2 linux bonds >>> network template for both 3 compute nodes and 3 ceph nodes. >>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>> >>> I used a cirros image to launch instance but the instance timed out >>> so i waited for the volume to be created. >>> >>> Once the volume was created i tried launching the instance from the >>> volume and still the instance is stuck in spawning state. >>> >>> >>> >>> Here is the nova-compute log: >>> >>> >>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep >>> daemon starting >>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep >>> process running with uid/gid: 0/0 >>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >>> process running with capabilities (eff/prm/inh): >>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >>> daemon running as pid 185437 >>> >>> 2023-03-15 17:35:47.974 8 WARNING >>> os_brick.initiator.connectors.nvmeof >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>> in _get_host_uuid: Unexpected error while running command. >>> >>> Command: blkid overlay -s UUID -o value >>> >>> Exit code: 2 >>> >>> Stdout: '' >>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>> Unexpected error while running command. >>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>> >>> >>> >>> It is stuck in creating image, do i need to run the template >>> mentioned here ?: >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> >>> >>> >>> The volume is already created and i do not understand why the >>> instance is stuck in spawning state. >>> >>> >>> >>> With regards, >>> >>> Swogat Pradhan >>> >>> >>> >>> >>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard >>> wrote: >>> >>>> >>> >>>> Does your environment use different network interfaces for each of >>> the networks? Or does it have a bond with everything on it? >>> >>>> >>> >>>> One issue I have seen before is that when launching instances, >>> there is a lot of network traffic between nodes as the hypervisor needs to >>> download the image from Glance. Along with various other services sending >>> normal network traffic, it can be enough to cause issues if everything is >>> running over a single 1Gbe interface. >>> >>>> >>> >>>> I have seen the same situation in fact when using a single >>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>> while you try to spawn the instance to see if you?re dropping packets. In >>> the situation I described, there were dropped packets which resulted in a >>> loss of communication between nova_compute and RMQ, so the node appeared >>> offline. You should also confirm that nova_compute is being disconnected in >>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>> instance. >>> >>>> >>> >>>> In my case, changing from active/backup to LACP helped. So, based >>> on that experience, from my perspective, is certainly sounds like some kind >>> of network issue. >>> >>>> >>> >>>> Regards, >>> >>>> >>> >>>> Brendan Shephard >>> >>>> Senior Software Engineer >>> >>>> Red Hat Australia >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> I tried to help someone with a similar issue some time ago in this >>> thread: >>> >>>> >>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>> >>>> >>> >>>> But apparently a neutron reinstallation fixed it for that user, not >>> sure if that could apply here. But is it possible that your nova and >>> neutron versions are different between central and edge site? Have you >>> restarted nova and neutron services on the compute nodes after >>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>> Maybe they can help narrow down the issue. >>> >>>> If there isn't any additional information in the debug logs I >>> probably would start "tearing down" rabbitmq. I didn't have to do that in a >>> production system yet so be careful. I can think of two routes: >>> >>>> >>> >>>> - Either remove queues, exchanges etc. while rabbit is running, >>> this will most likely impact client IO depending on your load. Check out >>> the rabbitmqctl commands. >>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all >>> nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>> >>>> >>> >>>> I can imagine that the failed reply "survives" while being >>> replicated across the rabbit nodes. But I don't really know the rabbit >>> internals too well, so maybe someone else can chime in here and give a >>> better advice. >>> >>>> >>> >>>> Regards, >>> >>>> Eugen >>> >>>> >>> >>>> Zitat von Swogat Pradhan : >>> >>>> >>> >>>> Hi, >>> >>>> Can someone please help me out on this issue? >>> >>>> >>> >>>> With regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>> wrote: >>> >>>> >>> >>>> Hi >>> >>>> I don't see any major packet loss. >>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to >>> packet >>> >>>> loss. >>> >>>> >>> >>>> with regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> Yes the MTU is the same as the default '1500'. >>> >>>> Generally I haven't seen any packet loss, but never checked when >>> >>>> launching the instance. >>> >>>> I will check that and come back. >>> >>>> But everytime i launch an instance the instance gets stuck at >>> spawning >>> >>>> state and there the hypervisor becomes down, so not sure if packet >>> loss >>> >>>> causes this. >>> >>>> >>> >>>> With regards, >>> >>>> Swogat pradhan >>> >>>> >>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>> >>>> >>> >>>> One more thing coming to mind is MTU size. Are they identical >>> between >>> >>>> central and edge site? Do you see packet loss through the tunnel? >>> >>>> >>> >>>> Zitat von Swogat Pradhan : >>> >>>> >>> >>>> > Hi Eugen, >>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am >>> not >>> >>>> > getting email's from you. >>> >>>> > Coming to the issue: >>> >>>> > >>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>> list_policies -p >>> >>>> / >>> >>>> > Listing policies for vhost "/" ... >>> >>>> > vhost name pattern apply-to definition priority >>> >>>> > / ha-all ^(?!amq\.).* queues >>> >>>> > >>> >>>> >>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>> >>>> > >>> >>>> > I have the edge site compute nodes up, it only goes down when i am >>> >>>> trying >>> >>>> > to launch an instance and the instance comes to a spawning state >>> and >>> >>>> then >>> >>>> > gets stuck. >>> >>>> > >>> >>>> > I have a tunnel setup between the central and the edge sites. >>> >>>> > >>> >>>> > With regards, >>> >>>> > Swogat Pradhan >>> >>>> > >>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>> >>>> swogatpradhan22 at gmail.com> >>> >>>> > wrote: >>> >>>> > >>> >>>> >> Hi Eugen, >>> >>>> >> For some reason i am not getting your email to me directly, i am >>> >>>> checking >>> >>>> >> the email digest and there i am able to find your reply. >>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>> >>>> >> Yes, these logs are from the time when the issue occurred. >>> >>>> >> >>> >>>> >> *Note: i am able to create vm's and perform other activities in >>> the >>> >>>> >> central site, only facing this issue in the edge site.* >>> >>>> >> >>> >>>> >> With regards, >>> >>>> >> Swogat Pradhan >>> >>>> >> >>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>> >>>> swogatpradhan22 at gmail.com> >>> >>>> >> wrote: >>> >>>> >> >>> >>>> >>> Hi Eugen, >>> >>>> >>> Thanks for your response. >>> >>>> >>> I have actually a 4 controller setup so here are the details: >>> >>>> >>> >>> >>>> >>> *PCS Status:* >>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest >>> ]: >>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>> >>>> Started >>> >>>> >>> overcloud-controller-no-ceph-3 >>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>> >>>> Started >>> >>>> >>> overcloud-controller-2 >>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>> >>>> Started >>> >>>> >>> overcloud-controller-1 >>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>> >>>> Started >>> >>>> >>> overcloud-controller-0 >>> >>>> >>> >>> >>>> >>> I have tried restarting the bundle multiple times but the issue >>> is >>> >>>> still >>> >>>> >>> present. >>> >>>> >>> >>> >>>> >>> *Cluster status:* >>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>> >>>> >>> Cluster status of node >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>> >>>> >>> Basics >>> >>>> >>> >>> >>>> >>> Cluster name: >>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>>> >>> >>> >>>> >>> Disk Nodes >>> >>>> >>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>> >>> >>> >>>> >>> Running Nodes >>> >>>> >>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>> >>> >>> >>>> >>> Versions >>> >>>> >>> >>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>> 3.8.3 >>> >>>> >>> on Erlang 22.3.4.1 >>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>> 3.8.3 >>> >>>> >>> on Erlang 22.3.4.1 >>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>> 3.8.3 >>> >>>> >>> on Erlang 22.3.4.1 >>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>> >>>> RabbitMQ >>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>> >>>> >>> >>> >>>> >>> Alarms >>> >>>> >>> >>> >>>> >>> (none) >>> >>>> >>> >>> >>>> >>> Network Partitions >>> >>>> >>> >>> >>>> >>> (none) >>> >>>> >>> >>> >>>> >>> Listeners >>> >>>> >>> >>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>> and CLI >>> >>>> tool >>> >>>> >>> communication >>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>>> >>> and AMQP 1.0 >>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>> and CLI >>> >>>> tool >>> >>>> >>> communication >>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>>> >>> and AMQP 1.0 >>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>> and CLI >>> >>>> tool >>> >>>> >>> communication >>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>> >>>> >>> and AMQP 1.0 >>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>> interface: >>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>> , >>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>> >>>> inter-node and >>> >>>> >>> CLI tool communication >>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>> , >>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: >>> AMQP >>> >>>> 0-9-1 >>> >>>> >>> and AMQP 1.0 >>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>> , >>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>> >>> >>> >>>> >>> Feature flags >>> >>>> >>> >>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>> >>>> >>> Flag: quorum_queue, state: enabled >>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>> >>>> >>> >>> >>>> >>> *Logs:* >>> >>>> >>> *(Attached)* >>> >>>> >>> >>> >>>> >>> With regards, >>> >>>> >>> Swogat Pradhan >>> >>>> >>> >>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>> >>>> swogatpradhan22 at gmail.com> >>> >>>> >>> wrote: >>> >>>> >>> >>> >>>> >>>> Hi, >>> >>>> >>>> Please find the nova conductor as well as nova api log. >>> >>>> >>>> >>> >>>> >>>> nova-conuctor: >>> >>>> >>>> >>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>> reply to >>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >>> reply to >>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >>> reply to >>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 >>> seconds >>> >>>> due to a >>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>> >>>> Abandoning...: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>> reply to >>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 >>> seconds >>> >>>> due to a >>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>> Abandoning...: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>> reply to >>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 >>> seconds >>> >>>> due to a >>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>> Abandoning...: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >>> enabled >>> >>>> with >>> >>>> >>>> backend dogpile.cache.null. >>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>> reply to >>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 >>> seconds >>> >>>> due to a >>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>> Abandoning...: >>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>> >>>> >>> >>>> >>>> With regards, >>> >>>> >>>> Swogat Pradhan >>> >>>> >>>> >>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>> >>>> >>>> >>> >>>> >>>>> Hi, >>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am >>> trying to >>> >>>> >>>>> launch vm's. >>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack >>> >>>> compute >>> >>>> >>>>> service list), the node comes backup when i restart the nova >>> >>>> compute >>> >>>> >>>>> service but then the launch of the vm fails. >>> >>>> >>>>> >>> >>>> >>>>> nova-compute.log >>> >>>> >>>>> >>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>> >>>> >>>>> instance usage >>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 >>> 07:00:00 >>> >>>> to >>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>> >>>> name: >>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >>> enabled >>> >>>> with >>> >>>> >>>>> backend dogpile.cache.null. >>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>> >>>> >>>>> privsep helper: >>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>> >>>> 'privsep-helper', >>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned >>> new >>> >>>> privsep >>> >>>> >>>>> daemon via rootwrap >>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] >>> privsep >>> >>>> >>>>> daemon starting >>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] >>> privsep >>> >>>> >>>>> process running with uid/gid: 0/0 >>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >>> privsep >>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >>> privsep >>> >>>> >>>>> daemon running as pid 2647 >>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>> >>>> os_brick.initiator.connectors.nvmeof >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>> >>>> >>>>> execution error >>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>> >>>> >>>>> Exit code: 2 >>> >>>> >>>>> Stdout: '' >>> >>>> >>>>> Stderr: '': >>> oslo_concurrency.processutils.ProcessExecutionError: >>> >>>> >>>>> Unexpected error while running command. >>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>> >>>> >>>>> >>> >>>> >>>>> Is there a way to solve this issue? >>> >>>> >>>>> >>> >>>> >>>>> >>> >>>> >>>>> With regards, >>> >>>> >>>>> >>> >>>> >>>>> Swogat Pradhan >>> >>>> >>>>> >>> >>>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 21 12:03:20 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 21 Mar 2023 17:33:20 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi, Seems like cinder is not using the local ceph. Ceph Output: [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l NAME SIZE PARENT FMT PROT LOCK 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 excl 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l NAME SIZE PARENT FMT PROT LOCK volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 [ceph: root at dcn02-ceph-all-0 /]# Attached the cinder config. Please let me know how I can solve this issue. With regards, Swogat Pradhan On Tue, Mar 21, 2023 at 3:53?PM John Fulton wrote: > in my last message under the line "On a DCN site if you run a command like > this:" I suggested some steps you could try to confirm the image is a COW > from the local glance as well as how to look at your cinder config. > > On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan > wrote: > >> Update: >> I uploaded an image directly to the dcn02 store, and it takes >> around 10,15 minutes to create a volume with image in dcn02. >> The image size is 389 MB. >> >> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>> Hi Jhon, >>> I checked in the ceph od dcn02, I can see the images created after >>> importing from the central site. >>> But launching an instance normally fails as it takes a long time for the >>> volume to get created. >>> >>> When launching an instance from volume the instance is getting created >>> properly without any errors. >>> >>> I tried to cache images in nova using >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> but getting checksum failed error. >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: >>> >>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>> wrote: >>>> > >>>> > Update: After restarting the nova services on the controller and >>>> running the deploy script on the edge site, I was able to launch the VM >>>> from volume. >>>> > >>>> > Right now the instance creation is failing as the block device >>>> creation is stuck in creating state, it is taking more than 10 mins for the >>>> volume to be created, whereas the image has already been imported to the >>>> edge glance. >>>> >>>> Try following this document and making the same observations in your >>>> environment for AZs and their local ceph cluster. >>>> >>>> >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>> >>>> On a DCN site if you run a command like this: >>>> >>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>> /etc/ceph/dcn0.client.admin.keyring >>>> $ rbd --cluster dcn0 -p volumes ls -l >>>> NAME SIZE PARENT >>>> FMT PROT LOCK >>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>> $ >>>> >>>> Then, you should see the parent of the volume is the image which is on >>>> the same local ceph cluster. >>>> >>>> I wonder if something is misconfigured and thus you're encountering >>>> the streaming behavior described here: >>>> >>>> Ideally all images should reside in the central Glance and be copied >>>> to DCN sites before instances of those images are booted on DCN sites. >>>> If an image is not copied to a DCN site before it is booted, then the >>>> image will be streamed to the DCN site and then the image will boot as >>>> an instance. This happens because Glance at the DCN site has access to >>>> the images store at the Central ceph cluster. Though the booting of >>>> the image will take time because it has not been copied in advance, >>>> this is still preferable to failing to boot the image. >>>> >>>> You can also exec into the cinder container at the DCN site and >>>> confirm it's using it's local ceph cluster. >>>> >>>> John >>>> >>>> > >>>> > I will try and create a new fresh image and test again then update. >>>> > >>>> > With regards, >>>> > Swogat Pradhan >>>> > >>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >> >>>> >> Update: >>>> >> In the hypervisor list the compute node state is showing down. >>>> >> >>>> >> >>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>> >>>> >>> Hi Brendan, >>>> >>> Now i have deployed another site where i have used 2 linux bonds >>>> network template for both 3 compute nodes and 3 ceph nodes. >>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>> >>> I used a cirros image to launch instance but the instance timed out >>>> so i waited for the volume to be created. >>>> >>> Once the volume was created i tried launching the instance from the >>>> volume and still the instance is stuck in spawning state. >>>> >>> >>>> >>> Here is the nova-compute log: >>>> >>> >>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep >>>> daemon starting >>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep >>>> process running with uid/gid: 0/0 >>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >>>> process running with capabilities (eff/prm/inh): >>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep >>>> daemon running as pid 185437 >>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>> os_brick.initiator.connectors.nvmeof >>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>> in _get_host_uuid: Unexpected error while running command. >>>> >>> Command: blkid overlay -s UUID -o value >>>> >>> Exit code: 2 >>>> >>> Stdout: '' >>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> Unexpected error while running command. >>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>> >>> >>>> >>> It is stuck in creating image, do i need to run the template >>>> mentioned here ?: >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>> >>> >>>> >>> The volume is already created and i do not understand why the >>>> instance is stuck in spawning state. >>>> >>> >>>> >>> With regards, >>>> >>> Swogat Pradhan >>>> >>> >>>> >>> >>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>> bshephar at redhat.com> wrote: >>>> >>>> >>>> >>>> Does your environment use different network interfaces for each of >>>> the networks? Or does it have a bond with everything on it? >>>> >>>> >>>> >>>> One issue I have seen before is that when launching instances, >>>> there is a lot of network traffic between nodes as the hypervisor needs to >>>> download the image from Glance. Along with various other services sending >>>> normal network traffic, it can be enough to cause issues if everything is >>>> running over a single 1Gbe interface. >>>> >>>> >>>> >>>> I have seen the same situation in fact when using a single >>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>>> while you try to spawn the instance to see if you?re dropping packets. In >>>> the situation I described, there were dropped packets which resulted in a >>>> loss of communication between nova_compute and RMQ, so the node appeared >>>> offline. You should also confirm that nova_compute is being disconnected in >>>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>>> instance. >>>> >>>> >>>> >>>> In my case, changing from active/backup to LACP helped. So, based >>>> on that experience, from my perspective, is certainly sounds like some kind >>>> of network issue. >>>> >>>> >>>> >>>> Regards, >>>> >>>> >>>> >>>> Brendan Shephard >>>> >>>> Senior Software Engineer >>>> >>>> Red Hat Australia >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I tried to help someone with a similar issue some time ago in this >>>> thread: >>>> >>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>> >>>> >>>> >>>> But apparently a neutron reinstallation fixed it for that user, >>>> not sure if that could apply here. But is it possible that your nova and >>>> neutron versions are different between central and edge site? Have you >>>> restarted nova and neutron services on the compute nodes after >>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>> Maybe they can help narrow down the issue. >>>> >>>> If there isn't any additional information in the debug logs I >>>> probably would start "tearing down" rabbitmq. I didn't have to do that in a >>>> production system yet so be careful. I can think of two routes: >>>> >>>> >>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, >>>> this will most likely impact client IO depending on your load. Check out >>>> the rabbitmqctl commands. >>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all >>>> nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>> >>>> >>>> >>>> I can imagine that the failed reply "survives" while being >>>> replicated across the rabbit nodes. But I don't really know the rabbit >>>> internals too well, so maybe someone else can chime in here and give a >>>> better advice. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Eugen >>>> >>>> >>>> >>>> Zitat von Swogat Pradhan : >>>> >>>> >>>> >>>> Hi, >>>> >>>> Can someone please help me out on this issue? >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>>> wrote: >>>> >>>> >>>> >>>> Hi >>>> >>>> I don't see any major packet loss. >>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to >>>> packet >>>> >>>> loss. >>>> >>>> >>>> >>>> with regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> Yes the MTU is the same as the default '1500'. >>>> >>>> Generally I haven't seen any packet loss, but never checked when >>>> >>>> launching the instance. >>>> >>>> I will check that and come back. >>>> >>>> But everytime i launch an instance the instance gets stuck at >>>> spawning >>>> >>>> state and there the hypervisor becomes down, so not sure if packet >>>> loss >>>> >>>> causes this. >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat pradhan >>>> >>>> >>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>>> >>>> >>>> >>>> One more thing coming to mind is MTU size. Are they identical >>>> between >>>> >>>> central and edge site? Do you see packet loss through the tunnel? >>>> >>>> >>>> >>>> Zitat von Swogat Pradhan : >>>> >>>> >>>> >>>> > Hi Eugen, >>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i >>>> am not >>>> >>>> > getting email's from you. >>>> >>>> > Coming to the issue: >>>> >>>> > >>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>> list_policies -p >>>> >>>> / >>>> >>>> > Listing policies for vhost "/" ... >>>> >>>> > vhost name pattern apply-to definition priority >>>> >>>> > / ha-all ^(?!amq\.).* queues >>>> >>>> > >>>> >>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>> >>>> > >>>> >>>> > I have the edge site compute nodes up, it only goes down when i >>>> am >>>> >>>> trying >>>> >>>> > to launch an instance and the instance comes to a spawning state >>>> and >>>> >>>> then >>>> >>>> > gets stuck. >>>> >>>> > >>>> >>>> > I have a tunnel setup between the central and the edge sites. >>>> >>>> > >>>> >>>> > With regards, >>>> >>>> > Swogat Pradhan >>>> >>>> > >>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>> > wrote: >>>> >>>> > >>>> >>>> >> Hi Eugen, >>>> >>>> >> For some reason i am not getting your email to me directly, i am >>>> >>>> checking >>>> >>>> >> the email digest and there i am able to find your reply. >>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>>> >>>> >> >>>> >>>> >> *Note: i am able to create vm's and perform other activities in >>>> the >>>> >>>> >> central site, only facing this issue in the edge site.* >>>> >>>> >> >>>> >>>> >> With regards, >>>> >>>> >> Swogat Pradhan >>>> >>>> >> >>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>> >> wrote: >>>> >>>> >> >>>> >>>> >>> Hi Eugen, >>>> >>>> >>> Thanks for your response. >>>> >>>> >>> I have actually a 4 controller setup so here are the details: >>>> >>>> >>> >>>> >>>> >>> *PCS Status:* >>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest >>>> ]: >>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>> Started >>>> >>>> >>> overcloud-controller-no-ceph-3 >>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>> Started >>>> >>>> >>> overcloud-controller-2 >>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>> Started >>>> >>>> >>> overcloud-controller-1 >>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>> Started >>>> >>>> >>> overcloud-controller-0 >>>> >>>> >>> >>>> >>>> >>> I have tried restarting the bundle multiple times but the >>>> issue is >>>> >>>> still >>>> >>>> >>> present. >>>> >>>> >>> >>>> >>>> >>> *Cluster status:* >>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>> >>>> >>> Cluster status of node >>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>> >>>> >>> Basics >>>> >>>> >>> >>>> >>>> >>> Cluster name: >>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>> >>>> >>> >>>> >>>> >>> Disk Nodes >>>> >>>> >>> >>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>> >>> >>>> >>>> >>> Running Nodes >>>> >>>> >>> >>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>> >>> >>>> >>>> >>> Versions >>>> >>>> >>> >>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>> 3.8.3 >>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>> 3.8.3 >>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>> 3.8.3 >>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> : >>>> >>>> RabbitMQ >>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>> >>>> >>> >>>> >>>> >>> Alarms >>>> >>>> >>> >>>> >>>> >>> (none) >>>> >>>> >>> >>>> >>>> >>> Network Partitions >>>> >>>> >>> >>>> >>>> >>> (none) >>>> >>>> >>> >>>> >>>> >>> Listeners >>>> >>>> >>> >>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>>> and CLI >>>> >>>> tool >>>> >>>> >>> communication >>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>> >>> and AMQP 1.0 >>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>>> and CLI >>>> >>>> tool >>>> >>>> >>> communication >>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>> >>> and AMQP 1.0 >>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node >>>> and CLI >>>> >>>> tool >>>> >>>> >>> communication >>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>> >>> and AMQP 1.0 >>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>> interface: >>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>> , >>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>> >>>> inter-node and >>>> >>>> >>> CLI tool communication >>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>> , >>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>>> purpose: AMQP >>>> >>>> 0-9-1 >>>> >>>> >>> and AMQP 1.0 >>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>> , >>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>> >>> >>>> >>>> >>> Feature flags >>>> >>>> >>> >>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>> >>>> >>> Flag: quorum_queue, state: enabled >>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>> >>>> >>> >>>> >>>> >>> *Logs:* >>>> >>>> >>> *(Attached)* >>>> >>>> >>> >>>> >>>> >>> With regards, >>>> >>>> >>> Swogat Pradhan >>>> >>>> >>> >>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>> >>> wrote: >>>> >>>> >>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>>> >>>> >>>> >>>> >>>> >>>> nova-conuctor: >>>> >>>> >>>> >>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>>> reply to >>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >>>> reply to >>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop >>>> reply to >>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 >>>> seconds >>>> >>>> due to a >>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>> >>>> Abandoning...: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>>> reply to >>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 >>>> seconds >>>> >>>> due to a >>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>> Abandoning...: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>>> reply to >>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 >>>> seconds >>>> >>>> due to a >>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>> Abandoning...: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >>>> enabled >>>> >>>> with >>>> >>>> >>>> backend dogpile.cache.null. >>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop >>>> reply to >>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 >>>> seconds >>>> >>>> due to a >>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>> Abandoning...: >>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>> >>>> >>>> >>>> >>>> With regards, >>>> >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hi, >>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am >>>> trying to >>>> >>>> >>>>> launch vm's. >>>> >>>> >>>>> When the VM is in spawning state the node goes down >>>> (openstack >>>> >>>> compute >>>> >>>> >>>>> service list), the node comes backup when i restart the nova >>>> >>>> compute >>>> >>>> >>>>> service but then the launch of the vm fails. >>>> >>>> >>>>> >>>> >>>> >>>>> nova-compute.log >>>> >>>> >>>>> >>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>> >>>> >>>>> instance usage >>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 >>>> 07:00:00 >>>> >>>> to >>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on >>>> node >>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied >>>> device >>>> >>>> name: >>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >>>> enabled >>>> >>>> with >>>> >>>> >>>>> backend dogpile.cache.null. >>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>> >>>> >>>>> privsep helper: >>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>> >>>> 'privsep-helper', >>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned >>>> new >>>> >>>> privsep >>>> >>>> >>>>> daemon via rootwrap >>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] >>>> privsep >>>> >>>> >>>>> daemon starting >>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] >>>> privsep >>>> >>>> >>>>> process running with uid/gid: 0/0 >>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >>>> privsep >>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] >>>> privsep >>>> >>>> >>>>> daemon running as pid 2647 >>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>> >>>> os_brick.initiator.connectors.nvmeof >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>> >>>> >>>>> execution error >>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>> >>>> >>>>> Exit code: 2 >>>> >>>> >>>>> Stdout: '' >>>> >>>> >>>>> Stderr: '': >>>> oslo_concurrency.processutils.ProcessExecutionError: >>>> >>>> >>>>> Unexpected error while running command. >>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>> >>>>> >>>> >>>> >>>>> Is there a way to solve this issue? >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> With regards, >>>> >>>> >>>>> >>>> >>>> >>>>> Swogat Pradhan >>>> >>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cinder.conf Type: application/octet-stream Size: 2768 bytes Desc: not available URL: From johfulto at redhat.com Tue Mar 21 12:22:22 2023 From: johfulto at redhat.com (John Fulton) Date: Tue, 21 Mar 2023 08:22:22 -0400 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan wrote: > > Hi, > Seems like cinder is not using the local ceph. That explains the issue. It's a misconfiguration. I hope this is not a production system since the mailing list now has the cinder.conf which contains passwords. The section that looks like this: [tripleo_ceph] volume_backend_name=tripleo_ceph volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_ceph_conf=/etc/ceph/ceph.conf rbd_user=openstack rbd_pool=volumes rbd_flatten_volume_from_snapshot=False rbd_secret_uuid= report_discard_supported=True Should be updated to refer to the local DCN ceph cluster and not the central one. Use the ceph conf file for that cluster and ensure the rbd_secret_uuid corresponds to that one. TripleO?s convention is to set the rbd_secret_uuid to the FSID of the Ceph cluster. The FSID should be in the ceph.conf file. The tripleo_nova_libvirt role will use virsh secret-* commands so that libvirt can retrieve the cephx secret using the FSID as a key. This can be confirmed with `podman exec nova_virtsecretd virsh secret-get-value $FSID`. The documentation describes how to configure the central and DCN sites correctly but an error seems to have occurred while you were following it. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html John > > Ceph Output: > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l > NAME SIZE PARENT FMT PROT LOCK > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 excl > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes > > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l > NAME SIZE PARENT FMT PROT LOCK > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 > [ceph: root at dcn02-ceph-all-0 /]# > > Attached the cinder config. > Please let me know how I can solve this issue. > > With regards, > Swogat Pradhan > > On Tue, Mar 21, 2023 at 3:53?PM John Fulton wrote: >> >> in my last message under the line "On a DCN site if you run a command like this:" I suggested some steps you could try to confirm the image is a COW from the local glance as well as how to look at your cinder config. >> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan wrote: >>> >>> Update: >>> I uploaded an image directly to the dcn02 store, and it takes around 10,15 minutes to create a volume with image in dcn02. >>> The image size is 389 MB. >>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan wrote: >>>> >>>> Hi Jhon, >>>> I checked in the ceph od dcn02, I can see the images created after importing from the central site. >>>> But launching an instance normally fails as it takes a long time for the volume to get created. >>>> >>>> When launching an instance from volume the instance is getting created properly without any errors. >>>> >>>> I tried to cache images in nova using https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html but getting checksum failed error. >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: >>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>> wrote: >>>>> > >>>>> > Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume. >>>>> > >>>>> > Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance. >>>>> >>>>> Try following this document and making the same observations in your >>>>> environment for AZs and their local ceph cluster. >>>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>> >>>>> On a DCN site if you run a command like this: >>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>> NAME SIZE PARENT >>>>> FMT PROT LOCK >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>>> $ >>>>> >>>>> Then, you should see the parent of the volume is the image which is on >>>>> the same local ceph cluster. >>>>> >>>>> I wonder if something is misconfigured and thus you're encountering >>>>> the streaming behavior described here: >>>>> >>>>> Ideally all images should reside in the central Glance and be copied >>>>> to DCN sites before instances of those images are booted on DCN sites. >>>>> If an image is not copied to a DCN site before it is booted, then the >>>>> image will be streamed to the DCN site and then the image will boot as >>>>> an instance. This happens because Glance at the DCN site has access to >>>>> the images store at the Central ceph cluster. Though the booting of >>>>> the image will take time because it has not been copied in advance, >>>>> this is still preferable to failing to boot the image. >>>>> >>>>> You can also exec into the cinder container at the DCN site and >>>>> confirm it's using it's local ceph cluster. >>>>> >>>>> John >>>>> >>>>> > >>>>> > I will try and create a new fresh image and test again then update. >>>>> > >>>>> > With regards, >>>>> > Swogat Pradhan >>>>> > >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan wrote: >>>>> >> >>>>> >> Update: >>>>> >> In the hypervisor list the compute node state is showing down. >>>>> >> >>>>> >> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan wrote: >>>>> >>> >>>>> >>> Hi Brendan, >>>>> >>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>> >>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created. >>>>> >>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state. >>>>> >>> >>>>> >>> Here is the nova-compute log: >>>>> >>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command. >>>>> >>> Command: blkid overlay -s UUID -o value >>>>> >>> Exit code: 2 >>>>> >>> Stdout: '' >>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>> >>> >>>>> >>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>> >>> >>>>> >>> The volume is already created and i do not understand why the instance is stuck in spawning state. >>>>> >>> >>>>> >>> With regards, >>>>> >>> Swogat Pradhan >>>>> >>> >>>>> >>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard wrote: >>>>> >>>> >>>>> >>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it? >>>>> >>>> >>>>> >>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface. >>>>> >>>> >>>>> >>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance. >>>>> >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue. >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> >>>>> >>>> Brendan Shephard >>>>> >>>> Senior Software Engineer >>>>> >>>> Red Hat Australia >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>>>> >>>> >>>>> >>>> Hi, >>>>> >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago in this thread: >>>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>> >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue. >>>>> >>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes: >>>>> >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands. >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>>> >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice. >>>>> >>>> >>>>> >>>> Regards, >>>>> >>>> Eugen >>>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>> >>>> >>>>> >>>> Hi, >>>>> >>>> Can someone please help me out on this issue? >>>>> >>>> >>>>> >>>> With regards, >>>>> >>>> Swogat Pradhan >>>>> >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >>>>> >>>> wrote: >>>>> >>>> >>>>> >>>> Hi >>>>> >>>> I don't see any major packet loss. >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet >>>>> >>>> loss. >>>>> >>>> >>>>> >>>> with regards, >>>>> >>>> Swogat Pradhan >>>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >>>>> >>>> wrote: >>>>> >>>> >>>>> >>>> Hi, >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>> >>>> Generally I haven't seen any packet loss, but never checked when >>>>> >>>> launching the instance. >>>>> >>>> I will check that and come back. >>>>> >>>> But everytime i launch an instance the instance gets stuck at spawning >>>>> >>>> state and there the hypervisor becomes down, so not sure if packet loss >>>>> >>>> causes this. >>>>> >>>> >>>>> >>>> With regards, >>>>> >>>> Swogat pradhan >>>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>>>> >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical between >>>>> >>>> central and edge site? Do you see packet loss through the tunnel? >>>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>> >>>> >>>>> >>>> > Hi Eugen, >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am not >>>>> >>>> > getting email's from you. >>>>> >>>> > Coming to the issue: >>>>> >>>> > >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>>>> >>>> / >>>>> >>>> > Listing policies for vhost "/" ... >>>>> >>>> > vhost name pattern apply-to definition priority >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>> >>>> > >>>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>> >>>> > >>>>> >>>> > I have the edge site compute nodes up, it only goes down when i am >>>>> >>>> trying >>>>> >>>> > to launch an instance and the instance comes to a spawning state and >>>>> >>>> then >>>>> >>>> > gets stuck. >>>>> >>>> > >>>>> >>>> > I have a tunnel setup between the central and the edge sites. >>>>> >>>> > >>>>> >>>> > With regards, >>>>> >>>> > Swogat Pradhan >>>>> >>>> > >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>> > wrote: >>>>> >>>> > >>>>> >>>> >> Hi Eugen, >>>>> >>>> >> For some reason i am not getting your email to me directly, i am >>>>> >>>> checking >>>>> >>>> >> the email digest and there i am able to find your reply. >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>>>> >>>> >> >>>>> >>>> >> *Note: i am able to create vm's and perform other activities in the >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>>> >>>> >> >>>>> >>>> >> With regards, >>>>> >>>> >> Swogat Pradhan >>>>> >>>> >> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>> >> wrote: >>>>> >>>> >> >>>>> >>>> >>> Hi Eugen, >>>>> >>>> >>> Thanks for your response. >>>>> >>>> >>> I have actually a 4 controller setup so here are the details: >>>>> >>>> >>> >>>>> >>>> >>> *PCS Status:* >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>> Started >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>> Started >>>>> >>>> >>> overcloud-controller-2 >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>> Started >>>>> >>>> >>> overcloud-controller-1 >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>> Started >>>>> >>>> >>> overcloud-controller-0 >>>>> >>>> >>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the issue is >>>>> >>>> still >>>>> >>>> >>> present. >>>>> >>>> >>> >>>>> >>>> >>> *Cluster status:* >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>>> >>>> >>> Cluster status of node >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>> >>>> >>> Basics >>>>> >>>> >>> >>>>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>> >>>> >>> >>>>> >>>> >>> Disk Nodes >>>>> >>>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>> >>> >>>>> >>>> >>> Running Nodes >>>>> >>>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>> >>> >>>>> >>>> >>> Versions >>>>> >>>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>> >>>> 3.8.3 >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>> >>>> 3.8.3 >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>> >>>> 3.8.3 >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>> >>>> RabbitMQ >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>> >>>> >>> >>>>> >>>> >>> Alarms >>>>> >>>> >>> >>>>> >>>> >>> (none) >>>>> >>>> >>> >>>>> >>>> >>> Network Partitions >>>>> >>>> >>> >>>>> >>>> >>> (none) >>>>> >>>> >>> >>>>> >>>> >>> Listeners >>>>> >>>> >>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> >>>> tool >>>>> >>>> >>> communication >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> >>>> tool >>>>> >>>> >>> communication >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>>> >>>> tool >>>>> >>>> >>> communication >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>> interface: >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>> , >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>>> >>>> inter-node and >>>>> >>>> >>> CLI tool communication >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>> , >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>>>> >>>> 0-9-1 >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>> , >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>> >>> >>>>> >>>> >>> Feature flags >>>>> >>>> >>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>> >>>> >>> >>>>> >>>> >>> *Logs:* >>>>> >>>> >>> *(Attached)* >>>>> >>>> >>> >>>>> >>>> >>> With regards, >>>>> >>>> >>> Swogat Pradhan >>>>> >>>> >>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>> >>> wrote: >>>>> >>>> >>> >>>>> >>>> >>>> Hi, >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>>>> >>>> >>>> >>>>> >>>> >>>> nova-conuctor: >>>>> >>>> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>>>> >>>> due to a >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>>> >>>> Abandoning...: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>>>> >>>> due to a >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>> Abandoning...: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>>>> >>>> due to a >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>> Abandoning...: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>>> >>>> with >>>>> >>>> >>>> backend dogpile.cache.null. >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>>>> >>>> due to a >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>> Abandoning...: >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>> >>>> >>>>> >>>> >>>> With regards, >>>>> >>>> >>>> Swogat Pradhan >>>>> >>>> >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>> >>>> >>>>> >>>> >>>>> Hi, >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>>> >>>> >>>>> launch vm's. >>>>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack >>>>> >>>> compute >>>>> >>>> >>>>> service list), the node comes backup when i restart the nova >>>>> >>>> compute >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>> >>>> >>>>> >>>>> >>>> >>>>> nova-compute.log >>>>> >>>> >>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>>> >>>> >>>>> instance usage >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>>>> >>>> to >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>>>> >>>> name: >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>>> >>>> with >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>>> >>>> >>>>> privsep helper: >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>>> >>>> 'privsep-helper', >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>>>> >>>> privsep >>>>> >>>> >>>>> daemon via rootwrap >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>> >>>>> daemon starting >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>>> >>>> >>>>> daemon running as pid 2647 >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>>> >>>> >>>>> execution error >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>> >>>> >>>>> Exit code: 2 >>>>> >>>> >>>>> Stdout: '' >>>>> >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>>> >>>> >>>>> Unexpected error while running command. >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>> >>>> >>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> With regards, >>>>> >>>> >>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>> >>>> >>>>> >>>>> >>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> From rafaelweingartner at gmail.com Tue Mar 21 12:39:23 2023 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Tue, 21 Mar 2023 09:39:23 -0300 Subject: [CloudKitty] Virtual PTG March 2023 Message-ID: Hello everyone, As you probably heard our next PTG will be held virtually in March, 2023. We've marked March 31, at 13:00-15:00 UTC [1]. However, if you guys would like some other dates and/or time, just let me know. The room we selected is called "Bexar". I've also created an etherpad [2] to collect ideas/topics for the PTG sessions. If you have anything to discuss, please don't hesitate to write it there. [1] https://ptg.opendev.org/ptg.html [2] https://etherpad.opendev.org/p/march2023-ptg-cloudkitty -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From pshchelokovskyy at mirantis.com Tue Mar 21 12:58:01 2023 From: pshchelokovskyy at mirantis.com (Pavlo Shchelokovskyy) Date: Tue, 21 Mar 2023 14:58:01 +0200 Subject: [barbican] database is growing and can not be purged In-Reply-To: References: Message-ID: Hi all, after having some thoughts, I came to another solution, that I think is the most appropriate here, kind of a variation of option 1: 4. Castellan should cleanup intermediate resources before returning secret ID(s) to the caller As I see it now, the root of the problem is in castellan's BarbicanKeyManager and the way it hides implementation details from the user. Since it returns only IDs of created secrets to the user, the api caller has no notion that something else has to be deleted once it is time for this. Since Barbican API is perfectly capable to delete orders and containers without deleting the secrets they reference, this is what castellan should do just before it returns IDs of generated secrets to the API caller. The only small trouble is that with default 'legacy' API policies in Barbican, not everybody who can create orders can delete them.. but this can be accounted for with try..except. Please review the patch in this regard https://review.opendev.org/c/openstack/castellan/+/877423 Best regards, On Mon, Mar 6, 2023 at 7:32?PM Pavlo Shchelokovskyy < pshchelokovskyy at mirantis.com> wrote: > Hi all, > > we are observing the following behavior in Barbican: > - OpenStack environment is using both encrypted Cinder volumes and > encrypted local storage (lvm) for Nova instances > - over the time, the secrets and orders tables are growing > - many soft-deleted entries in secrets DB can not be purged by the db > cleanup script > > As I understand what is happening - both Cinder and Nova create secrets in > Barbican on behalf of the user when creating an encrypted volume or booting > an instance with encrypted local storage. They both do it via castellan > library, that under the hood creates orders in Barbican, waits for them to > become active and returns to the caller only the ID of the generated > secret. When time comes to delete the thing (volume or instance) > Cinder/Nova again use castellan, but only delete the secret, not the order > (they are not aware that there was any 'order' created anyway). As a > result, the orders are left in place, and DB cleanup procedure does not > delete soft-deleted secrets when there's an ACTIVE order referencing such > secret. > > This is troublesomes on many levels - users who use Cinder or Nova may not > even be aware that they are creating something in Barbican. Orders > accumulating like that may eventually result in cryptic errors when e.g. > when you run out of quota for orders. And what's more, default Barbican > policies do allow 'normal' (creator) users to create an order, but not > delete it (only project admin can do it), so even if the users are aware of > Barbican involvement, they can not delete those orders manually anyway. > Plus there's no good way in API to determine outright which orders are > referencing deleted secrets. > > I see several ways of dealing with that and would like to ask for your > opinion on what would be the best one: > 1. Amend Barbican API to allow filtering orders by the secrets, when > castellan deletes a secret - search for corresponding order and delete it > as well, change default policy to actually allow order deletion by the same > users who can create them. > 2. Cascade-delete orders when deleting secrets - this is easy but probably > violates that very policy that disallowed normal users to delete orders. > 3. improve the database cleanup so it first marks any order that > references a deleted secret also as deleted, so later when time comes both > could be purged (or something like that). This also has a similar downside > to the previous option by not being explicit enough. > > I've filed a bug for that > https://storyboard.openstack.org/#!/story/2010625 and proposed a patch > for option 2 (cascade delete), but would like to ask what would you see as > the most appropriate way or may be there's something else that I've missed. > > Btw, the problem is probably even more pronounced with keypairs - when > castellan is used to create those, under the hood both order and container > are created besides the actual secrets, and again only the secret ids are > returned to the caller. When time comes to delete things, the caller only > knows about secret IDs, and can only delete them, leaving both container > and order behind. > Luckily, I did not find any place across OpenStack that actually creates > keypairs using castellan... but the problem is definitely there. > > Best regards, > -- > Dr. Pavlo Shchelokovskyy > Principal Software Engineer > Mirantis Inc > www.mirantis.com > -- Dr. Pavlo Shchelokovskyy Principal Software Engineer Mirantis Inc www.mirantis.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 21 12:40:19 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 21 Mar 2023 18:10:19 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Jhon, This seems to be an issue. When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster parameter was specified to the respective cluster names but the config files were created in the name of ceph.conf and keyring was ceph.client.openstack.keyring. Which created issues in glance as well as the naming convention of the files didn't match the cluster names, so i had to manually rename the central ceph conf file as such: [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ [root at dcn02-compute-0 ceph]# ll total 16 -rw-------. 1 root root 257 Mar 13 13:56 ceph_central.client.openstack.keyring -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf [root at dcn02-compute-0 ceph]# ceph.conf and ceph.client.openstack.keyring contain the fsid of the respective clusters in both dcn01 and dcn02. In the above cli output, the ceph.conf and ceph.client... are the files used to access dcn02 ceph cluster and ceph_central* files are used in for accessing central ceph cluster. glance multistore config: [dcn02] rbd_store_ceph_conf=/etc/ceph/ceph.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=dcn02 rbd glance store [ceph_central] rbd_store_ceph_conf=/etc/ceph/ceph_central.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=Default glance store backend. With regards, Swogat Pradhan On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: > On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan > wrote: > > > > Hi, > > Seems like cinder is not using the local ceph. > > That explains the issue. It's a misconfiguration. > > I hope this is not a production system since the mailing list now has > the cinder.conf which contains passwords. > > The section that looks like this: > > [tripleo_ceph] > volume_backend_name=tripleo_ceph > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_ceph_conf=/etc/ceph/ceph.conf > rbd_user=openstack > rbd_pool=volumes > rbd_flatten_volume_from_snapshot=False > rbd_secret_uuid= > report_discard_supported=True > > Should be updated to refer to the local DCN ceph cluster and not the > central one. Use the ceph conf file for that cluster and ensure the > rbd_secret_uuid corresponds to that one. > > TripleO?s convention is to set the rbd_secret_uuid to the FSID of the > Ceph cluster. The FSID should be in the ceph.conf file. The > tripleo_nova_libvirt role will use virsh secret-* commands so that > libvirt can retrieve the cephx secret using the FSID as a key. This > can be confirmed with `podman exec nova_virtsecretd virsh > secret-get-value $FSID`. > > The documentation describes how to configure the central and DCN sites > correctly but an error seems to have occurred while you were following > it. > > > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html > > John > > > > > Ceph Output: > > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l > > NAME SIZE PARENT FMT PROT > LOCK > > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 > excl > > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 > > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes > > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 > > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes > > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 > > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes > > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 > > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes > > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 > > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes > > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 > > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes > > > > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l > > NAME SIZE PARENT FMT PROT > LOCK > > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 > > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 > > [ceph: root at dcn02-ceph-all-0 /]# > > > > Attached the cinder config. > > Please let me know how I can solve this issue. > > > > With regards, > > Swogat Pradhan > > > > On Tue, Mar 21, 2023 at 3:53?PM John Fulton wrote: > >> > >> in my last message under the line "On a DCN site if you run a command > like this:" I suggested some steps you could try to confirm the image is a > COW from the local glance as well as how to look at your cinder config. > >> > >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>> > >>> Update: > >>> I uploaded an image directly to the dcn02 store, and it takes around > 10,15 minutes to create a volume with image in dcn02. > >>> The image size is 389 MB. > >>> > >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>> > >>>> Hi Jhon, > >>>> I checked in the ceph od dcn02, I can see the images created after > importing from the central site. > >>>> But launching an instance normally fails as it takes a long time for > the volume to get created. > >>>> > >>>> When launching an instance from volume the instance is getting > created properly without any errors. > >>>> > >>>> I tried to cache images in nova using > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > but getting checksum failed error. > >>>> > >>>> With regards, > >>>> Swogat Pradhan > >>>> > >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton > wrote: > >>>>> > >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan > >>>>> wrote: > >>>>> > > >>>>> > Update: After restarting the nova services on the controller and > running the deploy script on the edge site, I was able to launch the VM > from volume. > >>>>> > > >>>>> > Right now the instance creation is failing as the block device > creation is stuck in creating state, it is taking more than 10 mins for the > volume to be created, whereas the image has already been imported to the > edge glance. > >>>>> > >>>>> Try following this document and making the same observations in your > >>>>> environment for AZs and their local ceph cluster. > >>>>> > >>>>> > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites > >>>>> > >>>>> On a DCN site if you run a command like this: > >>>>> > >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring > >>>>> /etc/ceph/dcn0.client.admin.keyring > >>>>> $ rbd --cluster dcn0 -p volumes ls -l > >>>>> NAME SIZE PARENT > >>>>> FMT PROT LOCK > >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB > >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl > >>>>> $ > >>>>> > >>>>> Then, you should see the parent of the volume is the image which is > on > >>>>> the same local ceph cluster. > >>>>> > >>>>> I wonder if something is misconfigured and thus you're encountering > >>>>> the streaming behavior described here: > >>>>> > >>>>> Ideally all images should reside in the central Glance and be copied > >>>>> to DCN sites before instances of those images are booted on DCN > sites. > >>>>> If an image is not copied to a DCN site before it is booted, then the > >>>>> image will be streamed to the DCN site and then the image will boot > as > >>>>> an instance. This happens because Glance at the DCN site has access > to > >>>>> the images store at the Central ceph cluster. Though the booting of > >>>>> the image will take time because it has not been copied in advance, > >>>>> this is still preferable to failing to boot the image. > >>>>> > >>>>> You can also exec into the cinder container at the DCN site and > >>>>> confirm it's using it's local ceph cluster. > >>>>> > >>>>> John > >>>>> > >>>>> > > >>>>> > I will try and create a new fresh image and test again then update. > >>>>> > > >>>>> > With regards, > >>>>> > Swogat Pradhan > >>>>> > > >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>>> >> > >>>>> >> Update: > >>>>> >> In the hypervisor list the compute node state is showing down. > >>>>> >> > >>>>> >> > >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>>> >>> > >>>>> >>> Hi Brendan, > >>>>> >>> Now i have deployed another site where i have used 2 linux bonds > network template for both 3 compute nodes and 3 ceph nodes. > >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). > >>>>> >>> I used a cirros image to launch instance but the instance timed > out so i waited for the volume to be created. > >>>>> >>> Once the volume was created i tried launching the instance from > the volume and still the instance is stuck in spawning state. > >>>>> >>> > >>>>> >>> Here is the nova-compute log: > >>>>> >>> > >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] > privsep daemon starting > >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] > privsep process running with uid/gid: 0/0 > >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] > privsep process running with capabilities (eff/prm/inh): > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] > privsep daemon running as pid 185437 > >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING > os_brick.initiator.connectors.nvmeof > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error > in _get_host_uuid: Unexpected error while running command. > >>>>> >>> Command: blkid overlay -s UUID -o value > >>>>> >>> Exit code: 2 > >>>>> >>> Stdout: '' > >>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: > Unexpected error while running command. > >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > 450b749c-a10a-4308-80a9-3b8020fee758] Creating image > >>>>> >>> > >>>>> >>> It is stuck in creating image, do i need to run the template > mentioned here ?: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > >>>>> >>> > >>>>> >>> The volume is already created and i do not understand why the > instance is stuck in spawning state. > >>>>> >>> > >>>>> >>> With regards, > >>>>> >>> Swogat Pradhan > >>>>> >>> > >>>>> >>> > >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < > bshephar at redhat.com> wrote: > >>>>> >>>> > >>>>> >>>> Does your environment use different network interfaces for each > of the networks? Or does it have a bond with everything on it? > >>>>> >>>> > >>>>> >>>> One issue I have seen before is that when launching instances, > there is a lot of network traffic between nodes as the hypervisor needs to > download the image from Glance. Along with various other services sending > normal network traffic, it can be enough to cause issues if everything is > running over a single 1Gbe interface. > >>>>> >>>> > >>>>> >>>> I have seen the same situation in fact when using a single > active/backup bond on 1Gbe nics. It?s worth checking the network traffic > while you try to spawn the instance to see if you?re dropping packets. In > the situation I described, there were dropped packets which resulted in a > loss of communication between nova_compute and RMQ, so the node appeared > offline. You should also confirm that nova_compute is being disconnected in > the nova_compute logs if you tail them on the Hypervisor while spawning the > instance. > >>>>> >>>> > >>>>> >>>> In my case, changing from active/backup to LACP helped. So, > based on that experience, from my perspective, is certainly sounds like > some kind of network issue. > >>>>> >>>> > >>>>> >>>> Regards, > >>>>> >>>> > >>>>> >>>> Brendan Shephard > >>>>> >>>> Senior Software Engineer > >>>>> >>>> Red Hat Australia > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: > >>>>> >>>> > >>>>> >>>> Hi, > >>>>> >>>> > >>>>> >>>> I tried to help someone with a similar issue some time ago in > this thread: > >>>>> >>>> > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > >>>>> >>>> > >>>>> >>>> But apparently a neutron reinstallation fixed it for that user, > not sure if that could apply here. But is it possible that your nova and > neutron versions are different between central and edge site? Have you > restarted nova and neutron services on the compute nodes after > installation? Have you debug logs of nova-conductor and maybe nova-compute? > Maybe they can help narrow down the issue. > >>>>> >>>> If there isn't any additional information in the debug logs I > probably would start "tearing down" rabbitmq. I didn't have to do that in a > production system yet so be careful. I can think of two routes: > >>>>> >>>> > >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, > this will most likely impact client IO depending on your load. Check out > the rabbitmqctl commands. > >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from > all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. > >>>>> >>>> > >>>>> >>>> I can imagine that the failed reply "survives" while being > replicated across the rabbit nodes. But I don't really know the rabbit > internals too well, so maybe someone else can chime in here and give a > better advice. > >>>>> >>>> > >>>>> >>>> Regards, > >>>>> >>>> Eugen > >>>>> >>>> > >>>>> >>>> Zitat von Swogat Pradhan : > >>>>> >>>> > >>>>> >>>> Hi, > >>>>> >>>> Can someone please help me out on this issue? > >>>>> >>>> > >>>>> >>>> With regards, > >>>>> >>>> Swogat Pradhan > >>>>> >>>> > >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>>> >>>> wrote: > >>>>> >>>> > >>>>> >>>> Hi > >>>>> >>>> I don't see any major packet loss. > >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due > to packet > >>>>> >>>> loss. > >>>>> >>>> > >>>>> >>>> with regards, > >>>>> >>>> Swogat Pradhan > >>>>> >>>> > >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>>> >>>> wrote: > >>>>> >>>> > >>>>> >>>> Hi, > >>>>> >>>> Yes the MTU is the same as the default '1500'. > >>>>> >>>> Generally I haven't seen any packet loss, but never checked when > >>>>> >>>> launching the instance. > >>>>> >>>> I will check that and come back. > >>>>> >>>> But everytime i launch an instance the instance gets stuck at > spawning > >>>>> >>>> state and there the hypervisor becomes down, so not sure if > packet loss > >>>>> >>>> causes this. > >>>>> >>>> > >>>>> >>>> With regards, > >>>>> >>>> Swogat pradhan > >>>>> >>>> > >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block > wrote: > >>>>> >>>> > >>>>> >>>> One more thing coming to mind is MTU size. Are they identical > between > >>>>> >>>> central and edge site? Do you see packet loss through the > tunnel? > >>>>> >>>> > >>>>> >>>> Zitat von Swogat Pradhan : > >>>>> >>>> > >>>>> >>>> > Hi Eugen, > >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as > i am not > >>>>> >>>> > getting email's from you. > >>>>> >>>> > Coming to the issue: > >>>>> >>>> > > >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl > list_policies -p > >>>>> >>>> / > >>>>> >>>> > Listing policies for vhost "/" ... > >>>>> >>>> > vhost name pattern apply-to definition > priority > >>>>> >>>> > / ha-all ^(?!amq\.).* queues > >>>>> >>>> > > >>>>> >>>> > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > >>>>> >>>> > > >>>>> >>>> > I have the edge site compute nodes up, it only goes down when > i am > >>>>> >>>> trying > >>>>> >>>> > to launch an instance and the instance comes to a spawning > state and > >>>>> >>>> then > >>>>> >>>> > gets stuck. > >>>>> >>>> > > >>>>> >>>> > I have a tunnel setup between the central and the edge sites. > >>>>> >>>> > > >>>>> >>>> > With regards, > >>>>> >>>> > Swogat Pradhan > >>>>> >>>> > > >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > >>>>> >>>> swogatpradhan22 at gmail.com> > >>>>> >>>> > wrote: > >>>>> >>>> > > >>>>> >>>> >> Hi Eugen, > >>>>> >>>> >> For some reason i am not getting your email to me directly, > i am > >>>>> >>>> checking > >>>>> >>>> >> the email digest and there i am able to find your reply. > >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq > >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. > >>>>> >>>> >> > >>>>> >>>> >> *Note: i am able to create vm's and perform other activities > in the > >>>>> >>>> >> central site, only facing this issue in the edge site.* > >>>>> >>>> >> > >>>>> >>>> >> With regards, > >>>>> >>>> >> Swogat Pradhan > >>>>> >>>> >> > >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > >>>>> >>>> swogatpradhan22 at gmail.com> > >>>>> >>>> >> wrote: > >>>>> >>>> >> > >>>>> >>>> >>> Hi Eugen, > >>>>> >>>> >>> Thanks for your response. > >>>>> >>>> >>> I have actually a 4 controller setup so here are the > details: > >>>>> >>>> >>> > >>>>> >>>> >>> *PCS Status:* > >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ > >>>>> >>>> >>> > 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): > >>>>> >>>> Started > >>>>> >>>> >>> overcloud-controller-no-ceph-3 > >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): > >>>>> >>>> Started > >>>>> >>>> >>> overcloud-controller-2 > >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): > >>>>> >>>> Started > >>>>> >>>> >>> overcloud-controller-1 > >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): > >>>>> >>>> Started > >>>>> >>>> >>> overcloud-controller-0 > >>>>> >>>> >>> > >>>>> >>>> >>> I have tried restarting the bundle multiple times but the > issue is > >>>>> >>>> still > >>>>> >>>> >>> present. > >>>>> >>>> >>> > >>>>> >>>> >>> *Cluster status:* > >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status > >>>>> >>>> >>> Cluster status of node > >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... > >>>>> >>>> >>> Basics > >>>>> >>>> >>> > >>>>> >>>> >>> Cluster name: > rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>>>> >>>> >>> > >>>>> >>>> >>> Disk Nodes > >>>>> >>>> >>> > >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>>> >>>> >>> > >>>>> >>>> >>> Running Nodes > >>>>> >>>> >>> > >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>>> >>>> >>> > >>>>> >>>> >>> Versions > >>>>> >>>> >>> > >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: > RabbitMQ > >>>>> >>>> 3.8.3 > >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: > RabbitMQ > >>>>> >>>> 3.8.3 > >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: > RabbitMQ > >>>>> >>>> 3.8.3 > >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > >>>>> >>>> RabbitMQ > >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 > >>>>> >>>> >>> > >>>>> >>>> >>> Alarms > >>>>> >>>> >>> > >>>>> >>>> >>> (none) > >>>>> >>>> >>> > >>>>> >>>> >>> Network Partitions > >>>>> >>>> >>> > >>>>> >>>> >>> (none) > >>>>> >>>> >>> > >>>>> >>>> >>> Listeners > >>>>> >>>> >>> > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>>> >>>> tool > >>>>> >>>> >>> communication > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>>>> >>>> >>> and AMQP 1.0 > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>>> >>>> tool > >>>>> >>>> >>> communication > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>>>> >>>> >>> and AMQP 1.0 > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>>> >>>> tool > >>>>> >>>> >>> communication > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP > 0-9-1 > >>>>> >>>> >>> and AMQP 1.0 > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>>> >>>> interface: > >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>>> >>>> , > >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: > >>>>> >>>> inter-node and > >>>>> >>>> >>> CLI tool communication > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>>> >>>> , > >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, > purpose: AMQP > >>>>> >>>> 0-9-1 > >>>>> >>>> >>> and AMQP 1.0 > >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>>> >>>> , > >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP > API > >>>>> >>>> >>> > >>>>> >>>> >>> Feature flags > >>>>> >>>> >>> > >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled > >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled > >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled > >>>>> >>>> >>> Flag: quorum_queue, state: enabled > >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled > >>>>> >>>> >>> > >>>>> >>>> >>> *Logs:* > >>>>> >>>> >>> *(Attached)* > >>>>> >>>> >>> > >>>>> >>>> >>> With regards, > >>>>> >>>> >>> Swogat Pradhan > >>>>> >>>> >>> > >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > >>>>> >>>> swogatpradhan22 at gmail.com> > >>>>> >>>> >>> wrote: > >>>>> >>>> >>> > >>>>> >>>> >>>> Hi, > >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. > >>>>> >>>> >>>> > >>>>> >>>> >>>> nova-conuctor: > >>>>> >>>> >>>> > >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop > reply to > >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b > >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop > reply to > >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa > >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop > reply to > >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR > oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The > reply > >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 > seconds > >>>>> >>>> due to a > >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). > >>>>> >>>> Abandoning...: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop > reply to > >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR > oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The > reply > >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 > seconds > >>>>> >>>> due to a > >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>>> >>>> Abandoning...: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop > reply to > >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR > oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The > reply > >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 > seconds > >>>>> >>>> due to a > >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>>> >>>> Abandoning...: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache > enabled > >>>>> >>>> with > >>>>> >>>> >>>> backend dogpile.cache.null. > >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING > >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop > reply to > >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR > oslo_messaging._drivers.amqpdriver > >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The > reply > >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 > seconds > >>>>> >>>> due to a > >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). > >>>>> >>>> Abandoning...: > >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>>> >>>> >>>> > >>>>> >>>> >>>> With regards, > >>>>> >>>> >>>> Swogat Pradhan > >>>>> >>>> >>>> > >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: > >>>>> >>>> >>>> > >>>>> >>>> >>>>> Hi, > >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am > trying to > >>>>> >>>> >>>>> launch vm's. > >>>>> >>>> >>>>> When the VM is in spawning state the node goes down > (openstack > >>>>> >>>> compute > >>>>> >>>> >>>>> service list), the node comes backup when i restart the > nova > >>>>> >>>> compute > >>>>> >>>> >>>>> service but then the launch of the vm fails. > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> nova-compute.log > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] > Running > >>>>> >>>> >>>>> instance usage > >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 > 07:00:00 > >>>>> >>>> to > >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. > >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on > node > >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com > >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied > device > >>>>> >>>> name: > >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names > >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume > >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache > enabled > >>>>> >>>> with > >>>>> >>>> >>>>> backend dogpile.cache.null. > >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Running > >>>>> >>>> >>>>> privsep helper: > >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > >>>>> >>>> 'privsep-helper', > >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', > >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Spawned new > >>>>> >>>> privsep > >>>>> >>>> >>>>> daemon via rootwrap > >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] > privsep > >>>>> >>>> >>>>> daemon starting > >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] > privsep > >>>>> >>>> >>>>> process running with uid/gid: 0/0 > >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] > privsep > >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): > >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] > privsep > >>>>> >>>> >>>>> daemon running as pid 2647 > >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING > >>>>> >>>> os_brick.initiator.connectors.nvmeof > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Process > >>>>> >>>> >>>>> execution error > >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. > >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value > >>>>> >>>> >>>>> Exit code: 2 > >>>>> >>>> >>>>> Stdout: '' > >>>>> >>>> >>>>> Stderr: '': > oslo_concurrency.processutils.ProcessExecutionError: > >>>>> >>>> >>>>> Unexpected error while running command. > >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver > >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> Is there a way to solve this issue? > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> With regards, > >>>>> >>>> >>>>> > >>>>> >>>> >>>>> Swogat Pradhan > >>>>> >>>> >>>>> > >>>>> >>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bence.romsics at gmail.com Tue Mar 21 14:05:03 2023 From: bence.romsics at gmail.com (Bence Romsics) Date: Tue, 21 Mar 2023 15:05:03 +0100 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: References: Message-ID: Hi, Thanks for all the answers! I went back to ask what our users are using this for. At the moment I'm not sure what they do is really supported. But you tell me. To me it makes some sense. Basically they have an additional and unusual compute host recovery process, where a compute host after a failure is brought back by the same name. Then they rebuild the servers on the same compute host where the servers were running before. When the server's disk was backed by a volume, so its content was not lost by the compute host failure, they don't want to lose it either in the recovery process. The evacute operation clearly would be a better fit to do this, but that disallows evacuating to the "same" host. For a long time rebuild just allowed "evacuating to the same host". So they went with it. At the moment I did not find a prohibition in the documentation to bring back a failed compute host by the same name. If I missed it or this is not recommended for any reason, please let me know. Clearly in many clouds evacuating can fully replace what they do here. I believe they may have chosen this unusual compute host recovery option to have some kind of recovery process for very small deployments, where you don't always have space to evacuate before you rebuilt the failed compute host. And this collided with a deployment system which reuses host names. At this point I'm not sure if this really belongs to the rebuild operation. Could easily be better addressed in evacuate. Or in the deployment system not reusing hostnames. Please let me know what you think! Thanks in advance, Bence From dms at danplanet.com Tue Mar 21 14:56:43 2023 From: dms at danplanet.com (Dan Smith) Date: Tue, 21 Mar 2023 07:56:43 -0700 Subject: [nova][cinder] future of rebuild without reimaging In-Reply-To: (Bence Romsics's message of "Tue, 21 Mar 2023 15:05:03 +0100") References: Message-ID: > Basically they have an additional and unusual compute host recovery > process, where a compute host after a failure is brought back by the > same name. Then they rebuild the servers on the same compute host > where the servers were running before. When the server's disk was > backed by a volume, so its content was not lost by the compute host > failure, they don't want to lose it either in the recovery process. > The evacute operation clearly would be a better fit to do this, but > that disallows evacuating to the "same" host. For a long time rebuild > just allowed "evacuating to the same host". So they went with it. Aside from the "should this be possible" question, is rebuild even required in this case? For the non-volume-backed instances, we need rebuild to re-download the image and create the root disk. If it's really required for volume-backed instances, I'm guessing there's just some trivial amount of state that isn't in place on recovery that the rebuild "solves". It is indeed a very odd fringe use-case that is an obvious mis-use of the function. > At the moment I did not find a prohibition in the documentation to > bring back a failed compute host by the same name. If I missed it or > this is not recommended for any reason, please let me know. I'm not sure why this would be specifically documented, but since compute nodes are not fully stateless, your scenario is basically "delete part of the state of the system and expect things to keep working" which I don't think is reasonable (nor something we should need to document). Your scenario is basically the same as one where your /var/lib/nova is mounted on a disk that doesn't come up after reboot, or on NFS that was unavailable at boot. If nova were to say "meh, a bunch of state disappeared, I must be a rebuilt compute host" then it would potentially destroy (or desynchronize) actual state in other nodes (i.e. the database) for a transient/accidental situation. TBH, we might should even explicitly *block* rebuild on an instance that appears to be missing its on-disk state to avoid users, who don't know the state of the infra, from doing this to try to unblock their instances while ops are doing maintenance. I will point out that bringing back a compute node under the same name (without cleaning the residue first) is strikingly similar to renaming a compute host, which we do *not* support. As of Antelope, the compute node would detect your scenario as a potential rename and refuse to start, again because of state that has been lost in the system. So just FYI that an actual blocker to your scenario is coming :) > Clearly in many clouds evacuating can fully replace what they do here. > I believe they may have chosen this unusual compute host recovery > option to have some kind of recovery process for very small > deployments, where you don't always have space to evacuate before you > rebuilt the failed compute host. And this collided with a deployment > system which reuses host names. > > At this point I'm not sure if this really belongs to the rebuild > operation. Could easily be better addressed in evacuate. Or in the > deployment system not reusing hostnames. Evacuate can't work for this case either because it requires the compute node to be down to perform. As you note, bringing it back under a different name would solve that problem. However, neither "evacuate to same host" or "use rebuild for this recovery procedure" are reasonable, IMHO. --Dan From knikolla at bu.edu Tue Mar 21 15:42:22 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Tue, 21 Mar 2023 15:42:22 +0000 Subject: [tc][all] Technical Committee next weekly meeting on March 22 at 1600 UTC Message-ID: <70D4500C-29CA-48F6-890A-22A294CBE5D0@bu.edu> Hi all, This is a reminder that the next weekly Technical Committee meeting is to be held tomorrow (March 22) at 1600 UTC on #openstack-tc on OFTC IRC. The meeting will be chaired by Kristi Nikolla. A copy of the agenda can be found below. Items can still be proposed by editing the wiki page at https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting * Deciding on meeting time * Gate health check * 2023.2 cycle Leaderless projects ** https://etherpad.opendev.org/p/2023.2-leaderless * Virtual PTG Planning ** March 27-31, 2023, there's the Virtual PTG. ** https://etherpad.opendev.org/p/tc-2023-2-ptg * TC 2023.1 tracker status checks ** https://etherpad.opendev.org/p/tc-2023.1-tracker * Cleanup of PyPI maintainer list for OpenStack Projects ** Etherpad for audit and cleanup of additional PyPi maintainers *** https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup ** ML discussion *** https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html * Recurring tasks check ** Bare 'recheck' state *** https://etherpad.opendev.org/p/recheck-weekly-summary * Open Reviews ** https://review.opendev.org/q/projects:openstack/governance+is:open There are no noted absences. Thank you, Kristi Nikolla From hiromu.asahina.az at hco.ntt.co.jp Tue Mar 21 16:00:06 2023 From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina) Date: Wed, 22 Mar 2023 01:00:06 +0900 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> Message-ID: I apologize that I couldn't reply before the Ironic meeting on Monday. I need one slot to discuss this topic. I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, 27)[1,2] works for them. Does this work for Ironic? I understand not all Ironic members will join this discussion, so I hope we can arrange a convenient date for you two at least and, hopefully, for those interested in this topic. [1] https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z [2] https://ptg.opendev.org/ptg.html Thanks, Hiromu Asahina On 2023/03/17 23:29, Julia Kreger wrote: > I'm not sure how many Ironic contributors would be the ones to attend a > discussion, in part because this is disjointed from the items they need to > focus on. It is much more of a "big picture" item for those of us who are > leaders in the project. > > I think it would help to understand how much time you expect the discussion > to take to determine a path forward and how we can collaborate. Ironic has > a huge number of topics we want to discuss during the PTG, and I suspect > our team meeting on Monday next week should yield more interest/awareness > as well as an amount of time for each topic which will aid us in scheduling. > > If you can let us know how long, then I think we can figure out when the > best day/time will be. > > Thanks! > > -Julia > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > >> Thank you for your reply. >> >> I'd like to decide the time slot for this topic. >> I just checked PTG schedule [1]. >> >> We have the following time slots. Which one is convenient to gether? >> (I didn't get reply but I listed Barbican, as its cores are almost the >> same as Keystone) >> >> Mon, 27: >> >> - 14 (keystone) >> - 15 (keystone) >> >> Tue, 28 >> >> - 13 (barbican) >> - 14 (keystone, ironic) >> - 15 (keysonte, ironic) >> - 16 (ironic) >> >> Wed, 29 >> >> - 13 (ironic) >> - 14 (keystone, ironic) >> - 15 (keystone, ironic) >> - 21 (ironic) >> >> Thanks, >> >> [1] https://ptg.opendev.org/ptg.html >> >> Hiromu Asahina >> >> >> On 2023/02/11 1:41, Jay Faulkner wrote: >>> I think it's safe to say the Ironic community would be very invested in >>> such an effort. Let's make sure the time chosen for vPTG with this is >> such >>> that Ironic contributors can attend as well. >>> >>> Thanks, >>> Jay Faulkner >>> Ironic PTL >>> >>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < >>> hiromu.asahina.az at hco.ntt.co.jp> wrote: >>> >>>> Hello Everyone, >>>> >>>> Recently, Tacker and Keystone have been working together on a new >> Keystone >>>> Middleware that can work with external authentication >>>> services, such as Keycloak. The code has already been submitted [1], but >>>> we want to make this middleware a generic plugin that works >>>> with as many OpenStack services as possible. To that end, we would like >> to >>>> hear from other projects with similar use cases >>>> (especially Ironic and Barbican, which run as standalone services). We >>>> will make a time slot to discuss this topic at the next vPTG. >>>> Please contact me if you are interested and available to participate. >>>> >>>> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 >>>> >>>> -- >>>> Hiromu Asahina >>>> >>>> >>>> >>>> >>> >> >> -- >> ?-------------------------------------? >> NTT Network Innovation Center >> Hiromu Asahina >> ------------------------------------- >> 3-9-11, Midori-cho, Musashino-shi >> Tokyo 180-8585, Japan >> Phone: +81-422-59-7008 >> Email: hiromu.asahina.az at hco.ntt.co.jp >> ?-------------------------------------? >> >> > -- ?-------------------------------------? NTT Network Innovation Center Hiromu Asahina ------------------------------------- 3-9-11, Midori-cho, Musashino-shi Tokyo 180-8585, Japan ? Phone: +81-422-59-7008 ? Email: hiromu.asahina.az at hco.ntt.co.jp ?-------------------------------------? From jay at gr-oss.io Tue Mar 21 16:03:51 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 21 Mar 2023 09:03:51 -0700 Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra maintainers In-Reply-To: References: Message-ID: Thanks to those who have already taken action! Fifty extra maintainers have already been removed, with around three hundred to go. Please reach out to me if you're having trouble finding current email addresses for anyone, or having trouble with the process at all. Thanks, Jay Faulkner TC Vice-Chair On Thu, Mar 16, 2023 at 3:22?PM Jay Faulkner wrote: > Hi PTLs, > > The TC recently voted[1] to require humans be removed from PyPI access for > OpenStack-managed projects. This helps ensure all releases are created via > releases team tooling and makes it less likely for a user account > compromise to impact OpenStack packages. > > Many projects have already updated > https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33 > with a list of packages that contain extra maintainers. We'd like to > request that PTLs, or their designate, reach out to any extra maintainers > listed for projects you are responsible for and request they remove their > access in accordance with policy. An example email, and detailed steps to > follow have been provided at > https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template > . > > Thank you for your cooperation as we work to improve our security posture > and harden against supply chain attacks. > > Thank you, > Jay Faulkner > TC Vice-Chair > > 1: > https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Tue Mar 21 16:05:58 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Tue, 21 Mar 2023 09:05:58 -0700 Subject: [ironic] No meeting Monday 3/27/2023 Message-ID: Hello, Monday March 27, 2023 is when the vPTG is set to begin. I'm cancelling the Ironic weekly meeting to ensure any Ironic contributors can participate in sessions occurring Monday. Thanks! Jay Faulkner Ironic PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Tue Mar 21 16:06:46 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 21 Mar 2023 17:06:46 +0100 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: References: <3840757.STTH5IQzZg@p1> Message-ID: Hello: I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly. The discussion here could be the default behaviour for standard services: * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that. * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance. * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2] Regards. [1]https://review.opendev.org/c/openstack/neutron/+/877049 [2]https://review.opendev.org/c/openstack/neutron/+/876659 On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka wrote: > On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski > wrote: > > > > Hi, > > > > > > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: > > > > > Hi all, > > > > > > > > > > (I've tagged the thread with [ovn] because this question was raised in > > > > > the context of OVN, but it really is about the intent of neutron > > > > > stateless SG API.) > > > > > > > > > > Neutron API supports 'stateless' field for security groups: > > > > > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group > > > > > > > > > > The API reference doesn't explain the intent of the API, merely > > > > > walking through the field mechanics, as in > > > > > > > > > > "The stateful security group extension (stateful-security-group) adds > > > > > the stateful field to security groups, allowing users to configure > > > > > stateful or stateless security groups for ports. The existing security > > > > > groups will all be considered as stateful. Update of the stateful > > > > > attribute is allowed when there is no port associated with the > > > > > security group." > > > > > > > > > > The meaning of the API is left for users to deduce. It's customary > > > > > understood as something like > > > > > > > > > > "allowing to bypass connection tracking in the firewall, potentially > > > > > providing performance and simplicity benefits" (while imposing > > > > > additional complexity onto rule definitions - the user now has to > > > > > explicitly define rules for both directions of a duplex connection.) > > > > > [This is not an official definition, nor it's quoted from a respected > > > > > source, please don't criticize it. I don't think this is an important > > > > > point here.] > > > > > > > > > > Either way, the definition doesn't explain what should happen with > > > > > basic network services that a user of Neutron SG API is used to rely > > > > > on. Specifically, what happens for a port related to a stateless SG > > > > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 > > > > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 > > > > > procedure to configure its IPv6 stack. > > > > > > > > > > As part of our testing of stateless SG implementation for OVN backend, > > > > > we've noticed that VMs fail to configure via metadata, or use SLAAC to > > > > > configure IPv6. > > > > > > > > > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 > > > > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 > > > > > > > > > > We've noticed that adding explicit SG rules to allow 'returning' > > > > > communication for 169.254.169.254:80 and RA / NA fixes the problem. > > > > > > > > > > I figured that these services are "base" / "basic" and should be > > > > > provided to ports regardless of the stateful-ness of SG. I proposed > > > > > patches for this here: > > > > > > > > > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 > > > > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 > > > > > > > > > > Discussion in the patch that adjusts the existing stateless SG test > > > > > scenarios to not create explicit SG rules for metadata and ICMP > > > > > replies suggests that it's not a given / common understanding that > > > > > these "base" services should work by default for stateless SGs. > > > > > > > > > > See discussion in comments here: > > > > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 > > > > > > > > > > While this discussion is happening in the context of OVN, I think it > > > > > should be resolved in a broader context. Specifically, a decision > > > > > should be made about what Neutron API "means" by stateless SGs, and > > > > > how "base" services are supposed to behave. Then backends can act > > > > > accordingly. > > > > > > > > > > There's also an open question of how this should be implemented. > > > > > Whether Neutron would like to create explicit SG rules visible in API > > > > > that would allow for the returning traffic and that could be deleted > > > > > as needed, or whether backends should do it implicitly. We already > > > > > have "default" egress rules, so there's a precedent here. On the other > > > > > hand, the egress rules are broad (allowing everything) and there's > > > > > more rationale to delete them and replace them with tighter filters. > > > > > In my OVN series, I implement ACLs directly in OVN database, without > > > > > creating SG rules in Neutron API. > > > > > > > > > > So, questions for the community to clarify: > > > > > - whether Neutron API should define behavior of stateless SGs in > general, > > > > > - if so, whether Neutron API should also define behavior of stateless > > > > > SGs in terms of "base" services like metadata and DHCP, > > > > > - if so, whether backends should implement the necessary filters > > > > > themselves, or Neutron will create default SG rules itself. > > > > > > I think that we should be transparent and if we need any SG rules like > that to allow some traffic, those rules should be be added in visible way > for user. > > > > We also have in progress RFE > https://bugs.launchpad.net/neutron/+bug/1983053 which may help > administrators to define set of default SG rules which will be in each new > SG. So if we will now make those additional ACLs to be visible as SG rules > in SG it may be later easier to customize it. > > > > If we will hard code ACLs to allow ingress traffic from metadata server > or RA/NA packets there will be IMO inconsistency in behaviour between > stateful and stateless SGs as for stateful user will be able to disallow > traffic between vm and metadata service (probably there's no real use case > for that but it's possible) and for stateless it will not be possible as > ingress rules will be always there. Also use who knows how stateless SG > works may even treat it as bug as from Neutron API PoV this traffic to/from > metadata server would work as stateful - there would be rule to allow > egress traffic but what actually allows ingress response there? > > > > Thanks for clarifying the rationale on picking SG rules and not > per-backend implementation. > > What would be your answer to the two other questions in the list > above, specifically, "whether Neutron API should define behavior of > stateless SGs in general" and "whether Neutron API should define > behavior of stateless SGs in relation to metadata / RA / NA". Once we > have agreement on these points, we can discuss the exact mechanism - > whether to implement in backend or in API. But these two questions are > first order in my view. > > (To give an idea of my thinking, I believe API definition should not > only define fields and their mechanics but also semantics, so > > - yes, api-ref should define the meaning ("behavior") of stateless SG > in general, and > - yes, api-ref should also define the meaning ("behavior") of > stateless SG in relation to "standard" services like ipv6 addressing > or metadata. > > As to the last question - whether it's up to ml2 backend to implement > the behavior, or up to the core SG database plugin - I don't have a > strong opinion. I lean to "backend" solution just because it allows > for more granular definition because SG rules may not express some > filter rules, e.g. source port for metadata replies (an unfortunate > limitation of SG API that we inherited from AWS?). But perhaps others > prefer paying the price for having neutron ml2 plugin enforcing the > behavior consistently across all backends. > > > > > > > > > > > I hope I laid the problem out clearly, let me know if anything needs > > > > > clarification or explanation. > > > > > > Yes :) At least for me. > > > > > > > > > > > > Yours, > > > > > Ihar > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Slawek Kaplonski > > > > Principal Software Engineer > > > > Red Hat > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Tue Mar 21 16:29:47 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Tue, 21 Mar 2023 09:29:47 -0700 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> Message-ID: No worries! I think that time works for me. I'm not sure it will work for everyone, but I can proxy information back to the whole of the ironic project as we also have the question of this functionality listed for our Operator Hour in order to help ironic gauge interest. -Julia On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < hiromu.asahina.az at hco.ntt.co.jp> wrote: > I apologize that I couldn't reply before the Ironic meeting on Monday. > > I need one slot to discuss this topic. > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, > 27)[1,2] works for them. Does this work for Ironic? I understand not all > Ironic members will join this discussion, so I hope we can arrange a > convenient date for you two at least and, hopefully, for those > interested in this topic. > > [1] > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z > [2] https://ptg.opendev.org/ptg.html > > Thanks, > Hiromu Asahina > > On 2023/03/17 23:29, Julia Kreger wrote: > > I'm not sure how many Ironic contributors would be the ones to attend a > > discussion, in part because this is disjointed from the items they need > to > > focus on. It is much more of a "big picture" item for those of us who are > > leaders in the project. > > > > I think it would help to understand how much time you expect the > discussion > > to take to determine a path forward and how we can collaborate. Ironic > has > > a huge number of topics we want to discuss during the PTG, and I suspect > > our team meeting on Monday next week should yield more interest/awareness > > as well as an amount of time for each topic which will aid us in > scheduling. > > > > If you can let us know how long, then I think we can figure out when the > > best day/time will be. > > > > Thanks! > > > > -Julia > > > > > > > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > >> Thank you for your reply. > >> > >> I'd like to decide the time slot for this topic. > >> I just checked PTG schedule [1]. > >> > >> We have the following time slots. Which one is convenient to gether? > >> (I didn't get reply but I listed Barbican, as its cores are almost the > >> same as Keystone) > >> > >> Mon, 27: > >> > >> - 14 (keystone) > >> - 15 (keystone) > >> > >> Tue, 28 > >> > >> - 13 (barbican) > >> - 14 (keystone, ironic) > >> - 15 (keysonte, ironic) > >> - 16 (ironic) > >> > >> Wed, 29 > >> > >> - 13 (ironic) > >> - 14 (keystone, ironic) > >> - 15 (keystone, ironic) > >> - 21 (ironic) > >> > >> Thanks, > >> > >> [1] https://ptg.opendev.org/ptg.html > >> > >> Hiromu Asahina > >> > >> > >> On 2023/02/11 1:41, Jay Faulkner wrote: > >>> I think it's safe to say the Ironic community would be very invested in > >>> such an effort. Let's make sure the time chosen for vPTG with this is > >> such > >>> that Ironic contributors can attend as well. > >>> > >>> Thanks, > >>> Jay Faulkner > >>> Ironic PTL > >>> > >>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > >>> hiromu.asahina.az at hco.ntt.co.jp> wrote: > >>> > >>>> Hello Everyone, > >>>> > >>>> Recently, Tacker and Keystone have been working together on a new > >> Keystone > >>>> Middleware that can work with external authentication > >>>> services, such as Keycloak. The code has already been submitted [1], > but > >>>> we want to make this middleware a generic plugin that works > >>>> with as many OpenStack services as possible. To that end, we would > like > >> to > >>>> hear from other projects with similar use cases > >>>> (especially Ironic and Barbican, which run as standalone services). We > >>>> will make a time slot to discuss this topic at the next vPTG. > >>>> Please contact me if you are interested and available to participate. > >>>> > >>>> [1] > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 > >>>> > >>>> -- > >>>> Hiromu Asahina > >>>> > >>>> > >>>> > >>>> > >>> > >> > >> -- > >> ?-------------------------------------? > >> NTT Network Innovation Center > >> Hiromu Asahina > >> ------------------------------------- > >> 3-9-11, Midori-cho, Musashino-shi > >> Tokyo 180-8585, Japan > >> Phone: +81-422-59-7008 > >> Email: hiromu.asahina.az at hco.ntt.co.jp > >> ?-------------------------------------? > >> > >> > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > Phone: +81-422-59-7008 > Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Tue Mar 21 17:10:33 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 21 Mar 2023 18:10:33 +0100 Subject: [vptg][ptg][openstack-ansible] Bobcat virtual Project Team Gathering and Operator Hours Message-ID: Hi everyone! I'm happy to inform you that the OpenStack-Ansible team is going to have a virtual PTG next Tuesday, on March 28 from 15:00 till 18:00 UTC in Kilo room [1]. Everyone who is interested in participating in further development or regarding project plans for the next releases are warmly welcome to join us. We're also continuing the tradition to have a project Operator Hours. So all operators or folks who are wondering about OpenStack-Ansible concept, designs or just want to share their experience with the project are warmly welcome to join us on Wednesday, March 29 from 17:00 till 18:00 UTC in Havana room [2] So add dates to your calendar and hope seeing/hearing everyone next week! [1] PTG room: https://www.openstack.org/ptg/rooms/kilo [2] Operator hours room: https://www.openstack.org/ptg/rooms/havana From gmann at ghanshyammann.com Tue Mar 21 17:36:44 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 21 Mar 2023 10:36:44 -0700 Subject: [ptl][tc][ops][ptg] Operator + Developers interaction (operator-hours) slots in 2023.2 Bobcat PTG In-Reply-To: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> Message-ID: <187053ea474.d7018e84897536.6821147710012863509@ghanshyammann.com> Hello Everyone, This is a gentle reminder to book your project's operator-hours asap. Till now we have only 5 projects booked it, -gmann ---- On Fri, 17 Mar 2023 14:19:22 -0700 Ghanshyam Mann wrote --- > Hello Everyone/PTL, > > To improve the interaction/feedback between operators and developers, one of the efforts is to schedule > the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, which had mixed > productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in March > vPTG also. > > TC will not book the placeholder this time so that slots can be booked in the project room itself, and operators > can join developers to have a joint discussion. But at the same time, we need to avoid slot conflict for operators. > Every project needs to make sure its 'operator-hour' does not overlap with the related projects (integrated projects > which might have common operators, for example. nova, cinder, neutron etc needs to avoid conflict) 'operator-hour'. > > Guidelines for the project team to book 'operator-hour' > --------------------------------------------------------------------------------------- > * Request in #openinfra-events IRC channel to register the new track 'operator-hour-'. > For example, 'operator-hour-nova' > > * Once the track is registered, find a spot in your project slots where no other project (which you think is related/integrated > project and might have common operators) has already booked their operator-hour. Accordingly, book with the newly > registered track 'operator-hour-'. For example, #operator-hour-nova book essex-WedB1 . > > * Do not book more than one slot (1 hour) so that other projects will have enough slots open to book. If more discussion is > needed on anything, it can be continued in project-specific slots. > > We request that every project book an 'operator hour' slot for operators to join your PTG session. > For any query/conflict, ping TC in #openstack-tc or #openinfra-events IRC channel. > > [1] https://etherpad.opendev.org/p/Oct2022_PTGFeedback#L32 > > -gmann > > From manchandavishal143 at gmail.com Tue Mar 21 18:00:59 2023 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Tue, 21 Mar 2023 23:30:59 +0530 Subject: [horizon] Cancelling next two weekly meetings Message-ID: Hello Team, As agreed, during the last weekly meeting, we are canceling our weekly meeting on 22nd March and 29th March. The next weekly meeting will be on 5th April. See you at the PTG! Also, Please add topics for PTG discussion [1]. Thanks & regards, Vishal Manchanda [1] https://etherpad.opendev.org/p/horizon-bobcat-ptg -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Mar 21 18:30:54 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 21 Mar 2023 13:30:54 -0500 Subject: PTG March 2023 Registration & Schedule Message-ID: Hello Everyone! The March 2023 Project Teams Gathering is right around the corner (March 27-31) and the schedule is being setup by your team leads! Slots are going fast, so make sure to get your time booked ASAP if you haven't already! You can find the schedule and available slots on the PTGbot website [1]. The PTGbot site is the during-event website to keep track of what's being discussed and any last-minute schedule changes. It is driven via commands in the #openinfra-events IRC channel (on the OFTC network) where the PTGbot listens. If you have questions about the commands that you can give the bot, check out the documentation here[2]. Also, if you haven?t connected to IRC before, here are some docs on how to get setup![3] Lastly, please don't forget to register[4] (it is free after all!). Please let us know if you have any questions via email to ptg at openinfra.dev. Thanks! -Kendall (diablo_rojo) [1] PTGbot Site: https://ptg.opendev.org/ptg.html [2] PTGbot Documentation: https://github.com/openstack/ptgbot#open-infrastructure-ptg-bot [3] Setup IRC: https://docs.openstack.org/contributors/common/irc.html [4] PTG Registration: https://openinfra-ptg.eventbrite.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From felipe.reyes at canonical.com Tue Mar 21 21:40:42 2023 From: felipe.reyes at canonical.com (Felipe Reyes) Date: Tue, 21 Mar 2023 18:40:42 -0300 Subject: [ptl][tc][ops][ptg] Operator + Developers interaction (operator-hours) slots in 2023.2 Bobcat PTG In-Reply-To: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> Message-ID: <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com> Hi Ghanshyam, On Fri, 2023-03-17 at 14:19 -0700, Ghanshyam Mann wrote: > Hello Everyone/PTL, > > To improve the interaction/feedback between operators and developers, one of the efforts is to > schedule > the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, > which had mixed > productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in > March > vPTG also. At OpenStack-charms project we thought it was a good idea, can we get the track 'operator-hour- openstackcharms' registered? Thanks, -- Felipe Reyes Software Engineer @ Canonical felipe.reyes at canonical.com (GPG:0x9B1FFF39) Launchpad: ~freyes | IRC: freyes From nguyenhuukhoinw at gmail.com Wed Mar 22 00:07:49 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 22 Mar 2023 07:07:49 +0700 Subject: [openstack][kolla-ansible] Message-ID: Hello guys., I am using Xena(Ubuntu 20.04) and want to upgrade to zed. With my reading from https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html I will do it like that: Upgrade from Xena(20.04 container base) to Yoga(22.04 container base) then upgrade host to 22.04 >> Upgrade from Yoga to Zed.. *** Can you verify with me that we need kolla-ansible upgrade first then upgrade the host? Is that right? I just thought about mariadb, how we can verify schema upgrade and rabbitmq will crash with the new version? What will we do if something crashes like an upgrade database and rabbitmq failed? Thank you. Regards Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Mar 22 00:36:41 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Tue, 21 Mar 2023 17:36:41 -0700 Subject: [all][tc][policy] Canceling policy popup team next week meeting Message-ID: <18706bf1fab.d4504cb1909651.6328907432889493469@ghanshyammann.com> Hello Everyone, Due to vPTG weeks, I am cancelling the policy pop-up next meeting scheduled for 28th Mar. https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team#Meeting -gmann From ricolin at ricolky.com Wed Mar 22 01:58:52 2023 From: ricolin at ricolky.com (Rico Lin) Date: Wed, 22 Mar 2023 09:58:52 +0800 Subject: [magnum] Secure RBAC implememtion Message-ID: Hi magnum team I would like to make short report regarding the progress and trigger followup discussion Right now, with patch series https://review.opendev.org/c/openstack/magnum/+/874945 The patchset is a follow of tc goals: Consistent and Secure Default RBAC[1] We have now: * Implementation of Secure RBAC in project member and project reader for most APIs. And also add project scope check for APIs which is not design to run across multiple projects. * Unit test and functional test ready and passed for above features. The change of secure RBAC is currently default to false, so it will not affect on current running environments. And we should enable it in the following cycle. So what it does when not enable those configs are only provided deprecation warning. When enabled, we will requires project_reader role for perform any non-admin GET requests and project_member role for any other non-admin requests(PATCH, DELETE, POST, etc). And will also requires project scope token to allow perform those APIs. One of the patch we can discuss is to explicit set admin authorization to APIs in https://review.opendev.org/c/openstack/magnum/+/875625 This IMO, is an idea change to make sure we don't break admin operations on all APIs to avoid bugs like https://bugs.launchpad.net/neutron/+bug/1997089 , but if there are any other concerns, I would love to learn about it. The patch sets are ready, I think as we already in new developing cycle, would really like if anyone can help to review and landing them. Most of projects are already have these implementation in place, so now would be a good time for magnum to catch up with that goal. [1] https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html *Rico Lin* -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 22 09:42:24 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 22 Mar 2023 10:42:24 +0100 Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting) Message-ID: Hey folks, As a reminder, the Nova community will discuss at the vPTG. You can see the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg Our agenda will be from Tuesday to Friday, everyday between 1300UTC and 1700UTC. Connection details are in the etherpad above, but you can also use PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo room for all the discussions) You can't stick around for 4 hours x 4 days ? Heh, no worries ! If you (as an operator or a developer) want to engage with us (and we'd love this honestly), you have two possibilities : - either you prefer to listen (and talk) to some topics you've seen in the agenda, and then add your IRC nick (details how to use IRC are explained by [1]) on the topics you want. Once we start to discuss about those topics, I'll ping the courtesy ping list of each topic on #openstack-nova. Just make sure you're around in the IRC channel. - or you prefer to engage with us about some pain points or some feature requests, and then the right time is the Nova Operator Hour that will be on *Tuesday 1500UTC*. We have a specific etherpad for this session : https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you can preemptively add your thoughts or concerns. Anyway, we are eager to meet you all ! Oh, last point, given we will be at the vPTG, next week's weekly meeting on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk the #openstack-nova channel ;-) See you next week ! -Sylvain [1] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Mar 22 10:54:58 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 22 Mar 2023 11:54:58 +0100 Subject: [neutron] Neutron PTG sessions schedule Message-ID: Hello all: Please check the Neutron sessions schedule for the PTG week [1]. All sessions will take place in Juno channel [2]. This is a summary of the sessions and days: * Tuesday (13UTC - 17UTC): retrospective, releases, migrations and project deprecations. * Wednesday (13UTC - 17UTC): new RFEs and operator hour. * Thursday (13UTC - 17UTC): Nova-Neutron sessions, ovn-bgp-agent roadmap and neutron-dynamic-routing RFE. * Friday (13UTC - 17UTC): open hour for core candidates!! If you have any questions, please reply to this email or ping me in IRC (#openstack-neutron, ). Regards. [1]https://etherpad.opendev.org/p/neutron-bobcat-ptg [2]https://ptg.opendev.org/ptg.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Mar 22 10:57:41 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 22 Mar 2023 11:57:41 +0100 Subject: [nova][neutron] Nova-Neutron PTG sessions Message-ID: Hello all: The Nova-Neutron PTG sessions will take place on Thursday, from 13UTC to 15UTC, in the Juno channel. Please check the agenda [1]. We have 3 topics: * (artom) delete_on_termination for Neutron ports * (dvo-plv) Blueprint: "Add support for Napatech LinkVirt SmartNICs" review * (ralonsoh, artom): https://bugs.launchpad.net/neutron/+bug/1986003 (How to handle the duplicated port binding activate request from Nova, both in Nova and Neutron) Regards. [1]https://etherpad.opendev.org/p/neutron-bobcat-ptg -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Mar 22 11:00:28 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 22 Mar 2023 12:00:28 +0100 Subject: [neutron] Neutron meetings cancelled next week Message-ID: Hello Neutrinos: The regular Neutron meetings (team, CI and drivers) will be cancelled next week because of the PTG. Join us during the next one! Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hiromu.asahina.az at hco.ntt.co.jp Wed Mar 22 11:01:05 2023 From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina) Date: Wed, 22 Mar 2023 20:01:05 +0900 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> Message-ID: <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Thanks! I look forward to your reply. On 2023/03/22 1:29, Julia Kreger wrote: > No worries! > > I think that time works for me. I'm not sure it will work for everyone, but > I can proxy information back to the whole of the ironic project as we also > have the question of this functionality listed for our Operator Hour in > order to help ironic gauge interest. > > -Julia > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > >> I apologize that I couldn't reply before the Ironic meeting on Monday. >> >> I need one slot to discuss this topic. >> >> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, >> 27)[1,2] works for them. Does this work for Ironic? I understand not all >> Ironic members will join this discussion, so I hope we can arrange a >> convenient date for you two at least and, hopefully, for those >> interested in this topic. >> >> [1] >> >> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z >> [2] https://ptg.opendev.org/ptg.html >> >> Thanks, >> Hiromu Asahina >> >> On 2023/03/17 23:29, Julia Kreger wrote: >>> I'm not sure how many Ironic contributors would be the ones to attend a >>> discussion, in part because this is disjointed from the items they need >> to >>> focus on. It is much more of a "big picture" item for those of us who are >>> leaders in the project. >>> >>> I think it would help to understand how much time you expect the >> discussion >>> to take to determine a path forward and how we can collaborate. Ironic >> has >>> a huge number of topics we want to discuss during the PTG, and I suspect >>> our team meeting on Monday next week should yield more interest/awareness >>> as well as an amount of time for each topic which will aid us in >> scheduling. >>> >>> If you can let us know how long, then I think we can figure out when the >>> best day/time will be. >>> >>> Thanks! >>> >>> -Julia >>> >>> >>> >>> >>> >>> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < >>> hiromu.asahina.az at hco.ntt.co.jp> wrote: >>> >>>> Thank you for your reply. >>>> >>>> I'd like to decide the time slot for this topic. >>>> I just checked PTG schedule [1]. >>>> >>>> We have the following time slots. Which one is convenient to gether? >>>> (I didn't get reply but I listed Barbican, as its cores are almost the >>>> same as Keystone) >>>> >>>> Mon, 27: >>>> >>>> - 14 (keystone) >>>> - 15 (keystone) >>>> >>>> Tue, 28 >>>> >>>> - 13 (barbican) >>>> - 14 (keystone, ironic) >>>> - 15 (keysonte, ironic) >>>> - 16 (ironic) >>>> >>>> Wed, 29 >>>> >>>> - 13 (ironic) >>>> - 14 (keystone, ironic) >>>> - 15 (keystone, ironic) >>>> - 21 (ironic) >>>> >>>> Thanks, >>>> >>>> [1] https://ptg.opendev.org/ptg.html >>>> >>>> Hiromu Asahina >>>> >>>> >>>> On 2023/02/11 1:41, Jay Faulkner wrote: >>>>> I think it's safe to say the Ironic community would be very invested in >>>>> such an effort. Let's make sure the time chosen for vPTG with this is >>>> such >>>>> that Ironic contributors can attend as well. >>>>> >>>>> Thanks, >>>>> Jay Faulkner >>>>> Ironic PTL >>>>> >>>>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < >>>>> hiromu.asahina.az at hco.ntt.co.jp> wrote: >>>>> >>>>>> Hello Everyone, >>>>>> >>>>>> Recently, Tacker and Keystone have been working together on a new >>>> Keystone >>>>>> Middleware that can work with external authentication >>>>>> services, such as Keycloak. The code has already been submitted [1], >> but >>>>>> we want to make this middleware a generic plugin that works >>>>>> with as many OpenStack services as possible. To that end, we would >> like >>>> to >>>>>> hear from other projects with similar use cases >>>>>> (especially Ironic and Barbican, which run as standalone services). We >>>>>> will make a time slot to discuss this topic at the next vPTG. >>>>>> Please contact me if you are interested and available to participate. >>>>>> >>>>>> [1] >> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 >>>>>> >>>>>> -- >>>>>> Hiromu Asahina >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> ?-------------------------------------? >>>> NTT Network Innovation Center >>>> Hiromu Asahina >>>> ------------------------------------- >>>> 3-9-11, Midori-cho, Musashino-shi >>>> Tokyo 180-8585, Japan >>>> Phone: +81-422-59-7008 >>>> Email: hiromu.asahina.az at hco.ntt.co.jp >>>> ?-------------------------------------? >>>> >>>> >>> >> >> -- >> ?-------------------------------------? >> NTT Network Innovation Center >> Hiromu Asahina >> ------------------------------------- >> 3-9-11, Midori-cho, Musashino-shi >> Tokyo 180-8585, Japan >> Phone: +81-422-59-7008 >> Email: hiromu.asahina.az at hco.ntt.co.jp >> ?-------------------------------------? >> >> > -- ?-------------------------------------? NTT Network Innovation Center Hiromu Asahina ------------------------------------- 3-9-11, Midori-cho, Musashino-shi Tokyo 180-8585, Japan ? Phone: +81-422-59-7008 ? Email: hiromu.asahina.az at hco.ntt.co.jp ?-------------------------------------? From mnasiadka at gmail.com Wed Mar 22 11:09:33 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Wed, 22 Mar 2023 12:09:33 +0100 Subject: [kolla] next weekly meeting cancelled Message-ID: Hello Koalas, Next weekly meeting (29th March) is cancelled because of PTG on Mon/Tue/Thu - let?s meet there! Michal From swogatpradhan22 at gmail.com Wed Mar 22 11:25:08 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 22 Mar 2023 16:55:08 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Update: Here is the log when creating a volume using cirros image: 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s 2023-03-22 11:07:54.023 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena' category=FutureWarning) 2023-03-22 11:11:12.161 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena' category=FutureWarning) 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 MB/s 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. The image is present in dcn02 store but still it downloaded the image in 0.16 MB/s and then created the volume. With regards, Swogat Pradhan On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan wrote: > Hi Jhon, > This seems to be an issue. > When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster > parameter was specified to the respective cluster names but the config > files were created in the name of ceph.conf and keyring was > ceph.client.openstack.keyring. > > Which created issues in glance as well as the naming convention of the > files didn't match the cluster names, so i had to manually rename the > central ceph conf file as such: > > [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ > [root at dcn02-compute-0 ceph]# ll > total 16 > -rw-------. 1 root root 257 Mar 13 13:56 > ceph_central.client.openstack.keyring > -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf > -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring > -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf > [root at dcn02-compute-0 ceph]# > > ceph.conf and ceph.client.openstack.keyring contain the fsid of the > respective clusters in both dcn01 and dcn02. > In the above cli output, the ceph.conf and ceph.client... are the files > used to access dcn02 ceph cluster and ceph_central* files are used in for > accessing central ceph cluster. > > glance multistore config: > [dcn02] > rbd_store_ceph_conf=/etc/ceph/ceph.conf > rbd_store_user=openstack > rbd_store_pool=images > rbd_thin_provisioning=False > store_description=dcn02 rbd glance store > > [ceph_central] > rbd_store_ceph_conf=/etc/ceph/ceph_central.conf > rbd_store_user=openstack > rbd_store_pool=images > rbd_thin_provisioning=False > store_description=Default glance store backend. > > > With regards, > Swogat Pradhan > > On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: > >> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >> wrote: >> > >> > Hi, >> > Seems like cinder is not using the local ceph. >> >> That explains the issue. It's a misconfiguration. >> >> I hope this is not a production system since the mailing list now has >> the cinder.conf which contains passwords. >> >> The section that looks like this: >> >> [tripleo_ceph] >> volume_backend_name=tripleo_ceph >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_ceph_conf=/etc/ceph/ceph.conf >> rbd_user=openstack >> rbd_pool=volumes >> rbd_flatten_volume_from_snapshot=False >> rbd_secret_uuid= >> report_discard_supported=True >> >> Should be updated to refer to the local DCN ceph cluster and not the >> central one. Use the ceph conf file for that cluster and ensure the >> rbd_secret_uuid corresponds to that one. >> >> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >> Ceph cluster. The FSID should be in the ceph.conf file. The >> tripleo_nova_libvirt role will use virsh secret-* commands so that >> libvirt can retrieve the cephx secret using the FSID as a key. This >> can be confirmed with `podman exec nova_virtsecretd virsh >> secret-get-value $FSID`. >> >> The documentation describes how to configure the central and DCN sites >> correctly but an error seems to have occurred while you were following >> it. >> >> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >> >> John >> >> > >> > Ceph Output: >> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >> > NAME SIZE PARENT FMT PROT >> LOCK >> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >> excl >> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >> > >> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >> > NAME SIZE PARENT FMT >> PROT LOCK >> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >> > [ceph: root at dcn02-ceph-all-0 /]# >> > >> > Attached the cinder config. >> > Please let me know how I can solve this issue. >> > >> > With regards, >> > Swogat Pradhan >> > >> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >> wrote: >> >> >> >> in my last message under the line "On a DCN site if you run a command >> like this:" I suggested some steps you could try to confirm the image is a >> COW from the local glance as well as how to look at your cinder config. >> >> >> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>> >> >>> Update: >> >>> I uploaded an image directly to the dcn02 store, and it takes around >> 10,15 minutes to create a volume with image in dcn02. >> >>> The image size is 389 MB. >> >>> >> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>>> >> >>>> Hi Jhon, >> >>>> I checked in the ceph od dcn02, I can see the images created after >> importing from the central site. >> >>>> But launching an instance normally fails as it takes a long time for >> the volume to get created. >> >>>> >> >>>> When launching an instance from volume the instance is getting >> created properly without any errors. >> >>>> >> >>>> I tried to cache images in nova using >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >> but getting checksum failed error. >> >>>> >> >>>> With regards, >> >>>> Swogat Pradhan >> >>>> >> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >> wrote: >> >>>>> >> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >> >>>>> wrote: >> >>>>> > >> >>>>> > Update: After restarting the nova services on the controller and >> running the deploy script on the edge site, I was able to launch the VM >> from volume. >> >>>>> > >> >>>>> > Right now the instance creation is failing as the block device >> creation is stuck in creating state, it is taking more than 10 mins for the >> volume to be created, whereas the image has already been imported to the >> edge glance. >> >>>>> >> >>>>> Try following this document and making the same observations in your >> >>>>> environment for AZs and their local ceph cluster. >> >>>>> >> >>>>> >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >> >>>>> >> >>>>> On a DCN site if you run a command like this: >> >>>>> >> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >> >>>>> /etc/ceph/dcn0.client.admin.keyring >> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >> >>>>> NAME SIZE PARENT >> >>>>> FMT PROT LOCK >> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >> >>>>> $ >> >>>>> >> >>>>> Then, you should see the parent of the volume is the image which is >> on >> >>>>> the same local ceph cluster. >> >>>>> >> >>>>> I wonder if something is misconfigured and thus you're encountering >> >>>>> the streaming behavior described here: >> >>>>> >> >>>>> Ideally all images should reside in the central Glance and be copied >> >>>>> to DCN sites before instances of those images are booted on DCN >> sites. >> >>>>> If an image is not copied to a DCN site before it is booted, then >> the >> >>>>> image will be streamed to the DCN site and then the image will boot >> as >> >>>>> an instance. This happens because Glance at the DCN site has access >> to >> >>>>> the images store at the Central ceph cluster. Though the booting of >> >>>>> the image will take time because it has not been copied in advance, >> >>>>> this is still preferable to failing to boot the image. >> >>>>> >> >>>>> You can also exec into the cinder container at the DCN site and >> >>>>> confirm it's using it's local ceph cluster. >> >>>>> >> >>>>> John >> >>>>> >> >>>>> > >> >>>>> > I will try and create a new fresh image and test again then >> update. >> >>>>> > >> >>>>> > With regards, >> >>>>> > Swogat Pradhan >> >>>>> > >> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>>>> >> >> >>>>> >> Update: >> >>>>> >> In the hypervisor list the compute node state is showing down. >> >>>>> >> >> >>>>> >> >> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> wrote: >> >>>>> >>> >> >>>>> >>> Hi Brendan, >> >>>>> >>> Now i have deployed another site where i have used 2 linux >> bonds network template for both 3 compute nodes and 3 ceph nodes. >> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >> >>>>> >>> I used a cirros image to launch instance but the instance timed >> out so i waited for the volume to be created. >> >>>>> >>> Once the volume was created i tried launching the instance from >> the volume and still the instance is stuck in spawning state. >> >>>>> >>> >> >>>>> >>> Here is the nova-compute log: >> >>>>> >>> >> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >> privsep daemon starting >> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >> privsep process running with uid/gid: 0/0 >> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >> privsep process running with capabilities (eff/prm/inh): >> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >> privsep daemon running as pid 185437 >> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >> os_brick.initiator.connectors.nvmeof >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >> in _get_host_uuid: Unexpected error while running command. >> >>>>> >>> Command: blkid overlay -s UUID -o value >> >>>>> >>> Exit code: 2 >> >>>>> >>> Stdout: '' >> >>>>> >>> Stderr: '': >> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >> running command. >> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >> >>>>> >>> >> >>>>> >>> It is stuck in creating image, do i need to run the template >> mentioned here ?: >> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >> >>>>> >>> >> >>>>> >>> The volume is already created and i do not understand why the >> instance is stuck in spawning state. >> >>>>> >>> >> >>>>> >>> With regards, >> >>>>> >>> Swogat Pradhan >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >> bshephar at redhat.com> wrote: >> >>>>> >>>> >> >>>>> >>>> Does your environment use different network interfaces for >> each of the networks? Or does it have a bond with everything on it? >> >>>>> >>>> >> >>>>> >>>> One issue I have seen before is that when launching instances, >> there is a lot of network traffic between nodes as the hypervisor needs to >> download the image from Glance. Along with various other services sending >> normal network traffic, it can be enough to cause issues if everything is >> running over a single 1Gbe interface. >> >>>>> >>>> >> >>>>> >>>> I have seen the same situation in fact when using a single >> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >> while you try to spawn the instance to see if you?re dropping packets. In >> the situation I described, there were dropped packets which resulted in a >> loss of communication between nova_compute and RMQ, so the node appeared >> offline. You should also confirm that nova_compute is being disconnected in >> the nova_compute logs if you tail them on the Hypervisor while spawning the >> instance. >> >>>>> >>>> >> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, >> based on that experience, from my perspective, is certainly sounds like >> some kind of network issue. >> >>>>> >>>> >> >>>>> >>>> Regards, >> >>>>> >>>> >> >>>>> >>>> Brendan Shephard >> >>>>> >>>> Senior Software Engineer >> >>>>> >>>> Red Hat Australia >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >> >>>>> >>>> >> >>>>> >>>> Hi, >> >>>>> >>>> >> >>>>> >>>> I tried to help someone with a similar issue some time ago in >> this thread: >> >>>>> >>>> >> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >> >>>>> >>>> >> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >> user, not sure if that could apply here. But is it possible that your nova >> and neutron versions are different between central and edge site? Have you >> restarted nova and neutron services on the compute nodes after >> installation? Have you debug logs of nova-conductor and maybe nova-compute? >> Maybe they can help narrow down the issue. >> >>>>> >>>> If there isn't any additional information in the debug logs I >> probably would start "tearing down" rabbitmq. I didn't have to do that in a >> production system yet so be careful. I can think of two routes: >> >>>>> >>>> >> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >> running, this will most likely impact client IO depending on your load. >> Check out the rabbitmqctl commands. >> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from >> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >> >>>>> >>>> >> >>>>> >>>> I can imagine that the failed reply "survives" while being >> replicated across the rabbit nodes. But I don't really know the rabbit >> internals too well, so maybe someone else can chime in here and give a >> better advice. >> >>>>> >>>> >> >>>>> >>>> Regards, >> >>>>> >>>> Eugen >> >>>>> >>>> >> >>>>> >>>> Zitat von Swogat Pradhan : >> >>>>> >>>> >> >>>>> >>>> Hi, >> >>>>> >>>> Can someone please help me out on this issue? >> >>>>> >>>> >> >>>>> >>>> With regards, >> >>>>> >>>> Swogat Pradhan >> >>>>> >>>> >> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>>>> >>>> wrote: >> >>>>> >>>> >> >>>>> >>>> Hi >> >>>>> >>>> I don't see any major packet loss. >> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not >> due to packet >> >>>>> >>>> loss. >> >>>>> >>>> >> >>>>> >>>> with regards, >> >>>>> >>>> Swogat Pradhan >> >>>>> >>>> >> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >> swogatpradhan22 at gmail.com> >> >>>>> >>>> wrote: >> >>>>> >>>> >> >>>>> >>>> Hi, >> >>>>> >>>> Yes the MTU is the same as the default '1500'. >> >>>>> >>>> Generally I haven't seen any packet loss, but never checked >> when >> >>>>> >>>> launching the instance. >> >>>>> >>>> I will check that and come back. >> >>>>> >>>> But everytime i launch an instance the instance gets stuck at >> spawning >> >>>>> >>>> state and there the hypervisor becomes down, so not sure if >> packet loss >> >>>>> >>>> causes this. >> >>>>> >>>> >> >>>>> >>>> With regards, >> >>>>> >>>> Swogat pradhan >> >>>>> >>>> >> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >> wrote: >> >>>>> >>>> >> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical >> between >> >>>>> >>>> central and edge site? Do you see packet loss through the >> tunnel? >> >>>>> >>>> >> >>>>> >>>> Zitat von Swogat Pradhan : >> >>>>> >>>> >> >>>>> >>>> > Hi Eugen, >> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as >> i am not >> >>>>> >>>> > getting email's from you. >> >>>>> >>>> > Coming to the issue: >> >>>>> >>>> > >> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >> list_policies -p >> >>>>> >>>> / >> >>>>> >>>> > Listing policies for vhost "/" ... >> >>>>> >>>> > vhost name pattern apply-to definition >> priority >> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >> >>>>> >>>> > >> >>>>> >>>> >> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >> >>>>> >>>> > >> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >> when i am >> >>>>> >>>> trying >> >>>>> >>>> > to launch an instance and the instance comes to a spawning >> state and >> >>>>> >>>> then >> >>>>> >>>> > gets stuck. >> >>>>> >>>> > >> >>>>> >>>> > I have a tunnel setup between the central and the edge sites. >> >>>>> >>>> > >> >>>>> >>>> > With regards, >> >>>>> >>>> > Swogat Pradhan >> >>>>> >>>> > >> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >> >>>>> >>>> swogatpradhan22 at gmail.com> >> >>>>> >>>> > wrote: >> >>>>> >>>> > >> >>>>> >>>> >> Hi Eugen, >> >>>>> >>>> >> For some reason i am not getting your email to me directly, >> i am >> >>>>> >>>> checking >> >>>>> >>>> >> the email digest and there i am able to find your reply. >> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >> >>>>> >>>> >> >> >>>>> >>>> >> *Note: i am able to create vm's and perform other >> activities in the >> >>>>> >>>> >> central site, only facing this issue in the edge site.* >> >>>>> >>>> >> >> >>>>> >>>> >> With regards, >> >>>>> >>>> >> Swogat Pradhan >> >>>>> >>>> >> >> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >> >>>>> >>>> swogatpradhan22 at gmail.com> >> >>>>> >>>> >> wrote: >> >>>>> >>>> >> >> >>>>> >>>> >>> Hi Eugen, >> >>>>> >>>> >>> Thanks for your response. >> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >> details: >> >>>>> >>>> >>> >> >>>>> >>>> >>> *PCS Status:* >> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >> >>>>> >>>> >>> >> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >> >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >> >>>>> >>>> Started >> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >> >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >> >>>>> >>>> Started >> >>>>> >>>> >>> overcloud-controller-2 >> >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >> >>>>> >>>> Started >> >>>>> >>>> >>> overcloud-controller-1 >> >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >> >>>>> >>>> Started >> >>>>> >>>> >>> overcloud-controller-0 >> >>>>> >>>> >>> >> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the >> issue is >> >>>>> >>>> still >> >>>>> >>>> >>> present. >> >>>>> >>>> >>> >> >>>>> >>>> >>> *Cluster status:* >> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >> cluster_status >> >>>>> >>>> >>> Cluster status of node >> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >> >>>>> >>>> >>> Basics >> >>>>> >>>> >>> >> >>>>> >>>> >>> Cluster name: >> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >> >>>>> >>>> >>> >> >>>>> >>>> >>> Disk Nodes >> >>>>> >>>> >>> >> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>>>> >>>> >>> >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>>> >>>> >>> >> >>>>> >>>> >>> Running Nodes >> >>>>> >>>> >>> >> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >> >>>>> >>>> >>> >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>>> >>>> >>> >> >>>>> >>>> >>> Versions >> >>>>> >>>> >>> >> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >> RabbitMQ >> >>>>> >>>> 3.8.3 >> >>>>> >>>> >>> on Erlang 22.3.4.1 >> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >> RabbitMQ >> >>>>> >>>> 3.8.3 >> >>>>> >>>> >>> on Erlang 22.3.4.1 >> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >> RabbitMQ >> >>>>> >>>> 3.8.3 >> >>>>> >>>> >>> on Erlang 22.3.4.1 >> >>>>> >>>> >>> >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >> >>>>> >>>> RabbitMQ >> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >> >>>>> >>>> >>> >> >>>>> >>>> >>> Alarms >> >>>>> >>>> >>> >> >>>>> >>>> >>> (none) >> >>>>> >>>> >>> >> >>>>> >>>> >>> Network Partitions >> >>>>> >>>> >>> >> >>>>> >>>> >>> (none) >> >>>>> >>>> >>> >> >>>>> >>>> >>> Listeners >> >>>>> >>>> >>> >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >> inter-node and CLI >> >>>>> >>>> tool >> >>>>> >>>> >>> communication >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP >> 0-9-1 >> >>>>> >>>> >>> and AMQP 1.0 >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >> inter-node and CLI >> >>>>> >>>> tool >> >>>>> >>>> >>> communication >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP >> 0-9-1 >> >>>>> >>>> >>> and AMQP 1.0 >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >> inter-node and CLI >> >>>>> >>>> tool >> >>>>> >>>> >>> communication >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP >> 0-9-1 >> >>>>> >>>> >>> and AMQP 1.0 >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >> >>>>> >>>> interface: >> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>>> >>>> , >> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >> purpose: >> >>>>> >>>> inter-node and >> >>>>> >>>> >>> CLI tool communication >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>>> >>>> , >> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >> purpose: AMQP >> >>>>> >>>> 0-9-1 >> >>>>> >>>> >>> and AMQP 1.0 >> >>>>> >>>> >>> Node: >> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >> >>>>> >>>> , >> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >> HTTP API >> >>>>> >>>> >>> >> >>>>> >>>> >>> Feature flags >> >>>>> >>>> >>> >> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >> >>>>> >>>> >>> >> >>>>> >>>> >>> *Logs:* >> >>>>> >>>> >>> *(Attached)* >> >>>>> >>>> >>> >> >>>>> >>>> >>> With regards, >> >>>>> >>>> >>> Swogat Pradhan >> >>>>> >>>> >>> >> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >> >>>>> >>>> swogatpradhan22 at gmail.com> >> >>>>> >>>> >>> wrote: >> >>>>> >>>> >>> >> >>>>> >>>> >>>> Hi, >> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >> >>>>> >>>> >>>> >> >>>>> >>>> >>>> nova-conuctor: >> >>>>> >>>> >>>> >> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The >> reply >> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 >> seconds >> >>>>> >>>> due to a >> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >> >>>>> >>>> Abandoning...: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >> reply >> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 >> seconds >> >>>>> >>>> due to a >> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>>> >>>> Abandoning...: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >> reply >> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 >> seconds >> >>>>> >>>> due to a >> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>>> >>>> Abandoning...: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache >> enabled >> >>>>> >>>> with >> >>>>> >>>> >>>> backend dogpile.cache.null. >> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >> >>>>> >>>> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >> drop reply to >> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >> oslo_messaging._drivers.amqpdriver >> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >> reply >> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 >> seconds >> >>>>> >>>> due to a >> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >> >>>>> >>>> Abandoning...: >> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >> >>>>> >>>> >>>> >> >>>>> >>>> >>>> With regards, >> >>>>> >>>> >>>> Swogat Pradhan >> >>>>> >>>> >>>> >> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >> >>>>> >>>> >>>> >> >>>>> >>>> >>>>> Hi, >> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i >> am trying to >> >>>>> >>>> >>>>> launch vm's. >> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >> (openstack >> >>>>> >>>> compute >> >>>>> >>>> >>>>> service list), the node comes backup when i restart the >> nova >> >>>>> >>>> compute >> >>>>> >>>> >>>>> service but then the launch of the vm fails. >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> nova-compute.log >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >> Running >> >>>>> >>>> >>>>> instance usage >> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 >> 07:00:00 >> >>>>> >>>> to >> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> [instance: >> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful >> on node >> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> [instance: >> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied >> device >> >>>>> >>>> name: >> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> [instance: >> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> Cache enabled >> >>>>> >>>> with >> >>>>> >>>> >>>>> backend dogpile.cache.null. >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> Running >> >>>>> >>>> >>>>> privsep helper: >> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >> >>>>> >>>> 'privsep-helper', >> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> Spawned new >> >>>>> >>>> privsep >> >>>>> >>>> >>>>> daemon via rootwrap >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon >> [-] privsep >> >>>>> >>>> >>>>> daemon starting >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon >> [-] privsep >> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >> [-] privsep >> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >> [-] privsep >> >>>>> >>>> >>>>> daemon running as pid 2647 >> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >> >>>>> >>>> os_brick.initiator.connectors.nvmeof >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> Process >> >>>>> >>>> >>>>> execution error >> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >> command. >> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >> >>>>> >>>> >>>>> Exit code: 2 >> >>>>> >>>> >>>>> Stdout: '' >> >>>>> >>>> >>>>> Stderr: '': >> oslo_concurrency.processutils.ProcessExecutionError: >> >>>>> >>>> >>>>> Unexpected error while running command. >> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >> [instance: >> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> Is there a way to solve this issue? >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> With regards, >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>>> Swogat Pradhan >> >>>>> >>>> >>>>> >> >>>>> >>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From asimkon at otenet.gr Wed Mar 22 12:40:05 2023 From: asimkon at otenet.gr (Konstantinos Asimakopoulos) Date: Wed, 22 Mar 2023 14:40:05 +0200 Subject: App Deployment to OpenStack (Alfresco Community) Message-ID: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr> Hello! I am new to OpenStack technology (cloud) and of course willing to dive into this interesting infrastructure. I would like to get information (step by step wiki) on how to deploy open source applications especially Alfresco Community Edition https://www.alfresco.com/ecm-software/alfresco-community-editions to openstack cloud and operate it globally. Is that possible and how? That would help me a lot! Regards Kostas -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Wed Mar 22 13:45:00 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 22 Mar 2023 20:45:00 +0700 Subject: App Deployment to OpenStack (Alfresco Community) In-Reply-To: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr> References: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr> Message-ID: I think just like install and config application on vm. On Wed, Mar 22, 2023, 8:41 PM Konstantinos Asimakopoulos wrote: > Hello! > > I am new to OpenStack technology (cloud) and of course willing to dive > into this interesting infrastructure. I would like to get information (step > by step wiki) on how to deploy open source applications especially > Alfresco Community Edition > > https://www.alfresco.com/ecm-software/alfresco-community-editions > > to openstack cloud and operate it globally. Is that possible and how? > > > That would help me a lot! > > > Regards > Kostas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 22 13:41:50 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 22 Mar 2023 19:11:50 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Jhon, After some changes i feel like the cinder is now trying to pull the image from local glance as i am getting the following error in cinder-colume log: 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error finding address for http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: Unable to establish connection to http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) As the endpoint it is trying to reach is the dcn02 IP address. But when i check the ports i don't find the port 9292 running: [root at dcn02-compute-2 ceph]# netstat -nultp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:2022 0.0.0.0:* LISTEN 656800/sshd tcp 0 0 127.0.0.1:199 0.0.0.0:* LISTEN 4878/snmpd tcp 0 0 172.25.228.253:2379 0.0.0.0:* LISTEN 6232/etcd tcp 0 0 172.25.228.253:2380 0.0.0.0:* LISTEN 6232/etcd tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd tcp 0 0 127.0.0.1:6640 0.0.0.0:* LISTEN 2779/ovsdb-server tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 4918/sshd tcp6 0 0 :::2022 :::* LISTEN 656800/sshd tcp6 0 0 :::111 :::* LISTEN 1/systemd tcp6 0 0 :::22 :::* LISTEN 4918/sshd udp 0 0 0.0.0.0:111 0.0.0.0:* 1/systemd udp 0 0 0.0.0.0:161 0.0.0.0:* 4878/snmpd udp 0 0 127.0.0.1:323 0.0.0.0:* 2609/chronyd udp 0 0 0.0.0.0:6081 0.0.0.0:* - udp6 0 0 :::111 :::* 1/systemd udp6 0 0 ::1:161 :::* 4878/snmpd udp6 0 0 ::1:323 :::* 2609/chronyd udp6 0 0 :::6081 :::* - I see in the glance-api.conf that bind port parameter is set to 9292 but the port is not listed in netstat command. Can you please guide me in getting this port up and running as i feel like this would solve the issue i am facing right now. With regards, Swogat Pradhan On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan wrote: > Update: > Here is the log when creating a volume using cirros image: > > 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, > 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': > datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s > 2023-03-22 11:07:54.023 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > category=FutureWarning) > > 2023-03-22 11:11:12.161 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > category=FutureWarning) > > 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 > MB/s > 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f > (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully > 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. > > The image is present in dcn02 store but still it downloaded the image in > 0.16 MB/s and then created the volume. > > With regards, > Swogat Pradhan > > On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan > wrote: > >> Hi Jhon, >> This seems to be an issue. >> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >> parameter was specified to the respective cluster names but the config >> files were created in the name of ceph.conf and keyring was >> ceph.client.openstack.keyring. >> >> Which created issues in glance as well as the naming convention of the >> files didn't match the cluster names, so i had to manually rename the >> central ceph conf file as such: >> >> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >> [root at dcn02-compute-0 ceph]# ll >> total 16 >> -rw-------. 1 root root 257 Mar 13 13:56 >> ceph_central.client.openstack.keyring >> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >> [root at dcn02-compute-0 ceph]# >> >> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >> respective clusters in both dcn01 and dcn02. >> In the above cli output, the ceph.conf and ceph.client... are the files >> used to access dcn02 ceph cluster and ceph_central* files are used in for >> accessing central ceph cluster. >> >> glance multistore config: >> [dcn02] >> rbd_store_ceph_conf=/etc/ceph/ceph.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=dcn02 rbd glance store >> >> [ceph_central] >> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=Default glance store backend. >> >> >> With regards, >> Swogat Pradhan >> >> On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: >> >>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>> wrote: >>> > >>> > Hi, >>> > Seems like cinder is not using the local ceph. >>> >>> That explains the issue. It's a misconfiguration. >>> >>> I hope this is not a production system since the mailing list now has >>> the cinder.conf which contains passwords. >>> >>> The section that looks like this: >>> >>> [tripleo_ceph] >>> volume_backend_name=tripleo_ceph >>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>> rbd_ceph_conf=/etc/ceph/ceph.conf >>> rbd_user=openstack >>> rbd_pool=volumes >>> rbd_flatten_volume_from_snapshot=False >>> rbd_secret_uuid= >>> report_discard_supported=True >>> >>> Should be updated to refer to the local DCN ceph cluster and not the >>> central one. Use the ceph conf file for that cluster and ensure the >>> rbd_secret_uuid corresponds to that one. >>> >>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>> Ceph cluster. The FSID should be in the ceph.conf file. The >>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>> libvirt can retrieve the cephx secret using the FSID as a key. This >>> can be confirmed with `podman exec nova_virtsecretd virsh >>> secret-get-value $FSID`. >>> >>> The documentation describes how to configure the central and DCN sites >>> correctly but an error seems to have occurred while you were following >>> it. >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>> >>> John >>> >>> > >>> > Ceph Output: >>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>> > NAME SIZE PARENT FMT PROT >>> LOCK >>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>> excl >>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >>> > >>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>> > NAME SIZE PARENT FMT >>> PROT LOCK >>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>> > [ceph: root at dcn02-ceph-all-0 /]# >>> > >>> > Attached the cinder config. >>> > Please let me know how I can solve this issue. >>> > >>> > With regards, >>> > Swogat Pradhan >>> > >>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>> wrote: >>> >> >>> >> in my last message under the line "On a DCN site if you run a command >>> like this:" I suggested some steps you could try to confirm the image is a >>> COW from the local glance as well as how to look at your cinder config. >>> >> >>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>> >>> >>> Update: >>> >>> I uploaded an image directly to the dcn02 store, and it takes around >>> 10,15 minutes to create a volume with image in dcn02. >>> >>> The image size is 389 MB. >>> >>> >>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> >>> >>>> Hi Jhon, >>> >>>> I checked in the ceph od dcn02, I can see the images created after >>> importing from the central site. >>> >>>> But launching an instance normally fails as it takes a long time >>> for the volume to get created. >>> >>>> >>> >>>> When launching an instance from volume the instance is getting >>> created properly without any errors. >>> >>>> >>> >>>> I tried to cache images in nova using >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> but getting checksum failed error. >>> >>>> >>> >>>> With regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >>> wrote: >>> >>>>> >>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>> >>>>> wrote: >>> >>>>> > >>> >>>>> > Update: After restarting the nova services on the controller and >>> running the deploy script on the edge site, I was able to launch the VM >>> from volume. >>> >>>>> > >>> >>>>> > Right now the instance creation is failing as the block device >>> creation is stuck in creating state, it is taking more than 10 mins for the >>> volume to be created, whereas the image has already been imported to the >>> edge glance. >>> >>>>> >>> >>>>> Try following this document and making the same observations in >>> your >>> >>>>> environment for AZs and their local ceph cluster. >>> >>>>> >>> >>>>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>> >>>>> >>> >>>>> On a DCN site if you run a command like this: >>> >>>>> >>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>> >>>>> NAME SIZE PARENT >>> >>>>> FMT PROT LOCK >>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>> >>>>> $ >>> >>>>> >>> >>>>> Then, you should see the parent of the volume is the image which >>> is on >>> >>>>> the same local ceph cluster. >>> >>>>> >>> >>>>> I wonder if something is misconfigured and thus you're encountering >>> >>>>> the streaming behavior described here: >>> >>>>> >>> >>>>> Ideally all images should reside in the central Glance and be >>> copied >>> >>>>> to DCN sites before instances of those images are booted on DCN >>> sites. >>> >>>>> If an image is not copied to a DCN site before it is booted, then >>> the >>> >>>>> image will be streamed to the DCN site and then the image will >>> boot as >>> >>>>> an instance. This happens because Glance at the DCN site has >>> access to >>> >>>>> the images store at the Central ceph cluster. Though the booting of >>> >>>>> the image will take time because it has not been copied in advance, >>> >>>>> this is still preferable to failing to boot the image. >>> >>>>> >>> >>>>> You can also exec into the cinder container at the DCN site and >>> >>>>> confirm it's using it's local ceph cluster. >>> >>>>> >>> >>>>> John >>> >>>>> >>> >>>>> > >>> >>>>> > I will try and create a new fresh image and test again then >>> update. >>> >>>>> > >>> >>>>> > With regards, >>> >>>>> > Swogat Pradhan >>> >>>>> > >>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >> >>> >>>>> >> Update: >>> >>>>> >> In the hypervisor list the compute node state is showing down. >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >>> >>> >>>>> >>> Hi Brendan, >>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>> >>>>> >>> I used a cirros image to launch instance but the instance >>> timed out so i waited for the volume to be created. >>> >>>>> >>> Once the volume was created i tried launching the instance >>> from the volume and still the instance is stuck in spawning state. >>> >>>>> >>> >>> >>>>> >>> Here is the nova-compute log: >>> >>>>> >>> >>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >>> privsep daemon starting >>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >>> privsep process running with uid/gid: 0/0 >>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>> privsep process running with capabilities (eff/prm/inh): >>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>> privsep daemon running as pid 185437 >>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>> os_brick.initiator.connectors.nvmeof >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>> in _get_host_uuid: Unexpected error while running command. >>> >>>>> >>> Command: blkid overlay -s UUID -o value >>> >>>>> >>> Exit code: 2 >>> >>>>> >>> Stdout: '' >>> >>>>> >>> Stderr: '': >>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>> running command. >>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>> >>>>> >>> >>> >>>>> >>> It is stuck in creating image, do i need to run the template >>> mentioned here ?: >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> >>>>> >>> >>> >>>>> >>> The volume is already created and i do not understand why the >>> instance is stuck in spawning state. >>> >>>>> >>> >>> >>>>> >>> With regards, >>> >>>>> >>> Swogat Pradhan >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>> bshephar at redhat.com> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Does your environment use different network interfaces for >>> each of the networks? Or does it have a bond with everything on it? >>> >>>>> >>>> >>> >>>>> >>>> One issue I have seen before is that when launching >>> instances, there is a lot of network traffic between nodes as the >>> hypervisor needs to download the image from Glance. Along with various >>> other services sending normal network traffic, it can be enough to cause >>> issues if everything is running over a single 1Gbe interface. >>> >>>>> >>>> >>> >>>>> >>>> I have seen the same situation in fact when using a single >>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>> while you try to spawn the instance to see if you?re dropping packets. In >>> the situation I described, there were dropped packets which resulted in a >>> loss of communication between nova_compute and RMQ, so the node appeared >>> offline. You should also confirm that nova_compute is being disconnected in >>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>> instance. >>> >>>>> >>>> >>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, >>> based on that experience, from my perspective, is certainly sounds like >>> some kind of network issue. >>> >>>>> >>>> >>> >>>>> >>>> Regards, >>> >>>>> >>>> >>> >>>>> >>>> Brendan Shephard >>> >>>>> >>>> Senior Software Engineer >>> >>>>> >>>> Red Hat Australia >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> >>> >>>>> >>>> I tried to help someone with a similar issue some time ago in >>> this thread: >>> >>>>> >>>> >>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>> >>>>> >>>> >>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>> user, not sure if that could apply here. But is it possible that your nova >>> and neutron versions are different between central and edge site? Have you >>> restarted nova and neutron services on the compute nodes after >>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>> Maybe they can help narrow down the issue. >>> >>>>> >>>> If there isn't any additional information in the debug logs I >>> probably would start "tearing down" rabbitmq. I didn't have to do that in a >>> production system yet so be careful. I can think of two routes: >>> >>>>> >>>> >>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>> running, this will most likely impact client IO depending on your load. >>> Check out the rabbitmqctl commands. >>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from >>> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>> >>>>> >>>> >>> >>>>> >>>> I can imagine that the failed reply "survives" while being >>> replicated across the rabbit nodes. But I don't really know the rabbit >>> internals too well, so maybe someone else can chime in here and give a >>> better advice. >>> >>>>> >>>> >>> >>>>> >>>> Regards, >>> >>>>> >>>> Eugen >>> >>>>> >>>> >>> >>>>> >>>> Zitat von Swogat Pradhan : >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> Can someone please help me out on this issue? >>> >>>>> >>>> >>> >>>>> >>>> With regards, >>> >>>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi >>> >>>>> >>>> I don't see any major packet loss. >>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not >>> due to packet >>> >>>>> >>>> loss. >>> >>>>> >>>> >>> >>>>> >>>> with regards, >>> >>>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked >>> when >>> >>>>> >>>> launching the instance. >>> >>>>> >>>> I will check that and come back. >>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at >>> spawning >>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if >>> packet loss >>> >>>>> >>>> causes this. >>> >>>>> >>>> >>> >>>>> >>>> With regards, >>> >>>>> >>>> Swogat pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical >>> between >>> >>>>> >>>> central and edge site? Do you see packet loss through the >>> tunnel? >>> >>>>> >>>> >>> >>>>> >>>> Zitat von Swogat Pradhan : >>> >>>>> >>>> >>> >>>>> >>>> > Hi Eugen, >>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' >>> as i am not >>> >>>>> >>>> > getting email's from you. >>> >>>>> >>>> > Coming to the issue: >>> >>>>> >>>> > >>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>> list_policies -p >>> >>>>> >>>> / >>> >>>>> >>>> > Listing policies for vhost "/" ... >>> >>>>> >>>> > vhost name pattern apply-to definition >>> priority >>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>> >>>>> >>>> > >>> >>>>> >>>> >>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>> >>>>> >>>> > >>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >>> when i am >>> >>>>> >>>> trying >>> >>>>> >>>> > to launch an instance and the instance comes to a spawning >>> state and >>> >>>>> >>>> then >>> >>>>> >>>> > gets stuck. >>> >>>>> >>>> > >>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>> sites. >>> >>>>> >>>> > >>> >>>>> >>>> > With regards, >>> >>>>> >>>> > Swogat Pradhan >>> >>>>> >>>> > >>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> > wrote: >>> >>>>> >>>> > >>> >>>>> >>>> >> Hi Eugen, >>> >>>>> >>>> >> For some reason i am not getting your email to me >>> directly, i am >>> >>>>> >>>> checking >>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>> >>>>> >>>> >> >>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>> activities in the >>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>> >>>>> >>>> >> >>> >>>>> >>>> >> With regards, >>> >>>>> >>>> >> Swogat Pradhan >>> >>>>> >>>> >> >>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> >> wrote: >>> >>>>> >>>> >> >>> >>>>> >>>> >>> Hi Eugen, >>> >>>>> >>>> >>> Thanks for your response. >>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>> details: >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *PCS Status:* >>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>> >>>>> >>>> >>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>> >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>> >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-2 >>> >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-1 >>> >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-0 >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the >>> issue is >>> >>>>> >>>> still >>> >>>>> >>>> >>> present. >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *Cluster status:* >>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>> cluster_status >>> >>>>> >>>> >>> Cluster status of node >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> ... >>> >>>>> >>>> >>> Basics >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Cluster name: >>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Disk Nodes >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Running Nodes >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Versions >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>> >>>>> >>>> RabbitMQ >>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Alarms >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> (none) >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Network Partitions >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> (none) >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Listeners >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>> purpose: >>> >>>>> >>>> inter-node and >>> >>>>> >>>> >>> CLI tool communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>> purpose: AMQP >>> >>>>> >>>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >>> HTTP API >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Feature flags >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *Logs:* >>> >>>>> >>>> >>> *(Attached)* >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> With regards, >>> >>>>> >>>> >>> Swogat Pradhan >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> >>> wrote: >>> >>>>> >>>> >>> >>> >>>>> >>>> >>>> Hi, >>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> nova-conuctor: >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The >>> reply >>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Cache enabled >>> >>>>> >>>> with >>> >>>>> >>>> >>>> backend dogpile.cache.null. >>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> With regards, >>> >>>>> >>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>>> Hi, >>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i >>> am trying to >>> >>>>> >>>> >>>>> launch vm's. >>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>> (openstack >>> >>>>> >>>> compute >>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the >>> nova >>> >>>>> >>>> compute >>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> nova-compute.log >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >>> Running >>> >>>>> >>>> >>>>> instance usage >>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>> 2023-02-26 07:00:00 >>> >>>>> >>>> to >>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful >>> on node >>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied >>> device >>> >>>>> >>>> name: >>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>> volume >>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Cache enabled >>> >>>>> >>>> with >>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Running >>> >>>>> >>>> >>>>> privsep helper: >>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>> >>>>> >>>> 'privsep-helper', >>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Spawned new >>> >>>>> >>>> privsep >>> >>>>> >>>> >>>>> daemon via rootwrap >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> daemon starting >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> daemon running as pid 2647 >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Process >>> >>>>> >>>> >>>>> execution error >>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>> command. >>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>> >>>>> >>>> >>>>> Exit code: 2 >>> >>>>> >>>> >>>>> Stdout: '' >>> >>>>> >>>> >>>>> Stderr: '': >>> oslo_concurrency.processutils.ProcessExecutionError: >>> >>>>> >>>> >>>>> Unexpected error while running command. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> With regards, >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> Swogat Pradhan >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Wed Mar 22 13:46:05 2023 From: johfulto at redhat.com (John Fulton) Date: Wed, 22 Mar 2023 09:46:05 -0400 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Wed, Mar 22, 2023 at 9:42?AM Swogat Pradhan wrote: > > Hi Jhon, > After some changes i feel like the cinder is now trying to pull the image from local glance as i am getting the following error in cinder-colume log: > > 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error finding address for http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: Unable to establish connection to http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) > > As the endpoint it is trying to reach is the dcn02 IP address. > > But when i check the ports i don't find the port 9292 running: > [root at dcn02-compute-2 ceph]# netstat -nultp > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name > tcp 0 0 0.0.0.0:2022 0.0.0.0:* LISTEN 656800/sshd > tcp 0 0 127.0.0.1:199 0.0.0.0:* LISTEN 4878/snmpd > tcp 0 0 172.25.228.253:2379 0.0.0.0:* LISTEN 6232/etcd > tcp 0 0 172.25.228.253:2380 0.0.0.0:* LISTEN 6232/etcd > tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd > tcp 0 0 127.0.0.1:6640 0.0.0.0:* LISTEN 2779/ovsdb-server > tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 4918/sshd > tcp6 0 0 :::2022 :::* LISTEN 656800/sshd > tcp6 0 0 :::111 :::* LISTEN 1/systemd > tcp6 0 0 :::22 :::* LISTEN 4918/sshd > udp 0 0 0.0.0.0:111 0.0.0.0:* 1/systemd > udp 0 0 0.0.0.0:161 0.0.0.0:* 4878/snmpd > udp 0 0 127.0.0.1:323 0.0.0.0:* 2609/chronyd > udp 0 0 0.0.0.0:6081 0.0.0.0:* - > udp6 0 0 :::111 :::* 1/systemd > udp6 0 0 ::1:161 :::* 4878/snmpd > udp6 0 0 ::1:323 :::* 2609/chronyd > udp6 0 0 :::6081 :::* - > > I see in the glance-api.conf that bind port parameter is set to 9292 but the port is not listed in netstat command. > Can you please guide me in getting this port up and running as i feel like this would solve the issue i am facing right now. Looks like your glance container stopped running. Ask podman to show you all containers (including stopped ones) and investigate why the glance container stopped. > > With regards, > Swogat Pradhan > > On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan wrote: >> >> Update: >> Here is the log when creating a volume using cirros image: >> >> 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } >> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >> 2023-03-22 11:07:54.023 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena' >> category=FutureWarning) >> >> 2023-03-22 11:11:12.161 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena' >> category=FutureWarning) >> >> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 MB/s >> 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >> >> The image is present in dcn02 store but still it downloaded the image in 0.16 MB/s and then created the volume. >> >> With regards, >> Swogat Pradhan >> >> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan wrote: >>> >>> Hi Jhon, >>> This seems to be an issue. >>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster parameter was specified to the respective cluster names but the config files were created in the name of ceph.conf and keyring was ceph.client.openstack.keyring. >>> >>> Which created issues in glance as well as the naming convention of the files didn't match the cluster names, so i had to manually rename the central ceph conf file as such: >>> >>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>> [root at dcn02-compute-0 ceph]# ll >>> total 16 >>> -rw-------. 1 root root 257 Mar 13 13:56 ceph_central.client.openstack.keyring >>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>> [root at dcn02-compute-0 ceph]# >>> >>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the respective clusters in both dcn01 and dcn02. >>> In the above cli output, the ceph.conf and ceph.client... are the files used to access dcn02 ceph cluster and ceph_central* files are used in for accessing central ceph cluster. >>> >>> glance multistore config: >>> [dcn02] >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=dcn02 rbd glance store >>> >>> [ceph_central] >>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=Default glance store backend. >>> >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: >>>> >>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>> wrote: >>>> > >>>> > Hi, >>>> > Seems like cinder is not using the local ceph. >>>> >>>> That explains the issue. It's a misconfiguration. >>>> >>>> I hope this is not a production system since the mailing list now has >>>> the cinder.conf which contains passwords. >>>> >>>> The section that looks like this: >>>> >>>> [tripleo_ceph] >>>> volume_backend_name=tripleo_ceph >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>> rbd_user=openstack >>>> rbd_pool=volumes >>>> rbd_flatten_volume_from_snapshot=False >>>> rbd_secret_uuid= >>>> report_discard_supported=True >>>> >>>> Should be updated to refer to the local DCN ceph cluster and not the >>>> central one. Use the ceph conf file for that cluster and ensure the >>>> rbd_secret_uuid corresponds to that one. >>>> >>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>> secret-get-value $FSID`. >>>> >>>> The documentation describes how to configure the central and DCN sites >>>> correctly but an error seems to have occurred while you were following >>>> it. >>>> >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>> >>>> John >>>> >>>> > >>>> > Ceph Output: >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>> > NAME SIZE PARENT FMT PROT LOCK >>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 excl >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >>>> > >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>> > NAME SIZE PARENT FMT PROT LOCK >>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>> > >>>> > Attached the cinder config. >>>> > Please let me know how I can solve this issue. >>>> > >>>> > With regards, >>>> > Swogat Pradhan >>>> > >>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton wrote: >>>> >> >>>> >> in my last message under the line "On a DCN site if you run a command like this:" I suggested some steps you could try to confirm the image is a COW from the local glance as well as how to look at your cinder config. >>>> >> >>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan wrote: >>>> >>> >>>> >>> Update: >>>> >>> I uploaded an image directly to the dcn02 store, and it takes around 10,15 minutes to create a volume with image in dcn02. >>>> >>> The image size is 389 MB. >>>> >>> >>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan wrote: >>>> >>>> >>>> >>>> Hi Jhon, >>>> >>>> I checked in the ceph od dcn02, I can see the images created after importing from the central site. >>>> >>>> But launching an instance normally fails as it takes a long time for the volume to get created. >>>> >>>> >>>> >>>> When launching an instance from volume the instance is getting created properly without any errors. >>>> >>>> >>>> >>>> I tried to cache images in nova using https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html but getting checksum failed error. >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton wrote: >>>> >>>>> >>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>> >>>>> wrote: >>>> >>>>> > >>>> >>>>> > Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume. >>>> >>>>> > >>>> >>>>> > Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance. >>>> >>>>> >>>> >>>>> Try following this document and making the same observations in your >>>> >>>>> environment for AZs and their local ceph cluster. >>>> >>>>> >>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>> >>>>> >>>> >>>>> On a DCN site if you run a command like this: >>>> >>>>> >>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>> >>>>> NAME SIZE PARENT >>>> >>>>> FMT PROT LOCK >>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>> >>>>> $ >>>> >>>>> >>>> >>>>> Then, you should see the parent of the volume is the image which is on >>>> >>>>> the same local ceph cluster. >>>> >>>>> >>>> >>>>> I wonder if something is misconfigured and thus you're encountering >>>> >>>>> the streaming behavior described here: >>>> >>>>> >>>> >>>>> Ideally all images should reside in the central Glance and be copied >>>> >>>>> to DCN sites before instances of those images are booted on DCN sites. >>>> >>>>> If an image is not copied to a DCN site before it is booted, then the >>>> >>>>> image will be streamed to the DCN site and then the image will boot as >>>> >>>>> an instance. This happens because Glance at the DCN site has access to >>>> >>>>> the images store at the Central ceph cluster. Though the booting of >>>> >>>>> the image will take time because it has not been copied in advance, >>>> >>>>> this is still preferable to failing to boot the image. >>>> >>>>> >>>> >>>>> You can also exec into the cinder container at the DCN site and >>>> >>>>> confirm it's using it's local ceph cluster. >>>> >>>>> >>>> >>>>> John >>>> >>>>> >>>> >>>>> > >>>> >>>>> > I will try and create a new fresh image and test again then update. >>>> >>>>> > >>>> >>>>> > With regards, >>>> >>>>> > Swogat Pradhan >>>> >>>>> > >>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan wrote: >>>> >>>>> >> >>>> >>>>> >> Update: >>>> >>>>> >> In the hypervisor list the compute node state is showing down. >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan wrote: >>>> >>>>> >>> >>>> >>>>> >>> Hi Brendan, >>>> >>>>> >>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>> >>>>> >>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created. >>>> >>>>> >>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state. >>>> >>>>> >>> >>>> >>>>> >>> Here is the nova-compute log: >>>> >>>>> >>> >>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting >>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command. >>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>> >>>>> >>> Exit code: 2 >>>> >>>>> >>> Stdout: '' >>>> >>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. >>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>> >>>>> >>> >>>> >>>>> >>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>> >>>>> >>> >>>> >>>>> >>> The volume is already created and i do not understand why the instance is stuck in spawning state. >>>> >>>>> >>> >>>> >>>>> >>> With regards, >>>> >>>>> >>> Swogat Pradhan >>>> >>>>> >>> >>>> >>>>> >>> >>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it? >>>> >>>>> >>>> >>>> >>>>> >>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface. >>>> >>>>> >>>> >>>> >>>>> >>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance. >>>> >>>>> >>>> >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue. >>>> >>>>> >>>> >>>> >>>>> >>>> Regards, >>>> >>>>> >>>> >>>> >>>>> >>>> Brendan Shephard >>>> >>>>> >>>> Senior Software Engineer >>>> >>>>> >>>> Red Hat Australia >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago in this thread: >>>> >>>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>> >>>>> >>>> >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue. >>>> >>>>> >>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes: >>>> >>>>> >>>> >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands. >>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>> >>>>> >>>> >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice. >>>> >>>>> >>>> >>>> >>>>> >>>> Regards, >>>> >>>>> >>>> Eugen >>>> >>>>> >>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> Can someone please help me out on this issue? >>>> >>>>> >>>> >>>> >>>>> >>>> With regards, >>>> >>>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan >>>> >>>>> >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi >>>> >>>>> >>>> I don't see any major packet loss. >>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet >>>> >>>>> >>>> loss. >>>> >>>>> >>>> >>>> >>>>> >>>> with regards, >>>> >>>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan >>>> >>>>> >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked when >>>> >>>>> >>>> launching the instance. >>>> >>>>> >>>> I will check that and come back. >>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at spawning >>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if packet loss >>>> >>>>> >>>> causes this. >>>> >>>>> >>>> >>>> >>>>> >>>> With regards, >>>> >>>>> >>>> Swogat pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical between >>>> >>>>> >>>> central and edge site? Do you see packet loss through the tunnel? >>>> >>>>> >>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>> >>>>> >>>> >>>> >>>>> >>>> > Hi Eugen, >>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am not >>>> >>>>> >>>> > getting email's from you. >>>> >>>>> >>>> > Coming to the issue: >>>> >>>>> >>>> > >>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p >>>> >>>>> >>>> / >>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>> >>>>> >>>> > vhost name pattern apply-to definition priority >>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>> >>>>> >>>> > >>>> >>>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>> >>>>> >>>> > >>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down when i am >>>> >>>>> >>>> trying >>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning state and >>>> >>>>> >>>> then >>>> >>>>> >>>> > gets stuck. >>>> >>>>> >>>> > >>>> >>>>> >>>> > I have a tunnel setup between the central and the edge sites. >>>> >>>>> >>>> > >>>> >>>>> >>>> > With regards, >>>> >>>>> >>>> > Swogat Pradhan >>>> >>>>> >>>> > >>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> > wrote: >>>> >>>>> >>>> > >>>> >>>>> >>>> >> Hi Eugen, >>>> >>>>> >>>> >> For some reason i am not getting your email to me directly, i am >>>> >>>>> >>>> checking >>>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other activities in the >>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> With regards, >>>> >>>>> >>>> >> Swogat Pradhan >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> >> wrote: >>>> >>>>> >>>> >> >>>> >>>>> >>>> >>> Hi Eugen, >>>> >>>>> >>>> >>> Thanks for your response. >>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the details: >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *PCS Status:* >>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>> >>>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>> >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>> >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-2 >>>> >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-1 >>>> >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-0 >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the issue is >>>> >>>>> >>>> still >>>> >>>>> >>>> >>> present. >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *Cluster status:* >>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status >>>> >>>>> >>>> >>> Cluster status of node >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>> >>>>> >>>> >>> Basics >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Disk Nodes >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Running Nodes >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Versions >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>> >>>>> >>>> RabbitMQ >>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Alarms >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> (none) >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Network Partitions >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> (none) >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Listeners >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose: >>>> >>>>> >>>> inter-node and >>>> >>>>> >>>> >>> CLI tool communication >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP >>>> >>>>> >>>> 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Feature flags >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *Logs:* >>>> >>>>> >>>> >>> *(Attached)* >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> With regards, >>>> >>>>> >>>> >>> Swogat Pradhan >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> >>> wrote: >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>>> Hi, >>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> nova-conuctor: >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> >>>>> >>>> with >>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> With regards, >>>> >>>>> >>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>>> Hi, >>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to >>>> >>>>> >>>> >>>>> launch vm's. >>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack >>>> >>>>> >>>> compute >>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the nova >>>> >>>>> >>>> compute >>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> nova-compute.log >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running >>>> >>>>> >>>> >>>>> instance usage >>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 >>>> >>>>> >>>> to >>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node >>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device >>>> >>>>> >>>> name: >>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume >>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled >>>> >>>>> >>>> with >>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running >>>> >>>>> >>>> >>>>> privsep helper: >>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>> >>>>> >>>> 'privsep-helper', >>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new >>>> >>>>> >>>> privsep >>>> >>>>> >>>> >>>>> daemon via rootwrap >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> >>>> >>>>> daemon starting >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep >>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process >>>> >>>>> >>>> >>>>> execution error >>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command. >>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>> >>>>> >>>> >>>>> Exit code: 2 >>>> >>>>> >>>> >>>>> Stdout: '' >>>> >>>>> >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: >>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> With regards, >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Swogat Pradhan >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> From swogatpradhan22 at gmail.com Wed Mar 22 13:54:32 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 22 Mar 2023 19:24:32 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: My glance container is running but is in an unhealthy state. I don't see any errors in podman logs glance_api or anywhere. [root at dcn02-compute-0 ~]# podman ps --all | grep glance 03a07452704a 172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo 9 days ago Exited (0) 41 minutes ago container-puppet-glance_api b61e96e9f504 172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo /bin/bash -c chow... 9 days ago Exited (0) 36 minutes ago glance_init_logs ec1734dfb072 172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo /usr/bin/bootstra... 34 minutes ago Exited (0) 34 minutes ago glance_api_db_sync a8eb5d18b8d6 172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo kolla_start 31 minutes ago Up 32 minutes ago (healthy) glance_api_cron 74a92f45a4a2 172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo kolla_start 31 minutes ago Up 32 minutes ago (unhealthy) glance_api With regards, Swogat Pradhan On Wed, Mar 22, 2023 at 7:16?PM John Fulton wrote: > On Wed, Mar 22, 2023 at 9:42?AM Swogat Pradhan > wrote: > > > > Hi Jhon, > > After some changes i feel like the cinder is now trying to pull the > image from local glance as i am getting the following error in > cinder-colume log: > > > > 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server > cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error > finding address for > http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: > Unable to establish connection to > http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: > HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded > with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by > NewConnectionError(' 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] > ECONNREFUSED',)) > > > > As the endpoint it is trying to reach is the dcn02 IP address. > > > > But when i check the ports i don't find the port 9292 running: > > [root at dcn02-compute-2 ceph]# netstat -nultp > > Active Internet connections (only servers) > > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > > tcp 0 0 0.0.0.0:2022 0.0.0.0:* > LISTEN 656800/sshd > > tcp 0 0 127.0.0.1:199 0.0.0.0:* > LISTEN 4878/snmpd > > tcp 0 0 172.25.228.253:2379 0.0.0.0:* > LISTEN 6232/etcd > > tcp 0 0 172.25.228.253:2380 0.0.0.0:* > LISTEN 6232/etcd > > tcp 0 0 0.0.0.0:111 0.0.0.0:* > LISTEN 1/systemd > > tcp 0 0 127.0.0.1:6640 0.0.0.0:* > LISTEN 2779/ovsdb-server > > tcp 0 0 0.0.0.0:22 0.0.0.0:* > LISTEN 4918/sshd > > tcp6 0 0 :::2022 :::* > LISTEN 656800/sshd > > tcp6 0 0 :::111 :::* > LISTEN 1/systemd > > tcp6 0 0 :::22 :::* > LISTEN 4918/sshd > > udp 0 0 0.0.0.0:111 0.0.0.0:* > 1/systemd > > udp 0 0 0.0.0.0:161 0.0.0.0:* > 4878/snmpd > > udp 0 0 127.0.0.1:323 0.0.0.0:* > 2609/chronyd > > udp 0 0 0.0.0.0:6081 0.0.0.0:* > - > > udp6 0 0 :::111 :::* > 1/systemd > > udp6 0 0 ::1:161 :::* > 4878/snmpd > > udp6 0 0 ::1:323 :::* > 2609/chronyd > > udp6 0 0 :::6081 :::* > - > > > > I see in the glance-api.conf that bind port parameter is set to 9292 but > the port is not listed in netstat command. > > Can you please guide me in getting this port up and running as i feel > like this would solve the issue i am facing right now. > > Looks like your glance container stopped running. Ask podman to show > you all containers (including stopped ones) and investigate why the > glance container stopped. > > > > > With regards, > > Swogat Pradhan > > > > On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >> > >> Update: > >> Here is the log when creating a volume using cirros image: > >> > >> 2023-03-22 11:04:38.449 109 INFO > cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, > 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': > datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > >> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s > >> 2023-03-22 11:07:54.023 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > >> category=FutureWarning) > >> > >> 2023-03-22 11:11:12.161 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > >> category=FutureWarning) > >> > >> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 > MB/s > >> 2023-03-22 11:11:14.998 109 INFO > cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f > (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully > >> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. > >> > >> The image is present in dcn02 store but still it downloaded the image > in 0.16 MB/s and then created the volume. > >> > >> With regards, > >> Swogat Pradhan > >> > >> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>> > >>> Hi Jhon, > >>> This seems to be an issue. > >>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster > parameter was specified to the respective cluster names but the config > files were created in the name of ceph.conf and keyring was > ceph.client.openstack.keyring. > >>> > >>> Which created issues in glance as well as the naming convention of the > files didn't match the cluster names, so i had to manually rename the > central ceph conf file as such: > >>> > >>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ > >>> [root at dcn02-compute-0 ceph]# ll > >>> total 16 > >>> -rw-------. 1 root root 257 Mar 13 13:56 > ceph_central.client.openstack.keyring > >>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf > >>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring > >>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf > >>> [root at dcn02-compute-0 ceph]# > >>> > >>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the > respective clusters in both dcn01 and dcn02. > >>> In the above cli output, the ceph.conf and ceph.client... are the > files used to access dcn02 ceph cluster and ceph_central* files are used in > for accessing central ceph cluster. > >>> > >>> glance multistore config: > >>> [dcn02] > >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf > >>> rbd_store_user=openstack > >>> rbd_store_pool=images > >>> rbd_thin_provisioning=False > >>> store_description=dcn02 rbd glance store > >>> > >>> [ceph_central] > >>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf > >>> rbd_store_user=openstack > >>> rbd_store_pool=images > >>> rbd_thin_provisioning=False > >>> store_description=Default glance store backend. > >>> > >>> > >>> With regards, > >>> Swogat Pradhan > >>> > >>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton > wrote: > >>>> > >>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan > >>>> wrote: > >>>> > > >>>> > Hi, > >>>> > Seems like cinder is not using the local ceph. > >>>> > >>>> That explains the issue. It's a misconfiguration. > >>>> > >>>> I hope this is not a production system since the mailing list now has > >>>> the cinder.conf which contains passwords. > >>>> > >>>> The section that looks like this: > >>>> > >>>> [tripleo_ceph] > >>>> volume_backend_name=tripleo_ceph > >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver > >>>> rbd_ceph_conf=/etc/ceph/ceph.conf > >>>> rbd_user=openstack > >>>> rbd_pool=volumes > >>>> rbd_flatten_volume_from_snapshot=False > >>>> rbd_secret_uuid= > >>>> report_discard_supported=True > >>>> > >>>> Should be updated to refer to the local DCN ceph cluster and not the > >>>> central one. Use the ceph conf file for that cluster and ensure the > >>>> rbd_secret_uuid corresponds to that one. > >>>> > >>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the > >>>> Ceph cluster. The FSID should be in the ceph.conf file. The > >>>> tripleo_nova_libvirt role will use virsh secret-* commands so that > >>>> libvirt can retrieve the cephx secret using the FSID as a key. This > >>>> can be confirmed with `podman exec nova_virtsecretd virsh > >>>> secret-get-value $FSID`. > >>>> > >>>> The documentation describes how to configure the central and DCN sites > >>>> correctly but an error seems to have occurred while you were following > >>>> it. > >>>> > >>>> > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html > >>>> > >>>> John > >>>> > >>>> > > >>>> > Ceph Output: > >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l > >>>> > NAME SIZE PARENT FMT > PROT LOCK > >>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 > excl > >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 > >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 > yes > >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 > >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 > yes > >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 > >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 > yes > >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 > >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 > yes > >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 > >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 > yes > >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 > >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 > yes > >>>> > > >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l > >>>> > NAME SIZE PARENT FMT > PROT LOCK > >>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 > >>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 > >>>> > [ceph: root at dcn02-ceph-all-0 /]# > >>>> > > >>>> > Attached the cinder config. > >>>> > Please let me know how I can solve this issue. > >>>> > > >>>> > With regards, > >>>> > Swogat Pradhan > >>>> > > >>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton > wrote: > >>>> >> > >>>> >> in my last message under the line "On a DCN site if you run a > command like this:" I suggested some steps you could try to confirm the > image is a COW from the local glance as well as how to look at your cinder > config. > >>>> >> > >>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>> >>> > >>>> >>> Update: > >>>> >>> I uploaded an image directly to the dcn02 store, and it takes > around 10,15 minutes to create a volume with image in dcn02. > >>>> >>> The image size is 389 MB. > >>>> >>> > >>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>> >>>> > >>>> >>>> Hi Jhon, > >>>> >>>> I checked in the ceph od dcn02, I can see the images created > after importing from the central site. > >>>> >>>> But launching an instance normally fails as it takes a long time > for the volume to get created. > >>>> >>>> > >>>> >>>> When launching an instance from volume the instance is getting > created properly without any errors. > >>>> >>>> > >>>> >>>> I tried to cache images in nova using > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > but getting checksum failed error. > >>>> >>>> > >>>> >>>> With regards, > >>>> >>>> Swogat Pradhan > >>>> >>>> > >>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton > wrote: > >>>> >>>>> > >>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan > >>>> >>>>> wrote: > >>>> >>>>> > > >>>> >>>>> > Update: After restarting the nova services on the controller > and running the deploy script on the edge site, I was able to launch the VM > from volume. > >>>> >>>>> > > >>>> >>>>> > Right now the instance creation is failing as the block > device creation is stuck in creating state, it is taking more than 10 mins > for the volume to be created, whereas the image has already been imported > to the edge glance. > >>>> >>>>> > >>>> >>>>> Try following this document and making the same observations in > your > >>>> >>>>> environment for AZs and their local ceph cluster. > >>>> >>>>> > >>>> >>>>> > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites > >>>> >>>>> > >>>> >>>>> On a DCN site if you run a command like this: > >>>> >>>>> > >>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring > >>>> >>>>> /etc/ceph/dcn0.client.admin.keyring > >>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l > >>>> >>>>> NAME SIZE PARENT > >>>> >>>>> FMT PROT LOCK > >>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB > >>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl > >>>> >>>>> $ > >>>> >>>>> > >>>> >>>>> Then, you should see the parent of the volume is the image > which is on > >>>> >>>>> the same local ceph cluster. > >>>> >>>>> > >>>> >>>>> I wonder if something is misconfigured and thus you're > encountering > >>>> >>>>> the streaming behavior described here: > >>>> >>>>> > >>>> >>>>> Ideally all images should reside in the central Glance and be > copied > >>>> >>>>> to DCN sites before instances of those images are booted on DCN > sites. > >>>> >>>>> If an image is not copied to a DCN site before it is booted, > then the > >>>> >>>>> image will be streamed to the DCN site and then the image will > boot as > >>>> >>>>> an instance. This happens because Glance at the DCN site has > access to > >>>> >>>>> the images store at the Central ceph cluster. Though the > booting of > >>>> >>>>> the image will take time because it has not been copied in > advance, > >>>> >>>>> this is still preferable to failing to boot the image. > >>>> >>>>> > >>>> >>>>> You can also exec into the cinder container at the DCN site and > >>>> >>>>> confirm it's using it's local ceph cluster. > >>>> >>>>> > >>>> >>>>> John > >>>> >>>>> > >>>> >>>>> > > >>>> >>>>> > I will try and create a new fresh image and test again then > update. > >>>> >>>>> > > >>>> >>>>> > With regards, > >>>> >>>>> > Swogat Pradhan > >>>> >>>>> > > >>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>> >>>>> >> > >>>> >>>>> >> Update: > >>>> >>>>> >> In the hypervisor list the compute node state is showing > down. > >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> wrote: > >>>> >>>>> >>> > >>>> >>>>> >>> Hi Brendan, > >>>> >>>>> >>> Now i have deployed another site where i have used 2 linux > bonds network template for both 3 compute nodes and 3 ceph nodes. > >>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). > >>>> >>>>> >>> I used a cirros image to launch instance but the instance > timed out so i waited for the volume to be created. > >>>> >>>>> >>> Once the volume was created i tried launching the instance > from the volume and still the instance is stuck in spawning state. > >>>> >>>>> >>> > >>>> >>>>> >>> Here is the nova-compute log: > >>>> >>>>> >>> > >>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] > privsep daemon starting > >>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] > privsep process running with uid/gid: 0/0 > >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] > privsep process running with capabilities (eff/prm/inh): > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] > privsep daemon running as pid 185437 > >>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING > os_brick.initiator.connectors.nvmeof > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error > in _get_host_uuid: Unexpected error while running command. > >>>> >>>>> >>> Command: blkid overlay -s UUID -o value > >>>> >>>>> >>> Exit code: 2 > >>>> >>>>> >>> Stdout: '' > >>>> >>>>> >>> Stderr: '': > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while > running command. > >>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver > [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - default default] [instance: > 450b749c-a10a-4308-80a9-3b8020fee758] Creating image > >>>> >>>>> >>> > >>>> >>>>> >>> It is stuck in creating image, do i need to run the > template mentioned here ?: > https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html > >>>> >>>>> >>> > >>>> >>>>> >>> The volume is already created and i do not understand why > the instance is stuck in spawning state. > >>>> >>>>> >>> > >>>> >>>>> >>> With regards, > >>>> >>>>> >>> Swogat Pradhan > >>>> >>>>> >>> > >>>> >>>>> >>> > >>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < > bshephar at redhat.com> wrote: > >>>> >>>>> >>>> > >>>> >>>>> >>>> Does your environment use different network interfaces for > each of the networks? Or does it have a bond with everything on it? > >>>> >>>>> >>>> > >>>> >>>>> >>>> One issue I have seen before is that when launching > instances, there is a lot of network traffic between nodes as the > hypervisor needs to download the image from Glance. Along with various > other services sending normal network traffic, it can be enough to cause > issues if everything is running over a single 1Gbe interface. > >>>> >>>>> >>>> > >>>> >>>>> >>>> I have seen the same situation in fact when using a single > active/backup bond on 1Gbe nics. It?s worth checking the network traffic > while you try to spawn the instance to see if you?re dropping packets. In > the situation I described, there were dropped packets which resulted in a > loss of communication between nova_compute and RMQ, so the node appeared > offline. You should also confirm that nova_compute is being disconnected in > the nova_compute logs if you tail them on the Hypervisor while spawning the > instance. > >>>> >>>>> >>>> > >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. > So, based on that experience, from my perspective, is certainly sounds like > some kind of network issue. > >>>> >>>>> >>>> > >>>> >>>>> >>>> Regards, > >>>> >>>>> >>>> > >>>> >>>>> >>>> Brendan Shephard > >>>> >>>>> >>>> Senior Software Engineer > >>>> >>>>> >>>> Red Hat Australia > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block > wrote: > >>>> >>>>> >>>> > >>>> >>>>> >>>> Hi, > >>>> >>>>> >>>> > >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago > in this thread: > >>>> >>>>> >>>> > https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor > >>>> >>>>> >>>> > >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that > user, not sure if that could apply here. But is it possible that your nova > and neutron versions are different between central and edge site? Have you > restarted nova and neutron services on the compute nodes after > installation? Have you debug logs of nova-conductor and maybe nova-compute? > Maybe they can help narrow down the issue. > >>>> >>>>> >>>> If there isn't any additional information in the debug > logs I probably would start "tearing down" rabbitmq. I didn't have to do > that in a production system yet so be careful. I can think of two routes: > >>>> >>>>> >>>> > >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is > running, this will most likely impact client IO depending on your load. > Check out the rabbitmqctl commands. > >>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables > from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. > >>>> >>>>> >>>> > >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being > replicated across the rabbit nodes. But I don't really know the rabbit > internals too well, so maybe someone else can chime in here and give a > better advice. > >>>> >>>>> >>>> > >>>> >>>>> >>>> Regards, > >>>> >>>>> >>>> Eugen > >>>> >>>>> >>>> > >>>> >>>>> >>>> Zitat von Swogat Pradhan : > >>>> >>>>> >>>> > >>>> >>>>> >>>> Hi, > >>>> >>>>> >>>> Can someone please help me out on this issue? > >>>> >>>>> >>>> > >>>> >>>>> >>>> With regards, > >>>> >>>>> >>>> Swogat Pradhan > >>>> >>>>> >>>> > >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>> >>>>> >>>> wrote: > >>>> >>>>> >>>> > >>>> >>>>> >>>> Hi > >>>> >>>>> >>>> I don't see any major packet loss. > >>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but > not due to packet > >>>> >>>>> >>>> loss. > >>>> >>>>> >>>> > >>>> >>>>> >>>> with regards, > >>>> >>>>> >>>> Swogat Pradhan > >>>> >>>>> >>>> > >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < > swogatpradhan22 at gmail.com> > >>>> >>>>> >>>> wrote: > >>>> >>>>> >>>> > >>>> >>>>> >>>> Hi, > >>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. > >>>> >>>>> >>>> Generally I haven't seen any packet loss, but never > checked when > >>>> >>>>> >>>> launching the instance. > >>>> >>>>> >>>> I will check that and come back. > >>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck > at spawning > >>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure > if packet loss > >>>> >>>>> >>>> causes this. > >>>> >>>>> >>>> > >>>> >>>>> >>>> With regards, > >>>> >>>>> >>>> Swogat pradhan > >>>> >>>>> >>>> > >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block > wrote: > >>>> >>>>> >>>> > >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they > identical between > >>>> >>>>> >>>> central and edge site? Do you see packet loss through the > tunnel? > >>>> >>>>> >>>> > >>>> >>>>> >>>> Zitat von Swogat Pradhan : > >>>> >>>>> >>>> > >>>> >>>>> >>>> > Hi Eugen, > >>>> >>>>> >>>> > Request you to please add my email either on 'to' or > 'cc' as i am not > >>>> >>>>> >>>> > getting email's from you. > >>>> >>>>> >>>> > Coming to the issue: > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl > list_policies -p > >>>> >>>>> >>>> / > >>>> >>>>> >>>> > Listing policies for vhost "/" ... > >>>> >>>>> >>>> > vhost name pattern apply-to definition > priority > >>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down > when i am > >>>> >>>>> >>>> trying > >>>> >>>>> >>>> > to launch an instance and the instance comes to a > spawning state and > >>>> >>>>> >>>> then > >>>> >>>>> >>>> > gets stuck. > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > I have a tunnel setup between the central and the edge > sites. > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > With regards, > >>>> >>>>> >>>> > Swogat Pradhan > >>>> >>>>> >>>> > > >>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < > >>>> >>>>> >>>> swogatpradhan22 at gmail.com> > >>>> >>>>> >>>> > wrote: > >>>> >>>>> >>>> > > >>>> >>>>> >>>> >> Hi Eugen, > >>>> >>>>> >>>> >> For some reason i am not getting your email to me > directly, i am > >>>> >>>>> >>>> checking > >>>> >>>>> >>>> >> the email digest and there i am able to find your reply. > >>>> >>>>> >>>> >> Here is the log for download: > https://we.tl/t-L8FEkGZFSq > >>>> >>>>> >>>> >> Yes, these logs are from the time when the issue > occurred. > >>>> >>>>> >>>> >> > >>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other > activities in the > >>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* > >>>> >>>>> >>>> >> > >>>> >>>>> >>>> >> With regards, > >>>> >>>>> >>>> >> Swogat Pradhan > >>>> >>>>> >>>> >> > >>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < > >>>> >>>>> >>>> swogatpradhan22 at gmail.com> > >>>> >>>>> >>>> >> wrote: > >>>> >>>>> >>>> >> > >>>> >>>>> >>>> >>> Hi Eugen, > >>>> >>>>> >>>> >>> Thanks for your response. > >>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the > details: > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> *PCS Status:* > >>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ > >>>> >>>>> >>>> >>> > 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: > >>>> >>>>> >>>> >>> * rabbitmq-bundle-0 > (ocf::heartbeat:rabbitmq-cluster): > >>>> >>>>> >>>> Started > >>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 > >>>> >>>>> >>>> >>> * rabbitmq-bundle-1 > (ocf::heartbeat:rabbitmq-cluster): > >>>> >>>>> >>>> Started > >>>> >>>>> >>>> >>> overcloud-controller-2 > >>>> >>>>> >>>> >>> * rabbitmq-bundle-2 > (ocf::heartbeat:rabbitmq-cluster): > >>>> >>>>> >>>> Started > >>>> >>>>> >>>> >>> overcloud-controller-1 > >>>> >>>>> >>>> >>> * rabbitmq-bundle-3 > (ocf::heartbeat:rabbitmq-cluster): > >>>> >>>>> >>>> Started > >>>> >>>>> >>>> >>> overcloud-controller-0 > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but > the issue is > >>>> >>>>> >>>> still > >>>> >>>>> >>>> >>> present. > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> *Cluster status:* > >>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl > cluster_status > >>>> >>>>> >>>> >>> Cluster status of node > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > ... > >>>> >>>>> >>>> >>> Basics > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Cluster name: > rabbit at overcloud-controller-no-ceph-3.bdxworld.com > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Disk Nodes > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Running Nodes > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Versions > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: > RabbitMQ > >>>> >>>>> >>>> 3.8.3 > >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: > RabbitMQ > >>>> >>>>> >>>> 3.8.3 > >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: > RabbitMQ > >>>> >>>>> >>>> 3.8.3 > >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 > >>>> >>>>> >>>> >>> > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: > >>>> >>>>> >>>> RabbitMQ > >>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Alarms > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> (none) > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Network Partitions > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> (none) > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Listeners > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>> >>>>> >>>> tool > >>>> >>>>> >>>> >>> communication > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: > AMQP 0-9-1 > >>>> >>>>> >>>> >>> and AMQP 1.0 > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-0.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>> >>>>> >>>> tool > >>>> >>>>> >>>> >>> communication > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: > AMQP 0-9-1 > >>>> >>>>> >>>> >>> and AMQP 1.0 > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-1.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: > inter-node and CLI > >>>> >>>>> >>>> tool > >>>> >>>>> >>>> >>> communication > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: > AMQP 0-9-1 > >>>> >>>>> >>>> >>> and AMQP 1.0 > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-2.internalapi.bdxworld.com, > >>>> >>>>> >>>> interface: > >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>>>> >>>> , > >>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, > purpose: > >>>> >>>>> >>>> inter-node and > >>>> >>>>> >>>> >>> CLI tool communication > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>>>> >>>> , > >>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, > purpose: AMQP > >>>> >>>>> >>>> 0-9-1 > >>>> >>>>> >>>> >>> and AMQP 1.0 > >>>> >>>>> >>>> >>> Node: > rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com > >>>> >>>>> >>>> , > >>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: > HTTP API > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Feature flags > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled > >>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled > >>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled > >>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled > >>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> *Logs:* > >>>> >>>>> >>>> >>> *(Attached)* > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> With regards, > >>>> >>>>> >>>> >>> Swogat Pradhan > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < > >>>> >>>>> >>>> swogatpradhan22 at gmail.com> > >>>> >>>>> >>>> >>> wrote: > >>>> >>>>> >>>> >>> > >>>> >>>>> >>>> >>>> Hi, > >>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api > log. > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> >>>> nova-conuctor: > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b > >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa > >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] > The reply > >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after > 60 seconds > >>>> >>>>> >>>> due to a > >>>> >>>>> >>>> >>>> missing queue > (reply_276049ec36a84486a8a406911d9802f4). > >>>> >>>>> >>>> Abandoning...: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > The reply > >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after > 60 seconds > >>>> >>>>> >>>> due to a > >>>> >>>>> >>>> >>>> missing queue > (reply_349bcb075f8c49329435a0f884b33066). > >>>> >>>>> >>>> Abandoning...: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > The reply > >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after > 60 seconds > >>>> >>>>> >>>> due to a > >>>> >>>>> >>>> >>>> missing queue > (reply_349bcb075f8c49329435a0f884b33066). > >>>> >>>>> >>>> Abandoning...: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils > >>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Cache enabled > >>>> >>>>> >>>> with > >>>> >>>>> >>>> >>>> backend dogpile.cache.null. > >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING > >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, > drop reply to > >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR > oslo_messaging._drivers.amqpdriver > >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] > The reply > >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after > 60 seconds > >>>> >>>>> >>>> due to a > >>>> >>>>> >>>> >>>> missing queue > (reply_349bcb075f8c49329435a0f884b33066). > >>>> >>>>> >>>> Abandoning...: > >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> >>>> With regards, > >>>> >>>>> >>>> >>>> Swogat Pradhan > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < > >>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> >>>>> Hi, > >>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where > i am trying to > >>>> >>>>> >>>> >>>>> launch vm's. > >>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down > (openstack > >>>> >>>>> >>>> compute > >>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart > the nova > >>>> >>>>> >>>> compute > >>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> nova-compute.log > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager > >>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] > Running > >>>> >>>>> >>>> >>>>> instance usage > >>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from > 2023-02-26 07:00:00 > >>>> >>>>> >>>> to > >>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim > successful on node > >>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO > nova.virt.libvirt.driver > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring > supplied device > >>>> >>>>> >>>> name: > >>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev > names > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with > volume > >>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Cache enabled > >>>> >>>>> >>>> with > >>>> >>>>> >>>> >>>>> backend dogpile.cache.null. > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Running > >>>> >>>>> >>>> >>>>> privsep helper: > >>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', > >>>> >>>>> >>>> 'privsep-helper', > >>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', > '--config-file', > >>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', > >>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', > >>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Spawned new > >>>> >>>>> >>>> privsep > >>>> >>>>> >>>> >>>>> daemon via rootwrap > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO > oslo.privsep.daemon [-] privsep > >>>> >>>>> >>>> >>>>> daemon starting > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO > oslo.privsep.daemon [-] privsep > >>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO > oslo.privsep.daemon [-] privsep > >>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): > >>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO > oslo.privsep.daemon [-] privsep > >>>> >>>>> >>>> >>>>> daemon running as pid 2647 > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING > >>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > Process > >>>> >>>>> >>>> >>>>> execution error > >>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running > command. > >>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value > >>>> >>>>> >>>> >>>>> Exit code: 2 > >>>> >>>>> >>>> >>>>> Stdout: '' > >>>> >>>>> >>>> >>>>> Stderr: '': > oslo_concurrency.processutils.ProcessExecutionError: > >>>> >>>>> >>>> >>>>> Unexpected error while running command. > >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO > nova.virt.libvirt.driver > >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 > >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db > >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] > [instance: > >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> Is there a way to solve this issue? > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> With regards, > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>>> Swogat Pradhan > >>>> >>>>> >>>> >>>>> > >>>> >>>>> >>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> >>>> > >>>> >>>>> > >>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Wed Mar 22 14:41:27 2023 From: abishop at redhat.com (Alan Bishop) Date: Wed, 22 Mar 2023 07:41:27 -0700 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan wrote: > Update: > Here is the log when creating a volume using cirros image: > > 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, > 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': > datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', > 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s > As Adam Savage would say, well there's your problem ^^ (Image download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s suggests you have a network issue. John Fulton previously stated your cinder-volume service at the edge site is not using the local ceph image store. Assuming you are deploying GlanceApiEdge service [1], then the cinder-volume service should be configured to use the local glance service [2]. You should check cinder's glance_api_servers to confirm it's the edge site's glance service. [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 [2] https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 Alan > 2023-03-22 11:07:54.023 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > category=FutureWarning) > > 2023-03-22 11:11:12.161 109 WARNING py.warnings > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] > /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: > FutureWarning: The human format is deprecated and the format parameter will > be removed. Use explicitly json instead in version 'xena' > category=FutureWarning) > > 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 > MB/s > 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f > (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully > 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager > [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. > > The image is present in dcn02 store but still it downloaded the image in > 0.16 MB/s and then created the volume. > > With regards, > Swogat Pradhan > > On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan > wrote: > >> Hi Jhon, >> This seems to be an issue. >> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >> parameter was specified to the respective cluster names but the config >> files were created in the name of ceph.conf and keyring was >> ceph.client.openstack.keyring. >> >> Which created issues in glance as well as the naming convention of the >> files didn't match the cluster names, so i had to manually rename the >> central ceph conf file as such: >> >> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >> [root at dcn02-compute-0 ceph]# ll >> total 16 >> -rw-------. 1 root root 257 Mar 13 13:56 >> ceph_central.client.openstack.keyring >> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >> [root at dcn02-compute-0 ceph]# >> >> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >> respective clusters in both dcn01 and dcn02. >> In the above cli output, the ceph.conf and ceph.client... are the files >> used to access dcn02 ceph cluster and ceph_central* files are used in for >> accessing central ceph cluster. >> >> glance multistore config: >> [dcn02] >> rbd_store_ceph_conf=/etc/ceph/ceph.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=dcn02 rbd glance store >> >> [ceph_central] >> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=Default glance store backend. >> >> >> With regards, >> Swogat Pradhan >> >> On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: >> >>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>> wrote: >>> > >>> > Hi, >>> > Seems like cinder is not using the local ceph. >>> >>> That explains the issue. It's a misconfiguration. >>> >>> I hope this is not a production system since the mailing list now has >>> the cinder.conf which contains passwords. >>> >>> The section that looks like this: >>> >>> [tripleo_ceph] >>> volume_backend_name=tripleo_ceph >>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>> rbd_ceph_conf=/etc/ceph/ceph.conf >>> rbd_user=openstack >>> rbd_pool=volumes >>> rbd_flatten_volume_from_snapshot=False >>> rbd_secret_uuid= >>> report_discard_supported=True >>> >>> Should be updated to refer to the local DCN ceph cluster and not the >>> central one. Use the ceph conf file for that cluster and ensure the >>> rbd_secret_uuid corresponds to that one. >>> >>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>> Ceph cluster. The FSID should be in the ceph.conf file. The >>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>> libvirt can retrieve the cephx secret using the FSID as a key. This >>> can be confirmed with `podman exec nova_virtsecretd virsh >>> secret-get-value $FSID`. >>> >>> The documentation describes how to configure the central and DCN sites >>> correctly but an error seems to have occurred while you were following >>> it. >>> >>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>> >>> John >>> >>> > >>> > Ceph Output: >>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>> > NAME SIZE PARENT FMT PROT >>> LOCK >>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>> excl >>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >>> > >>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>> > NAME SIZE PARENT FMT >>> PROT LOCK >>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>> > [ceph: root at dcn02-ceph-all-0 /]# >>> > >>> > Attached the cinder config. >>> > Please let me know how I can solve this issue. >>> > >>> > With regards, >>> > Swogat Pradhan >>> > >>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>> wrote: >>> >> >>> >> in my last message under the line "On a DCN site if you run a command >>> like this:" I suggested some steps you could try to confirm the image is a >>> COW from the local glance as well as how to look at your cinder config. >>> >> >>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>> >>> >>> Update: >>> >>> I uploaded an image directly to the dcn02 store, and it takes around >>> 10,15 minutes to create a volume with image in dcn02. >>> >>> The image size is 389 MB. >>> >>> >>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> >>> >>>> Hi Jhon, >>> >>>> I checked in the ceph od dcn02, I can see the images created after >>> importing from the central site. >>> >>>> But launching an instance normally fails as it takes a long time >>> for the volume to get created. >>> >>>> >>> >>>> When launching an instance from volume the instance is getting >>> created properly without any errors. >>> >>>> >>> >>>> I tried to cache images in nova using >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> but getting checksum failed error. >>> >>>> >>> >>>> With regards, >>> >>>> Swogat Pradhan >>> >>>> >>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >>> wrote: >>> >>>>> >>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>> >>>>> wrote: >>> >>>>> > >>> >>>>> > Update: After restarting the nova services on the controller and >>> running the deploy script on the edge site, I was able to launch the VM >>> from volume. >>> >>>>> > >>> >>>>> > Right now the instance creation is failing as the block device >>> creation is stuck in creating state, it is taking more than 10 mins for the >>> volume to be created, whereas the image has already been imported to the >>> edge glance. >>> >>>>> >>> >>>>> Try following this document and making the same observations in >>> your >>> >>>>> environment for AZs and their local ceph cluster. >>> >>>>> >>> >>>>> >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>> >>>>> >>> >>>>> On a DCN site if you run a command like this: >>> >>>>> >>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>> >>>>> NAME SIZE PARENT >>> >>>>> FMT PROT LOCK >>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>> >>>>> $ >>> >>>>> >>> >>>>> Then, you should see the parent of the volume is the image which >>> is on >>> >>>>> the same local ceph cluster. >>> >>>>> >>> >>>>> I wonder if something is misconfigured and thus you're encountering >>> >>>>> the streaming behavior described here: >>> >>>>> >>> >>>>> Ideally all images should reside in the central Glance and be >>> copied >>> >>>>> to DCN sites before instances of those images are booted on DCN >>> sites. >>> >>>>> If an image is not copied to a DCN site before it is booted, then >>> the >>> >>>>> image will be streamed to the DCN site and then the image will >>> boot as >>> >>>>> an instance. This happens because Glance at the DCN site has >>> access to >>> >>>>> the images store at the Central ceph cluster. Though the booting of >>> >>>>> the image will take time because it has not been copied in advance, >>> >>>>> this is still preferable to failing to boot the image. >>> >>>>> >>> >>>>> You can also exec into the cinder container at the DCN site and >>> >>>>> confirm it's using it's local ceph cluster. >>> >>>>> >>> >>>>> John >>> >>>>> >>> >>>>> > >>> >>>>> > I will try and create a new fresh image and test again then >>> update. >>> >>>>> > >>> >>>>> > With regards, >>> >>>>> > Swogat Pradhan >>> >>>>> > >>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >> >>> >>>>> >> Update: >>> >>>>> >> In the hypervisor list the compute node state is showing down. >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >>> >>> >>>>> >>> Hi Brendan, >>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>> >>>>> >>> I used a cirros image to launch instance but the instance >>> timed out so i waited for the volume to be created. >>> >>>>> >>> Once the volume was created i tried launching the instance >>> from the volume and still the instance is stuck in spawning state. >>> >>>>> >>> >>> >>>>> >>> Here is the nova-compute log: >>> >>>>> >>> >>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >>> privsep daemon starting >>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >>> privsep process running with uid/gid: 0/0 >>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>> privsep process running with capabilities (eff/prm/inh): >>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>> privsep daemon running as pid 185437 >>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>> os_brick.initiator.connectors.nvmeof >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>> in _get_host_uuid: Unexpected error while running command. >>> >>>>> >>> Command: blkid overlay -s UUID -o value >>> >>>>> >>> Exit code: 2 >>> >>>>> >>> Stdout: '' >>> >>>>> >>> Stderr: '': >>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>> running command. >>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>> >>>>> >>> >>> >>>>> >>> It is stuck in creating image, do i need to run the template >>> mentioned here ?: >>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>> >>>>> >>> >>> >>>>> >>> The volume is already created and i do not understand why the >>> instance is stuck in spawning state. >>> >>>>> >>> >>> >>>>> >>> With regards, >>> >>>>> >>> Swogat Pradhan >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>> bshephar at redhat.com> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Does your environment use different network interfaces for >>> each of the networks? Or does it have a bond with everything on it? >>> >>>>> >>>> >>> >>>>> >>>> One issue I have seen before is that when launching >>> instances, there is a lot of network traffic between nodes as the >>> hypervisor needs to download the image from Glance. Along with various >>> other services sending normal network traffic, it can be enough to cause >>> issues if everything is running over a single 1Gbe interface. >>> >>>>> >>>> >>> >>>>> >>>> I have seen the same situation in fact when using a single >>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>> while you try to spawn the instance to see if you?re dropping packets. In >>> the situation I described, there were dropped packets which resulted in a >>> loss of communication between nova_compute and RMQ, so the node appeared >>> offline. You should also confirm that nova_compute is being disconnected in >>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>> instance. >>> >>>>> >>>> >>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, >>> based on that experience, from my perspective, is certainly sounds like >>> some kind of network issue. >>> >>>>> >>>> >>> >>>>> >>>> Regards, >>> >>>>> >>>> >>> >>>>> >>>> Brendan Shephard >>> >>>>> >>>> Senior Software Engineer >>> >>>>> >>>> Red Hat Australia >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> >>> >>>>> >>>> I tried to help someone with a similar issue some time ago in >>> this thread: >>> >>>>> >>>> >>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>> >>>>> >>>> >>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>> user, not sure if that could apply here. But is it possible that your nova >>> and neutron versions are different between central and edge site? Have you >>> restarted nova and neutron services on the compute nodes after >>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>> Maybe they can help narrow down the issue. >>> >>>>> >>>> If there isn't any additional information in the debug logs I >>> probably would start "tearing down" rabbitmq. I didn't have to do that in a >>> production system yet so be careful. I can think of two routes: >>> >>>>> >>>> >>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>> running, this will most likely impact client IO depending on your load. >>> Check out the rabbitmqctl commands. >>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from >>> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>> >>>>> >>>> >>> >>>>> >>>> I can imagine that the failed reply "survives" while being >>> replicated across the rabbit nodes. But I don't really know the rabbit >>> internals too well, so maybe someone else can chime in here and give a >>> better advice. >>> >>>>> >>>> >>> >>>>> >>>> Regards, >>> >>>>> >>>> Eugen >>> >>>>> >>>> >>> >>>>> >>>> Zitat von Swogat Pradhan : >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> Can someone please help me out on this issue? >>> >>>>> >>>> >>> >>>>> >>>> With regards, >>> >>>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi >>> >>>>> >>>> I don't see any major packet loss. >>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not >>> due to packet >>> >>>>> >>>> loss. >>> >>>>> >>>> >>> >>>>> >>>> with regards, >>> >>>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> Hi, >>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked >>> when >>> >>>>> >>>> launching the instance. >>> >>>>> >>>> I will check that and come back. >>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at >>> spawning >>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if >>> packet loss >>> >>>>> >>>> causes this. >>> >>>>> >>>> >>> >>>>> >>>> With regards, >>> >>>>> >>>> Swogat pradhan >>> >>>>> >>>> >>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>> wrote: >>> >>>>> >>>> >>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical >>> between >>> >>>>> >>>> central and edge site? Do you see packet loss through the >>> tunnel? >>> >>>>> >>>> >>> >>>>> >>>> Zitat von Swogat Pradhan : >>> >>>>> >>>> >>> >>>>> >>>> > Hi Eugen, >>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' >>> as i am not >>> >>>>> >>>> > getting email's from you. >>> >>>>> >>>> > Coming to the issue: >>> >>>>> >>>> > >>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>> list_policies -p >>> >>>>> >>>> / >>> >>>>> >>>> > Listing policies for vhost "/" ... >>> >>>>> >>>> > vhost name pattern apply-to definition >>> priority >>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>> >>>>> >>>> > >>> >>>>> >>>> >>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>> >>>>> >>>> > >>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >>> when i am >>> >>>>> >>>> trying >>> >>>>> >>>> > to launch an instance and the instance comes to a spawning >>> state and >>> >>>>> >>>> then >>> >>>>> >>>> > gets stuck. >>> >>>>> >>>> > >>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>> sites. >>> >>>>> >>>> > >>> >>>>> >>>> > With regards, >>> >>>>> >>>> > Swogat Pradhan >>> >>>>> >>>> > >>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> > wrote: >>> >>>>> >>>> > >>> >>>>> >>>> >> Hi Eugen, >>> >>>>> >>>> >> For some reason i am not getting your email to me >>> directly, i am >>> >>>>> >>>> checking >>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>> >>>>> >>>> >> >>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>> activities in the >>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>> >>>>> >>>> >> >>> >>>>> >>>> >> With regards, >>> >>>>> >>>> >> Swogat Pradhan >>> >>>>> >>>> >> >>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> >> wrote: >>> >>>>> >>>> >> >>> >>>>> >>>> >>> Hi Eugen, >>> >>>>> >>>> >>> Thanks for your response. >>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>> details: >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *PCS Status:* >>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>> >>>>> >>>> >>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>> >>>>> >>>> >>> * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>> >>>>> >>>> >>> * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-2 >>> >>>>> >>>> >>> * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-1 >>> >>>>> >>>> >>> * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): >>> >>>>> >>>> Started >>> >>>>> >>>> >>> overcloud-controller-0 >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the >>> issue is >>> >>>>> >>>> still >>> >>>>> >>>> >>> present. >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *Cluster status:* >>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>> cluster_status >>> >>>>> >>>> >>> Cluster status of node >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> ... >>> >>>>> >>>> >>> Basics >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Cluster name: >>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Disk Nodes >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Running Nodes >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Versions >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>> RabbitMQ >>> >>>>> >>>> 3.8.3 >>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>> >>>>> >>>> >>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>> >>>>> >>>> RabbitMQ >>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Alarms >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> (none) >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Network Partitions >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> (none) >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Listeners >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>> inter-node and CLI >>> >>>>> >>>> tool >>> >>>>> >>>> >>> communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP >>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>> >>>>> >>>> interface: >>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>> purpose: >>> >>>>> >>>> inter-node and >>> >>>>> >>>> >>> CLI tool communication >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>> purpose: AMQP >>> >>>>> >>>> 0-9-1 >>> >>>>> >>>> >>> and AMQP 1.0 >>> >>>>> >>>> >>> Node: >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>> >>>>> >>>> , >>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >>> HTTP API >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Feature flags >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> *Logs:* >>> >>>>> >>>> >>> *(Attached)* >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> With regards, >>> >>>>> >>>> >>> Swogat Pradhan >>> >>>>> >>>> >>> >>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>> >>>>> >>>> swogatpradhan22 at gmail.com> >>> >>>>> >>>> >>> wrote: >>> >>>>> >>>> >>> >>> >>>>> >>>> >>>> Hi, >>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> nova-conuctor: >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The >>> reply >>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Cache enabled >>> >>>>> >>>> with >>> >>>>> >>>> >>>> backend dogpile.cache.null. >>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>> drop reply to >>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>> oslo_messaging._drivers.amqpdriver >>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The >>> reply >>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 >>> seconds >>> >>>>> >>>> due to a >>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>> >>>>> >>>> Abandoning...: >>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> With regards, >>> >>>>> >>>> >>>> Swogat Pradhan >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>>>> Hi, >>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i >>> am trying to >>> >>>>> >>>> >>>>> launch vm's. >>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>> (openstack >>> >>>>> >>>> compute >>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the >>> nova >>> >>>>> >>>> compute >>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> nova-compute.log >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >>> Running >>> >>>>> >>>> >>>>> instance usage >>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>> 2023-02-26 07:00:00 >>> >>>>> >>>> to >>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful >>> on node >>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied >>> device >>> >>>>> >>>> name: >>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>> volume >>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Cache enabled >>> >>>>> >>>> with >>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Running >>> >>>>> >>>> >>>>> privsep helper: >>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>> >>>>> >>>> 'privsep-helper', >>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file', >>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Spawned new >>> >>>>> >>>> privsep >>> >>>>> >>>> >>>>> daemon via rootwrap >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> daemon starting >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>> [-] privsep >>> >>>>> >>>> >>>>> daemon running as pid 2647 >>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> Process >>> >>>>> >>>> >>>>> execution error >>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>> command. >>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>> >>>>> >>>> >>>>> Exit code: 2 >>> >>>>> >>>> >>>>> Stdout: '' >>> >>>>> >>>> >>>>> Stderr: '': >>> oslo_concurrency.processutils.ProcessExecutionError: >>> >>>>> >>>> >>>>> Unexpected error while running command. >>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>> [instance: >>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> With regards, >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> Swogat Pradhan >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Wed Mar 22 15:14:21 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Wed, 22 Mar 2023 08:14:21 -0700 Subject: [ironic] Meeting cancelled April 10; PTL availabliity Message-ID: After a quick poll of some Ironic contributors, it looks like most of us will be out April 10 for holiday or vacation. I'm cancelling the team meeting as we likely won't have quorum. Along those lines; I will be out the entire week of April 10. If you have anything that will need my attention specifically, please reach out before then. I trust any one of the PTLs-emeritus that are Ironic contributors to handle any event requiring PTL approval while I'm gone. Thanks, Jay Faulkner Ironic PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Wed Mar 22 15:18:47 2023 From: hberaud at redhat.com (Herve Beraud) Date: Wed, 22 Mar 2023 16:18:47 +0100 Subject: OpenStack 2023.1 Antelope is officially released! Message-ID: Hello OpenStack community, I'm excited to announce the final releases for the components of OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope development cycle. You will find a complete list of all components, their latest versions, and links to individual project release notes documents listed on the new release site. https://releases.openstack.org/antelope/ Congratulations to all of the teams who have contributed to this release! Our next production cycle, 2023.2 Bobcat, has already started. We will meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan the work for the upcoming cycle. I hope to see you there! Thanks, OpenStack Release Management team -- Herv? Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Mar 22 15:26:36 2023 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 22 Mar 2023 16:26:36 +0100 Subject: [largescale-sig] Next meeting: March 22, 15utc In-Reply-To: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org> References: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org> Message-ID: Here is the summary of our SIG meeting today. We discussed our next OpenInfra Live episode on April 6, featuring Societe Generale. We also decided to alter IRC meeting times to account for DST and make them slightly friendlier to our APAC friends. You can read the detailed meeting logs at: https://meetings.opendev.org/meetings/large_scale_sig/2023/large_scale_sig.2023-03-22-15.01.html Our next IRC meeting will be April 19, 14:00UTC on #openstack-operators on OFTC. Regards, -- Thierry Carrez (ttx) From gmann at ghanshyammann.com Wed Mar 22 15:45:49 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 08:45:49 -0700 Subject: [ptl][tc] OpenStack packages PyPi additional external maintainers audit & cleanup In-Reply-To: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com> References: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com> Message-ID: <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com> ---- On Fri, 20 Jan 2023 15:36:08 -0800 Ghanshyam Mann wrote --- > Hi PTLs, > > As you might know or have seen for your project package on PyPi, OpenStack deliverables on PyPi have > additional maintainers, For example, https://pypi.org/project/murano/, https://pypi.org/project/glance/ > > We should keep only 'openstackci' as a maintainer in PyPi so that releases of OpenStack deliverables > can be managed in a single place. Otherwise, we might face the two sets of maintainers' places and > packages might get released in PyPi by additional maintainers without the OpenStack project team > knowing about it. One such case is in Horizon repo 'xstatic-font-awesome' where a new maintainer is > added by an existing additional maintainer and this package was released without the Horizon team > knowing about the changes and release. > - https://github.com/openstack/xstatic-font-awesome/pull/2 > > To avoid the 'xstatic-font-awesome' case for other packages, TC discussed it in their weekly meetings[1] > and agreed to audit all the OpenStack packages and then clean up the additional maintainers in PyPi > (keep only 'openstackci' as maintainers). > > To help in this task, TC requests project PTL to perform the audit for their project's repo and add comments > in the below etherpad. > > - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup Hello Everyone, To update, there is an extra step for project PTLs in this task: * Step 1.1: Project PTL/team needs to communicate to the additional maintainers about removing themselves and transferring ownership to 'openstackci' - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L23 Initially, TC thought we could do a cleanup with the help of openstackci admin for all repo. But, to avoid any issue or misunderstanding/panic among additional maintainers on removal, it is better that projects communicate with additional maintainers and ask them to remove themself. JayF sent the email format to communicate to additional maintainers[1]. Please use that and let TC know if any queries/issues you are facing. [1] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032780.html -gmann > > Thanks to knikolla to automate the listing of the OpenStack packages with additional maintainers in PyPi which > you can find the result in output.txt at the bottom of this link. I have added the project list of who needs to check > their repo in etherpad. > > - https://gist.github.com/knikolla/7303a65a5ddaa2be553fc6e54619a7a1 > > Please complete the audit for your project before March 15 so that TC can discuss the next step in vPTG. > > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-01-11-16.00.log.html#l-41 > > > -gmann > > From gmann at ghanshyammann.com Wed Mar 22 15:57:45 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 08:57:45 -0700 Subject: [ptl][tc][ops][ptg] Operator + Developers interaction (operator-hours) slots in 2023.2 Bobcat PTG In-Reply-To: <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com> References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com> <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com> Message-ID: <1870a0a60f6.10ebbdd99985795.2857132943834492770@ghanshyammann.com> ---- On Tue, 21 Mar 2023 14:40:42 -0700 Felipe Reyes wrote --- > Hi Ghanshyam, > > On Fri, 2023-03-17 at 14:19 -0700, Ghanshyam Mann wrote: > > Hello Everyone/PTL, > > > > To improve the interaction/feedback between operators and developers, one of the efforts is to > > schedule > > the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, > > which had mixed > > productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in > > March > > vPTG also. > > At OpenStack-charms project we thought it was a good idea, can we get the track 'operator-hour- > openstackcharms' registered? Hi Felipe, Just in case you have noticed in IRC, 'operator-hour-openstackcharms' track is now registered (thanks fungi), you can book the slot. -gmann > > Thanks, > > -- > Felipe Reyes > Software Engineer @ Canonical > felipe.reyes at canonical.com (GPG:0x9B1FFF39) > Launchpad: ~freyes | IRC: freyes > > > From elod.illes at est.tech Wed Mar 22 15:58:16 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Wed, 22 Mar 2023 15:58:16 +0000 Subject: OpenStack 2023.1 Antelope is officially released! In-Reply-To: References: Message-ID: Let me join and thank to all who were part of the 2023.1 Antelope development cycle! Also note, that this marks the official opening of the openstack/releases repository for 2023.2 Bobcat releases, and freezes are now lifted. stable/2023.1 is now a fully normal stable branch, and the normal stable policy applies from now on. Thanks, El?d Ill?s ________________________________ From: Herve Beraud Sent: Wednesday, March 22, 2023 4:18 PM To: openstack-discuss Subject: OpenStack 2023.1 Antelope is officially released! Hello OpenStack community, I'm excited to announce the final releases for the components of OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope development cycle. You will find a complete list of all components, their latest versions, and links to individual project release notes documents listed on the new release site. https://releases.openstack.org/antelope/ Congratulations to all of the teams who have contributed to this release! Our next production cycle, 2023.2 Bobcat, has already started. We will meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan the work for the upcoming cycle. I hope to see you there! Thanks, OpenStack Release Management team -- Herv? Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Wed Mar 22 16:03:47 2023 From: thierry at openstack.org (Thierry Carrez) Date: Wed, 22 Mar 2023 17:03:47 +0100 Subject: OpenStack 2023.1 Antelope is officially released! In-Reply-To: References: Message-ID: <2814547e-1463-22c9-8c22-61b52ca69d2d@openstack.org> Woohoo! El?d Ill?s wrote: > Let me join and thank to all who were part of the 2023.1 Antelope > development cycle! Also note, that this marks the official opening of > the openstack/releases repository for 2023.2 Bobcat releases, and > freezes are now lifted. stable/2023.1 is now a fully normal stable > branch, and the normal stable policy applies from now on. Thanks, El?d > Ill?s > > > ------------------------------------------------------------------------ > *From:* Herve Beraud > *Sent:* Wednesday, March 22, 2023 4:18 PM > *To:* openstack-discuss > *Subject:* OpenStack 2023.1 Antelope is officially released! > > Hello OpenStack community, > > I'm excited to announce the final releases for the components of > OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope > development cycle. > > You will find a complete list of all components, their latest > versions, and links to individual project release notes documents > listed on the new release site. > > https://releases.openstack.org/antelope/ > > Congratulations to all of the teams who have contributed to this > release! > > Our next production cycle, 2023.2 Bobcat, has already started. We will > meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan > the work for the upcoming cycle. I hope to see you there! > > Thanks, > > OpenStack Release Management team > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > -- Thierry Carrez (ttx) From amy at demarco.com Wed Mar 22 16:19:14 2023 From: amy at demarco.com (Amy Marrich) Date: Wed, 22 Mar 2023 11:19:14 -0500 Subject: OpenStack 2023.1 Antelope is officially released! In-Reply-To: References: Message-ID: Congrats everyone!! Amy(spotz) On Wed, Mar 22, 2023 at 10:25?AM Herve Beraud wrote: > > Hello OpenStack community, > > I'm excited to announce the final releases for the components of > OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope > development cycle. > > You will find a complete list of all components, their latest > versions, and links to individual project release notes documents > listed on the new release site. > > https://releases.openstack.org/antelope/ > > Congratulations to all of the teams who have contributed to this > release! > > Our next production cycle, 2023.2 Bobcat, has already started. We will > meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan > the work for the upcoming cycle. I hope to see you there! > > Thanks, > > OpenStack Release Management team > > -- > Herv? Beraud > Senior Software Engineer at Red Hat > irc: hberaud > https://github.com/4383/ > From jay at gr-oss.io Wed Mar 22 16:19:38 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Wed, 22 Mar 2023 09:19:38 -0700 Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra maintainers In-Reply-To: References: Message-ID: Hey all, Wanted to remind you all: vPTG is a great time to address this issue! Even if the PyPI maintainers you would need to contact are emeritus contributors; you may have someone still on the project team who has contact with them. I strongly recommend you utilize this time to help clean your projects up. Thanks, Jay Faulkner TC Vice-Chair On Tue, Mar 21, 2023 at 9:03?AM Jay Faulkner wrote: > Thanks to those who have already taken action! Fifty extra maintainers > have already been removed, with around three hundred to go. > > Please reach out to me if you're having trouble finding current email > addresses for anyone, or having trouble with the process at all. > > Thanks, > Jay Faulkner > TC Vice-Chair > > > On Thu, Mar 16, 2023 at 3:22?PM Jay Faulkner wrote: > >> Hi PTLs, >> >> The TC recently voted[1] to require humans be removed from PyPI access >> for OpenStack-managed projects. This helps ensure all releases are created >> via releases team tooling and makes it less likely for a user account >> compromise to impact OpenStack packages. >> >> Many projects have already updated >> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33 >> with a list of packages that contain extra maintainers. We'd like to >> request that PTLs, or their designate, reach out to any extra maintainers >> listed for projects you are responsible for and request they remove their >> access in accordance with policy. An example email, and detailed steps to >> follow have been provided at >> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template >> . >> >> Thank you for your cooperation as we work to improve our security posture >> and harden against supply chain attacks. >> >> Thank you, >> Jay Faulkner >> TC Vice-Chair >> >> 1: >> https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristin at openinfra.dev Wed Mar 22 16:33:23 2023 From: kristin at openinfra.dev (Kristin Barrientos) Date: Wed, 22 Mar 2023 11:33:23 -0500 Subject: OpenInfra Live episode: March 23, 2023, at 9 a.m. CT (14:00 UTC) Message-ID: Hi everyone, This week?s OpenInfra Live episode is brought to you by the OpenStack community! Episode: OpenStack Antelope: A New Era The OpenStack community released Antelope, the 27th version of the world?s most widely deployed open source cloud infrastructure software, this week. Join us to learn about the latest from community leaders about what was delivered in Antelope and what we can expect in Bobcat, OpenStack's 28th release targeting October 2023. Speakers: Carlos Silva, Rajat Dhasmana, Sylvain Bauza, Jay Faulkner, Kendall Nelson Date and time: March 23, 2023, at 9 a.m. CT (14:00 UTC) You can watch us live on: YouTube: https://www.youtube.com/watch?v=YdLTUTyJ1eU LinkedIn: https://www.linkedin.com/events/7042534262494941185/comments/ WeChat: recording will be posted on OpenStack WeChat after the live stream Have an idea for a future episode? Share it now at ideas.openinfra.live. Thanks, Kristin Barrientos Marketing Coordinator OpenInfra Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephenfin at redhat.com Wed Mar 22 16:38:06 2023 From: stephenfin at redhat.com (Stephen Finucane) Date: Wed, 22 Mar 2023 16:38:06 +0000 Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will remove sqlalchemy-migrate support Message-ID: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to start their switch to alembic immediately. Projects with "legacy" sqlalchemy-migrated based migrations need to drop them. A quick heads up that oslo.db 13.0.0 will be release in the next month or so and will remove sqlalchemy-migrate support and formally add support for sqlalchemy 2.x. The removal of sqlalchemy-migrate support should only affect projects using oslo.db's sqlalchemy-migrate wrappers, as opposed to using sqlalchemy-migrate directly. For any projects that rely on this functionality, a short-term fix is to vendor the removed code [1] in your project. However, I must emphasise that we're not removing sqlalchemy-migrate integration for the fun of it: it's not compatible with sqlalchemy 2.x and is no longer maintained. If your project uses sqlalchemy-migrate and you haven't migrated to alembic yet, you need to start doing so immediately. If you have migrated to alembic but still have sqlalchemy- migrate "legacy" migrations in-tree, you need to look at dropping these asap. Anything less will result in broken master when we bump upper-constraints to allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that appear to be using the removed modules. For more advice on migrating to sqlalchemy 2.x and alembic, please look at my previous post on the matter [2]. Cheers, Stephen [1] https://review.opendev.org/c/openstack/oslo.db/+/853025 [2] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html From allison at openinfra.dev Wed Mar 22 16:40:54 2023 From: allison at openinfra.dev (Allison Price) Date: Wed, 22 Mar 2023 11:40:54 -0500 Subject: OpenStack 2023.1 Antelope is officially released! In-Reply-To: References: Message-ID: <2F187FB1-D47C-4288-BD04-8EEFED614FF2@openinfra.dev> Congratulations everyone! Thank you for all of your contributions! I hope everyone has a chance in your part of the world to celebrate for a minute or two today :) > On Mar 22, 2023, at 11:19 AM, Amy Marrich wrote: > > Congrats everyone!! > > Amy(spotz) > > On Wed, Mar 22, 2023 at 10:25?AM Herve Beraud wrote: >> >> Hello OpenStack community, >> >> I'm excited to announce the final releases for the components of >> OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope >> development cycle. >> >> You will find a complete list of all components, their latest >> versions, and links to individual project release notes documents >> listed on the new release site. >> >> https://releases.openstack.org/antelope/ >> >> Congratulations to all of the teams who have contributed to this >> release! >> >> Our next production cycle, 2023.2 Bobcat, has already started. We will >> meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan >> the work for the upcoming cycle. I hope to see you there! >> >> Thanks, >> >> OpenStack Release Management team >> >> -- >> Herv? Beraud >> Senior Software Engineer at Red Hat >> irc: hberaud >> https://github.com/4383/ >> > From ihrachys at redhat.com Wed Mar 22 16:55:05 2023 From: ihrachys at redhat.com (Ihar Hrachyshka) Date: Wed, 22 Mar 2023 12:55:05 -0400 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: References: <3840757.STTH5IQzZg@p1> Message-ID: On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez wrote: > > Hello: > > I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly. Thanks for this, I didn't realize that iptables may be considered prior art. > > The discussion here could be the default behaviour for standard services: > * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that. > * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance. Agreed DHCPv6 rules are closer to "base" and that the argument for RA / NA flows is stronger because of the parallel to DHCPv4 operation. > * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2] At this point I am more ambivalent to the decision of whether to include metadata into the list of "base" services, as long as we define the list (behavior) in api-ref. But to address the point, since Slawek leans to creating SG rules in Neutron API to handle ICMP traffic necessary for RA / NA (which seems to have a merit and internal logic) anyway, we could as well at this point create another "default" rule for metadata replies. But - I will repeat - as long as a decision on what the list of "base" services enabled for any SG by default is, I can live with metadata out of the list. It may not be as convenient to users (which is my concern), but that's probably a matter of taste in API design. BTW Rodolfo, thanks for allocating a time slot for this discussion at vPTG. I hope we get to the bottom of it then. See you all next Wed @13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg) Ihar > > Regards. > > [1]https://review.opendev.org/c/openstack/neutron/+/877049 > [2]https://review.opendev.org/c/openstack/neutron/+/876659 > > On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka wrote: >> >> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski wrote: >> > >> > Hi, >> > >> > >> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: >> > >> > > Hi all, >> > >> > > >> > >> > > (I've tagged the thread with [ovn] because this question was raised in >> > >> > > the context of OVN, but it really is about the intent of neutron >> > >> > > stateless SG API.) >> > >> > > >> > >> > > Neutron API supports 'stateless' field for security groups: >> > >> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group >> > >> > > >> > >> > > The API reference doesn't explain the intent of the API, merely >> > >> > > walking through the field mechanics, as in >> > >> > > >> > >> > > "The stateful security group extension (stateful-security-group) adds >> > >> > > the stateful field to security groups, allowing users to configure >> > >> > > stateful or stateless security groups for ports. The existing security >> > >> > > groups will all be considered as stateful. Update of the stateful >> > >> > > attribute is allowed when there is no port associated with the >> > >> > > security group." >> > >> > > >> > >> > > The meaning of the API is left for users to deduce. It's customary >> > >> > > understood as something like >> > >> > > >> > >> > > "allowing to bypass connection tracking in the firewall, potentially >> > >> > > providing performance and simplicity benefits" (while imposing >> > >> > > additional complexity onto rule definitions - the user now has to >> > >> > > explicitly define rules for both directions of a duplex connection.) >> > >> > > [This is not an official definition, nor it's quoted from a respected >> > >> > > source, please don't criticize it. I don't think this is an important >> > >> > > point here.] >> > >> > > >> > >> > > Either way, the definition doesn't explain what should happen with >> > >> > > basic network services that a user of Neutron SG API is used to rely >> > >> > > on. Specifically, what happens for a port related to a stateless SG >> > >> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 >> > >> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 >> > >> > > procedure to configure its IPv6 stack. >> > >> > > >> > >> > > As part of our testing of stateless SG implementation for OVN backend, >> > >> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to >> > >> > > configure IPv6. >> > >> > > >> > >> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 >> > >> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 >> > >> > > >> > >> > > We've noticed that adding explicit SG rules to allow 'returning' >> > >> > > communication for 169.254.169.254:80 and RA / NA fixes the problem. >> > >> > > >> > >> > > I figured that these services are "base" / "basic" and should be >> > >> > > provided to ports regardless of the stateful-ness of SG. I proposed >> > >> > > patches for this here: >> > >> > > >> > >> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 >> > >> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 >> > >> > > >> > >> > > Discussion in the patch that adjusts the existing stateless SG test >> > >> > > scenarios to not create explicit SG rules for metadata and ICMP >> > >> > > replies suggests that it's not a given / common understanding that >> > >> > > these "base" services should work by default for stateless SGs. >> > >> > > >> > >> > > See discussion in comments here: >> > >> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 >> > >> > > >> > >> > > While this discussion is happening in the context of OVN, I think it >> > >> > > should be resolved in a broader context. Specifically, a decision >> > >> > > should be made about what Neutron API "means" by stateless SGs, and >> > >> > > how "base" services are supposed to behave. Then backends can act >> > >> > > accordingly. >> > >> > > >> > >> > > There's also an open question of how this should be implemented. >> > >> > > Whether Neutron would like to create explicit SG rules visible in API >> > >> > > that would allow for the returning traffic and that could be deleted >> > >> > > as needed, or whether backends should do it implicitly. We already >> > >> > > have "default" egress rules, so there's a precedent here. On the other >> > >> > > hand, the egress rules are broad (allowing everything) and there's >> > >> > > more rationale to delete them and replace them with tighter filters. >> > >> > > In my OVN series, I implement ACLs directly in OVN database, without >> > >> > > creating SG rules in Neutron API. >> > >> > > >> > >> > > So, questions for the community to clarify: >> > >> > > - whether Neutron API should define behavior of stateless SGs in general, >> > >> > > - if so, whether Neutron API should also define behavior of stateless >> > >> > > SGs in terms of "base" services like metadata and DHCP, >> > >> > > - if so, whether backends should implement the necessary filters >> > >> > > themselves, or Neutron will create default SG rules itself. >> > >> > >> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. >> > >> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. >> > >> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there? >> > >> >> Thanks for clarifying the rationale on picking SG rules and not >> per-backend implementation. >> >> What would be your answer to the two other questions in the list >> above, specifically, "whether Neutron API should define behavior of >> stateless SGs in general" and "whether Neutron API should define >> behavior of stateless SGs in relation to metadata / RA / NA". Once we >> have agreement on these points, we can discuss the exact mechanism - >> whether to implement in backend or in API. But these two questions are >> first order in my view. >> >> (To give an idea of my thinking, I believe API definition should not >> only define fields and their mechanics but also semantics, so >> >> - yes, api-ref should define the meaning ("behavior") of stateless SG >> in general, and >> - yes, api-ref should also define the meaning ("behavior") of >> stateless SG in relation to "standard" services like ipv6 addressing >> or metadata. >> >> As to the last question - whether it's up to ml2 backend to implement >> the behavior, or up to the core SG database plugin - I don't have a >> strong opinion. I lean to "backend" solution just because it allows >> for more granular definition because SG rules may not express some >> filter rules, e.g. source port for metadata replies (an unfortunate >> limitation of SG API that we inherited from AWS?). But perhaps others >> prefer paying the price for having neutron ml2 plugin enforcing the >> behavior consistently across all backends. >> >> > >> > > >> > >> > > I hope I laid the problem out clearly, let me know if anything needs >> > >> > > clarification or explanation. >> > >> > >> > Yes :) At least for me. >> > >> > >> > > >> > >> > > Yours, >> > >> > > Ihar >> > >> > > >> > >> > > >> > >> > > >> > >> > >> > >> > -- >> > >> > Slawek Kaplonski >> > >> > Principal Software Engineer >> > >> > Red Hat >> >> From gmann at ghanshyammann.com Wed Mar 22 17:09:03 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 10:09:03 -0700 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> References: <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com> <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> Message-ID: <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> ---- On Fri, 10 Mar 2023 12:55:49 -0800 Ghanshyam Mann wrote --- > ---- On Wed, 22 Feb 2023 10:13:32 -0800 James Slagle wrote --- > > On Wed, Feb 22, 2023 at 12:43 PM Ghanshyam Mann gmann at ghanshyammann.com> wrote: > > > Hi James, > > > > > > Just checking if you got a chance to discuss this with the TripleO team? > > > > Yes, I asked folks to reply here if there are any volunteers for > > stable/zed maintenance, or any other feedback about the approach. I do > > not personally know of any volunteers. > > Ok. We discussed the stable/zed case in the TC meeting and decided[1] to keep stable/zed as 'supported > but no maintainers' (will update this information in stable/zed README.rst file). > > For the master branch, you can follow the normal deprecation process mentioned in the project-team-guide[2]. > I have proposed step 1 in governance to mark it deprecated, please check and we need PTL +1 > on that. > > - https://review.opendev.org/c/openstack/governance/+/877132 > > NOTE: As this is deprecated and not retired yet, we still need PTL nomination for TrilpeO[3] Hi James, TripleO team, Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects - https://etherpad.opendev.org/p/2023.2-leaderless -gmann > > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.log.html#l-256 > [2] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository > [3] https://etherpad.opendev.org/p/2023.2-leaderless#L26 > > -gmann' > > > > > -- > > -- James Slagle > > -- > > > > From swogatpradhan22 at gmail.com Wed Mar 22 15:37:28 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Wed, 22 Mar 2023 21:07:28 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Adam, The systems are in same LAN, in this case it seemed like the image was getting pulled from the central site which was caused due to an misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ directory, which seems to have been resolved after the changes i made to fix it. Right now the glance api podman is running in unhealthy state and the podman logs don't show any error whatsoever and when issued the command netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn site, which is why cinder is throwing an error stating: 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error finding address for http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: Unable to establish connection to http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] ECONNREFUSED',)) Now i need to find out why the port is not listed as the glance service is running, which i am not sure how to find out. With regards, Swogat Pradhan On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop wrote: > > > On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan > wrote: > >> Update: >> Here is the log when creating a volume using cirros image: >> >> 2023-03-22 11:04:38.449 109 INFO >> cinder.volume.flows.manager.create_volume >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >> specification: {'status': 'creating', 'volume_name': >> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >> tzinfo=datetime.timezone.utc), 'locations': [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> 'metadata': {'store': 'dcn02'}}], 'direct_url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >> 'owner_specified.openstack.object': 'images/cirros', >> 'owner_specified.openstack.sha256': ''}}, 'image_service': >> } >> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >> > > As Adam Savage would say, well there's your problem ^^ (Image download > 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s > suggests you have a network issue. > > John Fulton previously stated your cinder-volume service at the edge site > is not using the local ceph image store. Assuming you are deploying > GlanceApiEdge service [1], then the cinder-volume service should be > configured to use the local glance service [2]. You should check cinder's > glance_api_servers to confirm it's the edge site's glance service. > > [1] > https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 > [2] > https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 > > Alan > > >> 2023-03-22 11:07:54.023 109 WARNING py.warnings >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] >> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >> FutureWarning: The human format is deprecated and the format parameter will >> be removed. Use explicitly json instead in version 'xena' >> category=FutureWarning) >> >> 2023-03-22 11:11:12.161 109 WARNING py.warnings >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] >> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >> FutureWarning: The human format is deprecated and the format parameter will >> be removed. Use explicitly json instead in version 'xena' >> category=FutureWarning) >> >> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >> MB/s >> 2023-03-22 11:11:14.998 109 INFO >> cinder.volume.flows.manager.create_volume >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >> >> The image is present in dcn02 store but still it downloaded the image in >> 0.16 MB/s and then created the volume. >> >> With regards, >> Swogat Pradhan >> >> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan >> wrote: >> >>> Hi Jhon, >>> This seems to be an issue. >>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>> parameter was specified to the respective cluster names but the config >>> files were created in the name of ceph.conf and keyring was >>> ceph.client.openstack.keyring. >>> >>> Which created issues in glance as well as the naming convention of the >>> files didn't match the cluster names, so i had to manually rename the >>> central ceph conf file as such: >>> >>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>> [root at dcn02-compute-0 ceph]# ll >>> total 16 >>> -rw-------. 1 root root 257 Mar 13 13:56 >>> ceph_central.client.openstack.keyring >>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>> [root at dcn02-compute-0 ceph]# >>> >>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>> respective clusters in both dcn01 and dcn02. >>> In the above cli output, the ceph.conf and ceph.client... are the files >>> used to access dcn02 ceph cluster and ceph_central* files are used in for >>> accessing central ceph cluster. >>> >>> glance multistore config: >>> [dcn02] >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=dcn02 rbd glance store >>> >>> [ceph_central] >>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=Default glance store backend. >>> >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton wrote: >>> >>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>> wrote: >>>> > >>>> > Hi, >>>> > Seems like cinder is not using the local ceph. >>>> >>>> That explains the issue. It's a misconfiguration. >>>> >>>> I hope this is not a production system since the mailing list now has >>>> the cinder.conf which contains passwords. >>>> >>>> The section that looks like this: >>>> >>>> [tripleo_ceph] >>>> volume_backend_name=tripleo_ceph >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>> rbd_user=openstack >>>> rbd_pool=volumes >>>> rbd_flatten_volume_from_snapshot=False >>>> rbd_secret_uuid= >>>> report_discard_supported=True >>>> >>>> Should be updated to refer to the local DCN ceph cluster and not the >>>> central one. Use the ceph conf file for that cluster and ensure the >>>> rbd_secret_uuid corresponds to that one. >>>> >>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>> secret-get-value $FSID`. >>>> >>>> The documentation describes how to configure the central and DCN sites >>>> correctly but an error seems to have occurred while you were following >>>> it. >>>> >>>> >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>> >>>> John >>>> >>>> > >>>> > Ceph Output: >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>> > NAME SIZE PARENT FMT >>>> PROT LOCK >>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>> excl >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >>>> > >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>> > NAME SIZE PARENT FMT >>>> PROT LOCK >>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>> > >>>> > Attached the cinder config. >>>> > Please let me know how I can solve this issue. >>>> > >>>> > With regards, >>>> > Swogat Pradhan >>>> > >>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>> wrote: >>>> >> >>>> >> in my last message under the line "On a DCN site if you run a >>>> command like this:" I suggested some steps you could try to confirm the >>>> image is a COW from the local glance as well as how to look at your cinder >>>> config. >>>> >> >>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>> >>>> >>> Update: >>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>> around 10,15 minutes to create a volume with image in dcn02. >>>> >>> The image size is 389 MB. >>>> >>> >>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>> >>>> >>>> Hi Jhon, >>>> >>>> I checked in the ceph od dcn02, I can see the images created after >>>> importing from the central site. >>>> >>>> But launching an instance normally fails as it takes a long time >>>> for the volume to get created. >>>> >>>> >>>> >>>> When launching an instance from volume the instance is getting >>>> created properly without any errors. >>>> >>>> >>>> >>>> I tried to cache images in nova using >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>> but getting checksum failed error. >>>> >>>> >>>> >>>> With regards, >>>> >>>> Swogat Pradhan >>>> >>>> >>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >>>> wrote: >>>> >>>>> >>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>> >>>>> wrote: >>>> >>>>> > >>>> >>>>> > Update: After restarting the nova services on the controller >>>> and running the deploy script on the edge site, I was able to launch the VM >>>> from volume. >>>> >>>>> > >>>> >>>>> > Right now the instance creation is failing as the block device >>>> creation is stuck in creating state, it is taking more than 10 mins for the >>>> volume to be created, whereas the image has already been imported to the >>>> edge glance. >>>> >>>>> >>>> >>>>> Try following this document and making the same observations in >>>> your >>>> >>>>> environment for AZs and their local ceph cluster. >>>> >>>>> >>>> >>>>> >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>> >>>>> >>>> >>>>> On a DCN site if you run a command like this: >>>> >>>>> >>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>> >>>>> NAME SIZE PARENT >>>> >>>>> FMT PROT LOCK >>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>> >>>>> $ >>>> >>>>> >>>> >>>>> Then, you should see the parent of the volume is the image which >>>> is on >>>> >>>>> the same local ceph cluster. >>>> >>>>> >>>> >>>>> I wonder if something is misconfigured and thus you're >>>> encountering >>>> >>>>> the streaming behavior described here: >>>> >>>>> >>>> >>>>> Ideally all images should reside in the central Glance and be >>>> copied >>>> >>>>> to DCN sites before instances of those images are booted on DCN >>>> sites. >>>> >>>>> If an image is not copied to a DCN site before it is booted, then >>>> the >>>> >>>>> image will be streamed to the DCN site and then the image will >>>> boot as >>>> >>>>> an instance. This happens because Glance at the DCN site has >>>> access to >>>> >>>>> the images store at the Central ceph cluster. Though the booting >>>> of >>>> >>>>> the image will take time because it has not been copied in >>>> advance, >>>> >>>>> this is still preferable to failing to boot the image. >>>> >>>>> >>>> >>>>> You can also exec into the cinder container at the DCN site and >>>> >>>>> confirm it's using it's local ceph cluster. >>>> >>>>> >>>> >>>>> John >>>> >>>>> >>>> >>>>> > >>>> >>>>> > I will try and create a new fresh image and test again then >>>> update. >>>> >>>>> > >>>> >>>>> > With regards, >>>> >>>>> > Swogat Pradhan >>>> >>>>> > >>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> >> >>>> >>>>> >> Update: >>>> >>>>> >> In the hypervisor list the compute node state is showing down. >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> >>> >>>> >>>>> >>> Hi Brendan, >>>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>> >>>>> >>> I used a cirros image to launch instance but the instance >>>> timed out so i waited for the volume to be created. >>>> >>>>> >>> Once the volume was created i tried launching the instance >>>> from the volume and still the instance is stuck in spawning state. >>>> >>>>> >>> >>>> >>>>> >>> Here is the nova-compute log: >>>> >>>>> >>> >>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >>>> privsep daemon starting >>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >>>> privsep process running with uid/gid: 0/0 >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>> privsep process running with capabilities (eff/prm/inh): >>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>> privsep daemon running as pid 185437 >>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>> os_brick.initiator.connectors.nvmeof >>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>> in _get_host_uuid: Unexpected error while running command. >>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>> >>>>> >>> Exit code: 2 >>>> >>>>> >>> Stdout: '' >>>> >>>>> >>> Stderr: '': >>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>> running command. >>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>> >>>>> >>> >>>> >>>>> >>> It is stuck in creating image, do i need to run the template >>>> mentioned here ?: >>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>> >>>>> >>> >>>> >>>>> >>> The volume is already created and i do not understand why the >>>> instance is stuck in spawning state. >>>> >>>>> >>> >>>> >>>>> >>> With regards, >>>> >>>>> >>> Swogat Pradhan >>>> >>>>> >>> >>>> >>>>> >>> >>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>> bshephar at redhat.com> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Does your environment use different network interfaces for >>>> each of the networks? Or does it have a bond with everything on it? >>>> >>>>> >>>> >>>> >>>>> >>>> One issue I have seen before is that when launching >>>> instances, there is a lot of network traffic between nodes as the >>>> hypervisor needs to download the image from Glance. Along with various >>>> other services sending normal network traffic, it can be enough to cause >>>> issues if everything is running over a single 1Gbe interface. >>>> >>>>> >>>> >>>> >>>>> >>>> I have seen the same situation in fact when using a single >>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>>> while you try to spawn the instance to see if you?re dropping packets. In >>>> the situation I described, there were dropped packets which resulted in a >>>> loss of communication between nova_compute and RMQ, so the node appeared >>>> offline. You should also confirm that nova_compute is being disconnected in >>>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>>> instance. >>>> >>>>> >>>> >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, >>>> based on that experience, from my perspective, is certainly sounds like >>>> some kind of network issue. >>>> >>>>> >>>> >>>> >>>>> >>>> Regards, >>>> >>>>> >>>> >>>> >>>>> >>>> Brendan Shephard >>>> >>>>> >>>> Senior Software Engineer >>>> >>>>> >>>> Red Hat Australia >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago >>>> in this thread: >>>> >>>>> >>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>> >>>>> >>>> >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>>> user, not sure if that could apply here. But is it possible that your nova >>>> and neutron versions are different between central and edge site? Have you >>>> restarted nova and neutron services on the compute nodes after >>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>> Maybe they can help narrow down the issue. >>>> >>>>> >>>> If there isn't any additional information in the debug logs >>>> I probably would start "tearing down" rabbitmq. I didn't have to do that in >>>> a production system yet so be careful. I can think of two routes: >>>> >>>>> >>>> >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>> running, this will most likely impact client IO depending on your load. >>>> Check out the rabbitmqctl commands. >>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables >>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>> >>>>> >>>> >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being >>>> replicated across the rabbit nodes. But I don't really know the rabbit >>>> internals too well, so maybe someone else can chime in here and give a >>>> better advice. >>>> >>>>> >>>> >>>> >>>>> >>>> Regards, >>>> >>>>> >>>> Eugen >>>> >>>>> >>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> Can someone please help me out on this issue? >>>> >>>>> >>>> >>>> >>>>> >>>> With regards, >>>> >>>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi >>>> >>>>> >>>> I don't see any major packet loss. >>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not >>>> due to packet >>>> >>>>> >>>> loss. >>>> >>>>> >>>> >>>> >>>>> >>>> with regards, >>>> >>>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> Hi, >>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked >>>> when >>>> >>>>> >>>> launching the instance. >>>> >>>>> >>>> I will check that and come back. >>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck >>>> at spawning >>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if >>>> packet loss >>>> >>>>> >>>> causes this. >>>> >>>>> >>>> >>>> >>>>> >>>> With regards, >>>> >>>>> >>>> Swogat pradhan >>>> >>>>> >>>> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>>> wrote: >>>> >>>>> >>>> >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>> identical between >>>> >>>>> >>>> central and edge site? Do you see packet loss through the >>>> tunnel? >>>> >>>>> >>>> >>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>> >>>>> >>>> >>>> >>>>> >>>> > Hi Eugen, >>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' >>>> as i am not >>>> >>>>> >>>> > getting email's from you. >>>> >>>>> >>>> > Coming to the issue: >>>> >>>>> >>>> > >>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>> list_policies -p >>>> >>>>> >>>> / >>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>> >>>>> >>>> > vhost name pattern apply-to definition >>>> priority >>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>> >>>>> >>>> > >>>> >>>>> >>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>> >>>>> >>>> > >>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >>>> when i am >>>> >>>>> >>>> trying >>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning >>>> state and >>>> >>>>> >>>> then >>>> >>>>> >>>> > gets stuck. >>>> >>>>> >>>> > >>>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>>> sites. >>>> >>>>> >>>> > >>>> >>>>> >>>> > With regards, >>>> >>>>> >>>> > Swogat Pradhan >>>> >>>>> >>>> > >>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> > wrote: >>>> >>>>> >>>> > >>>> >>>>> >>>> >> Hi Eugen, >>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>> directly, i am >>>> >>>>> >>>> checking >>>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred. >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>> activities in the >>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> With regards, >>>> >>>>> >>>> >> Swogat Pradhan >>>> >>>>> >>>> >> >>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> >> wrote: >>>> >>>>> >>>> >> >>>> >>>>> >>>> >>> Hi Eugen, >>>> >>>>> >>>> >>> Thanks for your response. >>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>>> details: >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *PCS Status:* >>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>> >>>>> >>>> >>> >>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>> (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>> (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-2 >>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>> (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-1 >>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>> (ocf::heartbeat:rabbitmq-cluster): >>>> >>>>> >>>> Started >>>> >>>>> >>>> >>> overcloud-controller-0 >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but >>>> the issue is >>>> >>>>> >>>> still >>>> >>>>> >>>> >>> present. >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *Cluster status:* >>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>> cluster_status >>>> >>>>> >>>> >>> Cluster status of node >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> ... >>>> >>>>> >>>> >>> Basics >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Cluster name: >>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Disk Nodes >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Running Nodes >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Versions >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>> RabbitMQ >>>> >>>>> >>>> 3.8.3 >>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>> >>>>> >>>> RabbitMQ >>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Alarms >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> (none) >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Network Partitions >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> (none) >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Listeners >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>> inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: >>>> AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>> inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: >>>> AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>> inter-node and CLI >>>> >>>>> >>>> tool >>>> >>>>> >>>> >>> communication >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: >>>> AMQP 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>> >>>>> >>>> interface: >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>> purpose: >>>> >>>>> >>>> inter-node and >>>> >>>>> >>>> >>> CLI tool communication >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>>> purpose: AMQP >>>> >>>>> >>>> 0-9-1 >>>> >>>>> >>>> >>> and AMQP 1.0 >>>> >>>>> >>>> >>> Node: >>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>> >>>>> >>>> , >>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >>>> HTTP API >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Feature flags >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> *Logs:* >>>> >>>>> >>>> >>> *(Attached)* >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> With regards, >>>> >>>>> >>>> >>> Swogat Pradhan >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>> >>>>> >>>> >>> wrote: >>>> >>>>> >>>> >>> >>>> >>>>> >>>> >>>> Hi, >>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> nova-conuctor: >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>> The reply >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after >>>> 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> The reply >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after >>>> 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> The reply >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after >>>> 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> Cache enabled >>>> >>>>> >>>> with >>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>> drop reply to >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>> oslo_messaging._drivers.amqpdriver >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>> The reply >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after >>>> 60 seconds >>>> >>>>> >>>> due to a >>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>> >>>>> >>>> Abandoning...: >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> With regards, >>>> >>>>> >>>> >>>> Swogat Pradhan >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>>> Hi, >>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i >>>> am trying to >>>> >>>>> >>>> >>>>> launch vm's. >>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>>> (openstack >>>> >>>>> >>>> compute >>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart >>>> the nova >>>> >>>>> >>>> compute >>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> nova-compute.log >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >>>> Running >>>> >>>>> >>>> >>>>> instance usage >>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>> 2023-02-26 07:00:00 >>>> >>>>> >>>> to >>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful >>>> on node >>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>> supplied device >>>> >>>>> >>>> name: >>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>>> volume >>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> Cache enabled >>>> >>>>> >>>> with >>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> Running >>>> >>>>> >>>> >>>>> privsep helper: >>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>> >>>>> >>>> 'privsep-helper', >>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>> '--config-file', >>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> Spawned new >>>> >>>>> >>>> privsep >>>> >>>>> >>>> >>>>> daemon via rootwrap >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon >>>> [-] privsep >>>> >>>>> >>>> >>>>> daemon starting >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon >>>> [-] privsep >>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>>> [-] privsep >>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>>> [-] privsep >>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> Process >>>> >>>>> >>>> >>>>> execution error >>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>> command. >>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>> >>>>> >>>> >>>>> Exit code: 2 >>>> >>>>> >>>> >>>>> Stdout: '' >>>> >>>>> >>>> >>>>> Stderr: '': >>>> oslo_concurrency.processutils.ProcessExecutionError: >>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>> [instance: >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> With regards, >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> Swogat Pradhan >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> >>>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Wed Mar 22 15:49:37 2023 From: abishop at redhat.com (Alan Bishop) Date: Wed, 22 Mar 2023 08:49:37 -0700 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan wrote: > Hi Adam, > The systems are in same LAN, in this case it seemed like the image was > getting pulled from the central site which was caused due to an > misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ > directory, which seems to have been resolved after the changes i made to > fix it. > > Right now the glance api podman is running in unhealthy state and the > podman logs don't show any error whatsoever and when issued the command > netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn > site, which is why cinder is throwing an error stating: > > 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server > cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error > finding address for > http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: > Unable to establish connection to > http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: > HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded > with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by > NewConnectionError(' 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] > ECONNREFUSED',)) > > Now i need to find out why the port is not listed as the glance service is > running, which i am not sure how to find out. > One other thing to investigate is whether your deployment includes this patch [1]. If it does, then bear in mind the glance-api service running at the edge site will be an "internal" (non public facing) instance that uses port 9293 instead of 9292. You should familiarize yourself with the release note [2]. [1] https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 [2] https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml Alan > With regards, > Swogat Pradhan > > On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop wrote: > >> >> >> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan >> wrote: >> >>> Update: >>> Here is the log when creating a volume using cirros image: >>> >>> 2023-03-22 11:04:38.449 109 INFO >>> cinder.volume.flows.manager.create_volume >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>> specification: {'status': 'creating', 'volume_name': >>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>> 'owner_specified.openstack.object': 'images/cirros', >>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>> } >>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>> >> >> As Adam Savage would say, well there's your problem ^^ (Image download >> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s >> suggests you have a network issue. >> >> John Fulton previously stated your cinder-volume service at the edge site >> is not using the local ceph image store. Assuming you are deploying >> GlanceApiEdge service [1], then the cinder-volume service should be >> configured to use the local glance service [2]. You should check cinder's >> glance_api_servers to confirm it's the edge site's glance service. >> >> [1] >> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >> [2] >> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >> >> Alan >> >> >>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] >>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>> FutureWarning: The human format is deprecated and the format parameter will >>> be removed. Use explicitly json instead in version 'xena' >>> category=FutureWarning) >>> >>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] >>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>> FutureWarning: The human format is deprecated and the format parameter will >>> be removed. Use explicitly json instead in version 'xena' >>> category=FutureWarning) >>> >>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>> MB/s >>> 2023-03-22 11:11:14.998 109 INFO >>> cinder.volume.flows.manager.create_volume >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>> >>> The image is present in dcn02 store but still it downloaded the image in >>> 0.16 MB/s and then created the volume. >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Hi Jhon, >>>> This seems to be an issue. >>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>> parameter was specified to the respective cluster names but the config >>>> files were created in the name of ceph.conf and keyring was >>>> ceph.client.openstack.keyring. >>>> >>>> Which created issues in glance as well as the naming convention of the >>>> files didn't match the cluster names, so i had to manually rename the >>>> central ceph conf file as such: >>>> >>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>> [root at dcn02-compute-0 ceph]# ll >>>> total 16 >>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>> ceph_central.client.openstack.keyring >>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>> [root at dcn02-compute-0 ceph]# >>>> >>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>>> respective clusters in both dcn01 and dcn02. >>>> In the above cli output, the ceph.conf and ceph.client... are the files >>>> used to access dcn02 ceph cluster and ceph_central* files are used in for >>>> accessing central ceph cluster. >>>> >>>> glance multistore config: >>>> [dcn02] >>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=dcn02 rbd glance store >>>> >>>> [ceph_central] >>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=Default glance store backend. >>>> >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>> wrote: >>>> >>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>> wrote: >>>>> > >>>>> > Hi, >>>>> > Seems like cinder is not using the local ceph. >>>>> >>>>> That explains the issue. It's a misconfiguration. >>>>> >>>>> I hope this is not a production system since the mailing list now has >>>>> the cinder.conf which contains passwords. >>>>> >>>>> The section that looks like this: >>>>> >>>>> [tripleo_ceph] >>>>> volume_backend_name=tripleo_ceph >>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>> rbd_user=openstack >>>>> rbd_pool=volumes >>>>> rbd_flatten_volume_from_snapshot=False >>>>> rbd_secret_uuid= >>>>> report_discard_supported=True >>>>> >>>>> Should be updated to refer to the local DCN ceph cluster and not the >>>>> central one. Use the ceph conf file for that cluster and ensure the >>>>> rbd_secret_uuid corresponds to that one. >>>>> >>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>> secret-get-value $FSID`. >>>>> >>>>> The documentation describes how to configure the central and DCN sites >>>>> correctly but an error seems to have occurred while you were following >>>>> it. >>>>> >>>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>> >>>>> John >>>>> >>>>> > >>>>> > Ceph Output: >>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>> > NAME SIZE PARENT FMT >>>>> PROT LOCK >>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>>> excl >>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 yes >>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 yes >>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 yes >>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 yes >>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 yes >>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 yes >>>>> > >>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>> > NAME SIZE PARENT FMT >>>>> PROT LOCK >>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>> > >>>>> > Attached the cinder config. >>>>> > Please let me know how I can solve this issue. >>>>> > >>>>> > With regards, >>>>> > Swogat Pradhan >>>>> > >>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>>> wrote: >>>>> >> >>>>> >> in my last message under the line "On a DCN site if you run a >>>>> command like this:" I suggested some steps you could try to confirm the >>>>> image is a COW from the local glance as well as how to look at your cinder >>>>> config. >>>>> >> >>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>> >>>>> >>> Update: >>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>> >>> The image size is 389 MB. >>>>> >>> >>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>> >>>>> >>>> Hi Jhon, >>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>> after importing from the central site. >>>>> >>>> But launching an instance normally fails as it takes a long time >>>>> for the volume to get created. >>>>> >>>> >>>>> >>>> When launching an instance from volume the instance is getting >>>>> created properly without any errors. >>>>> >>>> >>>>> >>>> I tried to cache images in nova using >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>> but getting checksum failed error. >>>>> >>>> >>>>> >>>> With regards, >>>>> >>>> Swogat Pradhan >>>>> >>>> >>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >>>>> wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>> >>>>> wrote: >>>>> >>>>> > >>>>> >>>>> > Update: After restarting the nova services on the controller >>>>> and running the deploy script on the edge site, I was able to launch the VM >>>>> from volume. >>>>> >>>>> > >>>>> >>>>> > Right now the instance creation is failing as the block device >>>>> creation is stuck in creating state, it is taking more than 10 mins for the >>>>> volume to be created, whereas the image has already been imported to the >>>>> edge glance. >>>>> >>>>> >>>>> >>>>> Try following this document and making the same observations in >>>>> your >>>>> >>>>> environment for AZs and their local ceph cluster. >>>>> >>>>> >>>>> >>>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>> >>>>> >>>>> >>>>> On a DCN site if you run a command like this: >>>>> >>>>> >>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>> >>>>> NAME SIZE PARENT >>>>> >>>>> FMT PROT LOCK >>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>>> >>>>> $ >>>>> >>>>> >>>>> >>>>> Then, you should see the parent of the volume is the image which >>>>> is on >>>>> >>>>> the same local ceph cluster. >>>>> >>>>> >>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>> encountering >>>>> >>>>> the streaming behavior described here: >>>>> >>>>> >>>>> >>>>> Ideally all images should reside in the central Glance and be >>>>> copied >>>>> >>>>> to DCN sites before instances of those images are booted on DCN >>>>> sites. >>>>> >>>>> If an image is not copied to a DCN site before it is booted, >>>>> then the >>>>> >>>>> image will be streamed to the DCN site and then the image will >>>>> boot as >>>>> >>>>> an instance. This happens because Glance at the DCN site has >>>>> access to >>>>> >>>>> the images store at the Central ceph cluster. Though the booting >>>>> of >>>>> >>>>> the image will take time because it has not been copied in >>>>> advance, >>>>> >>>>> this is still preferable to failing to boot the image. >>>>> >>>>> >>>>> >>>>> You can also exec into the cinder container at the DCN site and >>>>> >>>>> confirm it's using it's local ceph cluster. >>>>> >>>>> >>>>> >>>>> John >>>>> >>>>> >>>>> >>>>> > >>>>> >>>>> > I will try and create a new fresh image and test again then >>>>> update. >>>>> >>>>> > >>>>> >>>>> > With regards, >>>>> >>>>> > Swogat Pradhan >>>>> >>>>> > >>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>> >> >>>>> >>>>> >> Update: >>>>> >>>>> >> In the hypervisor list the compute node state is showing down. >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>> >>> >>>>> >>>>> >>> Hi Brendan, >>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>>>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>> >>>>> >>> I used a cirros image to launch instance but the instance >>>>> timed out so i waited for the volume to be created. >>>>> >>>>> >>> Once the volume was created i tried launching the instance >>>>> from the volume and still the instance is stuck in spawning state. >>>>> >>>>> >>> >>>>> >>>>> >>> Here is the nova-compute log: >>>>> >>>>> >>> >>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >>>>> privsep daemon starting >>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >>>>> privsep process running with uid/gid: 0/0 >>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>>> privsep process running with capabilities (eff/prm/inh): >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>>> privsep daemon running as pid 185437 >>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>> os_brick.initiator.connectors.nvmeof >>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>> in _get_host_uuid: Unexpected error while running command. >>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>> >>>>> >>> Exit code: 2 >>>>> >>>>> >>> Stdout: '' >>>>> >>>>> >>> Stderr: '': >>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>> running command. >>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>> >>>>> >>> >>>>> >>>>> >>> It is stuck in creating image, do i need to run the template >>>>> mentioned here ?: >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>> >>>>> >>> >>>>> >>>>> >>> The volume is already created and i do not understand why >>>>> the instance is stuck in spawning state. >>>>> >>>>> >>> >>>>> >>>>> >>> With regards, >>>>> >>>>> >>> Swogat Pradhan >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>> bshephar at redhat.com> wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> Does your environment use different network interfaces for >>>>> each of the networks? Or does it have a bond with everything on it? >>>>> >>>>> >>>> >>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>> instances, there is a lot of network traffic between nodes as the >>>>> hypervisor needs to download the image from Glance. Along with various >>>>> other services sending normal network traffic, it can be enough to cause >>>>> issues if everything is running over a single 1Gbe interface. >>>>> >>>>> >>>> >>>>> >>>>> >>>> I have seen the same situation in fact when using a single >>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>>>> while you try to spawn the instance to see if you?re dropping packets. In >>>>> the situation I described, there were dropped packets which resulted in a >>>>> loss of communication between nova_compute and RMQ, so the node appeared >>>>> offline. You should also confirm that nova_compute is being disconnected in >>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>>>> instance. >>>>> >>>>> >>>> >>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, >>>>> based on that experience, from my perspective, is certainly sounds like >>>>> some kind of network issue. >>>>> >>>>> >>>> >>>>> >>>>> >>>> Regards, >>>>> >>>>> >>>> >>>>> >>>>> >>>> Brendan Shephard >>>>> >>>>> >>>> Senior Software Engineer >>>>> >>>>> >>>> Red Hat Australia >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>> wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> Hi, >>>>> >>>>> >>>> >>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago >>>>> in this thread: >>>>> >>>>> >>>> >>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>> >>>>> >>>> >>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>>>> user, not sure if that could apply here. But is it possible that your nova >>>>> and neutron versions are different between central and edge site? Have you >>>>> restarted nova and neutron services on the compute nodes after >>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>> Maybe they can help narrow down the issue. >>>>> >>>>> >>>> If there isn't any additional information in the debug logs >>>>> I probably would start "tearing down" rabbitmq. I didn't have to do that in >>>>> a production system yet so be careful. I can think of two routes: >>>>> >>>>> >>>> >>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>> running, this will most likely impact client IO depending on your load. >>>>> Check out the rabbitmqctl commands. >>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables >>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>>> >>>>> >>>> >>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being >>>>> replicated across the rabbit nodes. But I don't really know the rabbit >>>>> internals too well, so maybe someone else can chime in here and give a >>>>> better advice. >>>>> >>>>> >>>> >>>>> >>>>> >>>> Regards, >>>>> >>>>> >>>> Eugen >>>>> >>>>> >>>> >>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>> >>>>> >>>> >>>>> >>>>> >>>> Hi, >>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>> >>>>> >>>> >>>>> >>>>> >>>> With regards, >>>>> >>>>> >>>> Swogat Pradhan >>>>> >>>>> >>>> >>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> >>>>> >>>>> >>>> wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> Hi >>>>> >>>>> >>>> I don't see any major packet loss. >>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not >>>>> due to packet >>>>> >>>>> >>>> loss. >>>>> >>>>> >>>> >>>>> >>>>> >>>> with regards, >>>>> >>>>> >>>> Swogat Pradhan >>>>> >>>>> >>>> >>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> >>>>> >>>>> >>>> wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> Hi, >>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked >>>>> when >>>>> >>>>> >>>> launching the instance. >>>>> >>>>> >>>> I will check that and come back. >>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck >>>>> at spawning >>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if >>>>> packet loss >>>>> >>>>> >>>> causes this. >>>>> >>>>> >>>> >>>>> >>>>> >>>> With regards, >>>>> >>>>> >>>> Swogat pradhan >>>>> >>>>> >>>> >>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>>>> wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>> identical between >>>>> >>>>> >>>> central and edge site? Do you see packet loss through the >>>>> tunnel? >>>>> >>>>> >>>> >>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>> >>>>> >>>> >>>>> >>>>> >>>> > Hi Eugen, >>>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' >>>>> as i am not >>>>> >>>>> >>>> > getting email's from you. >>>>> >>>>> >>>> > Coming to the issue: >>>>> >>>>> >>>> > >>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>>> list_policies -p >>>>> >>>>> >>>> / >>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>> priority >>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>> >>>>> >>>> > >>>>> >>>>> >>>> >>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>> >>>>> >>>> > >>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >>>>> when i am >>>>> >>>>> >>>> trying >>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>> spawning state and >>>>> >>>>> >>>> then >>>>> >>>>> >>>> > gets stuck. >>>>> >>>>> >>>> > >>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>>>> sites. >>>>> >>>>> >>>> > >>>>> >>>>> >>>> > With regards, >>>>> >>>>> >>>> > Swogat Pradhan >>>>> >>>>> >>>> > >>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>>> >>>> > wrote: >>>>> >>>>> >>>> > >>>>> >>>>> >>>> >> Hi Eugen, >>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>> directly, i am >>>>> >>>>> >>>> checking >>>>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq >>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>> occurred. >>>>> >>>>> >>>> >> >>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>> activities in the >>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>>> >>>>> >>>> >> >>>>> >>>>> >>>> >> With regards, >>>>> >>>>> >>>> >> Swogat Pradhan >>>>> >>>>> >>>> >> >>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>>> >>>> >> wrote: >>>>> >>>>> >>>> >> >>>>> >>>>> >>>> >>> Hi Eugen, >>>>> >>>>> >>>> >>> Thanks for your response. >>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>>>> details: >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> *PCS Status:* >>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>> >>>>> >>>> >>> >>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>>> >>>> Started >>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>>> >>>> Started >>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>>> >>>> Started >>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>> >>>>> >>>> Started >>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but >>>>> the issue is >>>>> >>>>> >>>> still >>>>> >>>>> >>>> >>> present. >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> *Cluster status:* >>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>> cluster_status >>>>> >>>>> >>>> >>> Cluster status of node >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> ... >>>>> >>>>> >>>> >>> Basics >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Cluster name: >>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Disk Nodes >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Running Nodes >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Versions >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>>> RabbitMQ >>>>> >>>>> >>>> 3.8.3 >>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>>> RabbitMQ >>>>> >>>>> >>>> 3.8.3 >>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>>> RabbitMQ >>>>> >>>>> >>>> 3.8.3 >>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>> >>>>> >>>> >>> >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>> >>>>> >>>> RabbitMQ >>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Alarms >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> (none) >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Network Partitions >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> (none) >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Listeners >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>> inter-node and CLI >>>>> >>>>> >>>> tool >>>>> >>>>> >>>> >>> communication >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: >>>>> AMQP 0-9-1 >>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>> inter-node and CLI >>>>> >>>>> >>>> tool >>>>> >>>>> >>>> >>> communication >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: >>>>> AMQP 0-9-1 >>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>> inter-node and CLI >>>>> >>>>> >>>> tool >>>>> >>>>> >>>> >>> communication >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: >>>>> AMQP 0-9-1 >>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>> >>>>> >>>> interface: >>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>>> >>>> , >>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>>> purpose: >>>>> >>>>> >>>> inter-node and >>>>> >>>>> >>>> >>> CLI tool communication >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>>> >>>> , >>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>>>> purpose: AMQP >>>>> >>>>> >>>> 0-9-1 >>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>> >>>>> >>>> >>> Node: >>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>> >>>>> >>>> , >>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >>>>> HTTP API >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Feature flags >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> *Logs:* >>>>> >>>>> >>>> >>> *(Attached)* >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> With regards, >>>>> >>>>> >>>> >>> Swogat Pradhan >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>> >>>>> >>>> >>> wrote: >>>>> >>>>> >>>> >>> >>>>> >>>>> >>>> >>>> Hi, >>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log. >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>> nova-conuctor: >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>> The reply >>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after >>>>> 60 seconds >>>>> >>>>> >>>> due to a >>>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). >>>>> >>>>> >>>> Abandoning...: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> The reply >>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after >>>>> 60 seconds >>>>> >>>>> >>>> due to a >>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>>> >>>> Abandoning...: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> The reply >>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after >>>>> 60 seconds >>>>> >>>>> >>>> due to a >>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>>> >>>> Abandoning...: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> Cache enabled >>>>> >>>>> >>>> with >>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>> drop reply to >>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>> oslo_messaging._drivers.amqpdriver >>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>> The reply >>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after >>>>> 60 seconds >>>>> >>>>> >>>> due to a >>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). >>>>> >>>>> >>>> Abandoning...: >>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>> With regards, >>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>>> Hi, >>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where >>>>> i am trying to >>>>> >>>>> >>>> >>>>> launch vm's. >>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>>>> (openstack >>>>> >>>>> >>>> compute >>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart >>>>> the nova >>>>> >>>>> >>>> compute >>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> nova-compute.log >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >>>>> Running >>>>> >>>>> >>>> >>>>> instance usage >>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>> 2023-02-26 07:00:00 >>>>> >>>>> >>>> to >>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> [instance: >>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>> successful on node >>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>> nova.virt.libvirt.driver >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> [instance: >>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>> supplied device >>>>> >>>>> >>>> name: >>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> [instance: >>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>>>> volume >>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> Cache enabled >>>>> >>>>> >>>> with >>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> Running >>>>> >>>>> >>>> >>>>> privsep helper: >>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>>> >>>>> >>>> 'privsep-helper', >>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>> '--config-file', >>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> Spawned new >>>>> >>>>> >>>> privsep >>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon >>>>> [-] privsep >>>>> >>>>> >>>> >>>>> daemon starting >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon >>>>> [-] privsep >>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>>>> [-] privsep >>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon >>>>> [-] privsep >>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> Process >>>>> >>>>> >>>> >>>>> execution error >>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>>> command. >>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>> >>>>> >>>> >>>>> Stdout: '' >>>>> >>>>> >>>> >>>>> Stderr: '': >>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>> nova.virt.libvirt.driver >>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>> [instance: >>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> With regards, >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Mar 22 17:20:28 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 10:20:28 -0700 Subject: OpenStack 2023.1 Antelope is officially released! In-Reply-To: References: Message-ID: <1870a561cc9.c41d7834993030.554598293953461081@ghanshyammann.com> Congratulation and thanks, Everyone. A special thanks to the release team for another *on-time* release and great work. -gmann ---- On Wed, 22 Mar 2023 08:58:16 -0700 El?d Ill?s wrote --- > div.zm_-5934569411081988831_parse_4848926974378694865 P { margin-top: 0; margin-bottom: 0 }Let me join and thank to all who were part of the 2023.1 Antelopedevelopment cycle!Also note, that this marks the official opening of the openstack/releasesrepository for 2023.2 Bobcat releases, and freezes are now lifted.stable/2023.1 is now a fully normal stable branch, and the normal stablepolicy applies from now on.Thanks,El?d Ill?s > > From: Herve Beraud hberaud at redhat.com> > Sent: Wednesday, March 22, 2023 4:18 PM > To: openstack-discuss openstack-discuss at lists.openstack.org> > Subject: OpenStack 2023.1 Antelope is officially released!?Hello OpenStack community,I'm excited to announce the final releases for the components ofOpenStack 2023.1 Antelope, which conclude the 2023.1 Antelopedevelopment cycle.You will find a complete list of all components, their latestversions, and links to individual project release notes documentslisted on the new release site. https://releases.openstack.org/antelope/Congratulations to all of the teams who have contributed to thisrelease!Our next production cycle, 2023.2 Bobcat, has already started. We willmeet at the Virtual Project Team Gathering, March 27-31, 2023, to planthe work for the upcoming cycle. I hope to see you there!Thanks,OpenStack Release Management team-- > Herv? BeraudSenior Software Engineer at Red Hatirc: hberaudhttps://github.com/4383/ > > From gmann at ghanshyammann.com Wed Mar 22 17:43:48 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 10:43:48 -0700 Subject: [ptl] Need PTL volunteer for OpenStack Winstackers Message-ID: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com> Hi Lukas, I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle. There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please check if you or anyone you know would like to lead this project. - https://etherpad.opendev.org/p/2023.2-leaderless Also, if anyone else would like to help leading this project, this is time to let TC knows. -gmann From gmann at ghanshyammann.com Wed Mar 22 17:43:55 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 10:43:55 -0700 Subject: [ptl] Need PTL volunteer for OpenStack Vitrage Message-ID: <1870a6b95c5.105ea64ba994248.8904640311035076666@ghanshyammann.com> Hi Eyal, I am reaching out to you as you were PTL for OpenStack Vitrage project in the last cycle. There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please check if you or anyone you know would like to lead this project. - https://etherpad.opendev.org/p/2023.2-leaderless Also, if anyone else would like to help leading this project, this is time to let TC knows. -gmann From rdhasman at redhat.com Wed Mar 22 18:14:06 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Wed, 22 Mar 2023 23:44:06 +0530 Subject: [cinder] Error when creating backups from iscsi volume In-Reply-To: <20230316121023.tdzgu6zinm7spvjp@localhost> References: <20230306113543.a57aywefbn4cgsu3@localhost> <20230309095514.l3i67tys2ujaq6dp@localhost> <20230313163251.xpnzyvzb65b6zaal@localhost> <20230314084601.t2ez24gcljnu5plq@localhost> <20230316121023.tdzgu6zinm7spvjp@localhost> Message-ID: Hi Gorka and Rishat, As discussed with Gorka, I will be working on the issues reported. I've reported 2 bugs for case 1) and 3) since we aren't sure on case 2) yet. *Bug 1*: https://bugs.launchpad.net/os-brick/+bug/2012251 *Fix 1*: https://review.opendev.org/c/openstack/os-brick/+/878045 *Bug 2*: https://bugs.launchpad.net/os-brick/+bug/2012352 *Fix 2*: https://review.opendev.org/c/openstack/os-brick/+/878242 I'm not 100% sure that the approach in *Fix 2* is the best way to do it but it works with my test scenarios and reviews are always appreciated. Thanks Rajat Dhasmana On Thu, Mar 16, 2023 at 5:45?PM Gorka Eguileor wrote: > On 16/03, Rishat Azizov wrote: > > Hi Gorka, > > > > Thanks! > > I fixed issue by adding to multipathd config uxsock_timeout directive: > > uxsock_timeout 10000 > > > > Because in multipathd logs I saw this error: > > 3624a93705842cfae35d7483200015fd8: map flushed > > cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after > > 4.858561 secs > > > > Now large disk backups work fine. > > > > 2. This happens because despite the timeout of the first attempt and exit > > code 1, the multipath device was disconnected, so the next attempts ended > > with an error "is not a multipath device", since the multipath device had > > already disconnected. > > > > Hi, > > That's a nice workaround until we fix it upstream!! > > Thanks for confirming my suspicions were right. This is the 3rd thing I > mentioned was happening, flush call failed but it actually removed the > device. > > We'll proceed to fix the flushing code in master. > > Cheers, > Gorka. > > > > > ??, 14 ???. 2023??. ? 14:46, Gorka Eguileor : > > > > > [Sending the email again as it seems it didn't reach the ML] > > > > > > > > > On 13/03, Gorka Eguileor wrote: > > > > On 11/03, Rishat Azizov wrote: > > > > > Hi, Gorka, > > > > > > > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments. > > > > > > > > > > > > > > > > > Hi, > > > > > > There are multiple things going on here: > > > > > > 1. There is a bug in os-brick, because the disconnect_volume should not > > > fail, since it is being called with force=True and > > > ignore_errors=True. > > > > > > The issues is that this call [1] is not wrapped in the > > > ExceptionChainer context manager, and it should not even be a flush > > > call, it should be a call to "multipathd remove map $map" instead. > > > > > > 2. The way multipath code is written [2][3], the error we see about > > > "3624a93705842cfae35d7483200015fce is not a multipath device" means > 2 > > > different things: it is not a multipath or an error happened. > > > > > > So we don't really know what happened without enabling more verbose > > > multipathd log levels. > > > > > > 3. The "multipath -f" call should not be failing in the first place, > > > because the failure is happening on disconnecting the source volume, > > > which has no data buffered to be written and therefore no reason to > > > fail the flush (unless it's using a friendly name). > > > > > > I don't know if it could be happening that the first flush fails > with > > > a timeout (maybe because there is an extend operation happening), > but > > > multipathd keeps trying to flush it in the background and when it > > > succeeds it removes the multipath device, which makes following > calls > > > fail. > > > > > > If that's the case we would need to change the retry from automatic > > > [4] to manual and check in-between to see if the device has been > > > removed in-between calls. > > > > > > The first issue is definitely a bug, the 2nd one is something that > could > > > be changed in the deployment to try to get additional information on > the > > > failure, and the 3rd one could be a bug. > > > > > > I'll see if I can find someone who wants to work on the 1st and 3rd > > > points. > > > > > > Cheers, > > > Gorka. > > > > > > [1]: > > > > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952 > > > [2]: > > > > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064 > > > [3]: > > > > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872 > > > [4]: > > > > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384 > > > > > > > > > > > > > > > > > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor : > > > > > > > > > > > On 06/03, Rishat Azizov wrote: > > > > > > > Hi, > > > > > > > > > > > > > > It works with smaller volumes. > > > > > > > > > > > > > > multipath.conf attached to thist email. > > > > > > > > > > > > > > Cinder version - 18.2.0 Wallaby > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Mar 22 18:35:29 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 22 Mar 2023 11:35:29 -0700 Subject: [ptl] Need PTL volunteer for OpenStack Sahara Message-ID: <1870a9accaa.ca4d3931996653.2888367201531485088@ghanshyammann.com> Hi Qiu, I am reaching out to you as you were PTL for OpenStack Sahara project in the last cycle. There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please check if you or anyone you know would like to lead this project. - https://etherpad.opendev.org/p/2023.2-leaderless Also, if anyone else would like to help leading this project, this is time to let TC knows. -gmann From mnaser at vexxhost.com Wed Mar 22 19:27:20 2023 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 22 Mar 2023 19:27:20 +0000 Subject: [neutron] detecting l3-agent readiness In-Reply-To: References: <2315188.ElGaqSPkdT@p1> <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de> Message-ID: From: Rodolfo Alonso Hernandez Date: Monday, March 20, 2023 at 12:09 PM To: Jan Horstmann Cc: Mohammed Naser , felix.huettner at mail.schwarz , skaplons at redhat.com , openstack-discuss at lists.openstack.org Subject: Re: [neutron] detecting l3-agent readiness Hello: I think I'm repeating myself here but we have two approaches to solve this problem: * The easiest one, at least for the L3 agent, is to report an INFO level log before and after the full sync. That could be parsed by any tool to detect that. You can propose a patch to the Neutron repository. I?ve kicked this off with this: https://review.opendev.org/c/openstack/neutron/+/878248 fix: add log message for periodic_sync_routers_task fullsync [NEW] * https://bugs.launchpad.net/neutron/+bug/2011422: a more elaborated way to report the agent status. That could provide the start flag, the revived flag, the sync processing flag and many other ones that could be defined only for this specific agent. Regards. On Mon, Mar 20, 2023 at 4:33?PM Jan Horstmann > wrote: On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote: > Hi, > > > Subject: Re: [neutron] detecting l3-agent readiness > > > > Hi, > > > > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze: > > > Hi Mohammed, > > > > > > > Subject: [neutron] detecting l3-agent readiness > > > > > > > > Hi folks, > > > > > > > > I'm working on improving the stability of rollouts when using Kubernetes as a control > > plane, specifically around the L3 agent, it seems that I have not found a clear way to > > detect in the code path where the L3 agent has finished it's initial sync.. > > > > > > > > > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/- > > /blob/devel/files/startup_wait_for_ns.py > > > Basically we are checking against the neutron api what routers should be on the node and > > then validate that all keepalived processes are up and running. > > > > That would work only for HA routers. If You would also have routers which aren't "ha" this > > method may fail. > > > > Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived). > Instead of counting processes I have been using the l3 agent's `configurations.routers` field to determine its readiness. From my understanding comparing this number with the number of active routers hosted by the agent should be a good indicator of its sync status. Using two api calls for this is inherently racy, but could be a sufficient workaround for environments with a moderate number of router events. So a simple test snippet for the sync status of all agents could be: ``` import sys import openstack client = openstack.connection.Connection( ... ) l3_agent_synced = [ len([ router for router in client.network.agent_hosted_routers(agent) if router.is_admin_state_up ]) <= client.network.get_agent(agent).configuration["routers"] for agent in client.network.agents() if agent.agent_type == "L3 agent" and (agent.configuration["agent_mode"] == "dvr_snat" or agent.configuration["agent_mode"] == "legacy") ] if not all(l3_agent_synced): sys.exit(1) ``` Please let me know if I am way off with this approach :) > > > > > > > Am I missing it somewhere or is the architecture built in a way that doesn't really > > answer that question? > > > > > > > > > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts > > for l2 and dhcp agents. > > > > > > > > > > Thanks > > > > Mohammed > > > > > > > > > > > > -- > > > > Mohammed Naser > > > > VEXXHOST, Inc. > > > > > > -- > > > Felix Huettner > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung > > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger > > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. > > Hinweise zum Datenschutz finden Sie hier>. > > > > > > > > > -- > > Slawek Kaplonski > > Principal Software Engineer > > Red Hat > > -- > Felix Huettner > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier>. -- Jan Horstmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 22 20:59:18 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 02:29:18 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: I still have the same issue, I'm not sure what's left to try. All the pods are now in a healthy state, I am getting log entries 3 mins after I hit the create volume button in cinder-volume when I try to create a volume with an image. And the volumes are just stuck in creating state for more than 20 mins now. Cinder logs: 2023-03-22 20:32:44.010 108 INFO cinder.rpc [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected cinder-volume RPC version 3.17 as minimum service version. 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } With regards, Swogat Pradhan On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop wrote: > > > On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan > wrote: > >> Hi Adam, >> The systems are in same LAN, in this case it seemed like the image was >> getting pulled from the central site which was caused due to an >> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >> directory, which seems to have been resolved after the changes i made to >> fix it. >> >> Right now the glance api podman is running in unhealthy state and the >> podman logs don't show any error whatsoever and when issued the command >> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >> site, which is why cinder is throwing an error stating: >> >> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >> finding address for >> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >> Unable to establish connection to >> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >> NewConnectionError('> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >> ECONNREFUSED',)) >> >> Now i need to find out why the port is not listed as the glance service >> is running, which i am not sure how to find out. >> > > One other thing to investigate is whether your deployment includes this > patch [1]. If it does, then bear in mind > the glance-api service running at the edge site will be an "internal" (non > public facing) instance that uses port 9293 > instead of 9292. You should familiarize yourself with the release note [2]. > > [1] > https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 > [2] > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml > > Alan > > >> With regards, >> Swogat Pradhan >> >> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop wrote: >> >>> >>> >>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Update: >>>> Here is the log when creating a volume using cirros image: >>>> >>>> 2023-03-22 11:04:38.449 109 INFO >>>> cinder.volume.flows.manager.create_volume >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>> specification: {'status': 'creating', 'volume_name': >>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> [{'url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> 'metadata': {'store': 'ceph'}}, {'url': >>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> 'metadata': {'store': 'ceph'}}, {'url': >>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>> 'owner_specified.openstack.object': 'images/cirros', >>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>> } >>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>> >>> >>> As Adam Savage would say, well there's your problem ^^ (Image download >>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s >>> suggests you have a network issue. >>> >>> John Fulton previously stated your cinder-volume service at the edge >>> site is not using the local ceph image store. Assuming you are deploying >>> GlanceApiEdge service [1], then the cinder-volume service should be >>> configured to use the local glance service [2]. You should check cinder's >>> glance_api_servers to confirm it's the edge site's glance service. >>> >>> [1] >>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>> [2] >>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>> >>> Alan >>> >>> >>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>> FutureWarning: The human format is deprecated and the format parameter will >>>> be removed. Use explicitly json instead in version 'xena' >>>> category=FutureWarning) >>>> >>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>> FutureWarning: The human format is deprecated and the format parameter will >>>> be removed. Use explicitly json instead in version 'xena' >>>> category=FutureWarning) >>>> >>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>> MB/s >>>> 2023-03-22 11:11:14.998 109 INFO >>>> cinder.volume.flows.manager.create_volume >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>> >>>> The image is present in dcn02 store but still it downloaded the image >>>> in 0.16 MB/s and then created the volume. >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Hi Jhon, >>>>> This seems to be an issue. >>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>>> parameter was specified to the respective cluster names but the config >>>>> files were created in the name of ceph.conf and keyring was >>>>> ceph.client.openstack.keyring. >>>>> >>>>> Which created issues in glance as well as the naming convention of the >>>>> files didn't match the cluster names, so i had to manually rename the >>>>> central ceph conf file as such: >>>>> >>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>> [root at dcn02-compute-0 ceph]# ll >>>>> total 16 >>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>> ceph_central.client.openstack.keyring >>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>> [root at dcn02-compute-0 ceph]# >>>>> >>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>>>> respective clusters in both dcn01 and dcn02. >>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>> for accessing central ceph cluster. >>>>> >>>>> glance multistore config: >>>>> [dcn02] >>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>> rbd_store_user=openstack >>>>> rbd_store_pool=images >>>>> rbd_thin_provisioning=False >>>>> store_description=dcn02 rbd glance store >>>>> >>>>> [ceph_central] >>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>> rbd_store_user=openstack >>>>> rbd_store_pool=images >>>>> rbd_thin_provisioning=False >>>>> store_description=Default glance store backend. >>>>> >>>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>> wrote: >>>>> >>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>> wrote: >>>>>> > >>>>>> > Hi, >>>>>> > Seems like cinder is not using the local ceph. >>>>>> >>>>>> That explains the issue. It's a misconfiguration. >>>>>> >>>>>> I hope this is not a production system since the mailing list now has >>>>>> the cinder.conf which contains passwords. >>>>>> >>>>>> The section that looks like this: >>>>>> >>>>>> [tripleo_ceph] >>>>>> volume_backend_name=tripleo_ceph >>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>> rbd_user=openstack >>>>>> rbd_pool=volumes >>>>>> rbd_flatten_volume_from_snapshot=False >>>>>> rbd_secret_uuid= >>>>>> report_discard_supported=True >>>>>> >>>>>> Should be updated to refer to the local DCN ceph cluster and not the >>>>>> central one. Use the ceph conf file for that cluster and ensure the >>>>>> rbd_secret_uuid corresponds to that one. >>>>>> >>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>> secret-get-value $FSID`. >>>>>> >>>>>> The documentation describes how to configure the central and DCN sites >>>>>> correctly but an error seems to have occurred while you were following >>>>>> it. >>>>>> >>>>>> >>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>> >>>>>> John >>>>>> >>>>>> > >>>>>> > Ceph Output: >>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>> > NAME SIZE PARENT FMT >>>>>> PROT LOCK >>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>>>> excl >>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 >>>>>> yes >>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 >>>>>> yes >>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 >>>>>> yes >>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 >>>>>> yes >>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 >>>>>> yes >>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 >>>>>> yes >>>>>> > >>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>> > NAME SIZE PARENT FMT >>>>>> PROT LOCK >>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>> > >>>>>> > Attached the cinder config. >>>>>> > Please let me know how I can solve this issue. >>>>>> > >>>>>> > With regards, >>>>>> > Swogat Pradhan >>>>>> > >>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>>>> wrote: >>>>>> >> >>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>> config. >>>>>> >> >>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>> >>>>>> >>> Update: >>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>>> >>> The image size is 389 MB. >>>>>> >>> >>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>> >>>>>> >>>> Hi Jhon, >>>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>>> after importing from the central site. >>>>>> >>>> But launching an instance normally fails as it takes a long time >>>>>> for the volume to get created. >>>>>> >>>> >>>>>> >>>> When launching an instance from volume the instance is getting >>>>>> created properly without any errors. >>>>>> >>>> >>>>>> >>>> I tried to cache images in nova using >>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>> but getting checksum failed error. >>>>>> >>>> >>>>>> >>>> With regards, >>>>>> >>>> Swogat Pradhan >>>>>> >>>> >>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton >>>>>> wrote: >>>>>> >>>>> >>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>> >>>>> wrote: >>>>>> >>>>> > >>>>>> >>>>> > Update: After restarting the nova services on the controller >>>>>> and running the deploy script on the edge site, I was able to launch the VM >>>>>> from volume. >>>>>> >>>>> > >>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>> for the volume to be created, whereas the image has already been imported >>>>>> to the edge glance. >>>>>> >>>>> >>>>>> >>>>> Try following this document and making the same observations in >>>>>> your >>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>> >>>>> >>>>>> >>>>> >>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>> >>>>> >>>>>> >>>>> On a DCN site if you run a command like this: >>>>>> >>>>> >>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>> >>>>> NAME SIZE PARENT >>>>>> >>>>> FMT PROT LOCK >>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>>>> >>>>> $ >>>>>> >>>>> >>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>> which is on >>>>>> >>>>> the same local ceph cluster. >>>>>> >>>>> >>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>> encountering >>>>>> >>>>> the streaming behavior described here: >>>>>> >>>>> >>>>>> >>>>> Ideally all images should reside in the central Glance and be >>>>>> copied >>>>>> >>>>> to DCN sites before instances of those images are booted on DCN >>>>>> sites. >>>>>> >>>>> If an image is not copied to a DCN site before it is booted, >>>>>> then the >>>>>> >>>>> image will be streamed to the DCN site and then the image will >>>>>> boot as >>>>>> >>>>> an instance. This happens because Glance at the DCN site has >>>>>> access to >>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>> booting of >>>>>> >>>>> the image will take time because it has not been copied in >>>>>> advance, >>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>> >>>>> >>>>>> >>>>> You can also exec into the cinder container at the DCN site and >>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>> >>>>> >>>>>> >>>>> John >>>>>> >>>>> >>>>>> >>>>> > >>>>>> >>>>> > I will try and create a new fresh image and test again then >>>>>> update. >>>>>> >>>>> > >>>>>> >>>>> > With regards, >>>>>> >>>>> > Swogat Pradhan >>>>>> >>>>> > >>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>> >> >>>>>> >>>>> >> Update: >>>>>> >>>>> >> In the hypervisor list the compute node state is showing >>>>>> down. >>>>>> >>>>> >> >>>>>> >>>>> >> >>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>> >>> >>>>>> >>>>> >>> Hi Brendan, >>>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>>>>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>>> >>>>> >>> I used a cirros image to launch instance but the instance >>>>>> timed out so i waited for the volume to be created. >>>>>> >>>>> >>> Once the volume was created i tried launching the instance >>>>>> from the volume and still the instance is stuck in spawning state. >>>>>> >>>>> >>> >>>>>> >>>>> >>> Here is the nova-compute log: >>>>>> >>>>> >>> >>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] >>>>>> privsep daemon starting >>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] >>>>>> privsep process running with uid/gid: 0/0 >>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>>>> privsep process running with capabilities (eff/prm/inh): >>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] >>>>>> privsep daemon running as pid 185437 >>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>> os_brick.initiator.connectors.nvmeof >>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>> >>>>> >>> Exit code: 2 >>>>>> >>>>> >>> Stdout: '' >>>>>> >>>>> >>> Stderr: '': >>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>> running command. >>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>> >>>>> >>> >>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>> template mentioned here ?: >>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>> >>>>> >>> >>>>>> >>>>> >>> The volume is already created and i do not understand why >>>>>> the instance is stuck in spawning state. >>>>>> >>>>> >>> >>>>>> >>>>> >>> With regards, >>>>>> >>>>> >>> Swogat Pradhan >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>> bshephar at redhat.com> wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Does your environment use different network interfaces for >>>>>> each of the networks? Or does it have a bond with everything on it? >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>> instances, there is a lot of network traffic between nodes as the >>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>> other services sending normal network traffic, it can be enough to cause >>>>>> issues if everything is running over a single 1Gbe interface. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> I have seen the same situation in fact when using a single >>>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic >>>>>> while you try to spawn the instance to see if you?re dropping packets. In >>>>>> the situation I described, there were dropped packets which resulted in a >>>>>> loss of communication between nova_compute and RMQ, so the node appeared >>>>>> offline. You should also confirm that nova_compute is being disconnected in >>>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the >>>>>> instance. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. >>>>>> So, based on that experience, from my perspective, is certainly sounds like >>>>>> some kind of network issue. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Regards, >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Brendan Shephard >>>>>> >>>>> >>>> Senior Software Engineer >>>>>> >>>>> >>>> Red Hat Australia >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>> wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Hi, >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago >>>>>> in this thread: >>>>>> >>>>> >>>> >>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>>>>> user, not sure if that could apply here. But is it possible that your nova >>>>>> and neutron versions are different between central and edge site? Have you >>>>>> restarted nova and neutron services on the compute nodes after >>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>> Maybe they can help narrow down the issue. >>>>>> >>>>> >>>> If there isn't any additional information in the debug >>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do >>>>>> that in a production system yet so be careful. I can think of two routes: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>>> running, this will most likely impact client IO depending on your load. >>>>>> Check out the rabbitmqctl commands. >>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables >>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being >>>>>> replicated across the rabbit nodes. But I don't really know the rabbit >>>>>> internals too well, so maybe someone else can chime in here and give a >>>>>> better advice. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Regards, >>>>>> >>>>> >>>> Eugen >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Hi, >>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> With regards, >>>>>> >>>>> >>>> Swogat Pradhan >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> >>>>>> >>>>> >>>> wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Hi >>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but >>>>>> not due to packet >>>>>> >>>>> >>>> loss. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> with regards, >>>>>> >>>>> >>>> Swogat Pradhan >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> >>>>>> >>>>> >>>> wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Hi, >>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>> checked when >>>>>> >>>>> >>>> launching the instance. >>>>>> >>>>> >>>> I will check that and come back. >>>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck >>>>>> at spawning >>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure >>>>>> if packet loss >>>>>> >>>>> >>>> causes this. >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> With regards, >>>>>> >>>>> >>>> Swogat pradhan >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>>>>> wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>> identical between >>>>>> >>>>> >>>> central and edge site? Do you see packet loss through the >>>>>> tunnel? >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> > Hi Eugen, >>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or >>>>>> 'cc' as i am not >>>>>> >>>>> >>>> > getting email's from you. >>>>>> >>>>> >>>> > Coming to the issue: >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>>>> list_policies -p >>>>>> >>>>> >>>> / >>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>> priority >>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> >>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down >>>>>> when i am >>>>>> >>>>> >>>> trying >>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>> spawning state and >>>>>> >>>>> >>>> then >>>>>> >>>>> >>>> > gets stuck. >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>>>>> sites. >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> > With regards, >>>>>> >>>>> >>>> > Swogat Pradhan >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>> >>>>> >>>> > wrote: >>>>>> >>>>> >>>> > >>>>>> >>>>> >>>> >> Hi Eugen, >>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>> directly, i am >>>>>> >>>>> >>>> checking >>>>>> >>>>> >>>> >> the email digest and there i am able to find your reply. >>>>>> >>>>> >>>> >> Here is the log for download: >>>>>> https://we.tl/t-L8FEkGZFSq >>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>> occurred. >>>>>> >>>>> >>>> >> >>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>> activities in the >>>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>>>> >>>>> >>>> >> >>>>>> >>>>> >>>> >> With regards, >>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>> >>>>> >>>> >> >>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>> >>>>> >>>> >> wrote: >>>>>> >>>>> >>>> >> >>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>>>>> details: >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>> >>>>> >>>> >>> >>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>> >>>>> >>>> Started >>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>> >>>>> >>>> Started >>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>> >>>>> >>>> Started >>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>> >>>>> >>>> Started >>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but >>>>>> the issue is >>>>>> >>>>> >>>> still >>>>>> >>>>> >>>> >>> present. >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>> cluster_status >>>>>> >>>>> >>>> >>> Cluster status of node >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>> ... >>>>>> >>>>> >>>> >>> Basics >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Cluster name: >>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Disk Nodes >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Running Nodes >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Versions >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>>>> RabbitMQ >>>>>> >>>>> >>>> 3.8.3 >>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>>>> RabbitMQ >>>>>> >>>>> >>>> 3.8.3 >>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>>>> RabbitMQ >>>>>> >>>>> >>>> 3.8.3 >>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>> >>>>> >>>> >>> >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>> >>>>> >>>> RabbitMQ >>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Alarms >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> (none) >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Network Partitions >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> (none) >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Listeners >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>> inter-node and CLI >>>>>> >>>>> >>>> tool >>>>>> >>>>> >>>> >>> communication >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: >>>>>> AMQP 0-9-1 >>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>> inter-node and CLI >>>>>> >>>>> >>>> tool >>>>>> >>>>> >>>> >>> communication >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: >>>>>> AMQP 0-9-1 >>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>> inter-node and CLI >>>>>> >>>>> >>>> tool >>>>>> >>>>> >>>> >>> communication >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: >>>>>> AMQP 0-9-1 >>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>> >>>>> >>>> interface: >>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>> >>>>> >>>> , >>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>>>> purpose: >>>>>> >>>>> >>>> inter-node and >>>>>> >>>>> >>>> >>> CLI tool communication >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>> >>>>> >>>> , >>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, >>>>>> purpose: AMQP >>>>>> >>>>> >>>> 0-9-1 >>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>> >>>>> >>>> >>> Node: >>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>> >>>>> >>>> , >>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: >>>>>> HTTP API >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Feature flags >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> *Logs:* >>>>>> >>>>> >>>> >>> *(Attached)* >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> With regards, >>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>> >>>>> >>>> >>> wrote: >>>>>> >>>>> >>>> >>> >>>>>> >>>>> >>>> >>>> Hi, >>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api >>>>>> log. >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>> The reply >>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after >>>>>> 60 seconds >>>>>> >>>>> >>>> due to a >>>>>> >>>>> >>>> >>>> missing queue >>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>> >>>>> >>>> Abandoning...: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> The reply >>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after >>>>>> 60 seconds >>>>>> >>>>> >>>> due to a >>>>>> >>>>> >>>> >>>> missing queue >>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>> >>>>> >>>> Abandoning...: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> The reply >>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after >>>>>> 60 seconds >>>>>> >>>>> >>>> due to a >>>>>> >>>>> >>>> >>>> missing queue >>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>> >>>>> >>>> Abandoning...: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> Cache enabled >>>>>> >>>>> >>>> with >>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, >>>>>> drop reply to >>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>> oslo_messaging._drivers.amqpdriver >>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>> The reply >>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after >>>>>> 60 seconds >>>>>> >>>>> >>>> due to a >>>>>> >>>>> >>>> >>>> missing queue >>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>> >>>>> >>>> Abandoning...: >>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>> With regards, >>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>>> Hi, >>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where >>>>>> i am trying to >>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>>>>> (openstack >>>>>> >>>>> >>>> compute >>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart >>>>>> the nova >>>>>> >>>>> >>>> compute >>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] >>>>>> Running >>>>>> >>>>> >>>> >>>>> instance usage >>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>> 2023-02-26 07:00:00 >>>>>> >>>>> >>>> to >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> [instance: >>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>> successful on node >>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>> nova.virt.libvirt.driver >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> [instance: >>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>> supplied device >>>>>> >>>>> >>>> name: >>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev >>>>>> names >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> [instance: >>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>>>>> volume >>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> Cache enabled >>>>>> >>>>> >>>> with >>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> Running >>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>>>> >>>>> >>>> 'privsep-helper', >>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>> '--config-file', >>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path', >>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> Spawned new >>>>>> >>>>> >>>> privsep >>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>> oslo.privsep.daemon [-] privsep >>>>>> >>>>> >>>> >>>>> daemon starting >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>> oslo.privsep.daemon [-] privsep >>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>> oslo.privsep.daemon [-] privsep >>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>> oslo.privsep.daemon [-] privsep >>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> Process >>>>>> >>>>> >>>> >>>>> execution error >>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>>>> command. >>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>> nova.virt.libvirt.driver >>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>> [instance: >>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> With regards, >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>> >>>>> >>>> >>>>> >>>>>> >>>>> >>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 22 21:07:32 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 02:37:32 +0530 Subject: Unable to create volume with image in edge site | Glance-Cinder | Openstack DCN | Wallaby Message-ID: Hi, I am creating a fresh thread for this glance issue. I have setup glance multistore for my infra. glance.yaml for dcn site: (overcloud) [stack at hkg2director ~]$ cat dcn02/glance_dcn02.yaml parameter_defaults: #GlanceShowMultipleLocations: true GlanceEnabledImportMethods: web-download,copy-image GlanceBackend: rbd GlanceStoreDescription: 'dcn02 rbd glance store' GlanceBackendID: dcn02 CephClusterName: dcn02 GlanceMultistoreConfig: ceph: GlanceBackend: rbd GlanceStoreDescription: 'Default glance store backend.' CephClusterName: ceph Now i have created a cirros image and have imported it to dcn store using copy-image method. When I create an empty volume in the DCN site the volume gets created without any issues. But when I create a volume with image (volume source) the volume gets stuck in the creating state forever. I get logs in cinder-volume 3-4 mins after I have hit the create volume button. Cinder logs: 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } I checked both glance and cinder containers are running in a healthy state. I see no errors or whatsoever. I am not sure how to fix the cinder volume stuck in the creating state in the DCN edge site. With regards, Swogat Pradhan -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Wed Mar 22 21:15:55 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 02:45:55 +0530 Subject: Unable to create volume with image in edge site | Glance-Cinder | Openstack DCN | Wallaby In-Reply-To: References: Message-ID: Cinder volume config: [tripleo_ceph] volume_backend_name=tripleo_ceph volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_user=openstack rbd_pool=volumes rbd_flatten_volume_from_snapshot=False rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b report_discard_supported=True rbd_ceph_conf=/etc/ceph/dcn02.conf rbd_cluster_name=dcn02 Glance api config: [dcn02] rbd_store_ceph_conf=/etc/ceph/dcn02.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=dcn02 rbd glance store [ceph] rbd_store_ceph_conf=/etc/ceph/ceph.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=Default glance store backend. On Thu, Mar 23, 2023 at 2:37?AM Swogat Pradhan wrote: > Hi, > I am creating a fresh thread for this glance issue. > I have setup glance multistore for my infra. > > glance.yaml for dcn site: > > (overcloud) [stack at hkg2director ~]$ cat dcn02/glance_dcn02.yaml > parameter_defaults: > #GlanceShowMultipleLocations: true > GlanceEnabledImportMethods: web-download,copy-image > GlanceBackend: rbd > GlanceStoreDescription: 'dcn02 rbd glance store' > GlanceBackendID: dcn02 > CephClusterName: dcn02 > GlanceMultistoreConfig: > ceph: > GlanceBackend: rbd > GlanceStoreDescription: 'Default glance store backend.' > CephClusterName: ceph > > Now i have created a cirros image and have imported it to dcn store using > copy-image method. When I create an empty volume in the DCN site the volume > gets created without any issues. > But when I create a volume with image (volume source) the volume gets > stuck in the creating state forever. I get logs in cinder-volume 3-4 mins > after I have hit the create volume button. > > Cinder logs: > 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume > [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, > 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': > datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > > I checked both glance and cinder containers are running in a healthy state. > I see no errors or whatsoever. I am not sure how to fix the cinder volume > stuck in the creating state in the DCN edge site. > > With regards, > Swogat Pradhan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 23 07:40:16 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 23 Mar 2023 16:40:16 +0900 Subject: [heat][PTG] 2023.2 (Bobcat) PTG Planning In-Reply-To: References: Message-ID: Hello, It seems all of the attendees who have signed up are based around APAC so I allocated the bexer room for 4 UTC ~ 7 UTC slot on Wednesday. I updated the schedule etherpad based on the items added to the planning etherpad but in case you have anything you want to add then please let me know. https://etherpad.opendev.org/p/march2023-ptg-heat Thank you, Takashi On Mon, Mar 13, 2023 at 4:39?PM Takashi Kajinami wrote: > Hello, > > > I've signed up for the upcoming virtual PTG so that we can have some slots > for Heat discussion. > In case you are interested in attending the sessions or have any topics > you want to discuss, > please put your name and the proposed topics in the etherpad. > https://etherpad.opendev.org/p/march2023-ptg-heat-planning > > It'd be nice if we can update the planning etherpad this week so that I'll > fix our slots and topics > early next week. > > Thank you, > Takashi Kajinami > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Thu Mar 23 07:44:44 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 14:44:44 +0700 Subject: [nova]host cpu reserve Message-ID: Hello guys. I am trying google for nova host cpu reserve to prevent host overload but I cannot find any resource about it. Could you give me some information? Thanks. Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From yasufum.o at gmail.com Thu Mar 23 07:58:55 2023 From: yasufum.o at gmail.com (Yasufumi Ogawa) Date: Thu, 23 Mar 2023 16:58:55 +0900 Subject: [tacker][ptg] Bobcat vPTG Planning In-Reply-To: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com> References: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com> Message-ID: <1d53b57b-6de0-91db-fcc5-756b4a43b2ab@gmail.com> Hi all, As hiromu proposed a cross-project session for a new feature in keystone middleware [1][2], we've setup a etherpad for the discussion[3]. Please everyone add your name to attendees if you're going to join the session. The time slot will be fixed soon. [1] https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032138.html [2] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032791.html [3] https://etherpad.opendev.org/p/bobcat-ptg-ext_oauth2_server Thanks, Yasufumi On 2023/03/21 4:38, Yasufumi Ogawa wrote: > Hi team, > > We are going to have the Bobcat vPTG through three days, 28-30 Mar > 6am-8am UTC as agreed at the IRC meeting last week. I've booked rooms > for the sessions and uploaded etherpad [1]. Please feel free to add your > proposal on the etherpad. > > [1] https://etherpad.opendev.org/p/tacker-bobcat-ptg > > Thanks, > Yasufumi From mkopec at redhat.com Thu Mar 23 08:51:19 2023 From: mkopec at redhat.com (Martin Kopec) Date: Thu, 23 Mar 2023 09:51:19 +0100 Subject: [qa][ptg] Virtual Bobcat vPTG Planning In-Reply-To: References: Message-ID: Based on the responses in the pool [2], I booked 2 one hour sessions: * Wed 14-15 UTC @ kilo * Wed 17-18 UTC @ kilo On Fri, 17 Mar 2023 at 14:20, Martin Kopec wrote: > Hello everyone, > > here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your > topics there if there is anything you would like to discuss / propose ... > You can also vote for time slots for our sessions so that they fit your > schedule at [2]. > > We will go most likely with 1-hour slot per day, as they usually fit > easier into everyone's schedule. The number of slots will depend on the > number of topics proposed in [1]. > > [1] https://etherpad.opendev.org/p/qa-bobcat-ptg > [2] https://framadate.org/sLZppMVkFw2FcEhO > > Thanks, > -- > Martin Kopec > Senior Software Quality Engineer > Red Hat EMEA > IM: kopecmartin > > > > -- Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 23 09:15:16 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 23 Mar 2023 18:15:16 +0900 Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will remove sqlalchemy-migrate support In-Reply-To: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> Message-ID: Thank you for the heads up, Stephen. Today I spent some time attempting to remove the dependency on sqlalchemy-migrate from heat. I've pushed current patch sets but so far these seem to be working (according to CI). https://review.opendev.org/q/topic:alembic+project:openstack/heat We'll try to get these merged ASAP so that we can bump oslo.db timely after the new version without sqlalchemy support is released. If you have time to help reviewing these, that would be much appreciated. Thank you, Takashi On Thu, Mar 23, 2023 at 1:43?AM Stephen Finucane wrote: > tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to > start > their switch to alembic immediately. Projects with "legacy" > sqlalchemy-migrated > based migrations need to drop them. > > A quick heads up that oslo.db 13.0.0 will be release in the next month or > so and > will remove sqlalchemy-migrate support and formally add support for > sqlalchemy > 2.x. The removal of sqlalchemy-migrate support should only affect projects > using > oslo.db's sqlalchemy-migrate wrappers, as opposed to using > sqlalchemy-migrate > directly. For any projects that rely on this functionality, a short-term > fix is > to vendor the removed code [1] in your project. However, I must emphasise > that > we're not removing sqlalchemy-migrate integration for the fun of it: it's > not > compatible with sqlalchemy 2.x and is no longer maintained. If your > project uses > sqlalchemy-migrate and you haven't migrated to alembic yet, you need to > start > doing so immediately. If you have migrated to alembic but still have > sqlalchemy- > migrate "legacy" migrations in-tree, you need to look at dropping these > asap. > Anything less will result in broken master when we bump upper-constraints > to > allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that > appear to > be using the removed modules. > > For more advice on migrating to sqlalchemy 2.x and alembic, please look at > my > previous post on the matter [2]. > > Cheers, > Stephen > > [1] https://review.opendev.org/c/openstack/oslo.db/+/853025 > [2] > https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 23 09:16:11 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 23 Mar 2023 18:16:11 +0900 Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will remove sqlalchemy-migrate support In-Reply-To: References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> Message-ID: On Thu, Mar 23, 2023 at 6:15?PM Takashi Kajinami wrote: > Thank you for the heads up, Stephen. > > Today I spent some time attempting to remove the dependency on > sqlalchemy-migrate from heat. > I've pushed current patch sets but so far these seem to be working > (according to CI). > https://review.opendev.org/q/topic:alembic+project:openstack/heat > > We'll try to get these merged ASAP so that we can bump oslo.db timely > after the new version without > sqlalchemy support is released. If you have time to help reviewing these, > that would be much appreciated. > tiny but important correction the new version without "sqlalchemy-migrate" support > > Thank you, > Takashi > > > > On Thu, Mar 23, 2023 at 1:43?AM Stephen Finucane > wrote: > >> tl;dr: Projects still relying on sqlalchemy-migrate for migrations need >> to start >> their switch to alembic immediately. Projects with "legacy" >> sqlalchemy-migrated >> based migrations need to drop them. >> >> A quick heads up that oslo.db 13.0.0 will be release in the next month or >> so and >> will remove sqlalchemy-migrate support and formally add support for >> sqlalchemy >> 2.x. The removal of sqlalchemy-migrate support should only affect >> projects using >> oslo.db's sqlalchemy-migrate wrappers, as opposed to using >> sqlalchemy-migrate >> directly. For any projects that rely on this functionality, a short-term >> fix is >> to vendor the removed code [1] in your project. However, I must emphasise >> that >> we're not removing sqlalchemy-migrate integration for the fun of it: it's >> not >> compatible with sqlalchemy 2.x and is no longer maintained. If your >> project uses >> sqlalchemy-migrate and you haven't migrated to alembic yet, you need to >> start >> doing so immediately. If you have migrated to alembic but still have >> sqlalchemy- >> migrate "legacy" migrations in-tree, you need to look at dropping these >> asap. >> Anything less will result in broken master when we bump upper-constraints >> to >> allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that >> appear to >> be using the removed modules. >> >> For more advice on migrating to sqlalchemy 2.x and alembic, please look >> at my >> previous post on the matter [2]. >> >> Cheers, >> Stephen >> >> [1] https://review.opendev.org/c/openstack/oslo.db/+/853025 >> [2] >> https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 23 12:09:55 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 23 Mar 2023 12:09:55 +0000 Subject: [nova]host cpu reserve In-Reply-To: References: Message-ID: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> generally you should not you can use it but the preferd way to do this is use cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set) https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use. https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/ have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set, the issue with usein https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus is that you have to multiple the number of cores that are resverved by the https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio which means if you decide to manage that via placement api by using https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead then you need to update your nova.conf to modify the reservationfi you change the allocation ratio. if instead you use cpu_shared_set and cpu_dedicated_set you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd. in general you shoudl reserve the first core on each cpu socket for the host os. if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted form the cpu_shared_set and cpu_dedicated_set On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > Hello guys. > I am trying google for nova host cpu reserve to prevent host overload but I > cannot find any resource about it. Could you give me some information? > Thanks. > Nguyen Huu Khoi From swogatpradhan22 at gmail.com Wed Mar 22 21:16:04 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 02:46:04 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Cinder volume config: [tripleo_ceph] volume_backend_name=tripleo_ceph volume_driver=cinder.volume.drivers.rbd.RBDDriver rbd_user=openstack rbd_pool=volumes rbd_flatten_volume_from_snapshot=False rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b report_discard_supported=True rbd_ceph_conf=/etc/ceph/dcn02.conf rbd_cluster_name=dcn02 Glance api config: [dcn02] rbd_store_ceph_conf=/etc/ceph/dcn02.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=dcn02 rbd glance store [ceph] rbd_store_ceph_conf=/etc/ceph/ceph.conf rbd_store_user=openstack rbd_store_pool=images rbd_thin_provisioning=False store_description=Default glance store backend. On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan wrote: > I still have the same issue, I'm not sure what's left to try. > All the pods are now in a healthy state, I am getting log entries 3 mins > after I hit the create volume button in cinder-volume when I try to create > a volume with an image. > And the volumes are just stuck in creating state for more than 20 mins now. > > Cinder logs: > 2023-03-22 20:32:44.010 108 INFO cinder.rpc > [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected > cinder-volume RPC version 3.17 as minimum service version. > 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume > [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, > 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': > datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', > 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > > With regards, > Swogat Pradhan > > On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop wrote: > >> >> >> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan >> wrote: >> >>> Hi Adam, >>> The systems are in same LAN, in this case it seemed like the image was >>> getting pulled from the central site which was caused due to an >>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>> directory, which seems to have been resolved after the changes i made to >>> fix it. >>> >>> Right now the glance api podman is running in unhealthy state and the >>> podman logs don't show any error whatsoever and when issued the command >>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>> site, which is why cinder is throwing an error stating: >>> >>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>> finding address for >>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>> Unable to establish connection to >>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>> NewConnectionError('>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>> ECONNREFUSED',)) >>> >>> Now i need to find out why the port is not listed as the glance service >>> is running, which i am not sure how to find out. >>> >> >> One other thing to investigate is whether your deployment includes this >> patch [1]. If it does, then bear in mind >> the glance-api service running at the edge site will be an "internal" >> (non public facing) instance that uses port 9293 >> instead of 9292. You should familiarize yourself with the release note >> [2]. >> >> [1] >> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >> [2] >> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >> >> Alan >> >> >>> With regards, >>> Swogat Pradhan >>> >>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop wrote: >>> >>>> >>>> >>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Update: >>>>> Here is the log when creating a volume using cirros image: >>>>> >>>>> 2023-03-22 11:04:38.449 109 INFO >>>>> cinder.volume.flows.manager.create_volume >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>> specification: {'status': 'creating', 'volume_name': >>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>> } >>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>> >>>> >>>> As Adam Savage would say, well there's your problem ^^ (Image download >>>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s >>>> suggests you have a network issue. >>>> >>>> John Fulton previously stated your cinder-volume service at the edge >>>> site is not using the local ceph image store. Assuming you are deploying >>>> GlanceApiEdge service [1], then the cinder-volume service should be >>>> configured to use the local glance service [2]. You should check cinder's >>>> glance_api_servers to confirm it's the edge site's glance service. >>>> >>>> [1] >>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>> [2] >>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>> >>>> Alan >>>> >>>> >>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>> be removed. Use explicitly json instead in version 'xena' >>>>> category=FutureWarning) >>>>> >>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>> be removed. Use explicitly json instead in version 'xena' >>>>> category=FutureWarning) >>>>> >>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>> MB/s >>>>> 2023-03-22 11:11:14.998 109 INFO >>>>> cinder.volume.flows.manager.create_volume >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>> >>>>> The image is present in dcn02 store but still it downloaded the image >>>>> in 0.16 MB/s and then created the volume. >>>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>>> Hi Jhon, >>>>>> This seems to be an issue. >>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>>>> parameter was specified to the respective cluster names but the config >>>>>> files were created in the name of ceph.conf and keyring was >>>>>> ceph.client.openstack.keyring. >>>>>> >>>>>> Which created issues in glance as well as the naming convention of >>>>>> the files didn't match the cluster names, so i had to manually rename the >>>>>> central ceph conf file as such: >>>>>> >>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>> total 16 >>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>> ceph_central.client.openstack.keyring >>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring >>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>> [root at dcn02-compute-0 ceph]# >>>>>> >>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>>>>> respective clusters in both dcn01 and dcn02. >>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>> for accessing central ceph cluster. >>>>>> >>>>>> glance multistore config: >>>>>> [dcn02] >>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>> rbd_store_user=openstack >>>>>> rbd_store_pool=images >>>>>> rbd_thin_provisioning=False >>>>>> store_description=dcn02 rbd glance store >>>>>> >>>>>> [ceph_central] >>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>> rbd_store_user=openstack >>>>>> rbd_store_pool=images >>>>>> rbd_thin_provisioning=False >>>>>> store_description=Default glance store backend. >>>>>> >>>>>> >>>>>> With regards, >>>>>> Swogat Pradhan >>>>>> >>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>> wrote: >>>>>> >>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > Seems like cinder is not using the local ceph. >>>>>>> >>>>>>> That explains the issue. It's a misconfiguration. >>>>>>> >>>>>>> I hope this is not a production system since the mailing list now has >>>>>>> the cinder.conf which contains passwords. >>>>>>> >>>>>>> The section that looks like this: >>>>>>> >>>>>>> [tripleo_ceph] >>>>>>> volume_backend_name=tripleo_ceph >>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>> rbd_user=openstack >>>>>>> rbd_pool=volumes >>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>> rbd_secret_uuid= >>>>>>> report_discard_supported=True >>>>>>> >>>>>>> Should be updated to refer to the local DCN ceph cluster and not the >>>>>>> central one. Use the ceph conf file for that cluster and ensure the >>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>> >>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the >>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>> secret-get-value $FSID`. >>>>>>> >>>>>>> The documentation describes how to configure the central and DCN >>>>>>> sites >>>>>>> correctly but an error seems to have occurred while you were >>>>>>> following >>>>>>> it. >>>>>>> >>>>>>> >>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>> >>>>>>> John >>>>>>> >>>>>>> > >>>>>>> > Ceph Output: >>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>> > NAME SIZE PARENT FMT >>>>>>> PROT LOCK >>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>>>>> excl >>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 >>>>>>> yes >>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 >>>>>>> yes >>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 >>>>>>> yes >>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 >>>>>>> yes >>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 >>>>>>> yes >>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 >>>>>>> yes >>>>>>> > >>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>> > NAME SIZE PARENT FMT >>>>>>> PROT LOCK >>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>> > >>>>>>> > Attached the cinder config. >>>>>>> > Please let me know how I can solve this issue. >>>>>>> > >>>>>>> > With regards, >>>>>>> > Swogat Pradhan >>>>>>> > >>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>>>>> wrote: >>>>>>> >> >>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>> config. >>>>>>> >> >>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>> >>>>>>> >>> Update: >>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>>>> >>> The image size is 389 MB. >>>>>>> >>> >>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>> >>>>>>> >>>> Hi Jhon, >>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>>>> after importing from the central site. >>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>> time for the volume to get created. >>>>>>> >>>> >>>>>>> >>>> When launching an instance from volume the instance is getting >>>>>>> created properly without any errors. >>>>>>> >>>> >>>>>>> >>>> I tried to cache images in nova using >>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>> but getting checksum failed error. >>>>>>> >>>> >>>>>>> >>>> With regards, >>>>>>> >>>> Swogat Pradhan >>>>>>> >>>> >>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>> johfulto at redhat.com> wrote: >>>>>>> >>>>> >>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>> >>>>> wrote: >>>>>>> >>>>> > >>>>>>> >>>>> > Update: After restarting the nova services on the controller >>>>>>> and running the deploy script on the edge site, I was able to launch the VM >>>>>>> from volume. >>>>>>> >>>>> > >>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>> to the edge glance. >>>>>>> >>>>> >>>>>>> >>>>> Try following this document and making the same observations >>>>>>> in your >>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>> >>>>> >>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>> >>>>> >>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>> >>>>> NAME SIZE PARENT >>>>>>> >>>>> FMT PROT LOCK >>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 excl >>>>>>> >>>>> $ >>>>>>> >>>>> >>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>> which is on >>>>>>> >>>>> the same local ceph cluster. >>>>>>> >>>>> >>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>> encountering >>>>>>> >>>>> the streaming behavior described here: >>>>>>> >>>>> >>>>>>> >>>>> Ideally all images should reside in the central Glance and be >>>>>>> copied >>>>>>> >>>>> to DCN sites before instances of those images are booted on >>>>>>> DCN sites. >>>>>>> >>>>> If an image is not copied to a DCN site before it is booted, >>>>>>> then the >>>>>>> >>>>> image will be streamed to the DCN site and then the image will >>>>>>> boot as >>>>>>> >>>>> an instance. This happens because Glance at the DCN site has >>>>>>> access to >>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>> booting of >>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>> advance, >>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>> >>>>> >>>>>>> >>>>> You can also exec into the cinder container at the DCN site and >>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>> >>>>> >>>>>>> >>>>> John >>>>>>> >>>>> >>>>>>> >>>>> > >>>>>>> >>>>> > I will try and create a new fresh image and test again then >>>>>>> update. >>>>>>> >>>>> > >>>>>>> >>>>> > With regards, >>>>>>> >>>>> > Swogat Pradhan >>>>>>> >>>>> > >>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>> >> >>>>>>> >>>>> >> Update: >>>>>>> >>>>> >> In the hypervisor list the compute node state is showing >>>>>>> down. >>>>>>> >>>>> >> >>>>>>> >>>>> >> >>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Hi Brendan, >>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux >>>>>>> bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>>>> >>>>> >>> I used a cirros image to launch instance but the instance >>>>>>> timed out so i waited for the volume to be created. >>>>>>> >>>>> >>> Once the volume was created i tried launching the instance >>>>>>> from the volume and still the instance is stuck in spawning state. >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon >>>>>>> [-] privsep daemon starting >>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon >>>>>>> [-] privsep process running with uid/gid: 0/0 >>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>> [-] privsep process running with capabilities (eff/prm/inh): >>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>> [-] privsep daemon running as pid 185437 >>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>> >>>>> >>> Exit code: 2 >>>>>>> >>>>> >>> Stdout: '' >>>>>>> >>>>> >>> Stderr: '': >>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>> running command. >>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>> template mentioned here ?: >>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> The volume is already created and i do not understand why >>>>>>> the instance is stuck in spawning state. >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> With regards, >>>>>>> >>>>> >>> Swogat Pradhan >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>> bshephar at redhat.com> wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Does your environment use different network interfaces >>>>>>> for each of the networks? Or does it have a bond with everything on it? >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>> while spawning the instance. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. >>>>>>> So, based on that experience, from my perspective, is certainly sounds like >>>>>>> some kind of network issue. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Regards, >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Brendan Shephard >>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>> >>>>> >>>> Red Hat Australia >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>> wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Hi, >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time >>>>>>> ago in this thread: >>>>>>> >>>>> >>>> >>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that >>>>>>> user, not sure if that could apply here. But is it possible that your nova >>>>>>> and neutron versions are different between central and edge site? Have you >>>>>>> restarted nova and neutron services on the compute nodes after >>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>> Maybe they can help narrow down the issue. >>>>>>> >>>>> >>>> If there isn't any additional information in the debug >>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do >>>>>>> that in a production system yet so be careful. I can think of two routes: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>>>> running, this will most likely impact client IO depending on your load. >>>>>>> Check out the rabbitmqctl commands. >>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables >>>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>> a better advice. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Regards, >>>>>>> >>>>> >>>> Eugen >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Hi, >>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> With regards, >>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> >>>>>>> >>>>> >>>> wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Hi >>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but >>>>>>> not due to packet >>>>>>> >>>>> >>>> loss. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> with regards, >>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> >>>>>>> >>>>> >>>> wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Hi, >>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>> checked when >>>>>>> >>>>> >>>> launching the instance. >>>>>>> >>>>> >>>> I will check that and come back. >>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>> stuck at spawning >>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure >>>>>>> if packet loss >>>>>>> >>>>> >>>> causes this. >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> With regards, >>>>>>> >>>>> >>>> Swogat pradhan >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block >>>>>>> wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>> identical between >>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through the >>>>>>> tunnel? >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or >>>>>>> 'cc' as i am not >>>>>>> >>>>> >>>> > getting email's from you. >>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>>>>> list_policies -p >>>>>>> >>>>> >>>> / >>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>> priority >>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> >>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>> down when i am >>>>>>> >>>>> >>>> trying >>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>> spawning state and >>>>>>> >>>>> >>>> then >>>>>>> >>>>> >>>> > gets stuck. >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>>>>>> sites. >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> > With regards, >>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>> >>>>> >>>> > wrote: >>>>>>> >>>>> >>>> > >>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>> directly, i am >>>>>>> >>>>> >>>> checking >>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>> reply. >>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>> occurred. >>>>>>> >>>>> >>>> >> >>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>> activities in the >>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.* >>>>>>> >>>>> >>>> >> >>>>>>> >>>>> >>>> >> With regards, >>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>> >>>>> >>>> >> >>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>> >>>>> >>>> >> wrote: >>>>>>> >>>>> >>>> >> >>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>>>>>> details: >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>> >>>>> >>>> >>> >>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>> >>>>> >>>> Started >>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>> >>>>> >>>> Started >>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>> >>>>> >>>> Started >>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>> >>>>> >>>> Started >>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but >>>>>>> the issue is >>>>>>> >>>>> >>>> still >>>>>>> >>>>> >>>> >>> present. >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>> cluster_status >>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>> >>>>> >>>> >>> Basics >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Versions >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>> >>>>> >>>> 3.8.3 >>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>> >>>>> >>>> 3.8.3 >>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>> >>>>> >>>> 3.8.3 >>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>> >>>>> >>>> >>> >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>> >>>>> >>>> RabbitMQ >>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Alarms >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> (none) >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> (none) >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Listeners >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>> inter-node and CLI >>>>>>> >>>>> >>>> tool >>>>>>> >>>>> >>>> >>> communication >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: >>>>>>> AMQP 0-9-1 >>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>> inter-node and CLI >>>>>>> >>>>> >>>> tool >>>>>>> >>>>> >>>> >>> communication >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: >>>>>>> AMQP 0-9-1 >>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>> inter-node and CLI >>>>>>> >>>>> >>>> tool >>>>>>> >>>>> >>>> >>> communication >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: >>>>>>> AMQP 0-9-1 >>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>> >>>>> >>>> interface: >>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>> >>>>> >>>> , >>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>>>>> purpose: >>>>>>> >>>>> >>>> inter-node and >>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>> >>>>> >>>> , >>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>> amqp, purpose: AMQP >>>>>>> >>>>> >>>> 0-9-1 >>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>> >>>>> >>>> >>> Node: >>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>> >>>>> >>>> , >>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>> purpose: HTTP API >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Feature flags >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> With regards, >>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>> >>>>> >>>> >>> wrote: >>>>>>> >>>>> >>>> >>> >>>>>>> >>>>> >>>> >>>> Hi, >>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api >>>>>>> log. >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>>> The reply >>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>> after 60 seconds >>>>>>> >>>>> >>>> due to a >>>>>>> >>>>> >>>> >>>> missing queue >>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>> >>>>> >>>> Abandoning...: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> The reply >>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>> after 60 seconds >>>>>>> >>>>> >>>> due to a >>>>>>> >>>>> >>>> >>>> missing queue >>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>> >>>>> >>>> Abandoning...: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> The reply >>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>> after 60 seconds >>>>>>> >>>>> >>>> due to a >>>>>>> >>>>> >>>> >>>> missing queue >>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>> >>>>> >>>> Abandoning...: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> Cache enabled >>>>>>> >>>>> >>>> with >>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>> exist, drop reply to >>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>> The reply >>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>> after 60 seconds >>>>>>> >>>>> >>>> due to a >>>>>>> >>>>> >>>> >>>> missing queue >>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>> >>>>> >>>> Abandoning...: >>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>> With regards, >>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>> where i am trying to >>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down >>>>>>> (openstack >>>>>>> >>>>> >>>> compute >>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart >>>>>>> the nova >>>>>>> >>>>> >>>> compute >>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - >>>>>>> -] Running >>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>> 2023-02-26 07:00:00 >>>>>>> >>>>> >>>> to >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> [instance: >>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>> successful on node >>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>> nova.virt.libvirt.driver >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> [instance: >>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>> supplied device >>>>>>> >>>>> >>>> name: >>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev >>>>>>> names >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>> nova.virt.block_device >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> [instance: >>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>>>>>> volume >>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> Cache enabled >>>>>>> >>>>> >>>> with >>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> Running >>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', >>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>> '--config-file', >>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>> '--privsep_sock_path', >>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> Spawned new >>>>>>> >>>>> >>>> privsep >>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>> oslo.privsep.daemon [-] privsep >>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>> oslo.privsep.daemon [-] privsep >>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>> oslo.privsep.daemon [-] privsep >>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>> oslo.privsep.daemon [-] privsep >>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> Process >>>>>>> >>>>> >>>> >>>>> execution error >>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>>>>> command. >>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>> nova.virt.libvirt.driver >>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>> [instance: >>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>> >>>>> >>>> >>>>> >>>>>>> >>>>> >>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From renliang at uniontech.com Thu Mar 23 07:01:05 2023 From: renliang at uniontech.com (=?utf-8?B?5Lu75Lqu?=) Date: Thu, 23 Mar 2023 15:01:05 +0800 Subject: [ironic]Questions about the use of the build image tool Message-ID: Hi We are using diskimage-builder to make a custom image, There is a problem in extract_image, The question is chroot: failed to run command 'bin/tar': No such file or directory, I found the /bin/tar file in the working directory /tmp/ tmp.00yldwe1xt. But errors are still being reported. It's not clear if this is a custom image problem, There are also requirements for custom images. ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: application/octet-stream Size: 21357 bytes Desc: not available URL: From renliang at uniontech.com Thu Mar 23 08:41:00 2023 From: renliang at uniontech.com (=?utf-8?B?5Lu75Lqu?=) Date: Thu, 23 Mar 2023 16:41:00 +0800 Subject: =?utf-8?B?5Zue5aSN77yaIFtpcm9uaWNdUXVlc3Rpb25zIGFi?= =?utf-8?B?b3V0IHRoZSB1c2Ugb2YgdGhlIGJ1aWxkIGltYWdl?= =?utf-8?B?IHRvb2w=?= Message-ID: I have found the cause of the problem, which is because tar does not exist in the working directory. I wonder if there are requirements on the base image when building from the base image.  Thank you. ???????? ----------???????---------- ?? From swogatpradhan22 at gmail.com Thu Mar 23 12:20:35 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 17:50:35 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi, Is this bind not required for cinder_scheduler container? "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", I do not see this particular bind on the cinder scheduler containers on my controller nodes. With regards, Swogat Pradhan On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan wrote: > Cinder volume config: > > [tripleo_ceph] > volume_backend_name=tripleo_ceph > volume_driver=cinder.volume.drivers.rbd.RBDDriver > rbd_user=openstack > rbd_pool=volumes > rbd_flatten_volume_from_snapshot=False > rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b > report_discard_supported=True > rbd_ceph_conf=/etc/ceph/dcn02.conf > rbd_cluster_name=dcn02 > > Glance api config: > > [dcn02] > rbd_store_ceph_conf=/etc/ceph/dcn02.conf > rbd_store_user=openstack > rbd_store_pool=images > rbd_thin_provisioning=False > store_description=dcn02 rbd glance store > [ceph] > rbd_store_ceph_conf=/etc/ceph/ceph.conf > rbd_store_user=openstack > rbd_store_pool=images > rbd_thin_provisioning=False > store_description=Default glance store backend. > > On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan > wrote: > >> I still have the same issue, I'm not sure what's left to try. >> All the pods are now in a healthy state, I am getting log entries 3 mins >> after I hit the create volume button in cinder-volume when I try to create >> a volume with an image. >> And the volumes are just stuck in creating state for more than 20 mins >> now. >> >> Cinder logs: >> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >> cinder-volume RPC version 3.17 as minimum service version. >> 2023-03-22 20:34:59.166 108 INFO >> cinder.volume.flows.manager.create_volume >> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >> specification: {'status': 'creating', 'volume_name': >> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >> tzinfo=datetime.timezone.utc), 'locations': [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> 'metadata': {'store': 'dcn02'}}], 'direct_url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >> 'owner_specified.openstack.object': 'images/cirros', >> 'owner_specified.openstack.sha256': ''}}, 'image_service': >> } >> >> With regards, >> Swogat Pradhan >> >> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop wrote: >> >>> >>> >>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Hi Adam, >>>> The systems are in same LAN, in this case it seemed like the image was >>>> getting pulled from the central site which was caused due to an >>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>> directory, which seems to have been resolved after the changes i made to >>>> fix it. >>>> >>>> Right now the glance api podman is running in unhealthy state and the >>>> podman logs don't show any error whatsoever and when issued the command >>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>> site, which is why cinder is throwing an error stating: >>>> >>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>> finding address for >>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>> Unable to establish connection to >>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>> NewConnectionError('>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>> ECONNREFUSED',)) >>>> >>>> Now i need to find out why the port is not listed as the glance service >>>> is running, which i am not sure how to find out. >>>> >>> >>> One other thing to investigate is whether your deployment includes this >>> patch [1]. If it does, then bear in mind >>> the glance-api service running at the edge site will be an "internal" >>> (non public facing) instance that uses port 9293 >>> instead of 9292. You should familiarize yourself with the release note >>> [2]. >>> >>> [1] >>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>> [2] >>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>> >>> Alan >>> >>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>>> Update: >>>>>> Here is the log when creating a volume using cirros image: >>>>>> >>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>> cinder.volume.flows.manager.create_volume >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>> specification: {'status': 'creating', 'volume_name': >>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> [{'url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>> } >>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>> >>>>> >>>>> As Adam Savage would say, well there's your problem ^^ (Image download >>>>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s >>>>> suggests you have a network issue. >>>>> >>>>> John Fulton previously stated your cinder-volume service at the edge >>>>> site is not using the local ceph image store. Assuming you are deploying >>>>> GlanceApiEdge service [1], then the cinder-volume service should be >>>>> configured to use the local glance service [2]. You should check cinder's >>>>> glance_api_servers to confirm it's the edge site's glance service. >>>>> >>>>> [1] >>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>> [2] >>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>> >>>>> Alan >>>>> >>>>> >>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>> category=FutureWarning) >>>>>> >>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>> category=FutureWarning) >>>>>> >>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>> MB/s >>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>> cinder.volume.flows.manager.create_volume >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>> >>>>>> The image is present in dcn02 store but still it downloaded the image >>>>>> in 0.16 MB/s and then created the volume. >>>>>> >>>>>> With regards, >>>>>> Swogat Pradhan >>>>>> >>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>>>> Hi Jhon, >>>>>>> This seems to be an issue. >>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>>>>> parameter was specified to the respective cluster names but the config >>>>>>> files were created in the name of ceph.conf and keyring was >>>>>>> ceph.client.openstack.keyring. >>>>>>> >>>>>>> Which created issues in glance as well as the naming convention of >>>>>>> the files didn't match the cluster names, so i had to manually rename the >>>>>>> central ceph conf file as such: >>>>>>> >>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>> total 16 >>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>> ceph_central.client.openstack.keyring >>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>> ceph.client.openstack.keyring >>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>> >>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>>>>>> respective clusters in both dcn01 and dcn02. >>>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>>> for accessing central ceph cluster. >>>>>>> >>>>>>> glance multistore config: >>>>>>> [dcn02] >>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>> rbd_store_user=openstack >>>>>>> rbd_store_pool=images >>>>>>> rbd_thin_provisioning=False >>>>>>> store_description=dcn02 rbd glance store >>>>>>> >>>>>>> [ceph_central] >>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>> rbd_store_user=openstack >>>>>>> rbd_store_pool=images >>>>>>> rbd_thin_provisioning=False >>>>>>> store_description=Default glance store backend. >>>>>>> >>>>>>> >>>>>>> With regards, >>>>>>> Swogat Pradhan >>>>>>> >>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>> wrote: >>>>>>> >>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Hi, >>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>> >>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>> >>>>>>>> I hope this is not a production system since the mailing list now >>>>>>>> has >>>>>>>> the cinder.conf which contains passwords. >>>>>>>> >>>>>>>> The section that looks like this: >>>>>>>> >>>>>>>> [tripleo_ceph] >>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>> rbd_user=openstack >>>>>>>> rbd_pool=volumes >>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>> rbd_secret_uuid= >>>>>>>> report_discard_supported=True >>>>>>>> >>>>>>>> Should be updated to refer to the local DCN ceph cluster and not the >>>>>>>> central one. Use the ceph conf file for that cluster and ensure the >>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>> >>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of >>>>>>>> the >>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>> secret-get-value $FSID`. >>>>>>>> >>>>>>>> The documentation describes how to configure the central and DCN >>>>>>>> sites >>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>> following >>>>>>>> it. >>>>>>>> >>>>>>>> >>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> > >>>>>>>> > Ceph Output: >>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>> > NAME SIZE PARENT FMT >>>>>>>> PROT LOCK >>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>>>>>> excl >>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB 2 >>>>>>>> yes >>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB 2 >>>>>>>> yes >>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB 2 >>>>>>>> yes >>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB 2 >>>>>>>> yes >>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB 2 >>>>>>>> yes >>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB 2 >>>>>>>> yes >>>>>>>> > >>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>> > NAME SIZE PARENT >>>>>>>> FMT PROT LOCK >>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>> > >>>>>>>> > Attached the cinder config. >>>>>>>> > Please let me know how I can solve this issue. >>>>>>>> > >>>>>>>> > With regards, >>>>>>>> > Swogat Pradhan >>>>>>>> > >>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>>>>>> wrote: >>>>>>>> >> >>>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>> config. >>>>>>>> >> >>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>> >>>>>>>> >>> Update: >>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>>>>> >>> The image size is 389 MB. >>>>>>>> >>> >>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>> >>>>>>>> >>>> Hi Jhon, >>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>>>>> after importing from the central site. >>>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>>> time for the volume to get created. >>>>>>>> >>>> >>>>>>>> >>>> When launching an instance from volume the instance is getting >>>>>>>> created properly without any errors. >>>>>>>> >>>> >>>>>>>> >>>> I tried to cache images in nova using >>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>> but getting checksum failed error. >>>>>>>> >>>> >>>>>>>> >>>> With regards, >>>>>>>> >>>> Swogat Pradhan >>>>>>>> >>>> >>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>> johfulto at redhat.com> wrote: >>>>>>>> >>>>> >>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>> >>>>> wrote: >>>>>>>> >>>>> > >>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>> launch the VM from volume. >>>>>>>> >>>>> > >>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>> to the edge glance. >>>>>>>> >>>>> >>>>>>>> >>>>> Try following this document and making the same observations >>>>>>>> in your >>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>> >>>>> >>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>> >>>>> >>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>> >>>>> FMT PROT LOCK >>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>> excl >>>>>>>> >>>>> $ >>>>>>>> >>>>> >>>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>>> which is on >>>>>>>> >>>>> the same local ceph cluster. >>>>>>>> >>>>> >>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>> encountering >>>>>>>> >>>>> the streaming behavior described here: >>>>>>>> >>>>> >>>>>>>> >>>>> Ideally all images should reside in the central Glance and be >>>>>>>> copied >>>>>>>> >>>>> to DCN sites before instances of those images are booted on >>>>>>>> DCN sites. >>>>>>>> >>>>> If an image is not copied to a DCN site before it is booted, >>>>>>>> then the >>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>> will boot as >>>>>>>> >>>>> an instance. This happens because Glance at the DCN site has >>>>>>>> access to >>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>> booting of >>>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>>> advance, >>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>> >>>>> >>>>>>>> >>>>> You can also exec into the cinder container at the DCN site >>>>>>>> and >>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>> >>>>> >>>>>>>> >>>>> John >>>>>>>> >>>>> >>>>>>>> >>>>> > >>>>>>>> >>>>> > I will try and create a new fresh image and test again then >>>>>>>> update. >>>>>>>> >>>>> > >>>>>>>> >>>>> > With regards, >>>>>>>> >>>>> > Swogat Pradhan >>>>>>>> >>>>> > >>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> Update: >>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing >>>>>>>> down. >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>>>>> >>>>> >>> I used a cirros image to launch instance but the instance >>>>>>>> timed out so i waited for the volume to be created. >>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon >>>>>>>> [-] privsep daemon starting >>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon >>>>>>>> [-] privsep process running with uid/gid: 0/0 >>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>> [-] privsep process running with capabilities (eff/prm/inh): >>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>> [-] privsep daemon running as pid 185437 >>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>> >>>>> >>> Stdout: '' >>>>>>>> >>>>> >>> Stderr: '': >>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>> running command. >>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>> template mentioned here ?: >>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> The volume is already created and i do not understand why >>>>>>>> the instance is stuck in spawning state. >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> With regards, >>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>> bshephar at redhat.com> wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Does your environment use different network interfaces >>>>>>>> for each of the networks? Or does it have a bond with everything on it? >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>> while spawning the instance. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. >>>>>>>> So, based on that experience, from my perspective, is certainly sounds like >>>>>>>> some kind of network issue. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Regards, >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>>> wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Hi, >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time >>>>>>>> ago in this thread: >>>>>>>> >>>>> >>>> >>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>> Maybe they can help narrow down the issue. >>>>>>>> >>>>> >>>> If there isn't any additional information in the debug >>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do >>>>>>>> that in a production system yet so be careful. I can think of two routes: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>>>>> running, this will most likely impact client IO depending on your load. >>>>>>>> Check out the rabbitmqctl commands. >>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables >>>>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>> a better advice. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Regards, >>>>>>>> >>>>> >>>> Eugen >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Hi, >>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> With regards, >>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>> >>>>> >>>> wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Hi >>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but >>>>>>>> not due to packet >>>>>>>> >>>>> >>>> loss. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> with regards, >>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>> >>>>> >>>> wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Hi, >>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>> checked when >>>>>>>> >>>>> >>>> launching the instance. >>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>> stuck at spawning >>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure >>>>>>>> if packet loss >>>>>>>> >>>>> >>>> causes this. >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> With regards, >>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>> eblock at nde.ag> wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>> identical between >>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through >>>>>>>> the tunnel? >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or >>>>>>>> 'cc' as i am not >>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>>>>>> list_policies -p >>>>>>>> >>>>> >>>> / >>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>>> priority >>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> >>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>>> down when i am >>>>>>>> >>>>> >>>> trying >>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>> spawning state and >>>>>>>> >>>>> >>>> then >>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge >>>>>>>> sites. >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> > With regards, >>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>> >>>>> >>>> > wrote: >>>>>>>> >>>>> >>>> > >>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>> directly, i am >>>>>>>> >>>>> >>>> checking >>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>> reply. >>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>> occurred. >>>>>>>> >>>>> >>>> >> >>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>> activities in the >>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>> site.* >>>>>>>> >>>>> >>>> >> >>>>>>>> >>>>> >>>> >> With regards, >>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>> >>>>> >>>> >> >>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>> >>>>> >>>> >> wrote: >>>>>>>> >>>>> >>>> >> >>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the >>>>>>>> details: >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>> >>>>> >>>> >>> >>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>> >>>>> >>>> Started >>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>> >>>>> >>>> Started >>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>> >>>>> >>>> Started >>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>> >>>>> >>>> Started >>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times >>>>>>>> but the issue is >>>>>>>> >>>>> >>>> still >>>>>>>> >>>>> >>>> >>> present. >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>> cluster_status >>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>> >>>>> >>>> >>> Basics >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Versions >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>> >>>>> >>>> >>> >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> (none) >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> (none) >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>> inter-node and CLI >>>>>>>> >>>>> >>>> tool >>>>>>>> >>>>> >>>> >>> communication >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: >>>>>>>> AMQP 0-9-1 >>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>> inter-node and CLI >>>>>>>> >>>>> >>>> tool >>>>>>>> >>>>> >>>> >>> communication >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: >>>>>>>> AMQP 0-9-1 >>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>> inter-node and CLI >>>>>>>> >>>>> >>>> tool >>>>>>>> >>>>> >>>> >>> communication >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: >>>>>>>> AMQP 0-9-1 >>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>> >>>>> >>>> interface: >>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> , >>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>>>>>> purpose: >>>>>>>> >>>>> >>>> inter-node and >>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> , >>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>> amqp, purpose: AMQP >>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>> >>>>> >>>> >>> Node: >>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>> >>>>> >>>> , >>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>> purpose: HTTP API >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>> >>>>> >>>> >>> >>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api >>>>>>>> log. >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>> -] The reply >>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>> after 60 seconds >>>>>>>> >>>>> >>>> due to a >>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>> -] The reply >>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>> after 60 seconds >>>>>>>> >>>>> >>>> due to a >>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>> -] The reply >>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>> after 60 seconds >>>>>>>> >>>>> >>>> due to a >>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] >>>>>>>> Cache enabled >>>>>>>> >>>>> >>>> with >>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] >>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>> exist, drop reply to >>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>> -] The reply >>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>> after 60 seconds >>>>>>>> >>>>> >>>> due to a >>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>> where i am trying to >>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>> down (openstack >>>>>>>> >>>>> >>>> compute >>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>> restart the nova >>>>>>>> >>>>> >>>> compute >>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager >>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - >>>>>>>> -] Running >>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>> 2023-02-26 07:00:00 >>>>>>>> >>>>> >>>> to >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] [instance: >>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>> successful on node >>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>> nova.virt.libvirt.driver >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] [instance: >>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>> supplied device >>>>>>>> >>>>> >>>> name: >>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev >>>>>>>> names >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>> nova.virt.block_device >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] [instance: >>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with >>>>>>>> volume >>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] Cache enabled >>>>>>>> >>>>> >>>> with >>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] Running >>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>> '--config-file', >>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context', >>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>> '--privsep_sock_path', >>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] Spawned new >>>>>>>> >>>>> >>>> privsep >>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] Process >>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>>>>>> command. >>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>> nova.virt.libvirt.driver >>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>> default] [instance: >>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>> image >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>> >>>>> >>>> >>>>> >>>>>>>> >>>>> >>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 23 12:35:02 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 23 Mar 2023 13:35:02 +0100 Subject: [nova]host cpu reserve In-Reply-To: References: Message-ID: Hey, It's a config option for nova-compute: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus Also related ones are regarding ram and disk. You might also find a good idea to apply a cgroups rule to ensure you _really_ have CPU reserved, like this: https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd ??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i : > Hello guys. > I am trying google for nova host cpu reserve to prevent host overload but > I cannot find any resource about it. Could you give me some information? > Thanks. > Nguyen Huu Khoi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 23 12:36:53 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 23 Mar 2023 13:36:53 +0100 Subject: [nova]host cpu reserve In-Reply-To: References: Message-ID: Forget my reply, Sean's proposal is way better and the correct one. ??, 23 ???. 2023 ?., 13:35 Dmitriy Rabotyagov : > Hey, > > It's a config option for nova-compute: > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > Also related ones are regarding ram and disk. > > You might also find a good idea to apply a cgroups rule to ensure you > _really_ have CPU reserved, like this: > https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd > > > > ??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i : > >> Hello guys. >> I am trying google for nova host cpu reserve to prevent host overload but >> I cannot find any resource about it. Could you give me some information? >> Thanks. >> Nguyen Huu Khoi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Thu Mar 23 12:55:52 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 19:55:52 +0700 Subject: [nova]host cpu reserve In-Reply-To: References: Message-ID: Thank both of you much. I am using cpu allocation ratio but I dont understand how host cpu can work if all vm using 100% cpu. Vmware have cpu and ram reserve for host. On Thu, Mar 23, 2023, 7:44 PM Dmitriy Rabotyagov wrote: > Forget my reply, Sean's proposal is way better and the correct one. > > ??, 23 ???. 2023 ?., 13:35 Dmitriy Rabotyagov : > >> Hey, >> >> It's a config option for nova-compute: >> >> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus >> >> Also related ones are regarding ram and disk. >> >> You might also find a good idea to apply a cgroups rule to ensure you >> _really_ have CPU reserved, like this: >> https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd >> >> >> >> ??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i : >> >>> Hello guys. >>> I am trying google for nova host cpu reserve to prevent host overload >>> but I cannot find any resource about it. Could you give me some information? >>> Thanks. >>> Nguyen Huu Khoi >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 23 13:13:52 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 23 Mar 2023 14:13:52 +0100 Subject: [nova]host cpu reserve In-Reply-To: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Just to double check with you, given that you have cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 physical cores, then it should be defined like: [compute] cpu_shared_set="2-32,34-64,66-96,98-128"? > in general you shoudl reserve the first core on each cpu socket for the host os. > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted > form the cpu_shared_set and cpu_dedicated_set ??, 23 ???. 2023??. ? 13:12, Sean Mooney : > > generally you should not > you can use it but the preferd way to do this is use > cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set) > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > > if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use. > > https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/ > > have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set, > > the issue with usein > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > is that you have to multiple the number of cores that are resverved by the > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > > which means if you decide to manage that via placement api by using > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead > then you need to update your nova.conf to modify the reservationfi you change the allocation ratio. > > if instead you use cpu_shared_set and cpu_dedicated_set > you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd. > > in general you shoudl reserve the first core on each cpu socket for the host os. > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted > form the cpu_shared_set and cpu_dedicated_set > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > > Hello guys. > > I am trying google for nova host cpu reserve to prevent host overload but I > > cannot find any resource about it. Could you give me some information? > > Thanks. > > Nguyen Huu Khoi > > From nguyenhuukhoinw at gmail.com Thu Mar 23 13:35:02 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 20:35:02 +0700 Subject: [nova]host cpu reserve In-Reply-To: References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Ok. I will try to understand it. I will let you know when I get it. Many thanks for your help. :) On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov wrote: > Just to double check with you, given that you have > cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 > physical cores, then it should be defined like: > > [compute] > cpu_shared_set="2-32,34-64,66-96,98-128"? > > > in general you shoudl reserve the first core on each cpu socket for the > host os. > > if you use hyperthreading then both hyperthread of the first cpu core on > each socket shoudl be omitted > > form the cpu_shared_set and cpu_dedicated_set > > ??, 23 ???. 2023??. ? 13:12, Sean Mooney : > > > > generally you should not > > you can use it but the preferd way to do this is use > > cpu_shared_set and cpu_dedicated_set (in old releases you would have > used vcpu_pin_set) > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > > > > if you dont need cpu pinning just use cpu_share_set to spcify the cores > that can be sued for floatign vms > > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified > are reseved for host use. > > > > https://that.guru/blog/cpu-resources/ and > https://that.guru/blog/cpu-resources-redux/ > > > > have some useful info but that mostly looking at it form a cpu pinning > angel althoguh the secon one covers cpu_shared_set, > > > > the issue with usein > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > > > is that you have to multiple the number of cores that are resverved by > the > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > > > > which means if you decide to manage that via placement api by using > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio > instead > > then you need to update your nova.conf to modify the reservationfi you > change the allocation ratio. > > > > if instead you use cpu_shared_set and cpu_dedicated_set > > you are specifying exactly which cpus nova can use and the allocation > ration nolonger needs to be conisderd. > > > > in general you shoudl reserve the first core on each cpu socket for the > host os. > > if you use hyperthreading then both hyperthread of the first cpu core on > each socket shoudl be omitted > > form the cpu_shared_set and cpu_dedicated_set > > > > > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > > > Hello guys. > > > I am trying google for nova host cpu reserve to prevent host overload > but I > > > cannot find any resource about it. Could you give me some information? > > > Thanks. > > > Nguyen Huu Khoi > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From senrique at redhat.com Thu Mar 23 13:50:48 2023 From: senrique at redhat.com (Sofia Enriquez) Date: Thu, 23 Mar 2023 13:50:48 +0000 Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting) In-Reply-To: References: Message-ID: Hi Sylvain, I hope you're doing well. Apologies for the delay in responding to your previous message. I wanted to suggest a cross-project topic with Cinder that involves adding support for NFS encryption. To complement the work on Cinder[3], I've proposed two patches [1][2] and would appreciate any feedback you may have. I believe it would be beneficial to discuss my approach during the PTG, but I'm open to discussing it during a weekly meeting as well. I've looked at the etherpad and am considering the first slot on Wednesday or the last slot on Thursday or Friday. Please let me know your thoughts. Thank you, Sofia [1] https://review.opendev.org/c/openstack/nova/+/854030 [2] https://review.opendev.org/c/openstack/nova/+/870012 [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza wrote: > Hey folks, > > As a reminder, the Nova community will discuss at the vPTG. You can see > the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg > > Our agenda will be from Tuesday to Friday, everyday between 1300UTC and > 1700UTC. Connection details are in the etherpad above, but you can also use > PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo > room for all the discussions) > > You can't stick around for 4 hours x 4 days ? Heh, no worries ! > If you (as an operator or a developer) want to engage with us (and we'd > love this honestly), you have two possibilities : > - either you prefer to listen (and talk) to some topics you've seen in > the agenda, and then add your IRC nick (details how to use IRC are > explained by [1]) on the topics you want. Once we start to discuss about > those topics, I'll ping the courtesy ping list of each topic on > #openstack-nova. Just make sure you're around in the IRC channel. > - or you prefer to engage with us about some pain points or some feature > requests, and then the right time is the Nova Operator Hour that will be on > *Tuesday 1500UTC*. We have a specific etherpad for this session : > https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you > can preemptively add your thoughts or concerns. > > Anyway, we are eager to meet you all ! > > Oh, last point, given we will be at the vPTG, next week's weekly meeting > on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk > the #openstack-nova channel ;-) > > See you next week ! > -Sylvain > > [1] > https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html > > > -- Sof?a Enriquez she/her Software Engineer Red Hat PnT IRC: @enriquetaso @RedHat Red Hat Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Thu Mar 23 13:50:59 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 20:50:59 +0700 Subject: [nova]host cpu reserve In-Reply-To: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Could you help me to explain how host cpu handle with cpu ratio? On Thu, Mar 23, 2023, 7:10 PM Sean Mooney wrote: > generally you should not > you can use it but the preferd way to do this is use > cpu_shared_set and cpu_dedicated_set (in old releases you would have used > vcpu_pin_set) > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > > if you dont need cpu pinning just use cpu_share_set to spcify the cores > that can be sued for floatign vms > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified > are reseved for host use. > > https://that.guru/blog/cpu-resources/ and > https://that.guru/blog/cpu-resources-redux/ > > have some useful info but that mostly looking at it form a cpu pinning > angel althoguh the secon one covers cpu_shared_set, > > the issue with usein > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > is that you have to multiple the number of cores that are resverved by the > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > > which means if you decide to manage that via placement api by using > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio > instead > then you need to update your nova.conf to modify the reservationfi you > change the allocation ratio. > > if instead you use cpu_shared_set and cpu_dedicated_set > you are specifying exactly which cpus nova can use and the allocation > ration nolonger needs to be conisderd. > > in general you shoudl reserve the first core on each cpu socket for the > host os. > if you use hyperthreading then both hyperthread of the first cpu core on > each socket shoudl be omitted > form the cpu_shared_set and cpu_dedicated_set > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > > Hello guys. > > I am trying google for nova host cpu reserve to prevent host overload > but I > > cannot find any resource about it. Could you give me some information? > > Thanks. > > Nguyen Huu Khoi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 23 13:51:14 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 23 Mar 2023 14:51:14 +0100 Subject: [nova]host cpu reserve In-Reply-To: References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Just in case, you DO have options to control cpu and ram reservation for the hypervisor. It's just more about that, that it's not the best way to do it, especially if you're overcommitting, as things in real life are more complicated then just defining the amount of reserved CPUs. For example, if you have cpu_allocation_ratio set to 3, then you're getting 3 times more CPUs to signup VMs then you actually have (cores*sockets*threads*cpu_allocation_ratio). With that you really can't set any decent amount of reserved CPUs that will 100% ensure that hypervisor will be able to gain required resources at any given time. So with that approach the only option is to disable cpu overcommit, but even then you might get CPU in socket 1 fully utilized which might have negative side-effects for the hypervisor. And based on that, as Sean has mentioned, you can tell nova to explicitly exclude specific cores from being utilized, which will make them reserved for the hypervisor. ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i : > > Ok. I will try to understand it. I will let you know when I get it. > Many thanks for your help. :) > > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov wrote: >> >> Just to double check with you, given that you have >> cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 >> physical cores, then it should be defined like: >> >> [compute] >> cpu_shared_set="2-32,34-64,66-96,98-128"? >> >> > in general you shoudl reserve the first core on each cpu socket for the host os. >> > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted >> > form the cpu_shared_set and cpu_dedicated_set >> >> ??, 23 ???. 2023??. ? 13:12, Sean Mooney : >> > >> > generally you should not >> > you can use it but the preferd way to do this is use >> > cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set) >> > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set >> > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set >> > >> > if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms >> > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use. >> > >> > https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/ >> > >> > have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set, >> > >> > the issue with usein >> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus >> > >> > is that you have to multiple the number of cores that are resverved by the >> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio >> > >> > which means if you decide to manage that via placement api by using >> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead >> > then you need to update your nova.conf to modify the reservationfi you change the allocation ratio. >> > >> > if instead you use cpu_shared_set and cpu_dedicated_set >> > you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd. >> > >> > in general you shoudl reserve the first core on each cpu socket for the host os. >> > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted >> > form the cpu_shared_set and cpu_dedicated_set >> > >> > >> > >> > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: >> > > Hello guys. >> > > I am trying google for nova host cpu reserve to prevent host overload but I >> > > cannot find any resource about it. Could you give me some information? >> > > Thanks. >> > > Nguyen Huu Khoi >> > >> > From nguyenhuukhoinw at gmail.com Thu Mar 23 13:57:42 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 20:57:42 +0700 Subject: [nova]host cpu reserve In-Reply-To: References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Hello Dmitriy Rabotyagov and Sean Mooney, very thank you for your sharing. Nguyen Huu Khoi On Thu, Mar 23, 2023 at 8:50?PM Nguy?n H?u Kh?i wrote: > Could you help me to explain how host cpu handle with cpu ratio? > > On Thu, Mar 23, 2023, 7:10 PM Sean Mooney wrote: > >> generally you should not >> you can use it but the preferd way to do this is use >> cpu_shared_set and cpu_dedicated_set (in old releases you would have used >> vcpu_pin_set) >> >> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set >> >> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set >> >> if you dont need cpu pinning just use cpu_share_set to spcify the cores >> that can be sued for floatign vms >> when you use cpu_shared_set and cpu_dedicated_set any cpu not specified >> are reseved for host use. >> >> https://that.guru/blog/cpu-resources/ and >> https://that.guru/blog/cpu-resources-redux/ >> >> have some useful info but that mostly looking at it form a cpu pinning >> angel althoguh the secon one covers cpu_shared_set, >> >> the issue with usein >> >> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus >> >> is that you have to multiple the number of cores that are resverved by >> the >> >> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio >> >> which means if you decide to manage that via placement api by using >> >> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio >> instead >> then you need to update your nova.conf to modify the reservationfi you >> change the allocation ratio. >> >> if instead you use cpu_shared_set and cpu_dedicated_set >> you are specifying exactly which cpus nova can use and the allocation >> ration nolonger needs to be conisderd. >> >> in general you shoudl reserve the first core on each cpu socket for the >> host os. >> if you use hyperthreading then both hyperthread of the first cpu core on >> each socket shoudl be omitted >> form the cpu_shared_set and cpu_dedicated_set >> >> >> >> On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: >> > Hello guys. >> > I am trying google for nova host cpu reserve to prevent host overload >> but I >> > cannot find any resource about it. Could you give me some information? >> > Thanks. >> > Nguyen Huu Khoi >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Thu Mar 23 14:26:18 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Thu, 23 Mar 2023 21:26:18 +0700 Subject: [nova]host cpu reserve In-Reply-To: References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: Hi. Too many new things for me. It is interesting. I will read more. Thank you Dmitriy Rabotyagov Nice to meet you! Nguyen Huu Khoi On Thu, Mar 23, 2023 at 8:58?PM Dmitriy Rabotyagov wrote: > Just in case, you DO have options to control cpu and ram reservation > for the hypervisor. It's just more about that, that it's not the best > way to do it, especially if you're overcommitting, as things in real > life are more complicated then just defining the amount of reserved > CPUs. > > For example, if you have cpu_allocation_ratio set to 3, then you're > getting 3 times more CPUs to signup VMs then you actually have > (cores*sockets*threads*cpu_allocation_ratio). With that you really > can't set any decent amount of reserved CPUs that will 100% ensure > that hypervisor will be able to gain required resources at any given > time. So with that approach the only option is to disable cpu > overcommit, but even then you might get CPU in socket 1 fully utilized > which might have negative side-effects for the hypervisor. > > And based on that, as Sean has mentioned, you can tell nova to > explicitly exclude specific cores from being utilized, which will make > them reserved for the hypervisor. > > ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i : > > > > Ok. I will try to understand it. I will let you know when I get it. > > Many thanks for your help. :) > > > > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov < > noonedeadpunk at gmail.com> wrote: > >> > >> Just to double check with you, given that you have > >> cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 > >> physical cores, then it should be defined like: > >> > >> [compute] > >> cpu_shared_set="2-32,34-64,66-96,98-128"? > >> > >> > in general you shoudl reserve the first core on each cpu socket for > the host os. > >> > if you use hyperthreading then both hyperthread of the first cpu core > on each socket shoudl be omitted > >> > form the cpu_shared_set and cpu_dedicated_set > >> > >> ??, 23 ???. 2023??. ? 13:12, Sean Mooney : > >> > > >> > generally you should not > >> > you can use it but the preferd way to do this is use > >> > cpu_shared_set and cpu_dedicated_set (in old releases you would have > used vcpu_pin_set) > >> > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > >> > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > >> > > >> > if you dont need cpu pinning just use cpu_share_set to spcify the > cores that can be sued for floatign vms > >> > when you use cpu_shared_set and cpu_dedicated_set any cpu not > specified are reseved for host use. > >> > > >> > https://that.guru/blog/cpu-resources/ and > https://that.guru/blog/cpu-resources-redux/ > >> > > >> > have some useful info but that mostly looking at it form a cpu > pinning angel althoguh the secon one covers cpu_shared_set, > >> > > >> > the issue with usein > >> > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > >> > > >> > is that you have to multiple the number of cores that are resverved > by the > >> > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > >> > > >> > which means if you decide to manage that via placement api by using > >> > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio > instead > >> > then you need to update your nova.conf to modify the reservationfi > you change the allocation ratio. > >> > > >> > if instead you use cpu_shared_set and cpu_dedicated_set > >> > you are specifying exactly which cpus nova can use and the allocation > ration nolonger needs to be conisderd. > >> > > >> > in general you shoudl reserve the first core on each cpu socket for > the host os. > >> > if you use hyperthreading then both hyperthread of the first cpu core > on each socket shoudl be omitted > >> > form the cpu_shared_set and cpu_dedicated_set > >> > > >> > > >> > > >> > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > >> > > Hello guys. > >> > > I am trying google for nova host cpu reserve to prevent host > overload but I > >> > > cannot find any resource about it. Could you give me some > information? > >> > > Thanks. > >> > > Nguyen Huu Khoi > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 23 14:29:49 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 23 Mar 2023 14:29:49 +0000 Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting) In-Reply-To: References: Message-ID: On Thu, 2023-03-23 at 13:50 +0000, Sofia Enriquez wrote: > Hi Sylvain, > > I hope you're doing well. Apologies for the delay in responding to your > previous message. I wanted to suggest a cross-project topic with Cinder > that involves adding support for NFS encryption. > > To complement the work on Cinder[3], I've proposed two patches [1][2] and > would appreciate any feedback you may have. I believe it would be > beneficial to discuss my approach during the PTG, but I'm open to > discussing it during a weekly meeting as well. i have seen those patches come in over the last while but since you did not create a nova bluepirnt or spec i was assuming that you were waiting for the ptg to talk about them. i have not atcully reviewed them but we can defeintly discuss it next week if the changes are small we can proably proceed with a specless bluepirnt but if they have any api impact or upgrade consideration or move operatoion considertioant we will need an actul spec even if its short. > > I've looked at the etherpad and am considering the first slot on Wednesday > or the last slot on Thursday or Friday. > > Please let me know your thoughts. > > Thank you, > Sofia > > [1] https://review.opendev.org/c/openstack/nova/+/854030 > [2] https://review.opendev.org/c/openstack/nova/+/870012 > [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption > > On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza wrote: > > > Hey folks, > > > > As a reminder, the Nova community will discuss at the vPTG. You can see > > the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg > > > > Our agenda will be from Tuesday to Friday, everyday between 1300UTC and > > 1700UTC. Connection details are in the etherpad above, but you can also use > > PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo > > room for all the discussions) > > > > You can't stick around for 4 hours x 4 days ? Heh, no worries ! > > If you (as an operator or a developer) want to engage with us (and we'd > > love this honestly), you have two possibilities : > > - either you prefer to listen (and talk) to some topics you've seen in > > the agenda, and then add your IRC nick (details how to use IRC are > > explained by [1]) on the topics you want. Once we start to discuss about > > those topics, I'll ping the courtesy ping list of each topic on > > #openstack-nova. Just make sure you're around in the IRC channel. > > - or you prefer to engage with us about some pain points or some feature > > requests, and then the right time is the Nova Operator Hour that will be on > > *Tuesday 1500UTC*. We have a specific etherpad for this session : > > https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you > > can preemptively add your thoughts or concerns. > > > > Anyway, we are eager to meet you all ! > > > > Oh, last point, given we will be at the vPTG, next week's weekly meeting > > on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk > > the #openstack-nova channel ;-) > > > > See you next week ! > > -Sylvain > > > > [1] > > https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html > > > > > > > From sbauza at redhat.com Thu Mar 23 14:32:44 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 23 Mar 2023 15:32:44 +0100 Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting) In-Reply-To: References: Message-ID: Le jeu. 23 mars 2023 ? 14:51, Sofia Enriquez a ?crit : > > Hi Sylvain, > > I hope you're doing well. Apologies for the delay in responding to your > previous message. I wanted to suggest a cross-project topic with Cinder > that involves adding support for NFS encryption. > > To complement the work on Cinder[3], I've proposed two patches [1][2] and > would appreciate any feedback you may have. I believe it would be > beneficial to discuss my approach during the PTG, but I'm open to > discussing it during a weekly meeting as well. > > Hey Sofia, I'm indeed well, thank you. I was actually considering to ping Rajat over IRC since we weren't having any cinder-related topics yet in the etherpad but I was almost sure that we would have some last-minute thoughts during the week. I'm quite OK with discussing your item into some cross-project session. I've looked at the etherpad and am considering the first slot on Wednesday > or the last slot on Thursday or Friday. > Cool, I'll arrange some common timeslot between teams with Rajat once he's done with the OpenInfra Live presentation, like me :-) -Sylvain > Please let me know your thoughts. > > Thank you, > Sofia > > [1] https://review.opendev.org/c/openstack/nova/+/854030 > [2] https://review.opendev.org/c/openstack/nova/+/870012 > [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption > > On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza wrote: > >> Hey folks, >> >> As a reminder, the Nova community will discuss at the vPTG. You can see >> the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg >> >> Our agenda will be from Tuesday to Friday, everyday between 1300UTC and >> 1700UTC. Connection details are in the etherpad above, but you can also use >> PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo >> room for all the discussions) >> >> You can't stick around for 4 hours x 4 days ? Heh, no worries ! >> If you (as an operator or a developer) want to engage with us (and we'd >> love this honestly), you have two possibilities : >> - either you prefer to listen (and talk) to some topics you've seen in >> the agenda, and then add your IRC nick (details how to use IRC are >> explained by [1]) on the topics you want. Once we start to discuss about >> those topics, I'll ping the courtesy ping list of each topic on >> #openstack-nova. Just make sure you're around in the IRC channel. >> - or you prefer to engage with us about some pain points or some feature >> requests, and then the right time is the Nova Operator Hour that will be on >> *Tuesday 1500UTC*. We have a specific etherpad for this session : >> https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where >> you can preemptively add your thoughts or concerns. >> >> Anyway, we are eager to meet you all ! >> >> Oh, last point, given we will be at the vPTG, next week's weekly meeting >> on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk >> the #openstack-nova channel ;-) >> >> See you next week ! >> -Sylvain >> >> [1] >> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html >> >> >> > > -- > > Sof?a Enriquez > > she/her > > Software Engineer > > Red Hat PnT > > IRC: @enriquetaso > @RedHat Red Hat > Red Hat > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 23 14:47:44 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 23 Mar 2023 14:47:44 +0000 Subject: [nova]host cpu reserve In-Reply-To: References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> Message-ID: <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com> On Thu, 2023-03-23 at 14:51 +0100, Dmitriy Rabotyagov wrote: > Just in case, you DO have options to control cpu and ram reservation > for the hypervisor. It's just more about that, that it's not the best > way to do it, especially if you're overcommitting, as things in real > life are more complicated then just defining the amount of reserved > CPUs. > > For example, if you have cpu_allocation_ratio set to 3, then you're > getting 3 times more CPUs to signup VMs then you actually have > (cores*sockets*threads*cpu_allocation_ratio). With that you really > can't set any decent amount of reserved CPUs that will 100% ensure > that hypervisor will be able to gain required resources at any given > time. So with that approach the only option is to disable cpu > overcommit, but even then you might get CPU in socket 1 fully utilized > which might have negative side-effects for the hypervisor. > > And based on that, as Sean has mentioned, you can tell nova to > explicitly exclude specific cores from being utilized, which will make > them reserved for the hypervisor. yep exactly. without geting into all the details the host reserved cpu option was added in the really early days and then vcpu_pin_set was added to adress the fact that the existing option didnt really work the way peopel wanted. it was later used for cpu pinning and we realise we wanted to have 2 sepreate pools of cpus cpu_shared_set for shared core useed by floating vms (anything with out hw:cpu_policy=dedicated) and cpu_dedicated_set for explictlly pinned vms. in general using cpu_shared_set and cpu_dedicated_set is a much more intitive way to resver cores since you get to select exaction which cores can be used for nova vms. that allows you do the use systemd or other tools like taskset to affiites nova-cpu or libvirtd or sshd to run on core that wont have vms that prevents the vms form staving those host process of cpu resouces. > > ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i : > > > > Ok. I will try to understand it. I will let you know when I get it. > > Many thanks for your help. :) > > > > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov wrote: > > > > > > Just to double check with you, given that you have > > > cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 > > > physical cores, then it should be defined like: > > > > > > [compute] > > > cpu_shared_set="2-32,34-64,66-96,98-128"? > > > > > > > in general you shoudl reserve the first core on each cpu socket for the host os. > > > > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted > > > > form the cpu_shared_set and cpu_dedicated_set > > > > > > ??, 23 ???. 2023??. ? 13:12, Sean Mooney : > > > > > > > > generally you should not > > > > you can use it but the preferd way to do this is use > > > > cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set) > > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > > > > > > > > if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms > > > > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use. > > > > > > > > https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/ > > > > > > > > have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set, > > > > > > > > the issue with usein > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > > > > > > > is that you have to multiple the number of cores that are resverved by the > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > > > > > > > > which means if you decide to manage that via placement api by using > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead > > > > then you need to update your nova.conf to modify the reservationfi you change the allocation ratio. > > > > > > > > if instead you use cpu_shared_set and cpu_dedicated_set > > > > you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd. > > > > > > > > in general you shoudl reserve the first core on each cpu socket for the host os. > > > > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted > > > > form the cpu_shared_set and cpu_dedicated_set > > > > > > > > > > > > > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > > > > > Hello guys. > > > > > I am trying google for nova host cpu reserve to prevent host overload but I > > > > > cannot find any resource about it. Could you give me some information? > > > > > Thanks. > > > > > Nguyen Huu Khoi > > > > > > > > > From sbauza at redhat.com Thu Mar 23 16:54:13 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 23 Mar 2023 17:54:13 +0100 Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting) In-Reply-To: References: Message-ID: Le jeu. 23 mars 2023 ? 15:32, Sylvain Bauza a ?crit : > > > Le jeu. 23 mars 2023 ? 14:51, Sofia Enriquez a > ?crit : > >> >> Hi Sylvain, >> >> I hope you're doing well. Apologies for the delay in responding to your >> previous message. I wanted to suggest a cross-project topic with Cinder >> that involves adding support for NFS encryption. >> >> To complement the work on Cinder[3], I've proposed two patches [1][2] and >> would appreciate any feedback you may have. I believe it would be >> beneficial to discuss my approach during the PTG, but I'm open to >> discussing it during a weekly meeting as well. >> >> > Hey Sofia, I'm indeed well, thank you. I was actually considering to ping > Rajat over IRC since we weren't having any cinder-related topics yet in the > etherpad but I was almost sure that we would have some last-minute thoughts > during the week. > I'm quite OK with discussing your item into some cross-project session. > > I've looked at the etherpad and am considering the first slot on Wednesday >> or the last slot on Thursday or Friday. >> > > Cool, I'll arrange some common timeslot between teams with Rajat once he's > done with the OpenInfra Live presentation, like me :-) > -Sylvain > > Just a quick wrap-up : Rajat and I agreed on a cross-project session between Cinder and Nova on Thursday Mar30 1600UTC in the nova (diablo) room. -Sylvain >> Please let me know your thoughts. >> >> Thank you, >> Sofia >> >> [1] https://review.opendev.org/c/openstack/nova/+/854030 >> [2] https://review.opendev.org/c/openstack/nova/+/870012 >> [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption >> >> On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza wrote: >> >>> Hey folks, >>> >>> As a reminder, the Nova community will discuss at the vPTG. You can see >>> the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg >>> >>> Our agenda will be from Tuesday to Friday, everyday between 1300UTC and >>> 1700UTC. Connection details are in the etherpad above, but you can also use >>> PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo >>> room for all the discussions) >>> >>> You can't stick around for 4 hours x 4 days ? Heh, no worries ! >>> If you (as an operator or a developer) want to engage with us (and we'd >>> love this honestly), you have two possibilities : >>> - either you prefer to listen (and talk) to some topics you've seen in >>> the agenda, and then add your IRC nick (details how to use IRC are >>> explained by [1]) on the topics you want. Once we start to discuss about >>> those topics, I'll ping the courtesy ping list of each topic on >>> #openstack-nova. Just make sure you're around in the IRC channel. >>> - or you prefer to engage with us about some pain points or some >>> feature requests, and then the right time is the Nova Operator Hour that >>> will be on *Tuesday 1500UTC*. We have a specific etherpad for this session >>> : https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where >>> you can preemptively add your thoughts or concerns. >>> >>> Anyway, we are eager to meet you all ! >>> >>> Oh, last point, given we will be at the vPTG, next week's weekly meeting >>> on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk >>> the #openstack-nova channel ;-) >>> >>> See you next week ! >>> -Sylvain >>> >>> [1] >>> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html >>> >>> >>> >> >> -- >> >> Sof?a Enriquez >> >> she/her >> >> Software Engineer >> >> Red Hat PnT >> >> IRC: @enriquetaso >> @RedHat Red Hat >> Red Hat >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Thu Mar 23 18:38:08 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Thu, 23 Mar 2023 13:38:08 -0500 Subject: Fwd: [PTG] Environmental Sustainability WG goes to the vPTG next week! In-Reply-To: References: Message-ID: [Cross posting from the foundation ML] Hello Everyone! I have finally begun setting up the etherpad for our time during the PTG. I have the Austin room reserved on Tuesday from 16-18 UTC. I hope to see you all there and please add to the etherpad if there is a related topic you think we should discuss! Here is the etherpad: https://etherpad.opendev.org/p/march2023-ptg-env-sus -Kendall Nelson -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Mar 23 19:07:25 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 23 Mar 2023 12:07:25 -0700 Subject: [all][tc][goal][policy] RBAC goal discussion in 2023.2 PTG Message-ID: <1870fde64e3.dd20d8991092719.4320346159123288024@ghanshyammann.com> Hello Everyone, I have booked the Tuesday 17-18 UTC slot bexar room for the RBAC goal discussion. You can add the topics/queries to be discussed in vPTG in the below etherpad. - https://etherpad.opendev.org/p/rbac-2023.2-ptg -gmann From knikolla at bu.edu Thu Mar 23 19:49:03 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Thu, 23 Mar 2023 19:49:03 +0000 Subject: [tc][ptl][ptg] TC + Community Leaders Interaction for 2023.2 vPTG Message-ID: Hello everyone, On Monday, March 27, 2023 16.00UTC to 18.00UTC, the TC is organizing the Technical Committee & Community Leaders Interaction. While this meeting is open to all, we would like to invite participation especially from PTL, TC, and SIG Chairs with the goal of gathering feedback and promoting collaboration. This meeting has been quite successful in the last 2 PTG and I'm hoping we can continue this tradition. If you have an item you'd like to propose for discussion please add it to the purposefully quite empty etherpad[0], where you can also find information and details on how to join. 0. https://etherpad.opendev.org/p/tc-leaders-interaction-2023-2 Hope to see you there, Kristi Nikolla -------------- next part -------------- An HTML attachment was scrubbed... URL: From knikolla at bu.edu Thu Mar 23 21:00:08 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Thu, 23 Mar 2023 21:00:08 +0000 Subject: [tc] No TC weekly meeting next week + meeting time change Message-ID: <823CC989-9363-4A9C-8FF6-38D860E6F806@bu.edu> Hi all, Due to the vPTG being held next week, the TC will not hold its regular weekly meeting that was scheduled for Wednesday, March 29, 2023. Please note, that after the PTG, the new meeting time will be Tuesdays 18.00 UTC. More information and an ICS file can be found here https://meetings.opendev.org/#Technical_Committee_Meeting Thank you, Kristi Nikolla -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Fri Mar 24 01:21:03 2023 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Thu, 23 Mar 2023 22:21:03 -0300 Subject: [manila] Bobcat vPTG slots and topics In-Reply-To: References: Message-ID: Hello everyone! Just a quick update on this: Time slots are assigned to the topics, please check it out in the official PTG etherpad [3]. Please let me know if you would like to have a session being moved around and I can work on accommodating it if feasible. We will also have a cross-project discussion with Nova on Wednesday, 15 UTC to talk about preventing shares deletion while it's attached to an instance. I have also scheduled an operator hour, on Thursday at 16 UTC, in the same room we will be meeting for the other sessions (Austin). We would like to gather some feedback from operators and hear more from you on what we can improve. Please join us! [3] https://etherpad.opendev.org/p/manila-bobcat-ptg Looking forward to the great discussions! carloss Em qui., 16 de mar. de 2023 ?s 11:26, Carlos Silva escreveu: > Hello, Zorillas! > > PTG is right around the corner and I would like to remind you to please > add the topics you would like to bring up during our sessions to the > planning etherpad [1] until next Tuesday (Mar 21st). > > I have already allocated some slots for our sessions: > > - Monday: 16:00 to 17:00 UTC > - Wednesday: 14:00 to 16:00 UTC > - Thursday: 14:00 to 16:00 UTC > - Friday: 14:00 to 17:00 UTC > > > We will be meeting in the Austin room, you can access the meeting room > through the PTG page [2]. > > If you have a preference of date/time for your topic to be discussed, > please let me know and I will try to accommodate it. > > Looking forward to meeting you! > > [1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning > [2] https://ptg.opendev.org/ptg.html > > Thanks, > carloss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ces.eduardo98 at gmail.com Fri Mar 24 01:25:31 2023 From: ces.eduardo98 at gmail.com (Carlos Silva) Date: Thu, 23 Mar 2023 22:25:31 -0300 Subject: [manila] Cancelling March 30th IRC weekly meeting Message-ID: Hello Zorillas! As mentioned in today's IRC meeting, we will be having several meetings at the PTG next week, so we will not have our usual IRC meeting on Thursday March 30th 15 UTC. See you at the PTG! Regards, carloss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbaker at redhat.com Fri Mar 24 02:16:23 2023 From: sbaker at redhat.com (Steve Baker) Date: Fri, 24 Mar 2023 15:16:23 +1300 Subject: =?UTF-8?B?UmU6IOWbnuWkje+8miBbaXJvbmljXVF1ZXN0aW9ucyBhYm91dCB0aGUg?= =?UTF-8?Q?use_of_the_build_image_tool?= In-Reply-To: References: Message-ID: We are likely assuming the source image is compliant enough with the LSB[1] which references the Filesystem Hierarchy Standard[2] that specifies a /bin directory which includes the tar command[3]. Any improvement in LSB compliance would be beneficial for the UOS distribution. [1] https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/normativerefs.html#STD.FHS [2] https://refspecs.linuxbase.org/fhs [3] https://refspecs.linuxbase.org/FHS_3.0/fhs/ch03s04.html On 23/03/23 21:41, ?? wrote: > > I have found the cause of the problem, which is because tar does not > exist in the working directory. > I wonder if there are requirements on the base image when building > from the base image. > > ?Thank you. > > ------------------------------------------------------------------------ > ???????? > > > > ----------???????---------- > ????2023-03-23 ?? 15:01??? > > > Hi > We are using diskimage-builder to make a custom image, There is a > problem in extract_image, The question is chroot: failed to run > command 'bin/tar': No such file or directory, I found the /bin/tar > file in the working directory /tmp/ tmp.00yldwe1xt. But errors are > still being reported. It's not clear if this is a custom image > problem, There are also requirements for custom images. > > ------------------------------------------------------------------------ > ???????? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Fri Mar 24 07:47:08 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Fri, 24 Mar 2023 14:47:08 +0700 Subject: [nova]host cpu reserve In-Reply-To: <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com> References: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com> <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com> Message-ID: Hello/ After chasing links and your examples, I found this example is good for beginners like me, I want to show that for previous people. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/configuring_the_compute_service_for_instance_creation/index#proc_configuring-compute-nodes-for-cpu-pinning_cpu-pinning Thank you much. Nguyen Huu Khoi On Thu, Mar 23, 2023 at 9:54?PM Sean Mooney wrote: > On Thu, 2023-03-23 at 14:51 +0100, Dmitriy Rabotyagov wrote: > > Just in case, you DO have options to control cpu and ram reservation > > for the hypervisor. It's just more about that, that it's not the best > > way to do it, especially if you're overcommitting, as things in real > > life are more complicated then just defining the amount of reserved > > CPUs. > > > > For example, if you have cpu_allocation_ratio set to 3, then you're > > getting 3 times more CPUs to signup VMs then you actually have > > (cores*sockets*threads*cpu_allocation_ratio). With that you really > > can't set any decent amount of reserved CPUs that will 100% ensure > > that hypervisor will be able to gain required resources at any given > > time. So with that approach the only option is to disable cpu > > overcommit, but even then you might get CPU in socket 1 fully utilized > > which might have negative side-effects for the hypervisor. > > > > And based on that, as Sean has mentioned, you can tell nova to > > explicitly exclude specific cores from being utilized, which will make > > them reserved for the hypervisor. > yep exactly. > without geting into all the details the host reserved cpu option was added > in the really early days and then vcpu_pin_set was > added to adress the fact that the existing option didnt really work the > way peopel wanted. > it was later used for cpu pinning and we realise we wanted to have 2 > sepreate pools of cpus > > cpu_shared_set for shared core useed by floating vms (anything with out > hw:cpu_policy=dedicated) and > cpu_dedicated_set for explictlly pinned vms. > > in general using cpu_shared_set and cpu_dedicated_set is a much more > intitive way to resver cores since you get to select exaction which > cores can be used for nova vms. > > that allows you do the use systemd or other tools like taskset to > affiites nova-cpu or libvirtd or sshd to run on core that wont have vms > that prevents the vms form staving those host process of cpu resouces. > > > > ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i >: > > > > > > Ok. I will try to understand it. I will let you know when I get it. > > > Many thanks for your help. :) > > > > > > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov < > noonedeadpunk at gmail.com> wrote: > > > > > > > > Just to double check with you, given that you have > > > > cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32 > > > > physical cores, then it should be defined like: > > > > > > > > [compute] > > > > cpu_shared_set="2-32,34-64,66-96,98-128"? > > > > > > > > > in general you shoudl reserve the first core on each cpu socket > for the host os. > > > > > if you use hyperthreading then both hyperthread of the first cpu > core on each socket shoudl be omitted > > > > > form the cpu_shared_set and cpu_dedicated_set > > > > > > > > ??, 23 ???. 2023??. ? 13:12, Sean Mooney : > > > > > > > > > > generally you should not > > > > > you can use it but the preferd way to do this is use > > > > > cpu_shared_set and cpu_dedicated_set (in old releases you would > have used vcpu_pin_set) > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set > > > > > > > > > > if you dont need cpu pinning just use cpu_share_set to spcify the > cores that can be sued for floatign vms > > > > > when you use cpu_shared_set and cpu_dedicated_set any cpu not > specified are reseved for host use. > > > > > > > > > > https://that.guru/blog/cpu-resources/ and > https://that.guru/blog/cpu-resources-redux/ > > > > > > > > > > have some useful info but that mostly looking at it form a cpu > pinning angel althoguh the secon one covers cpu_shared_set, > > > > > > > > > > the issue with usein > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus > > > > > > > > > > is that you have to multiple the number of cores that are > resverved by the > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio > > > > > > > > > > which means if you decide to manage that via placement api by using > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio > instead > > > > > then you need to update your nova.conf to modify the reservationfi > you change the allocation ratio. > > > > > > > > > > if instead you use cpu_shared_set and cpu_dedicated_set > > > > > you are specifying exactly which cpus nova can use and the > allocation ration nolonger needs to be conisderd. > > > > > > > > > > in general you shoudl reserve the first core on each cpu socket > for the host os. > > > > > if you use hyperthreading then both hyperthread of the first cpu > core on each socket shoudl be omitted > > > > > form the cpu_shared_set and cpu_dedicated_set > > > > > > > > > > > > > > > > > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote: > > > > > > Hello guys. > > > > > > I am trying google for nova host cpu reserve to prevent host > overload but I > > > > > > cannot find any resource about it. Could you give me some > information? > > > > > > Thanks. > > > > > > Nguyen Huu Khoi > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Fri Mar 24 09:46:57 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 24 Mar 2023 10:46:57 +0100 Subject: [neutron] Neutron driver's meeting cancelled Message-ID: Hello Neutrinos: Today's drivers meeting is cancelled. The only topic in the agenda [1] was agreed to be discussed during the PTG. Join us next week in the PTG sessions! Here is the Neutron agenda [2] and the PTG website [3]. We'll be on the Juno channel. Have a nice weekend. [1]https://wiki.openstack.org/wiki/Meetings/NeutronDrivers [2]https://etherpad.opendev.org/p/neutron-bobcat-ptg [3]https://ptg.opendev.org/ptg.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From artem.goncharov at gmail.com Fri Mar 24 10:35:58 2023 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 24 Mar 2023 11:35:58 +0100 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked Message-ID: Hi all, A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room. Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli Feel free to feel in topics you want to discuss Cheers, Artem -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at stackhpc.com Fri Mar 24 12:08:22 2023 From: pierre at stackhpc.com (Pierre Riteau) Date: Fri, 24 Mar 2023 13:08:22 +0100 Subject: [blazar][ptg] Bobcat PTG scheduling In-Reply-To: References: Message-ID: Due to a conflict with another PTG session, we have decided to start the Blazar session one hour later. The new time is 1500 UTC to 1700 UTC. On Fri, 10 Mar 2023 at 18:07, Pierre Riteau wrote: > Hello, > > The Bobcat PTG will happen online during the week starting March 27. > > As the Blazar project has done in the past, I suggest we meet on Thursday, > but starting 1400 UTC rather than the usual 1500 of our biweekly meeting. I > have booked two hours in the Bexar room. If you want to join, please let me > know if this works for you. > > To summarise, the Blazar project will meet on Thursday March 30 from 1400 > UTC to 1600 UTC. > > We will prepare discussion topics on Etherpad. > > Cheers, > Pierre Riteau (priteau) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Mar 24 15:17:55 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 24 Mar 2023 15:17:55 +0000 Subject: [security-sig] vPTG sessions 16:00 UTC Tuesday and Wednesday Message-ID: <20230324151755.ke35cniiaiyx5ekm@yuggoth.org> I've booked two hours on the vPTG schedule, Tuesday and Wednesday 16:00-17:00 UTC, in the hopes that interested parties will be able to make at least one of those if not both. We'll use OpenDev's Meetpad service: https://meetpad.opendev.org/march2023-ptg-os-security I tried to avoid booking conflicts with Barbican and Keystone since those are the two projects our participants traditionally have obligatory conflicts from (also worked around the TC, Release and Diversity WG sessions). I know folks from Ironic wanted to talk about VMT topics, but our times overlap with some of theirs so we can either try to talk about that in one of the non-overlapping Ironic sessions or they can join us during ours, whichever works better. I've started adding some proposed discussion topics to the corresponding pad, but anyone can feel free to throw ideas in there, or just bring them up once we're on the call... I'm not one to stand on ceremony: https://etherpad.opendev.org/p/march2023-ptg-os-security Hopefully some people will be able to make it, but if you want other times on the schedule too then let me know and I'll try to work something out. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From christian.rohmann at inovex.de Fri Mar 24 15:28:47 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Fri, 24 Mar 2023 16:28:47 +0100 Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder or Nova Message-ID: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de> Hello OpenStack-discuss, I am currently looking into how one can provide fast ephemeral storage (backed by local NVME drives) to instances. There seem to be two approaches and I would love to double-check my thoughts and assumptions. 1) *Via Nova* instance storage and the configurable "ephemeral" volume for a flavor a) We currently use Ceph RBD als image_type (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), so instance images are stored in Ceph, not locally on disk. I believe this setting will also cause ephemeral volumes (destination_local) to be placed on a RBD and not /var/lib/nova/instances? Or is there a setting to set a different backend for local block devices providing "ephemeral" storage? So RBD for the root disk and a local LVM VG for ephemeral? b) Will an ephemeral volume also be migrated when the instance is shutoff as with live-migration? Or will there be an new volume created on the target host? I am asking because I want to avoid syncing 500G or 1T when it's only "ephemeral" and the instance will not expect any data on it on the next boot. c) Is the size of the ephemeral storage for flavors a fixed size or just the upper bound for users? So if I limit this to 1T, will such a flavor always provision a block device with his size? I suppose using LVM this will be thin provisioned anyways? 2) *Via Cinder*, running cinder-volume on each compute node to provide a volume type "ephemeral", using e.g. the LVM driver a) While not really "ephemeral" and bound to the instance lifecycle, this would allow users to provision ephemeral volume just as they need them. I suppose I could use backend specific quotas (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) to limit the number of size of such volumes? b) Do I need to use the instance locality filter (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) then? c)? Since a volume will always be bound to a certain host, I suppose this will cause side-effects to instance scheduling? With the volume remaining after an instance has been destroyed (beating the purpose of it being "ephemeral") I suppose any other instance attaching this volume will be scheduling on this very machine? Is there any way around this? Maybe a driver setting to have such volumes "self-destroy" if they are not attached anymore? d) Same question as with Nova: What happens when an instance is live-migrated? Maybe others also have this use case and you can share your solution(s)? Thanks and with regards Christian From abishop at redhat.com Thu Mar 23 12:36:35 2023 From: abishop at redhat.com (Alan Bishop) Date: Thu, 23 Mar 2023 05:36:35 -0700 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan wrote: > Hi, > Is this bind not required for cinder_scheduler container? > > "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", > I do not see this particular bind on the cinder scheduler containers on my > controller nodes. > That is correct, because the scheduler does not access the ceph cluster. Alan > With regards, > Swogat Pradhan > > On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan > wrote: > >> Cinder volume config: >> >> [tripleo_ceph] >> volume_backend_name=tripleo_ceph >> volume_driver=cinder.volume.drivers.rbd.RBDDriver >> rbd_user=openstack >> rbd_pool=volumes >> rbd_flatten_volume_from_snapshot=False >> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >> report_discard_supported=True >> rbd_ceph_conf=/etc/ceph/dcn02.conf >> rbd_cluster_name=dcn02 >> >> Glance api config: >> >> [dcn02] >> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=dcn02 rbd glance store >> [ceph] >> rbd_store_ceph_conf=/etc/ceph/ceph.conf >> rbd_store_user=openstack >> rbd_store_pool=images >> rbd_thin_provisioning=False >> store_description=Default glance store backend. >> >> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan >> wrote: >> >>> I still have the same issue, I'm not sure what's left to try. >>> All the pods are now in a healthy state, I am getting log entries 3 mins >>> after I hit the create volume button in cinder-volume when I try to create >>> a volume with an image. >>> And the volumes are just stuck in creating state for more than 20 mins >>> now. >>> >>> Cinder logs: >>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>> cinder-volume RPC version 3.17 as minimum service version. >>> 2023-03-22 20:34:59.166 108 INFO >>> cinder.volume.flows.manager.create_volume >>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>> specification: {'status': 'creating', 'volume_name': >>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>> 'owner_specified.openstack.object': 'images/cirros', >>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>> } >>> >>> With regards, >>> Swogat Pradhan >>> >>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop wrote: >>> >>>> >>>> >>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Hi Adam, >>>>> The systems are in same LAN, in this case it seemed like the image was >>>>> getting pulled from the central site which was caused due to an >>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>> directory, which seems to have been resolved after the changes i made to >>>>> fix it. >>>>> >>>>> Right now the glance api podman is running in unhealthy state and the >>>>> podman logs don't show any error whatsoever and when issued the command >>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>> site, which is why cinder is throwing an error stating: >>>>> >>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>> finding address for >>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>> Unable to establish connection to >>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>> NewConnectionError('>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>> ECONNREFUSED',)) >>>>> >>>>> Now i need to find out why the port is not listed as the glance >>>>> service is running, which i am not sure how to find out. >>>>> >>>> >>>> One other thing to investigate is whether your deployment includes this >>>> patch [1]. If it does, then bear in mind >>>> the glance-api service running at the edge site will be an "internal" >>>> (non public facing) instance that uses port 9293 >>>> instead of 9292. You should familiarize yourself with the release note >>>> [2]. >>>> >>>> [1] >>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>> [2] >>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>> >>>> Alan >>>> >>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>>>> Update: >>>>>>> Here is the log when creating a volume using cirros image: >>>>>>> >>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>> cinder.volume.flows.manager.create_volume >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> [{'url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>> } >>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>> >>>>>> >>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>> 0.16 MB/s suggests you have a network issue. >>>>>> >>>>>> John Fulton previously stated your cinder-volume service at the edge >>>>>> site is not using the local ceph image store. Assuming you are deploying >>>>>> GlanceApiEdge service [1], then the cinder-volume service should be >>>>>> configured to use the local glance service [2]. You should check cinder's >>>>>> glance_api_servers to confirm it's the edge site's glance service. >>>>>> >>>>>> [1] >>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>> [2] >>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>> >>>>>> Alan >>>>>> >>>>>> >>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>> category=FutureWarning) >>>>>>> >>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>> category=FutureWarning) >>>>>>> >>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>> MB/s >>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>> cinder.volume.flows.manager.create_volume >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>> >>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>> >>>>>>> With regards, >>>>>>> Swogat Pradhan >>>>>>> >>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Jhon, >>>>>>>> This seems to be an issue. >>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>>>>>> parameter was specified to the respective cluster names but the config >>>>>>>> files were created in the name of ceph.conf and keyring was >>>>>>>> ceph.client.openstack.keyring. >>>>>>>> >>>>>>>> Which created issues in glance as well as the naming convention of >>>>>>>> the files didn't match the cluster names, so i had to manually rename the >>>>>>>> central ceph conf file as such: >>>>>>>> >>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>> total 16 >>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>> ceph_central.client.openstack.keyring >>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>> ceph.client.openstack.keyring >>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>> >>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the >>>>>>>> respective clusters in both dcn01 and dcn02. >>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>>>> for accessing central ceph cluster. >>>>>>>> >>>>>>>> glance multistore config: >>>>>>>> [dcn02] >>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>> rbd_store_user=openstack >>>>>>>> rbd_store_pool=images >>>>>>>> rbd_thin_provisioning=False >>>>>>>> store_description=dcn02 rbd glance store >>>>>>>> >>>>>>>> [ceph_central] >>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>> rbd_store_user=openstack >>>>>>>> rbd_store_pool=images >>>>>>>> rbd_thin_provisioning=False >>>>>>>> store_description=Default glance store backend. >>>>>>>> >>>>>>>> >>>>>>>> With regards, >>>>>>>> Swogat Pradhan >>>>>>>> >>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Hi, >>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>> >>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>> >>>>>>>>> I hope this is not a production system since the mailing list now >>>>>>>>> has >>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>> >>>>>>>>> The section that looks like this: >>>>>>>>> >>>>>>>>> [tripleo_ceph] >>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>> rbd_user=openstack >>>>>>>>> rbd_pool=volumes >>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>> rbd_secret_uuid= >>>>>>>>> report_discard_supported=True >>>>>>>>> >>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not >>>>>>>>> the >>>>>>>>> central one. Use the ceph conf file for that cluster and ensure the >>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>> >>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of >>>>>>>>> the >>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This >>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>> secret-get-value $FSID`. >>>>>>>>> >>>>>>>>> The documentation describes how to configure the central and DCN >>>>>>>>> sites >>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>> following >>>>>>>>> it. >>>>>>>>> >>>>>>>>> >>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> > >>>>>>>>> > Ceph Output: >>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>> > NAME SIZE PARENT FMT >>>>>>>>> PROT LOCK >>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB 2 >>>>>>>>> excl >>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>> 2 yes >>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>> 2 yes >>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>> 2 yes >>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>> 2 yes >>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>> 2 yes >>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>> 2 yes >>>>>>>>> > >>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>> > NAME SIZE PARENT >>>>>>>>> FMT PROT LOCK >>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB 2 >>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB 2 >>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>> > >>>>>>>>> > Attached the cinder config. >>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>> > >>>>>>>>> > With regards, >>>>>>>>> > Swogat Pradhan >>>>>>>>> > >>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton >>>>>>>>> wrote: >>>>>>>>> >> >>>>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>> config. >>>>>>>>> >> >>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Update: >>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>>>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>> >>> The image size is 389 MB. >>>>>>>>> >>> >>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>> >>>>>>>>> >>>> Hi Jhon, >>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>>>>>> after importing from the central site. >>>>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>>>> time for the volume to get created. >>>>>>>>> >>>> >>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>> getting created properly without any errors. >>>>>>>>> >>>> >>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>> but getting checksum failed error. >>>>>>>>> >>>> >>>>>>>>> >>>> With regards, >>>>>>>>> >>>> Swogat Pradhan >>>>>>>>> >>>> >>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>> >>>>> wrote: >>>>>>>>> >>>>> > >>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>> launch the VM from volume. >>>>>>>>> >>>>> > >>>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>>> to the edge glance. >>>>>>>>> >>>>> >>>>>>>>> >>>>> Try following this document and making the same observations >>>>>>>>> in your >>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>> >>>>> >>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>> >>>>> >>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>> excl >>>>>>>>> >>>>> $ >>>>>>>>> >>>>> >>>>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>>>> which is on >>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>> >>>>> >>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>> encountering >>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>> >>>>> >>>>>>>>> >>>>> Ideally all images should reside in the central Glance and >>>>>>>>> be copied >>>>>>>>> >>>>> to DCN sites before instances of those images are booted on >>>>>>>>> DCN sites. >>>>>>>>> >>>>> If an image is not copied to a DCN site before it is booted, >>>>>>>>> then the >>>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>>> will boot as >>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site has >>>>>>>>> access to >>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>> booting of >>>>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>>>> advance, >>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>> >>>>> >>>>>>>>> >>>>> You can also exec into the cinder container at the DCN site >>>>>>>>> and >>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>> >>>>> >>>>>>>>> >>>>> John >>>>>>>>> >>>>> >>>>>>>>> >>>>> > >>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>> then update. >>>>>>>>> >>>>> > >>>>>>>>> >>>>> > With regards, >>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>> >>>>> > >>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>> >> >>>>>>>>> >>>>> >> Update: >>>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing >>>>>>>>> down. >>>>>>>>> >>>>> >> >>>>>>>>> >>>>> >> >>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active). >>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon >>>>>>>>> [-] privsep daemon starting >>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon >>>>>>>>> [-] privsep process running with uid/gid: 0/0 >>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>>> [-] privsep process running with capabilities (eff/prm/inh): >>>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>>> [-] privsep daemon running as pid 185437 >>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>> running command. >>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>> template mentioned here ?: >>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> The volume is already created and i do not understand >>>>>>>>> why the instance is stuck in spawning state. >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> With regards, >>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> >>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Does your environment use different network interfaces >>>>>>>>> for each of the networks? Or does it have a bond with everything on it? >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>> while spawning the instance. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. >>>>>>>>> So, based on that experience, from my perspective, is certainly sounds like >>>>>>>>> some kind of network issue. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Regards, >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>>>> wrote: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Hi, >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time >>>>>>>>> ago in this thread: >>>>>>>>> >>>>> >>>> >>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>> >>>>> >>>> If there isn't any additional information in the debug >>>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do >>>>>>>>> that in a production system yet so be careful. I can think of two routes: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>>>>>> running, this will most likely impact client IO depending on your load. >>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>> rebuild. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>> a better advice. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Regards, >>>>>>>>> >>>>> >>>> Eugen >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Hi, >>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> With regards, >>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>> >>>>> >>>> wrote: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Hi >>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but >>>>>>>>> not due to packet >>>>>>>>> >>>>> >>>> loss. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> with regards, >>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>> >>>>> >>>> wrote: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Hi, >>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>> checked when >>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>>> stuck at spawning >>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>> sure if packet loss >>>>>>>>> >>>>> >>>> causes this. >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> With regards, >>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>> identical between >>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through >>>>>>>>> the tunnel? >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or >>>>>>>>> 'cc' as i am not >>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl >>>>>>>>> list_policies -p >>>>>>>>> >>>>> >>>> / >>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>>>> priority >>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> >>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>>>> down when i am >>>>>>>>> >>>>> >>>> trying >>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>> spawning state and >>>>>>>>> >>>>> >>>> then >>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>> edge sites. >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>> >>>>> >>>> > >>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>>> directly, i am >>>>>>>>> >>>>> >>>> checking >>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>>> reply. >>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>> occurred. >>>>>>>>> >>>>> >>>> >> >>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>>> activities in the >>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>> site.* >>>>>>>>> >>>>> >>>> >> >>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>> >>>>> >>>> >> >>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>> >>>>> >>>> >> >>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are >>>>>>>>> the details: >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>> >>>>> >>>> Started >>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>> >>>>> >>>> Started >>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>> >>>>> >>>> Started >>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>> >>>>> >>>> Started >>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times >>>>>>>>> but the issue is >>>>>>>>> >>>>> >>>> still >>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>> cluster_status >>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>> inter-node and CLI >>>>>>>>> >>>>> >>>> tool >>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>> inter-node and CLI >>>>>>>>> >>>>> >>>> tool >>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>> inter-node and CLI >>>>>>>>> >>>>> >>>> tool >>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>> >>>>> >>>> interface: >>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> , >>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, >>>>>>>>> purpose: >>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> , >>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>>> amqp, purpose: AMQP >>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>> >>>>> >>>> , >>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>> purpose: HTTP API >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>> >>>>> >>>> >>> >>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api >>>>>>>>> log. >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>> -] The reply >>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>>> after 60 seconds >>>>>>>>> >>>>> >>>> due to a >>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] The reply >>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>>> after 60 seconds >>>>>>>>> >>>>> >>>> due to a >>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] The reply >>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>>> after 60 seconds >>>>>>>>> >>>>> >>>> due to a >>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils >>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] Cache enabled >>>>>>>>> >>>>> >>>> with >>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] >>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>> exist, drop reply to >>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>> -] The reply >>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>>> after 60 seconds >>>>>>>>> >>>>> >>>> due to a >>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>>> where i am trying to >>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>>> down (openstack >>>>>>>>> >>>>> >>>> compute >>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>> restart the nova >>>>>>>>> >>>>> >>>> compute >>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>> nova.compute.manager >>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - >>>>>>>>> -] Running >>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>> >>>>> >>>> to >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] [instance: >>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>> successful on node >>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>> nova.virt.libvirt.driver >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] [instance: >>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>>> supplied device >>>>>>>>> >>>>> >>>> name: >>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev >>>>>>>>> names >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>> nova.virt.block_device >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] [instance: >>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>> with volume >>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] Cache enabled >>>>>>>>> >>>>> >>>> with >>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] Running >>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>> '--config-file', >>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>> '--privsep_context', >>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>> '--privsep_sock_path', >>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] Spawned new >>>>>>>>> >>>>> >>>> privsep >>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] Process >>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running >>>>>>>>> command. >>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>> nova.virt.libvirt.driver >>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>> default] [instance: >>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>>> image >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>> >>>>> >>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>> >>>>>>>>> >>>>> >>>>>>>>> >>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Thu Mar 23 12:42:36 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 18:12:36 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Jhon, Thank you for clarifying that. Right now the cinder volume is stuck in *creating *state when adding image as volume source. But when creating an empty volume the volumes are getting created successfully without any errors. We are getting volume creation request in cinder-volume.log as such: 2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1, 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } But there is nothing else after that and the volume doesn't even timeout, it just gets stuck in creating state. Can you advise what might be the issue here? All the containers are in a healthy state now. With regards, Swogat Pradhan On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop wrote: > > > On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan > wrote: > >> Hi, >> Is this bind not required for cinder_scheduler container? >> >> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", >> I do not see this particular bind on the cinder scheduler containers on >> my controller nodes. >> > > That is correct, because the scheduler does not access the ceph cluster. > > Alan > > >> With regards, >> Swogat Pradhan >> >> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan >> wrote: >> >>> Cinder volume config: >>> >>> [tripleo_ceph] >>> volume_backend_name=tripleo_ceph >>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>> rbd_user=openstack >>> rbd_pool=volumes >>> rbd_flatten_volume_from_snapshot=False >>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >>> report_discard_supported=True >>> rbd_ceph_conf=/etc/ceph/dcn02.conf >>> rbd_cluster_name=dcn02 >>> >>> Glance api config: >>> >>> [dcn02] >>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=dcn02 rbd glance store >>> [ceph] >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>> rbd_store_user=openstack >>> rbd_store_pool=images >>> rbd_thin_provisioning=False >>> store_description=Default glance store backend. >>> >>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> I still have the same issue, I'm not sure what's left to try. >>>> All the pods are now in a healthy state, I am getting log entries 3 >>>> mins after I hit the create volume button in cinder-volume when I try to >>>> create a volume with an image. >>>> And the volumes are just stuck in creating state for more than 20 mins >>>> now. >>>> >>>> Cinder logs: >>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>>> cinder-volume RPC version 3.17 as minimum service version. >>>> 2023-03-22 20:34:59.166 108 INFO >>>> cinder.volume.flows.manager.create_volume >>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>>> specification: {'status': 'creating', 'volume_name': >>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> [{'url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> 'metadata': {'store': 'ceph'}}, {'url': >>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> 'metadata': {'store': 'ceph'}}, {'url': >>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>> 'owner_specified.openstack.object': 'images/cirros', >>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>> } >>>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>>> Hi Adam, >>>>>> The systems are in same LAN, in this case it seemed like the image >>>>>> was getting pulled from the central site which was caused due to an >>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>>> directory, which seems to have been resolved after the changes i made to >>>>>> fix it. >>>>>> >>>>>> Right now the glance api podman is running in unhealthy state and the >>>>>> podman logs don't show any error whatsoever and when issued the command >>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>>> site, which is why cinder is throwing an error stating: >>>>>> >>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>>> finding address for >>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>> Unable to establish connection to >>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>>> NewConnectionError('>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>>> ECONNREFUSED',)) >>>>>> >>>>>> Now i need to find out why the port is not listed as the glance >>>>>> service is running, which i am not sure how to find out. >>>>>> >>>>> >>>>> One other thing to investigate is whether your deployment includes >>>>> this patch [1]. If it does, then bear in mind >>>>> the glance-api service running at the edge site will be an "internal" >>>>> (non public facing) instance that uses port 9293 >>>>> instead of 9292. You should familiarize yourself with the release note >>>>> [2]. >>>>> >>>>> [1] >>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>>> [2] >>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>>> >>>>> Alan >>>>> >>>>> >>>>>> With regards, >>>>>> Swogat Pradhan >>>>>> >>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>>>>> Update: >>>>>>>> Here is the log when creating a volume using cirros image: >>>>>>>> >>>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> [{'url': >>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>>> } >>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>>> >>>>>>> >>>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>>> 0.16 MB/s suggests you have a network issue. >>>>>>> >>>>>>> John Fulton previously stated your cinder-volume service at the edge >>>>>>> site is not using the local ceph image store. Assuming you are deploying >>>>>>> GlanceApiEdge service [1], then the cinder-volume service should be >>>>>>> configured to use the local glance service [2]. You should check cinder's >>>>>>> glance_api_servers to confirm it's the edge site's glance service. >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>>> [2] >>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>>> >>>>>>> Alan >>>>>>> >>>>>>> >>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>> category=FutureWarning) >>>>>>>> >>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>> category=FutureWarning) >>>>>>>> >>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>>> MB/s >>>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>>> >>>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>>> >>>>>>>> With regards, >>>>>>>> Swogat Pradhan >>>>>>>> >>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jhon, >>>>>>>>> This seems to be an issue. >>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster >>>>>>>>> parameter was specified to the respective cluster names but the config >>>>>>>>> files were created in the name of ceph.conf and keyring was >>>>>>>>> ceph.client.openstack.keyring. >>>>>>>>> >>>>>>>>> Which created issues in glance as well as the naming convention of >>>>>>>>> the files didn't match the cluster names, so i had to manually rename the >>>>>>>>> central ceph conf file as such: >>>>>>>>> >>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>>> total 16 >>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>>> ceph_central.client.openstack.keyring >>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>>> ceph.client.openstack.keyring >>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>>> >>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of >>>>>>>>> the respective clusters in both dcn01 and dcn02. >>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>>>>> for accessing central ceph cluster. >>>>>>>>> >>>>>>>>> glance multistore config: >>>>>>>>> [dcn02] >>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>> rbd_store_user=openstack >>>>>>>>> rbd_store_pool=images >>>>>>>>> rbd_thin_provisioning=False >>>>>>>>> store_description=dcn02 rbd glance store >>>>>>>>> >>>>>>>>> [ceph_central] >>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>>> rbd_store_user=openstack >>>>>>>>> rbd_store_pool=images >>>>>>>>> rbd_thin_provisioning=False >>>>>>>>> store_description=Default glance store backend. >>>>>>>>> >>>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Swogat Pradhan >>>>>>>>> >>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> > Hi, >>>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>>> >>>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>>> >>>>>>>>>> I hope this is not a production system since the mailing list now >>>>>>>>>> has >>>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>>> >>>>>>>>>> The section that looks like this: >>>>>>>>>> >>>>>>>>>> [tripleo_ceph] >>>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>> rbd_user=openstack >>>>>>>>>> rbd_pool=volumes >>>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>>> rbd_secret_uuid= >>>>>>>>>> report_discard_supported=True >>>>>>>>>> >>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not >>>>>>>>>> the >>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure >>>>>>>>>> the >>>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>>> >>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of >>>>>>>>>> the >>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that >>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. >>>>>>>>>> This >>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>>> secret-get-value $FSID`. >>>>>>>>>> >>>>>>>>>> The documentation describes how to configure the central and DCN >>>>>>>>>> sites >>>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>>> following >>>>>>>>>> it. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>>> >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > Ceph Output: >>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>> FMT PROT LOCK >>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB >>>>>>>>>> 2 excl >>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>>> 2 yes >>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>>> 2 yes >>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>>> 2 yes >>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>>> 2 yes >>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>>> 2 yes >>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>>> 2 yes >>>>>>>>>> > >>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>> FMT PROT LOCK >>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB >>>>>>>>>> 2 >>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB >>>>>>>>>> 2 >>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>>> > >>>>>>>>>> > Attached the cinder config. >>>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>>> > >>>>>>>>>> > With regards, >>>>>>>>>> > Swogat Pradhan >>>>>>>>>> > >>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton < >>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>>> config. >>>>>>>>>> >> >>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Update: >>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes >>>>>>>>>> around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>>> >>> The image size is 389 MB. >>>>>>>>>> >>> >>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>> Hi Jhon, >>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created >>>>>>>>>> after importing from the central site. >>>>>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>>>>> time for the volume to get created. >>>>>>>>>> >>>> >>>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>>> getting created properly without any errors. >>>>>>>>>> >>>> >>>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>> but getting checksum failed error. >>>>>>>>>> >>>> >>>>>>>>>> >>>> With regards, >>>>>>>>>> >>>> Swogat Pradhan >>>>>>>>>> >>>> >>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>>> >>>>> wrote: >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>>> launch the VM from volume. >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>>>> to the edge glance. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Try following this document and making the same >>>>>>>>>> observations in your >>>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>>> excl >>>>>>>>>> >>>>> $ >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>>>>> which is on >>>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>>> encountering >>>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and >>>>>>>>>> be copied >>>>>>>>>> >>>>> to DCN sites before instances of those images are booted on >>>>>>>>>> DCN sites. >>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is >>>>>>>>>> booted, then the >>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>>>> will boot as >>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site >>>>>>>>>> has access to >>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>>> booting of >>>>>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>>>>> advance, >>>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN site >>>>>>>>>> and >>>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> John >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>>> then update. >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> > With regards, >>>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>>> >> >>>>>>>>>> >>>>> >> Update: >>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing >>>>>>>>>> down. >>>>>>>>>> >>>>> >> >>>>>>>>>> >>>>> >> >>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad >>>>>>>>>> (lacp=active). >>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon >>>>>>>>>> [-] privsep daemon starting >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon >>>>>>>>>> [-] privsep process running with uid/gid: 0/0 >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>>>> [-] privsep process running with capabilities (eff/prm/inh): >>>>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon >>>>>>>>>> [-] privsep daemon running as pid 185437 >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>> running command. >>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver >>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance: >>>>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>>> template mentioned here ?: >>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> The volume is already created and i do not understand >>>>>>>>>> why the instance is stuck in spawning state. >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> With regards, >>>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> >>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Does your environment use different network interfaces >>>>>>>>>> for each of the networks? Or does it have a bond with everything on it? >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>>> while spawning the instance. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP >>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly >>>>>>>>>> sounds like some kind of network issue. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>>>>> wrote: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time >>>>>>>>>> ago in this thread: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>>> >>>>> >>>> If there isn't any additional information in the debug >>>>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do >>>>>>>>>> that in a production system yet so be careful. I can think of two routes: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is >>>>>>>>>> running, this will most likely impact client IO depending on your load. >>>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>>> rebuild. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>>> a better advice. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>> >>>>> >>>> Eugen >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Hi >>>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe >>>>>>>>>> but not due to packet >>>>>>>>>> >>>>> >>>> loss. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> with regards, >>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>>> checked when >>>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>>>> stuck at spawning >>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>>> sure if packet loss >>>>>>>>>> >>>>> >>>> causes this. >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>>> identical between >>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through >>>>>>>>>> the tunnel? >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or >>>>>>>>>> 'cc' as i am not >>>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# >>>>>>>>>> rabbitmqctl list_policies -p >>>>>>>>>> >>>>> >>>> / >>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>>>>> priority >>>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> >>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>>>>> down when i am >>>>>>>>>> >>>>> >>>> trying >>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>>> spawning state and >>>>>>>>>> >>>>> >>>> then >>>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>>> edge sites. >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>>> >>>>> >>>> > >>>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>>>> directly, i am >>>>>>>>>> >>>>> >>>> checking >>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>>>> reply. >>>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>>> occurred. >>>>>>>>>> >>>>> >>>> >> >>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>>>> activities in the >>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>>> site.* >>>>>>>>>> >>>>> >>>> >> >>>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >> >>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>>> >>>>> >>>> >> >>>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are >>>>>>>>>> the details: >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>> >>>>> >>>> Started >>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>> >>>>> >>>> Started >>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>> >>>>> >>>> Started >>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>> >>>>> >>>> Started >>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times >>>>>>>>>> but the issue is >>>>>>>>>> >>>>> >>>> still >>>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>>> cluster_status >>>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>> inter-node and CLI >>>>>>>>>> >>>>> >>>> tool >>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>> API >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>> inter-node and CLI >>>>>>>>>> >>>>> >>>> tool >>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>> API >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>> inter-node and CLI >>>>>>>>>> >>>>> >>>> tool >>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>> API >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> , >>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: >>>>>>>>>> clustering, purpose: >>>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> , >>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>>>> amqp, purpose: AMQP >>>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>> >>>>> >>>> , >>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>>> purpose: HTTP API >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova >>>>>>>>>> api log. >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - >>>>>>>>>> -] The reply >>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>>>> after 60 seconds >>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] The reply >>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>>>> after 60 seconds >>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] The reply >>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>>>> after 60 seconds >>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING >>>>>>>>>> nova.cache_utils >>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] Cache enabled >>>>>>>>>> >>>>> >>>> with >>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] >>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>> exist, drop reply to >>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - >>>>>>>>>> -] The reply >>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>>>> after 60 seconds >>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>>>> where i am trying to >>>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>>>> down (openstack >>>>>>>>>> >>>>> >>>> compute >>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>>> restart the nova >>>>>>>>>> >>>>> >>>> compute >>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>>> nova.compute.manager >>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - >>>>>>>>>> - -] Running >>>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>>> >>>>> >>>> to >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO >>>>>>>>>> nova.compute.claims >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] [instance: >>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>>> successful on node >>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] [instance: >>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>>>> supplied device >>>>>>>>>> >>>>> >>>> name: >>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev >>>>>>>>>> names >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>>> nova.virt.block_device >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] [instance: >>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>>> with volume >>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING >>>>>>>>>> nova.cache_utils >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] Cache enabled >>>>>>>>>> >>>>> >>>> with >>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO >>>>>>>>>> oslo.privsep.daemon >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] Running >>>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>>> '--config-file', >>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>>> '--privsep_context', >>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>>> '--privsep_sock_path', >>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO >>>>>>>>>> oslo.privsep.daemon >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] Spawned new >>>>>>>>>> >>>>> >>>> privsep >>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] Process >>>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while >>>>>>>>>> running command. >>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>> default] [instance: >>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>>>> image >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Thu Mar 23 12:43:31 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 18:13:31 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Alan, My bad i didn't see it was you who replied. Thanks for clarifying my doubt. On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan wrote: > Hi Jhon, > Thank you for clarifying that. > Right now the cinder volume is stuck in *creating *state when adding > image as volume source. > But when creating an empty volume the volumes are getting created > successfully without any errors. > > We are getting volume creation request in cinder-volume.log as such: > 2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume > [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1, > 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': > datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > > But there is nothing else after that and the volume doesn't even timeout, > it just gets stuck in creating state. > Can you advise what might be the issue here? > All the containers are in a healthy state now. > > With regards, > Swogat Pradhan > > > On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop wrote: > >> >> >> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan >> wrote: >> >>> Hi, >>> Is this bind not required for cinder_scheduler container? >>> >>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", >>> I do not see this particular bind on the cinder scheduler containers on >>> my controller nodes. >>> >> >> That is correct, because the scheduler does not access the ceph cluster. >> >> Alan >> >> >>> With regards, >>> Swogat Pradhan >>> >>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Cinder volume config: >>>> >>>> [tripleo_ceph] >>>> volume_backend_name=tripleo_ceph >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>> rbd_user=openstack >>>> rbd_pool=volumes >>>> rbd_flatten_volume_from_snapshot=False >>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >>>> report_discard_supported=True >>>> rbd_ceph_conf=/etc/ceph/dcn02.conf >>>> rbd_cluster_name=dcn02 >>>> >>>> Glance api config: >>>> >>>> [dcn02] >>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=dcn02 rbd glance store >>>> [ceph] >>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=Default glance store backend. >>>> >>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> I still have the same issue, I'm not sure what's left to try. >>>>> All the pods are now in a healthy state, I am getting log entries 3 >>>>> mins after I hit the create volume button in cinder-volume when I try to >>>>> create a volume with an image. >>>>> And the volumes are just stuck in creating state for more than 20 mins >>>>> now. >>>>> >>>>> Cinder logs: >>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>>>> cinder-volume RPC version 3.17 as minimum service version. >>>>> 2023-03-22 20:34:59.166 108 INFO >>>>> cinder.volume.flows.manager.create_volume >>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>>>> specification: {'status': 'creating', 'volume_name': >>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>> } >>>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>>>> Hi Adam, >>>>>>> The systems are in same LAN, in this case it seemed like the image >>>>>>> was getting pulled from the central site which was caused due to an >>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>>>> directory, which seems to have been resolved after the changes i made to >>>>>>> fix it. >>>>>>> >>>>>>> Right now the glance api podman is running in unhealthy state and >>>>>>> the podman logs don't show any error whatsoever and when issued the command >>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>>>> site, which is why cinder is throwing an error stating: >>>>>>> >>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>>>> finding address for >>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>> Unable to establish connection to >>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>>>> NewConnectionError('>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>>>> ECONNREFUSED',)) >>>>>>> >>>>>>> Now i need to find out why the port is not listed as the glance >>>>>>> service is running, which i am not sure how to find out. >>>>>>> >>>>>> >>>>>> One other thing to investigate is whether your deployment includes >>>>>> this patch [1]. If it does, then bear in mind >>>>>> the glance-api service running at the edge site will be an "internal" >>>>>> (non public facing) instance that uses port 9293 >>>>>> instead of 9292. You should familiarize yourself with the release >>>>>> note [2]. >>>>>> >>>>>> [1] >>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>>>> [2] >>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>>>> >>>>>> Alan >>>>>> >>>>>> >>>>>>> With regards, >>>>>>> Swogat Pradhan >>>>>>> >>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Update: >>>>>>>>> Here is the log when creating a volume using cirros image: >>>>>>>>> >>>>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> [{'url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>>>> } >>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>>>> >>>>>>>> >>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>>>> 0.16 MB/s suggests you have a network issue. >>>>>>>> >>>>>>>> John Fulton previously stated your cinder-volume service at the >>>>>>>> edge site is not using the local ceph image store. Assuming you are >>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should >>>>>>>> be configured to use the local glance service [2]. You should check >>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>>>> [2] >>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>>>> >>>>>>>> Alan >>>>>>>> >>>>>>>> >>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>> category=FutureWarning) >>>>>>>>> >>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>> category=FutureWarning) >>>>>>>>> >>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>>>> MB/s >>>>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>>>> >>>>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Swogat Pradhan >>>>>>>>> >>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Jhon, >>>>>>>>>> This seems to be an issue. >>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the >>>>>>>>>> --cluster parameter was specified to the respective cluster names but the >>>>>>>>>> config files were created in the name of ceph.conf and keyring was >>>>>>>>>> ceph.client.openstack.keyring. >>>>>>>>>> >>>>>>>>>> Which created issues in glance as well as the naming convention >>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename >>>>>>>>>> the central ceph conf file as such: >>>>>>>>>> >>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>>>> total 16 >>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>>>> ceph_central.client.openstack.keyring >>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>>>> ceph.client.openstack.keyring >>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>>>> >>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of >>>>>>>>>> the respective clusters in both dcn01 and dcn02. >>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>>>>>> for accessing central ceph cluster. >>>>>>>>>> >>>>>>>>>> glance multistore config: >>>>>>>>>> [dcn02] >>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>> rbd_store_user=openstack >>>>>>>>>> rbd_store_pool=images >>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>> store_description=dcn02 rbd glance store >>>>>>>>>> >>>>>>>>>> [ceph_central] >>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>>>> rbd_store_user=openstack >>>>>>>>>> rbd_store_pool=images >>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>> store_description=Default glance store backend. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Swogat Pradhan >>>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>>>> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Hi, >>>>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>>>> >>>>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>>>> >>>>>>>>>>> I hope this is not a production system since the mailing list >>>>>>>>>>> now has >>>>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>>>> >>>>>>>>>>> The section that looks like this: >>>>>>>>>>> >>>>>>>>>>> [tripleo_ceph] >>>>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>> rbd_user=openstack >>>>>>>>>>> rbd_pool=volumes >>>>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>>>> rbd_secret_uuid= >>>>>>>>>>> report_discard_supported=True >>>>>>>>>>> >>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not >>>>>>>>>>> the >>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure >>>>>>>>>>> the >>>>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>>>> >>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID >>>>>>>>>>> of the >>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so >>>>>>>>>>> that >>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. >>>>>>>>>>> This >>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>>>> secret-get-value $FSID`. >>>>>>>>>>> >>>>>>>>>>> The documentation describes how to configure the central and DCN >>>>>>>>>>> sites >>>>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>>>> following >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> > >>>>>>>>>>> > Ceph Output: >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB >>>>>>>>>>> 2 excl >>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB >>>>>>>>>>> 2 >>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB >>>>>>>>>>> 2 >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>>>> > >>>>>>>>>>> > Attached the cinder config. >>>>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>>>> > >>>>>>>>>>> > With regards, >>>>>>>>>>> > Swogat Pradhan >>>>>>>>>>> > >>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton < >>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>>>> config. >>>>>>>>>>> >> >>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Update: >>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it >>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>>>> >>> The image size is 389 MB. >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Hi Jhon, >>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images >>>>>>>>>>> created after importing from the central site. >>>>>>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>>>>>> time for the volume to get created. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>>>> getting created properly without any errors. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>> but getting checksum failed error. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> With regards, >>>>>>>>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>>>> >>>>> wrote: >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>>>> launch the VM from volume. >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>>>>> to the edge glance. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Try following this document and making the same >>>>>>>>>>> observations in your >>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>>>> excl >>>>>>>>>>> >>>>> $ >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>>>>>> which is on >>>>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>>>> encountering >>>>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and >>>>>>>>>>> be copied >>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted >>>>>>>>>>> on DCN sites. >>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is >>>>>>>>>>> booted, then the >>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>>>>> will boot as >>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site >>>>>>>>>>> has access to >>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>>>> booting of >>>>>>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>>>>>> advance, >>>>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN >>>>>>>>>>> site and >>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> John >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>>>> then update. >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > With regards, >>>>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> Update: >>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is >>>>>>>>>>> showing down. >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad >>>>>>>>>>> (lacp=active). >>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities >>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>> running command. >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO >>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 >>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>>>> template mentioned here ?: >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand >>>>>>>>>>> why the instance is stuck in spawning state. >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> With regards, >>>>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Does your environment use different network >>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything >>>>>>>>>>> on it? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>>>> while spawning the instance. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP >>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly >>>>>>>>>>> sounds like some kind of network issue. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some >>>>>>>>>>> time ago in this thread: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the >>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to >>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit >>>>>>>>>>> is running, this will most likely impact client IO depending on your load. >>>>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>>>> rebuild. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>>>> a better advice. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>> >>>>> >>>> Eugen >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi >>>>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe >>>>>>>>>>> but not due to packet >>>>>>>>>>> >>>>> >>>> loss. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> with regards, >>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>>>> checked when >>>>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>>>>> stuck at spawning >>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>>>> sure if packet loss >>>>>>>>>>> >>>>> >>>> causes this. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>>>> identical between >>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through >>>>>>>>>>> the tunnel? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' >>>>>>>>>>> or 'cc' as i am not >>>>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# >>>>>>>>>>> rabbitmqctl list_policies -p >>>>>>>>>>> >>>>> >>>> / >>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>>>>>> priority >>>>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>>>>>> down when i am >>>>>>>>>>> >>>>> >>>> trying >>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>>>> spawning state and >>>>>>>>>>> >>>>> >>>> then >>>>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>>>> edge sites. >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>>>>> directly, i am >>>>>>>>>>> >>>>> >>>> checking >>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>>>>> reply. >>>>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>>>> occurred. >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>>>>> activities in the >>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>>>> site.* >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are >>>>>>>>>>> the details: >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times >>>>>>>>>>> but the issue is >>>>>>>>>>> >>>>> >>>> still >>>>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>>>> cluster_status >>>>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: >>>>>>>>>>> clustering, purpose: >>>>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>>>>> amqp, purpose: AMQP >>>>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>>>> purpose: HTTP API >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova >>>>>>>>>>> api log. >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING >>>>>>>>>>> nova.cache_utils >>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Cache enabled >>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>>>>> where i am trying to >>>>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>>>>> down (openstack >>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>>>> restart the nova >>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>>>> nova.compute.manager >>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - >>>>>>>>>>> - -] Running >>>>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>>>> >>>>> >>>> to >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO >>>>>>>>>>> nova.compute.claims >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>>>> successful on node >>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>>>>> supplied device >>>>>>>>>>> >>>>> >>>> name: >>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied >>>>>>>>>>> dev names >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>>>> nova.virt.block_device >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>>>> with volume >>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING >>>>>>>>>>> nova.cache_utils >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Cache enabled >>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO >>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Running >>>>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>>>> '--config-file', >>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>>>> '--privsep_context', >>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>>>> '--privsep_sock_path', >>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO >>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Spawned new >>>>>>>>>>> >>>>> >>>> privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Process >>>>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while >>>>>>>>>>> running command. >>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>>>>> image >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Thu Mar 23 16:01:16 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 23 Mar 2023 21:31:16 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi, Can someone please help me identify the issue here? Latest cinder-volume logs from dcn02: (ATTACHED) The volume is stuck in creating state. With regards, Swogat Pradhan On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan wrote: > Hi Jhon, > Thank you for clarifying that. > Right now the cinder volume is stuck in *creating *state when adding > image as volume source. > But when creating an empty volume the volumes are getting created > successfully without any errors. > > We are getting volume creation request in cinder-volume.log as such: > 2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume > [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db > 4160ce999a31485fa643aed0936dfef0 - - -] Volume > 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with > specification: {'status': 'creating', 'volume_name': > 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1, > 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': > ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', > 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', > 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', > 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, > 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', > 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': > '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', > 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': > datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), > 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, > tzinfo=datetime.timezone.utc), 'locations': [{'url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'ceph'}}, {'url': > 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'metadata': {'store': 'dcn02'}}], 'direct_url': > 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', > 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', > 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', > 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', > 'owner_specified.openstack.object': 'images/cirros', > 'owner_specified.openstack.sha256': ''}}, 'image_service': > } > > But there is nothing else after that and the volume doesn't even timeout, > it just gets stuck in creating state. > Can you advise what might be the issue here? > All the containers are in a healthy state now. > > With regards, > Swogat Pradhan > > > On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop wrote: > >> >> >> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan >> wrote: >> >>> Hi, >>> Is this bind not required for cinder_scheduler container? >>> >>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", >>> I do not see this particular bind on the cinder scheduler containers on >>> my controller nodes. >>> >> >> That is correct, because the scheduler does not access the ceph cluster. >> >> Alan >> >> >>> With regards, >>> Swogat Pradhan >>> >>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Cinder volume config: >>>> >>>> [tripleo_ceph] >>>> volume_backend_name=tripleo_ceph >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>> rbd_user=openstack >>>> rbd_pool=volumes >>>> rbd_flatten_volume_from_snapshot=False >>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >>>> report_discard_supported=True >>>> rbd_ceph_conf=/etc/ceph/dcn02.conf >>>> rbd_cluster_name=dcn02 >>>> >>>> Glance api config: >>>> >>>> [dcn02] >>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=dcn02 rbd glance store >>>> [ceph] >>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>> rbd_store_user=openstack >>>> rbd_store_pool=images >>>> rbd_thin_provisioning=False >>>> store_description=Default glance store backend. >>>> >>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> I still have the same issue, I'm not sure what's left to try. >>>>> All the pods are now in a healthy state, I am getting log entries 3 >>>>> mins after I hit the create volume button in cinder-volume when I try to >>>>> create a volume with an image. >>>>> And the volumes are just stuck in creating state for more than 20 mins >>>>> now. >>>>> >>>>> Cinder logs: >>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>>>> cinder-volume RPC version 3.17 as minimum service version. >>>>> 2023-03-22 20:34:59.166 108 INFO >>>>> cinder.volume.flows.manager.create_volume >>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>>>> specification: {'status': 'creating', 'volume_name': >>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>> } >>>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>>>> Hi Adam, >>>>>>> The systems are in same LAN, in this case it seemed like the image >>>>>>> was getting pulled from the central site which was caused due to an >>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>>>> directory, which seems to have been resolved after the changes i made to >>>>>>> fix it. >>>>>>> >>>>>>> Right now the glance api podman is running in unhealthy state and >>>>>>> the podman logs don't show any error whatsoever and when issued the command >>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>>>> site, which is why cinder is throwing an error stating: >>>>>>> >>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>>>> finding address for >>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>> Unable to establish connection to >>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>>>> NewConnectionError('>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>>>> ECONNREFUSED',)) >>>>>>> >>>>>>> Now i need to find out why the port is not listed as the glance >>>>>>> service is running, which i am not sure how to find out. >>>>>>> >>>>>> >>>>>> One other thing to investigate is whether your deployment includes >>>>>> this patch [1]. If it does, then bear in mind >>>>>> the glance-api service running at the edge site will be an "internal" >>>>>> (non public facing) instance that uses port 9293 >>>>>> instead of 9292. You should familiarize yourself with the release >>>>>> note [2]. >>>>>> >>>>>> [1] >>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>>>> [2] >>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>>>> >>>>>> Alan >>>>>> >>>>>> >>>>>>> With regards, >>>>>>> Swogat Pradhan >>>>>>> >>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Update: >>>>>>>>> Here is the log when creating a volume using cirros image: >>>>>>>>> >>>>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> [{'url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>>>> } >>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>>>> >>>>>>>> >>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>>>> 0.16 MB/s suggests you have a network issue. >>>>>>>> >>>>>>>> John Fulton previously stated your cinder-volume service at the >>>>>>>> edge site is not using the local ceph image store. Assuming you are >>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should >>>>>>>> be configured to use the local glance service [2]. You should check >>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>>>> [2] >>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>>>> >>>>>>>> Alan >>>>>>>> >>>>>>>> >>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>> category=FutureWarning) >>>>>>>>> >>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>> category=FutureWarning) >>>>>>>>> >>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>>>> MB/s >>>>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>>>> >>>>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Swogat Pradhan >>>>>>>>> >>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Jhon, >>>>>>>>>> This seems to be an issue. >>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the >>>>>>>>>> --cluster parameter was specified to the respective cluster names but the >>>>>>>>>> config files were created in the name of ceph.conf and keyring was >>>>>>>>>> ceph.client.openstack.keyring. >>>>>>>>>> >>>>>>>>>> Which created issues in glance as well as the naming convention >>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename >>>>>>>>>> the central ceph conf file as such: >>>>>>>>>> >>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>>>> total 16 >>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>>>> ceph_central.client.openstack.keyring >>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>>>> ceph.client.openstack.keyring >>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>>>> >>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of >>>>>>>>>> the respective clusters in both dcn01 and dcn02. >>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the >>>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in >>>>>>>>>> for accessing central ceph cluster. >>>>>>>>>> >>>>>>>>>> glance multistore config: >>>>>>>>>> [dcn02] >>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>> rbd_store_user=openstack >>>>>>>>>> rbd_store_pool=images >>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>> store_description=dcn02 rbd glance store >>>>>>>>>> >>>>>>>>>> [ceph_central] >>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>>>> rbd_store_user=openstack >>>>>>>>>> rbd_store_pool=images >>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>> store_description=Default glance store backend. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Swogat Pradhan >>>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>>>> wrote: >>>>>>>>>>> > >>>>>>>>>>> > Hi, >>>>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>>>> >>>>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>>>> >>>>>>>>>>> I hope this is not a production system since the mailing list >>>>>>>>>>> now has >>>>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>>>> >>>>>>>>>>> The section that looks like this: >>>>>>>>>>> >>>>>>>>>>> [tripleo_ceph] >>>>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>> rbd_user=openstack >>>>>>>>>>> rbd_pool=volumes >>>>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>>>> rbd_secret_uuid= >>>>>>>>>>> report_discard_supported=True >>>>>>>>>>> >>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not >>>>>>>>>>> the >>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure >>>>>>>>>>> the >>>>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>>>> >>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID >>>>>>>>>>> of the >>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so >>>>>>>>>>> that >>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. >>>>>>>>>>> This >>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>>>> secret-get-value $FSID`. >>>>>>>>>>> >>>>>>>>>>> The documentation describes how to configure the central and DCN >>>>>>>>>>> sites >>>>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>>>> following >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> > >>>>>>>>>>> > Ceph Output: >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB >>>>>>>>>>> 2 excl >>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB 2 >>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB 2 >>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB 2 >>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB 2 >>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB 2 >>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB 2 >>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>>>> 2 yes >>>>>>>>>>> > >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB >>>>>>>>>>> 2 >>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB >>>>>>>>>>> 2 >>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>>>> > >>>>>>>>>>> > Attached the cinder config. >>>>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>>>> > >>>>>>>>>>> > With regards, >>>>>>>>>>> > Swogat Pradhan >>>>>>>>>>> > >>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton < >>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> in my last message under the line "On a DCN site if you run a >>>>>>>>>>> command like this:" I suggested some steps you could try to confirm the >>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>>>> config. >>>>>>>>>>> >> >>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Update: >>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it >>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>>>> >>> The image size is 389 MB. >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Hi Jhon, >>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images >>>>>>>>>>> created after importing from the central site. >>>>>>>>>>> >>>> But launching an instance normally fails as it takes a long >>>>>>>>>>> time for the volume to get created. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>>>> getting created properly without any errors. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>> but getting checksum failed error. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> With regards, >>>>>>>>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>>>> >>>>> wrote: >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>>>> launch the VM from volume. >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>>>>> to the edge glance. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Try following this document and making the same >>>>>>>>>>> observations in your >>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring >>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>>>> excl >>>>>>>>>>> >>>>> $ >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image >>>>>>>>>>> which is on >>>>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>>>> encountering >>>>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and >>>>>>>>>>> be copied >>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted >>>>>>>>>>> on DCN sites. >>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is >>>>>>>>>>> booted, then the >>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>>>>> will boot as >>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site >>>>>>>>>>> has access to >>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>>>> booting of >>>>>>>>>>> >>>>> the image will take time because it has not been copied in >>>>>>>>>>> advance, >>>>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN >>>>>>>>>>> site and >>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> John >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>>>> then update. >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > With regards, >>>>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>>>> >>>>> > >>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> Update: >>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is >>>>>>>>>>> showing down. >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> >>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad >>>>>>>>>>> (lacp=active). >>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities >>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>> running command. >>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO >>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 >>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>>>> template mentioned here ?: >>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand >>>>>>>>>>> why the instance is stuck in spawning state. >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> With regards, >>>>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> >>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Does your environment use different network >>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything >>>>>>>>>>> on it? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>>>> while spawning the instance. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP >>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly >>>>>>>>>>> sounds like some kind of network issue. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some >>>>>>>>>>> time ago in this thread: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the >>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to >>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit >>>>>>>>>>> is running, this will most likely impact client IO depending on your load. >>>>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>>>> rebuild. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>>>> a better advice. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>> >>>>> >>>> Eugen >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi >>>>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe >>>>>>>>>>> but not due to packet >>>>>>>>>>> >>>>> >>>> loss. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> with regards, >>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>>>> checked when >>>>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>>>>> stuck at spawning >>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>>>> sure if packet loss >>>>>>>>>>> >>>>> >>>> causes this. >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>>>> identical between >>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through >>>>>>>>>>> the tunnel? >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan : >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' >>>>>>>>>>> or 'cc' as i am not >>>>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# >>>>>>>>>>> rabbitmqctl list_policies -p >>>>>>>>>>> >>>>> >>>> / >>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>>>> >>>>> >>>> > vhost name pattern apply-to definition >>>>>>>>>>> priority >>>>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes >>>>>>>>>>> down when i am >>>>>>>>>>> >>>>> >>>> trying >>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>>>> spawning state and >>>>>>>>>>> >>>>> >>>> then >>>>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>>>> edge sites. >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>>>>> directly, i am >>>>>>>>>>> >>>>> >>>> checking >>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>>>>> reply. >>>>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>>>> occurred. >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>>>>> activities in the >>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>>>> site.* >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are >>>>>>>>>>> the details: >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times >>>>>>>>>>> but the issue is >>>>>>>>>>> >>>>> >>>> still >>>>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>>>> cluster_status >>>>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ >>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: >>>>>>>>>>> inter-node and CLI >>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>> API >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: >>>>>>>>>>> clustering, purpose: >>>>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>>>>> amqp, purpose: AMQP >>>>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>>>> purpose: HTTP API >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova >>>>>>>>>>> api log. >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING >>>>>>>>>>> nova.cache_utils >>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Cache enabled >>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] >>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>> exist, drop reply to >>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>> - -] The reply >>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>>>>> after 60 seconds >>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>>>>> where i am trying to >>>>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>>>>> down (openstack >>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>>>> restart the nova >>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>>>> nova.compute.manager >>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - >>>>>>>>>>> - -] Running >>>>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>>>> >>>>> >>>> to >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO >>>>>>>>>>> nova.compute.claims >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>>>> successful on node >>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>>>>> supplied device >>>>>>>>>>> >>>>> >>>> name: >>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied >>>>>>>>>>> dev names >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>>>> nova.virt.block_device >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>>>> with volume >>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING >>>>>>>>>>> nova.cache_utils >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Cache enabled >>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO >>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Running >>>>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>>>> '--config-file', >>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>>>> '--privsep_context', >>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>>>> '--privsep_sock_path', >>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO >>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Spawned new >>>>>>>>>>> >>>>> >>>> privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh): >>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] Process >>>>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while >>>>>>>>>>> running command. >>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>> default] [instance: >>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>>>>> image >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 2023-03-23 12:36:23.852 108 INFO cinder.volume.flows.manager.create_volume [req-e196679a-cf81-447d-9dc9-0b1b397b0849 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume d714560e-aca9-4fac-8d2d-f8be86d58c2e: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-d714560e-aca9-4fac-8d2d-f8be86d58c2e', 'volume_size': 1, 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } 2023-03-23 15:49:45.182 108 INFO cinder.volume.flows.manager.create_volume [req-c7be83db-2ddc-413b-a176-cf36adc9435e b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume 72a23a26-1b04-49fc-86d1-453d9b021b68: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-72a23a26-1b04-49fc-86d1-453d9b021b68', 'volume_size': 50, 'image_id': '2735662a-ad29-497b-b7c8-edb235594769', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://cec7cdfd-3667-57f1-afaf-5dfca9b0e975/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn01'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '2735662a-ad29-497b-b7c8-edb235594769', 'created_at': datetime.datetime(2023, 3, 23, 15, 28, 41, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 23, 15, 34, 44, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://cec7cdfd-3667-57f1-afaf-5dfca9b0e975/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn01'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'tags': [], 'file': '/v2/images/2735662a-ad29-497b-b7c8-edb235594769/file', 'stores': 'ceph,dcn01,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': } From abishop at redhat.com Thu Mar 23 17:05:30 2023 From: abishop at redhat.com (Alan Bishop) Date: Thu, 23 Mar 2023 10:05:30 -0700 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: On Thu, Mar 23, 2023 at 9:01?AM Swogat Pradhan wrote: > Hi, > Can someone please help me identify the issue here? > Latest cinder-volume logs from dcn02: > (ATTACHED) > It's really not possible to analyze what's happening with just one or two log entries. Do you have debug logs enabled? One thing I noticed is the glance image's disk_format is qcow2. You should use "raw" images with ceph RBD. Alan > > The volume is stuck in creating state. > > With regards, > Swogat Pradhan > > On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan > wrote: > >> Hi Jhon, >> Thank you for clarifying that. >> Right now the cinder volume is stuck in *creating *state when adding >> image as volume source. >> But when creating an empty volume the volumes are getting created >> successfully without any errors. >> >> We are getting volume creation request in cinder-volume.log as such: >> 2023-03-23 12:34:40.152 108 INFO >> cinder.volume.flows.manager.create_volume >> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db >> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with >> specification: {'status': 'creating', 'volume_name': >> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1, >> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': >> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': >> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), >> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, >> tzinfo=datetime.timezone.utc), 'locations': [{'url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> 'metadata': {'store': 'ceph'}}, {'url': >> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> 'metadata': {'store': 'dcn02'}}], 'direct_url': >> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', >> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >> 'owner_specified.openstack.object': 'images/cirros', >> 'owner_specified.openstack.sha256': ''}}, 'image_service': >> } >> >> But there is nothing else after that and the volume doesn't even timeout, >> it just gets stuck in creating state. >> Can you advise what might be the issue here? >> All the containers are in a healthy state now. >> >> With regards, >> Swogat Pradhan >> >> >> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop wrote: >> >>> >>> >>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan < >>> swogatpradhan22 at gmail.com> wrote: >>> >>>> Hi, >>>> Is this bind not required for cinder_scheduler container? >>>> >>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", >>>> I do not see this particular bind on the cinder scheduler containers on >>>> my controller nodes. >>>> >>> >>> That is correct, because the scheduler does not access the ceph cluster. >>> >>> Alan >>> >>> >>>> With regards, >>>> Swogat Pradhan >>>> >>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Cinder volume config: >>>>> >>>>> [tripleo_ceph] >>>>> volume_backend_name=tripleo_ceph >>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>> rbd_user=openstack >>>>> rbd_pool=volumes >>>>> rbd_flatten_volume_from_snapshot=False >>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >>>>> report_discard_supported=True >>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf >>>>> rbd_cluster_name=dcn02 >>>>> >>>>> Glance api config: >>>>> >>>>> [dcn02] >>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >>>>> rbd_store_user=openstack >>>>> rbd_store_pool=images >>>>> rbd_thin_provisioning=False >>>>> store_description=dcn02 rbd glance store >>>>> [ceph] >>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>> rbd_store_user=openstack >>>>> rbd_store_pool=images >>>>> rbd_thin_provisioning=False >>>>> store_description=Default glance store backend. >>>>> >>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>>> I still have the same issue, I'm not sure what's left to try. >>>>>> All the pods are now in a healthy state, I am getting log entries 3 >>>>>> mins after I hit the create volume button in cinder-volume when I try to >>>>>> create a volume with an image. >>>>>> And the volumes are just stuck in creating state for more than 20 >>>>>> mins now. >>>>>> >>>>>> Cinder logs: >>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>>>>> cinder-volume RPC version 3.17 as minimum service version. >>>>>> 2023-03-22 20:34:59.166 108 INFO >>>>>> cinder.volume.flows.manager.create_volume >>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>>>>> specification: {'status': 'creating', 'volume_name': >>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> [{'url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>> } >>>>>> >>>>>> With regards, >>>>>> Swogat Pradhan >>>>>> >>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Adam, >>>>>>>> The systems are in same LAN, in this case it seemed like the image >>>>>>>> was getting pulled from the central site which was caused due to an >>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>>>>> directory, which seems to have been resolved after the changes i made to >>>>>>>> fix it. >>>>>>>> >>>>>>>> Right now the glance api podman is running in unhealthy state and >>>>>>>> the podman logs don't show any error whatsoever and when issued the command >>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>>>>> site, which is why cinder is throwing an error stating: >>>>>>>> >>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>>>>> finding address for >>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>>> Unable to establish connection to >>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>>>>> NewConnectionError('>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>>>>> ECONNREFUSED',)) >>>>>>>> >>>>>>>> Now i need to find out why the port is not listed as the glance >>>>>>>> service is running, which i am not sure how to find out. >>>>>>>> >>>>>>> >>>>>>> One other thing to investigate is whether your deployment includes >>>>>>> this patch [1]. If it does, then bear in mind >>>>>>> the glance-api service running at the edge site will be an >>>>>>> "internal" (non public facing) instance that uses port 9293 >>>>>>> instead of 9292. You should familiarize yourself with the release >>>>>>> note [2]. >>>>>>> >>>>>>> [1] >>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>>>>> [2] >>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>>>>> >>>>>>> Alan >>>>>>> >>>>>>> >>>>>>>> With regards, >>>>>>>> Swogat Pradhan >>>>>>>> >>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Update: >>>>>>>>>> Here is the log when creating a volume using cirros image: >>>>>>>>>> >>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> [{'url': >>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>>>>> } >>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>>>>> >>>>>>>>> >>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>>>>> 0.16 MB/s suggests you have a network issue. >>>>>>>>> >>>>>>>>> John Fulton previously stated your cinder-volume service at the >>>>>>>>> edge site is not using the local ceph image store. Assuming you are >>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should >>>>>>>>> be configured to use the local glance service [2]. You should check >>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>>>>> [2] >>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>>>>> >>>>>>>>> Alan >>>>>>>>> >>>>>>>>> >>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>>> category=FutureWarning) >>>>>>>>>> >>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>>> category=FutureWarning) >>>>>>>>>> >>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>>>>> MB/s >>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>>>>> >>>>>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Swogat Pradhan >>>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Jhon, >>>>>>>>>>> This seems to be an issue. >>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the >>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the >>>>>>>>>>> config files were created in the name of ceph.conf and keyring was >>>>>>>>>>> ceph.client.openstack.keyring. >>>>>>>>>>> >>>>>>>>>>> Which created issues in glance as well as the naming convention >>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename >>>>>>>>>>> the central ceph conf file as such: >>>>>>>>>>> >>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>>>>> total 16 >>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>>>>> ceph_central.client.openstack.keyring >>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>>>>> ceph.client.openstack.keyring >>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>>>>> >>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of >>>>>>>>>>> the respective clusters in both dcn01 and dcn02. >>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are >>>>>>>>>>> the files used to access dcn02 ceph cluster and ceph_central* files are >>>>>>>>>>> used in for accessing central ceph cluster. >>>>>>>>>>> >>>>>>>>>>> glance multistore config: >>>>>>>>>>> [dcn02] >>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>> rbd_store_user=openstack >>>>>>>>>>> rbd_store_pool=images >>>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>>> store_description=dcn02 rbd glance store >>>>>>>>>>> >>>>>>>>>>> [ceph_central] >>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>>>>> rbd_store_user=openstack >>>>>>>>>>> rbd_store_pool=images >>>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>>> store_description=Default glance store backend. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> With regards, >>>>>>>>>>> Swogat Pradhan >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>>>>> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Hi, >>>>>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>>>>> >>>>>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>>>>> >>>>>>>>>>>> I hope this is not a production system since the mailing list >>>>>>>>>>>> now has >>>>>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>>>>> >>>>>>>>>>>> The section that looks like this: >>>>>>>>>>>> >>>>>>>>>>>> [tripleo_ceph] >>>>>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>>> rbd_user=openstack >>>>>>>>>>>> rbd_pool=volumes >>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>>>>> rbd_secret_uuid= >>>>>>>>>>>> report_discard_supported=True >>>>>>>>>>>> >>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and >>>>>>>>>>>> not the >>>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure >>>>>>>>>>>> the >>>>>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>>>>> >>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID >>>>>>>>>>>> of the >>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so >>>>>>>>>>>> that >>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. >>>>>>>>>>>> This >>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>>>>> secret-get-value $FSID`. >>>>>>>>>>>> >>>>>>>>>>>> The documentation describes how to configure the central and >>>>>>>>>>>> DCN sites >>>>>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>>>>> following >>>>>>>>>>>> it. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>>>>> >>>>>>>>>>>> John >>>>>>>>>>>> >>>>>>>>>>>> > >>>>>>>>>>>> > Ceph Output: >>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB >>>>>>>>>>>> 2 excl >>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>>>>> 2 yes >>>>>>>>>>>> > >>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB >>>>>>>>>>>> 2 >>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>>>>> > >>>>>>>>>>>> > Attached the cinder config. >>>>>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>>>>> > >>>>>>>>>>>> > With regards, >>>>>>>>>>>> > Swogat Pradhan >>>>>>>>>>>> > >>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton < >>>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run >>>>>>>>>>>> a command like this:" I suggested some steps you could try to confirm the >>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>>>>> config. >>>>>>>>>>>> >> >>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> Update: >>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it >>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>>>>> >>> The image size is 389 MB. >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> Hi Jhon, >>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images >>>>>>>>>>>> created after importing from the central site. >>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a >>>>>>>>>>>> long time for the volume to get created. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>>>>> getting created properly without any errors. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>>> but getting checksum failed error. >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> With regards, >>>>>>>>>>>> >>>> Swogat Pradhan >>>>>>>>>>>> >>>> >>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>>>>> >>>>> wrote: >>>>>>>>>>>> >>>>> > >>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>>>>> launch the VM from volume. >>>>>>>>>>>> >>>>> > >>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block >>>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins >>>>>>>>>>>> for the volume to be created, whereas the image has already been imported >>>>>>>>>>>> to the edge glance. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Try following this document and making the same >>>>>>>>>>>> observations in your >>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf >>>>>>>>>>>> --keyring >>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>>>>> excl >>>>>>>>>>>> >>>>> $ >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the >>>>>>>>>>>> image which is on >>>>>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>>>>> encountering >>>>>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance >>>>>>>>>>>> and be copied >>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted >>>>>>>>>>>> on DCN sites. >>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is >>>>>>>>>>>> booted, then the >>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image >>>>>>>>>>>> will boot as >>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site >>>>>>>>>>>> has access to >>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>>>>> booting of >>>>>>>>>>>> >>>>> the image will take time because it has not been copied >>>>>>>>>>>> in advance, >>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN >>>>>>>>>>>> site and >>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> John >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> > >>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>>>>> then update. >>>>>>>>>>>> >>>>> > >>>>>>>>>>>> >>>>> > With regards, >>>>>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>>>>> >>>>> > >>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>> >>>>> >> >>>>>>>>>>>> >>>>> >> Update: >>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is >>>>>>>>>>>> showing down. >>>>>>>>>>>> >>>>> >> >>>>>>>>>>>> >>>>> >> >>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad >>>>>>>>>>>> (lacp=active). >>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities >>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>> running command. >>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO >>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 >>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>>>>> template mentioned here ?: >>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand >>>>>>>>>>>> why the instance is stuck in spawning state. >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> With regards, >>>>>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Does your environment use different network >>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything >>>>>>>>>>>> on it? >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>>>>> while spawning the instance. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP >>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly >>>>>>>>>>>> sounds like some kind of network issue. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block < >>>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some >>>>>>>>>>>> time ago in this thread: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for >>>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your >>>>>>>>>>>> nova and neutron versions are different between central and edge site? Have >>>>>>>>>>>> you restarted nova and neutron services on the compute nodes after >>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the >>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to >>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit >>>>>>>>>>>> is running, this will most likely impact client IO depending on your load. >>>>>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>>>>> rebuild. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while >>>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the >>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>>>>> a better advice. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>>> >>>>> >>>> Eugen >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan >>>>>>>>>>> >: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Hi >>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe >>>>>>>>>>>> but not due to packet >>>>>>>>>>>> >>>>> >>>> loss. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> with regards, >>>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>>>>> checked when >>>>>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets >>>>>>>>>>>> stuck at spawning >>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>>>>> sure if packet loss >>>>>>>>>>>> >>>>> >>>> causes this. >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>>>>> identical between >>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss >>>>>>>>>>>> through the tunnel? >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan >>>>>>>>>>> >: >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' >>>>>>>>>>>> or 'cc' as i am not >>>>>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# >>>>>>>>>>>> rabbitmqctl list_policies -p >>>>>>>>>>>> >>>>> >>>> / >>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>>>>> >>>>> >>>> > vhost name pattern apply-to >>>>>>>>>>>> definition priority >>>>>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only >>>>>>>>>>>> goes down when i am >>>>>>>>>>>> >>>>> >>>> trying >>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>>>>> spawning state and >>>>>>>>>>>> >>>>> >>>> then >>>>>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>>>>> edge sites. >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me >>>>>>>>>>>> directly, i am >>>>>>>>>>>> >>>>> >>>> checking >>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your >>>>>>>>>>>> reply. >>>>>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>>>>> occurred. >>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other >>>>>>>>>>>> activities in the >>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>>>>> site.* >>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are >>>>>>>>>>>> the details: >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest >>>>>>>>>>>> ]: >>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple >>>>>>>>>>>> times but the issue is >>>>>>>>>>>> >>>>> >>>> still >>>>>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>>>>> cluster_status >>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>>>>>>>>>> RabbitMQ >>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>>>>>>>>>> RabbitMQ >>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>>>>>>>>>> RabbitMQ >>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: >>>>>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>>> API >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>>> API >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP >>>>>>>>>>>> API >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: >>>>>>>>>>>> clustering, purpose: >>>>>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: >>>>>>>>>>>> amqp, purpose: AMQP >>>>>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>>>>> purpose: HTTP API >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova >>>>>>>>>>>> api log. >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - >>>>>>>>>>>> - -] The reply >>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send >>>>>>>>>>>> after 60 seconds >>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] The reply >>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send >>>>>>>>>>>> after 60 seconds >>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] The reply >>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send >>>>>>>>>>>> after 60 seconds >>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING >>>>>>>>>>>> nova.cache_utils >>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] Cache enabled >>>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] >>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - >>>>>>>>>>>> - -] The reply >>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send >>>>>>>>>>>> after 60 seconds >>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < >>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 >>>>>>>>>>>> where i am trying to >>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes >>>>>>>>>>>> down (openstack >>>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>>>>> restart the nova >>>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>>>>> nova.compute.manager >>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - >>>>>>>>>>>> - - -] Running >>>>>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>>>>> >>>>> >>>> to >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO >>>>>>>>>>>> nova.compute.claims >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] [instance: >>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>>>>> successful on node >>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] [instance: >>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring >>>>>>>>>>>> supplied device >>>>>>>>>>>> >>>>> >>>> name: >>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied >>>>>>>>>>>> dev names >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>>>>> nova.virt.block_device >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] [instance: >>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>>>>> with volume >>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at >>>>>>>>>>>> /dev/vda >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING >>>>>>>>>>>> nova.cache_utils >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] Cache enabled >>>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO >>>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] Running >>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>>>>> '--config-file', >>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>>>>> '--privsep_context', >>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>>>>> '--privsep_sock_path', >>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO >>>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] Spawned new >>>>>>>>>>>> >>>>> >>>> privsep >>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities >>>>>>>>>>>> (eff/prm/inh): >>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] Process >>>>>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while >>>>>>>>>>>> running command. >>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>> default] [instance: >>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating >>>>>>>>>>>> image >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From adetoyeanointing15 at gmail.com Fri Mar 24 00:06:29 2023 From: adetoyeanointing15 at gmail.com (Anointing Adetoye) Date: Fri, 24 Mar 2023 01:06:29 +0100 Subject: OUTREACHY INITIAL APPLICANT Message-ID: Good day I have been unable to make relevant contributions because I don't know how to start contributing or converse on the channel. Also, I am a golang developer that is just picking C and I would really like to use this opportunity to build my knowledge in C. I will appreciate it if i get pointed to how to make contribution and also utilize the few weeks remaining to make relevant contribution to the project. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lsofia.enriquez at gmail.com Fri Mar 24 10:42:53 2023 From: lsofia.enriquez at gmail.com (Sofia Enriquez) Date: Fri, 24 Mar 2023 10:42:53 +0000 Subject: OUTREACHY INITIAL APPLICANT In-Reply-To: References: Message-ID: Hi Anointing, I hope this email finds you well. There's projects for Cinder, Manila, and Glance. Could you please let me know which project you are interested in? I wanted to address the concern you mentioned about being unable to make relevant contributions. I understand that you are struggling with knowing how to start contributing or chatting on the channel. I recommend reviewing the how to contribute section on the Outreachy portal, where you will find a detailed explanation that may be helpful. Additionally, I appreciate your interest in using this opportunity to build your knowledge in C, but I wanted to clarify that if you are referring to the Extending Automated Validation of API-ref project, it explicitly requires Python development. However, if you are interested in other projects, please feel free to contact the mentor as soon as possible so they can provide further assistance. Thank you for your time and consideration. I look forward to hearing back from you soon. Best regards, Sofia El vie, 24 mar 2023 a la(s) 00:06, Anointing Adetoye ( adetoyeanointing15 at gmail.com) escribi?: > Good day > I have been unable to make relevant contributions because I don't know how > to start contributing or converse on the channel. Also, I am a golang > developer that is just picking C and I would really like to use this > opportunity to build my knowledge in C. > > I will appreciate it if i get pointed to how to make contribution and also > utilize the few weeks remaining to make relevant contribution to the > project. > -- Sofia Enriquez -------------- next part -------------- An HTML attachment was scrubbed... URL: From hiromu.asahina.az at hco.ntt.co.jp Fri Mar 24 15:38:21 2023 From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina) Date: Sat, 25 Mar 2023 00:38:21 +0900 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Message-ID: As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic members, please kindly reply convenient timeslots. [1] https://ptg.opendev.org/ptg.html Thanks, Hiromu Asahina On 2023/03/22 20:01, Hiromu Asahina wrote: > Thanks! > > I look forward to your reply. > > On 2023/03/22 1:29, Julia Kreger wrote: >> No worries! >> >> I think that time works for me. I'm not sure it will work for >> everyone, but >> I can proxy information back to the whole of the ironic project as we >> also >> have the question of this functionality listed for our Operator Hour in >> order to help ironic gauge interest. >> >> -Julia >> >> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < >> hiromu.asahina.az at hco.ntt.co.jp> wrote: >> >>> I apologize that I couldn't reply before the Ironic meeting on Monday. >>> >>> I need one slot to discuss this topic. >>> >>> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, >>> 27)[1,2] works for them. Does this work for Ironic? I understand not all >>> Ironic members will join this discussion, so I hope we can arrange a >>> convenient date for you two at least and, hopefully, for those >>> interested in this topic. >>> >>> [1] >>> >>> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z >>> [2] https://ptg.opendev.org/ptg.html >>> >>> Thanks, >>> Hiromu Asahina >>> >>> On 2023/03/17 23:29, Julia Kreger wrote: >>>> I'm not sure how many Ironic contributors would be the ones to attend a >>>> discussion, in part because this is disjointed from the items they need >>> to >>>> focus on. It is much more of a "big picture" item for those of us >>>> who are >>>> leaders in the project. >>>> >>>> I think it would help to understand how much time you expect the >>> discussion >>>> to take to determine a path forward and how we can collaborate. Ironic >>> has >>>> a huge number of topics we want to discuss during the PTG, and I >>>> suspect >>>> our team meeting on Monday next week should yield more >>>> interest/awareness >>>> as well as an amount of time for each topic which will aid us in >>> scheduling. >>>> >>>> If you can let us know how long, then I think we can figure out when >>>> the >>>> best day/time will be. >>>> >>>> Thanks! >>>> >>>> -Julia >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < >>>> hiromu.asahina.az at hco.ntt.co.jp> wrote: >>>> >>>>> Thank you for your reply. >>>>> >>>>> I'd like to decide the time slot for this topic. >>>>> I just checked PTG schedule [1]. >>>>> >>>>> We have the following time slots. Which one is convenient to gether? >>>>> (I didn't get reply but I listed Barbican, as its cores are almost the >>>>> same as Keystone) >>>>> >>>>> Mon, 27: >>>>> >>>>> - 14 (keystone) >>>>> - 15 (keystone) >>>>> >>>>> Tue, 28 >>>>> >>>>> - 13 (barbican) >>>>> - 14 (keystone, ironic) >>>>> - 15 (keysonte, ironic) >>>>> - 16 (ironic) >>>>> >>>>> Wed, 29 >>>>> >>>>> - 13 (ironic) >>>>> - 14 (keystone, ironic) >>>>> - 15 (keystone, ironic) >>>>> - 21 (ironic) >>>>> >>>>> Thanks, >>>>> >>>>> [1] https://ptg.opendev.org/ptg.html >>>>> >>>>> Hiromu Asahina >>>>> >>>>> >>>>> On 2023/02/11 1:41, Jay Faulkner wrote: >>>>>> I think it's safe to say the Ironic community would be very >>>>>> invested in >>>>>> such an effort. Let's make sure the time chosen for vPTG with this is >>>>> such >>>>>> that Ironic contributors can attend as well. >>>>>> >>>>>> Thanks, >>>>>> Jay Faulkner >>>>>> Ironic PTL >>>>>> >>>>>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < >>>>>> hiromu.asahina.az at hco.ntt.co.jp> wrote: >>>>>> >>>>>>> Hello Everyone, >>>>>>> >>>>>>> Recently, Tacker and Keystone have been working together on a new >>>>> Keystone >>>>>>> Middleware that can work with external authentication >>>>>>> services, such as Keycloak. The code has already been submitted [1], >>> but >>>>>>> we want to make this middleware a generic plugin that works >>>>>>> with as many OpenStack services as possible. To that end, we would >>> like >>>>> to >>>>>>> hear from other projects with similar use cases >>>>>>> (especially Ironic and Barbican, which run as standalone >>>>>>> services). We >>>>>>> will make a time slot to discuss this topic at the next vPTG. >>>>>>> Please contact me if you are interested and available to >>>>>>> participate. >>>>>>> >>>>>>> [1] >>> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 >>>>>>> >>>>>>> -- >>>>>>> Hiromu Asahina >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> ?-------------------------------------? >>>>> ????? NTT Network Innovation Center >>>>> ??????? Hiromu Asahina >>>>> ???? ------------------------------------- >>>>> ????? 3-9-11, Midori-cho, Musashino-shi >>>>> ??????? Tokyo 180-8585, Japan >>>>> Phone: +81-422-59-7008 >>>>> Email: hiromu.asahina.az at hco.ntt.co.jp >>>>> ?-------------------------------------? >>>>> >>>>> >>>> >>> >>> -- >>> ?-------------------------------------? >>> ???? NTT Network Innovation Center >>> ?????? Hiromu Asahina >>> ??? ------------------------------------- >>> ???? 3-9-11, Midori-cho, Musashino-shi >>> ?????? Tokyo 180-8585, Japan >>> Phone: +81-422-59-7008 >>> Email: hiromu.asahina.az at hco.ntt.co.jp >>> ?-------------------------------------? >>> >>> >> > -- ?-------------------------------------? NTT Network Innovation Center Hiromu Asahina ------------------------------------- 3-9-11, Midori-cho, Musashino-shi Tokyo 180-8585, Japan ? Phone: +81-422-59-7008 ? Email: hiromu.asahina.az at hco.ntt.co.jp ?-------------------------------------? From james.slagle at gmail.com Fri Mar 24 16:48:14 2023 From: james.slagle at gmail.com (James Slagle) Date: Fri, 24 Mar 2023 12:48:14 -0400 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> References: <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com> <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> Message-ID: On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann wrote: > > Hi James, TripleO team, > > Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL > as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects > - https://etherpad.opendev.org/p/2023.2-leaderless It doesn't look like we have any other volunteers, so I'm willing to do it. At the last PTG, we discussed and it was agreed that we would switch TripleO to the distributed project leadership model. However, given the drastic change in our focus, I personally think it makes more sense to continue with the PTL model for train/wallaby stable maintenance. I would ask any project members to reply here with +1/-1 to indicate agreement. [1] https://governance.openstack.org/tc/resolutions/20200803-distributed-project-leadership.html -- -- James Slagle -- From dwilde at redhat.com Fri Mar 24 16:54:50 2023 From: dwilde at redhat.com (Dave Wilde) Date: Fri, 24 Mar 2023 11:54:50 -0500 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Message-ID: I?m happy to book an additional time slot(s) specifically for this discussion if something other than what we currently have works better for everyone. Please let me know. /Dave On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina , wrote: > As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this > discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic > members, please kindly reply convenient timeslots. > > [1] https://ptg.opendev.org/ptg.html > > Thanks, > > Hiromu Asahina > > On 2023/03/22 20:01, Hiromu Asahina wrote: > > Thanks! > > > > I look forward to your reply. > > > > On 2023/03/22 1:29, Julia Kreger wrote: > > > No worries! > > > > > > I think that time works for me. I'm not sure it will work for > > > everyone, but > > > I can proxy information back to the whole of the ironic project as we > > > also > > > have the question of this functionality listed for our Operator Hour in > > > order to help ironic gauge interest. > > > > > > -Julia > > > > > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > I apologize that I couldn't reply before the Ironic meeting on Monday. > > > > > > > > I need one slot to discuss this topic. > > > > > > > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, > > > > 27)[1,2] works for them. Does this work for Ironic? I understand not all > > > > Ironic members will join this discussion, so I hope we can arrange a > > > > convenient date for you two at least and, hopefully, for those > > > > interested in this topic. > > > > > > > > [1] > > > > > > > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z > > > > [2] https://ptg.opendev.org/ptg.html > > > > > > > > Thanks, > > > > Hiromu Asahina > > > > > > > > On 2023/03/17 23:29, Julia Kreger wrote: > > > > > I'm not sure how many Ironic contributors would be the ones to attend a > > > > > discussion, in part because this is disjointed from the items they need > > > > to > > > > > focus on. It is much more of a "big picture" item for those of us > > > > > who are > > > > > leaders in the project. > > > > > > > > > > I think it would help to understand how much time you expect the > > > > discussion > > > > > to take to determine a path forward and how we can collaborate. Ironic > > > > has > > > > > a huge number of topics we want to discuss during the PTG, and I > > > > > suspect > > > > > our team meeting on Monday next week should yield more > > > > > interest/awareness > > > > > as well as an amount of time for each topic which will aid us in > > > > scheduling. > > > > > > > > > > If you can let us know how long, then I think we can figure out when > > > > > the > > > > > best day/time will be. > > > > > > > > > > Thanks! > > > > > > > > > > -Julia > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > > > > > Thank you for your reply. > > > > > > > > > > > > I'd like to decide the time slot for this topic. > > > > > > I just checked PTG schedule [1]. > > > > > > > > > > > > We have the following time slots. Which one is convenient to gether? > > > > > > (I didn't get reply but I listed Barbican, as its cores are almost the > > > > > > same as Keystone) > > > > > > > > > > > > Mon, 27: > > > > > > > > > > > > - 14 (keystone) > > > > > > - 15 (keystone) > > > > > > > > > > > > Tue, 28 > > > > > > > > > > > > - 13 (barbican) > > > > > > - 14 (keystone, ironic) > > > > > > - 15 (keysonte, ironic) > > > > > > - 16 (ironic) > > > > > > > > > > > > Wed, 29 > > > > > > > > > > > > - 13 (ironic) > > > > > > - 14 (keystone, ironic) > > > > > > - 15 (keystone, ironic) > > > > > > - 21 (ironic) > > > > > > > > > > > > Thanks, > > > > > > > > > > > > [1] https://ptg.opendev.org/ptg.html > > > > > > > > > > > > Hiromu Asahina > > > > > > > > > > > > > > > > > > On 2023/02/11 1:41, Jay Faulkner wrote: > > > > > > > I think it's safe to say the Ironic community would be very > > > > > > > invested in > > > > > > > such an effort. Let's make sure the time chosen for vPTG with this is > > > > > > such > > > > > > > that Ironic contributors can attend as well. > > > > > > > > > > > > > > Thanks, > > > > > > > Jay Faulkner > > > > > > > Ironic PTL > > > > > > > > > > > > > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > > > > > > > > > Hello Everyone, > > > > > > > > > > > > > > > > Recently, Tacker and Keystone have been working together on a new > > > > > > Keystone > > > > > > > > Middleware that can work with external authentication > > > > > > > > services, such as Keycloak. The code has already been submitted [1], > > > > but > > > > > > > > we want to make this middleware a generic plugin that works > > > > > > > > with as many OpenStack services as possible. To that end, we would > > > > like > > > > > > to > > > > > > > > hear from other projects with similar use cases > > > > > > > > (especially Ironic and Barbican, which run as standalone > > > > > > > > services). We > > > > > > > > will make a time slot to discuss this topic at the next vPTG. > > > > > > > > Please contact me if you are interested and available to > > > > > > > > participate. > > > > > > > > > > > > > > > > [1] > > > > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 > > > > > > > > > > > > > > > > -- > > > > > > > > Hiromu Asahina > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > ?-------------------------------------? > > > > > > ????? NTT Network Innovation Center > > > > > > ??????? Hiromu Asahina > > > > > > ???? ------------------------------------- > > > > > > ????? 3-9-11, Midori-cho, Musashino-shi > > > > > > ??????? Tokyo 180-8585, Japan > > > > > > Phone: +81-422-59-7008 > > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp > > > > > > ?-------------------------------------? > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > ?-------------------------------------? > > > > ???? NTT Network Innovation Center > > > > ?????? Hiromu Asahina > > > > ??? ------------------------------------- > > > > ???? 3-9-11, Midori-cho, Musashino-shi > > > > ?????? Tokyo 180-8585, Japan > > > > Phone: +81-422-59-7008 > > > > Email: hiromu.asahina.az at hco.ntt.co.jp > > > > ?-------------------------------------? > > > > > > > > > > > > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > ? Phone: +81-422-59-7008 > ? Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Mar 24 17:00:17 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 24 Mar 2023 10:00:17 -0700 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: References: <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com> <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> Message-ID: <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com> ---- On Fri, 24 Mar 2023 09:48:14 -0700 James Slagle wrote --- > On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann gmann at ghanshyammann.com> wrote: > > > > Hi James, TripleO team, > > > > Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL > > as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects > > - https://etherpad.opendev.org/p/2023.2-leaderless > > It doesn't look like we have any other volunteers, so I'm willing to > do it. At the last PTG, we discussed and it was agreed that we would > switch TripleO to the distributed project leadership model. However, > given the drastic change in our focus, I personally think it makes > more sense to continue with the PTL model for train/wallaby stable > maintenance. I would ask any project members to reply here with +1/-1 > to indicate agreement. Thanks, James, for volunteering. I think if you were thinking of the DPL model, then it will work better than PTL here. 1. You might get more people helping you with a distributed amount of work 2. we do not need to have PTL nomination/appointment work in every cycle until you want to maintain train/wallaby. If it is ok, let's move it to the DPL model, which satisfies the governance requirement. -gmann > > [1] https://governance.openstack.org/tc/resolutions/20200803-distributed-project-leadership.html > > -- > -- James Slagle > -- > > From kennelson11 at gmail.com Fri Mar 24 17:40:24 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Fri, 24 Mar 2023 12:40:24 -0500 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: References: Message-ID: Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/ And I am very very sorry if I missed sharing an opinion on when would be good to meet. -Kendall On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov wrote: > Hi all, > > A bit late, but still - I have booked a 3 hours slot during PTG on Friday > 14:00-17:00 UTC. This will follow publiccloud room discussion so I think > some people and outcomes will follow directly into our room. > > Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli > > Feel free to feel in topics you want to discuss > > Cheers, > Artem > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.slagle at gmail.com Fri Mar 24 17:51:10 2023 From: james.slagle at gmail.com (James Slagle) Date: Fri, 24 Mar 2023 13:51:10 -0400 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com> References: <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com> <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com> Message-ID: On Fri, Mar 24, 2023 at 1:00?PM Ghanshyam Mann wrote: > > ---- On Fri, 24 Mar 2023 09:48:14 -0700 James Slagle wrote --- > > On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann gmann at ghanshyammann.com> wrote: > > > > > > Hi James, TripleO team, > > > > > > Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL > > > as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects > > > - https://etherpad.opendev.org/p/2023.2-leaderless > > > > It doesn't look like we have any other volunteers, so I'm willing to > > do it. At the last PTG, we discussed and it was agreed that we would > > switch TripleO to the distributed project leadership model. However, > > given the drastic change in our focus, I personally think it makes > > more sense to continue with the PTL model for train/wallaby stable > > maintenance. I would ask any project members to reply here with +1/-1 > > to indicate agreement. > > Thanks, James, for volunteering. I think if you were thinking of the DPL model, then it will > work better than PTL here. 1. You might get more people helping you with a distributed amount > of work 2. we do not need to have PTL nomination/appointment work in every cycle until you > want to maintain train/wallaby. > > If it is ok, let's move it to the DPL model, which satisfies the governance requirement. That WFM, and I think we can move forward with DPL since that is what the team previously agreed upon. I'll work on some governance patches. -- -- James Slagle -- From artem.goncharov at gmail.com Fri Mar 24 18:12:52 2023 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Fri, 24 Mar 2023 19:12:52 +0100 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: References: Message-ID: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear. What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day) Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway. > On 24. Mar 2023, at 18:40, Kendall Nelson wrote: > > Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/ > > And I am very very sorry if I missed sharing an opinion on when would be good to meet. > > -Kendall > > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov > wrote: >> Hi all, >> >> A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room. >> >> Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli >> >> Feel free to feel in topics you want to discuss >> >> Cheers, >> Artem -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Mar 24 18:29:28 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 24 Mar 2023 18:29:28 +0000 Subject: [TripleO] Last maintained release of TripleO is Wallaby In-Reply-To: References: <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com> <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com> <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com> <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com> <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com> <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com> Message-ID: <20230324182927.iwr3usifxvxhogen@yuggoth.org> On 2023-03-24 13:51:10 -0400 (-0400), James Slagle wrote: [...] > That WFM, and I think we can move forward with DPL since that is what > the team previously agreed upon. I'll work on some governance patches. If it helps at all, you can probably even list the same person for all the liaison positions, the main difference from PTL being that liaisons don't have terms that expire. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gmann at ghanshyammann.com Fri Mar 24 18:37:09 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 24 Mar 2023 11:37:09 -0700 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> References: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> Message-ID: <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 UTC slot does not overlap with TC. - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18 -gmann ---- On Fri, 24 Mar 2023 11:12:52 -0700 Artem Goncharov wrote --- > Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear. > What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day) > Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway. > > > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com> wrote: > Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/ > > And I am very very sorry if I missed sharing an opinion on when would be good to meet.? > -Kendall? > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov artem.goncharov at gmail.com> wrote: > Hi all, > A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room. > Etherpad is there:?https://etherpad.opendev.org/p/march2023-ptg-sdk-cli > Feel free to feel in topics you want to discuss > Cheers,Artem > From smooney at redhat.com Fri Mar 24 18:50:56 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 24 Mar 2023 18:50:56 +0000 Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder or Nova In-Reply-To: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de> References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de> Message-ID: i responed in line but just a waring this is a usecase we ahve heard before. there is no simple option im afraid and there are many many sharp edges and severl littel know features/limitatiosn that your question puts you right in the middel of. On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote: > Hello OpenStack-discuss, > > I am currently looking into how one can provide fast ephemeral storage > (backed by local NVME drives) to instances. > > > There seem to be two approaches and I would love to double-check my > thoughts and assumptions. > > 1) *Via Nova* instance storage and the configurable "ephemeral" volume > for a flavor > > a) We currently use Ceph RBD als image_type > (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), > so instance images are stored in Ceph, not locally on disk. I believe > this setting will also cause ephemeral volumes (destination_local) to be > placed on a RBD and not /var/lib/nova/instances? it should be in ceph yes we do not support havign the root/swap/ephemral disk use diffent storage locatiosn > Or is there a setting to set a different backend for local block devices > providing "ephemeral" storage? So RBD for the root disk and a local LVM > VG for ephemeral? no that would be a new feature and not a trivial one as yo uwould have to make sure it works for live migration and cold migration. > > b) Will an ephemeral volume also be migrated when the instance is > shutoff as with live-migration? its hsoudl be. its not included in snapshots so its not presergved when shelving. that means corss cell cold migration will not preserve the disk. but for a normal cold migration it shoudl be scp'd or rsynced with the root disk if you are using the raw/qcow/flat images type if i remember correctly. with RBD or other shared storage like nfs it really sould be preserved. one other thing to note is ironic and only ironic support the preserve_ephemeral option in the rebuild api. libvirt will wipte the ephmeral disk if you rebuild or evacuate. > Or will there be an new volume created on the target host? I am asking > because I want to avoid syncing 500G or 1T when it's only "ephemeral" > and the instance will not expect any data on it on the next boot. i would perssonally consider it a bug if it was not transfered. that does not mean that could not change in the future. this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted. the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm for exmple it should nto get recreate vai a simple reboot or live migration it should not get created for cold migration or rezise. but it will get wipted for shelve_offload, cross cell resize and evacuate. > > c) Is the size of the ephemeral storage for flavors a fixed size or just > the upper bound for users? So if I limit this to 1T, will such a flavor > always provision a block device with his size? flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks on the same instance. so if its 100G you can ask for 2 50G epmeeral disks you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server create. this has been automated in recent version of the openstack client so you can do openstack server creeate --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ... https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral this is limted by https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices > > I suppose using LVM this will be thin provisioned anyways? to use the lvm backend with libvirt you set https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group to identify which lvm VG to use. https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might work without it but see the note """ Warning This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future. Reason Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin provisioning, use Cinder thin-provisioned volumes. """ the nova lvm supprot has been in maintance mode for many years. im not opposed to improving it just calling out that it has bugs and noone has really worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase. you are well into undefined behavior land however at this point we do not test it so we assume untile told otherwise that its broken. > > > 2) *Via Cinder*, running cinder-volume on each compute node to provide a > volume type "ephemeral", using e.g. the LVM driver > > a) While not really "ephemeral" and bound to the instance lifecycle, > this would allow users to provision ephemeral volume just as they need them. > I suppose I could use backend specific quotas > (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) > to > limit the number of size of such volumes? > > b) Do I need to use the instance locality filter > (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) > then? That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost so you still have the the network layer overhead. when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has. if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement. we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term. we are trying to keep nova device manamgnet to stateless only. That said we added intel PMEM/NVDIM supprot to nova and did handle both optionl data transfer and multi tancny but that was a non trivial amount of work > > c)? Since a volume will always be bound to a certain host, I suppose > this will cause side-effects to instance scheduling? > With the volume remaining after an instance has been destroyed (beating > the purpose of it being "ephemeral") I suppose any other instance > attaching this volume will > be scheduling on this very machine? > no nova would have no knowage about the volume locality out of the box > Is there any way around this? Maybe > a driver setting to have such volumes "self-destroy" if they are not > attached anymore? we hate those kind of config options nova would not know that its bound to the host at the schduler level and we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data" by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely. you woudl have to then do a volume migration sepreately in cinder i think. > > d) Same question as with Nova: What happens when an instance is > live-migrated? > i think i anser this above? > > > Maybe others also have this use case and you can share your solution(s)? adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option you coudl extend nova but as i said we have rejected that in the past. that said the generic resouce table we added for pemem was made generic so that future resocues like local block device could be tracked there without db changes. supproting differnt image_type backend for root,swap and ephmeral would be possibel. its an invasive change but might be more natural then teh resouce tabel approch. you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break anything in the process woudl take a lot of testing. > Thanks and with regards > > > Christian > > From rdhasman at redhat.com Fri Mar 24 18:57:41 2023 From: rdhasman at redhat.com (Rajat Dhasmana) Date: Sat, 25 Mar 2023 00:27:41 +0530 Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) Virtual PTG Message-ID: Hello Argonauts, We will be conducting cinder virtual PTG for 2023.2 (Bobcat) cycle from 28th March to 31st March, 2023. I've prepared the PTG etherpad[1] with topics day wise. I haven't kept the schedule time bound since some discussions take less and some take more time and in both cases reaching a conclusion is important. There are some events that need to be done on their respective time are as follows: *1) Operator Hour: *We encourage operators to join and tell us about the pain points of cinder so we can improve upon it. Date: Wednesday, 29 March, 2023 Time: 1400-1500 UTC Link: https://bluejeans.com/556681290 *2) Glance Cross Project* Date: Thursday, 30 March, 2023 Time: 1430-1500 UTC Link: https://bluejeans.com/556681290 *3) Nova Cross Project* Date: Thursday, 30 March, 2023 Time: 1600-1700 UTC Link: https://zoom.us/j/96494117185?pwd=NGhya0NpeWppMEc1OUNKdlFPbDNYdz09 (Diablo room) The general information about the PTG is as follows: Date: 28th March to 31st March, 2023 Time: 1300-1700 UTC everyday Link to PTG: https://bluejeans.com/556681290 You can also follow the schedule at https://ptg.opendev.org/ptg.html Note that we have allocated 4 hours each day but it also depends on the number and duration of topics as to how long the PTG will run. If you want to be reminded about a particular topic, please add your IRC nick in the *Courtesy ping list *mentioned after every topic. If you still have topics, please add them to the Planning etherpad[2] and I will see if we can accommodate that into our current schedule. Lastly, we will be cancelling our cinder upstream meeting (Wednesday, 29 March, 1400-1500 UTC) since it overlaps with the PTG. If you still have topics please bring it to the PTG. See you all at the PTG! [1] https://etherpad.opendev.org/p/bobcat-ptg-cinder [2] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning Thanks Rajat Dhasmana -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianyrchoi at gmail.com Fri Mar 24 19:49:05 2023 From: ianyrchoi at gmail.com (Ian Y. Choi) Date: Sat, 25 Mar 2023 04:49:05 +0900 Subject: [i18n][PTG] I18n SIG 2023.2 (Bobcat) PTG Planning Message-ID: Hi, PTG is coming next week. I have blocked the following schedule - anyone can check via https://ptg.opendev.org/ptg.html and please let me know if more discussion is needed during PTG. Those are topics what I am thinking currently but it is open - feel free to suggest any topic to be discussed: - Weblate migration - Translation artifacts release management - Review of i18n statistics & ATC/AC - Discussion on translation target - Overall OpenStack/OpenInfra I18n process/progress review - Check-in on each language team Discussions will be shared via Etherpad: https://etherpad.opendev.org/p/march2023-ptg-i18n . Lastly, please don't forget to register if you have forgotten: https://openinfra-ptg.eventbrite.com :) Looking forward to a productive PTG with I18n. Thank you, /Ian From kozhukalov at gmail.com Fri Mar 24 20:26:34 2023 From: kozhukalov at gmail.com (Vladimir Kozhukalov) Date: Fri, 24 Mar 2023 23:26:34 +0300 Subject: [openstack-helm] PTG March 27-31 2023 Message-ID: Dear openstack-helmers, As you know PTG is going to happen next week and I booked slots at the end of the week on Thursday 03/30 and on Friday 03/31 from 14:00 UTC till 17:00 UTC [1] I believe this should be enough. If you feel other time slots are gonna work better for any reason there are still some free slots that can be booked. Please also pay some attention to the etherpad where I listed some of the points for our discussions. [2] Please feel free to add other points that you think worth it to be discussed. [1] https://ptg.opendev.org/ptg.html [2] https://etherpad.opendev.org/p/march2023-ptg-openstack-helm -- Best regards, Kozhukalov Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Fri Mar 24 21:03:02 2023 From: amy at demarco.com (Amy Marrich) Date: Fri, 24 Mar 2023 16:03:02 -0500 Subject: [Diversity] Diversity and Inclusion at the PTG Message-ID: I have blocked off three hours on Monday for the D&I WG to discuss the upcoming Diversity Survey(14:00 UTC) and then ongoing changes to the Code of Conduct(15:00- 16:00 UTC) and then the Summit if there is time. The agenda can be found here[0]. All projects are encouraged to attend these sessions as the WG is at the Foundation level. Thanks, Amy (spotz) 0 - https://etherpad.opendev.org/p/march2023-ptg-diversity From kennelson11 at gmail.com Sat Mar 25 00:39:51 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Fri, 24 Mar 2023 19:39:51 -0500 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com> References: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com> Message-ID: Heh, okay well not complete overlap, but there is still a 3 hour overlap as sdk things are currently scheduled go from 14 - 17 UTC. Either way, I would rather not try to squeeze it down on Friday, when we can just move it to Wednesday. -Kendall On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann wrote: > Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 > UTC slot does not overlap with TC. > > - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18 > > -gmann > > ---- On Fri, 24 Mar 2023 11:12:52 -0700 Artem Goncharov wrote --- > > Well, there was actually no pool, since I was not even sure anybody is > that interested, but glad to hear. > > What about Wed somewhere from 13:00 to 17:00? There is however overlap > with Nova (pretty much like on any other day) > > Ideas? I just want to avoid overlap with public cloud, but maybe even > 1h is enough. So far there are not much topics anyway. > > > > > > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com> wrote: > > Super annoying request, but can we do earlier in the week? The sessions > for sdk have 100% overlap with the TC which I was planning on attending :/ > > > > And I am very very sorry if I missed sharing an opinion on when would > be good to meet. > > -Kendall > > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov > artem.goncharov at gmail.com> wrote: > > Hi all, > > A bit late, but still - I have booked a 3 hours slot during PTG on > Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I > think some people and outcomes will follow directly into our room. > > Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli > > Feel free to feel in topics you want to discuss > > Cheers,Artem > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Sun Mar 26 11:11:17 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 26 Mar 2023 18:11:17 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem Message-ID: Hello guys. I playing with Nova AZ and Masakari https://docs.openstack.org/nova/latest/admin/availability-zones.html Masakari will move server by nova scheduler. Openstack Docs describe that: If the server was not created in a specific zone then it is free to be moved to other zones, i.e. the AvailabilityZoneFilter is a no-op. I see that everyone usually creates instances with "Any Availability Zone" on Horzion and also we don't specify AZ when creating instances by cli. By this way, when we use Masakari or we miragrated instances( or evacuate) so our instance will be moved to other zones. Can we attach AZ to server create requests API based on Any Availability Zone to limit instances moved to other zones? Thank you. Regards Nguyen Huu Khoi -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Sun Mar 26 12:24:16 2023 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Sun, 26 Mar 2023 09:24:16 -0300 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: Hello Nguy?n H?u Kh?i, You might want to take a look at: https://review.opendev.org/c/openstack/nova/+/864760. We created a patch to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an AZ that has cross zone attache equals to false. On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i wrote: > Hello guys. > I playing with Nova AZ and Masakari > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > Masakari will move server by nova scheduler. > > Openstack Docs describe that: > > If the server was not created in a specific zone then it is free to be > moved to other zones, i.e. the AvailabilityZoneFilter > is > a no-op. > > I see that everyone usually creates instances with "Any Availability Zone" > on Horzion and also we don't specify AZ when creating instances by cli. > > By this way, when we use Masakari or we miragrated instances( or evacuate) > so our instance will be moved to other zones. > > Can we attach AZ to server create requests API based on Any > Availability Zone to limit instances moved to other zones? > > Thank you. Regards > > Nguyen Huu Khoi > -- Rafael Weing?rtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Sun Mar 26 12:52:00 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 26 Mar 2023 19:52:00 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: Hello. Many thanks for your information. It's very helpful for me. :) Nguyen Huu Khoi On Sun, Mar 26, 2023 at 7:24?PM Rafael Weing?rtner < rafaelweingartner at gmail.com> wrote: > Hello Nguy?n H?u Kh?i, > You might want to take a look at: > https://review.opendev.org/c/openstack/nova/+/864760. We created a patch > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an > AZ that has cross zone attache equals to false. > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i > wrote: > >> Hello guys. >> I playing with Nova AZ and Masakari >> >> https://docs.openstack.org/nova/latest/admin/availability-zones.html >> >> Masakari will move server by nova scheduler. >> >> Openstack Docs describe that: >> >> If the server was not created in a specific zone then it is free to be >> moved to other zones, i.e. the AvailabilityZoneFilter >> is >> a no-op. >> >> I see that everyone usually creates instances with "Any Availability >> Zone" on Horzion and also we don't specify AZ when creating instances by >> cli. >> >> By this way, when we use Masakari or we miragrated instances( or >> evacuate) so our instance will be moved to other zones. >> >> Can we attach AZ to server create requests API based on Any >> Availability Zone to limit instances moved to other zones? >> >> Thank you. Regards >> >> Nguyen Huu Khoi >> > > > -- > Rafael Weing?rtner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Sun Mar 26 13:04:58 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 26 Mar 2023 20:04:58 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: I don't know why this is not merged to github. It is a problem with the system. Nguyen Huu Khoi On Sun, Mar 26, 2023 at 7:52?PM Nguy?n H?u Kh?i wrote: > Hello. > Many thanks for your information. It's very helpful for me. :) > Nguyen Huu Khoi > > > On Sun, Mar 26, 2023 at 7:24?PM Rafael Weing?rtner < > rafaelweingartner at gmail.com> wrote: > >> Hello Nguy?n H?u Kh?i, >> You might want to take a look at: >> https://review.opendev.org/c/openstack/nova/+/864760. We created a patch >> to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an >> AZ that has cross zone attache equals to false. >> >> On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >> nguyenhuukhoinw at gmail.com> wrote: >> >>> Hello guys. >>> I playing with Nova AZ and Masakari >>> >>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>> >>> Masakari will move server by nova scheduler. >>> >>> Openstack Docs describe that: >>> >>> If the server was not created in a specific zone then it is free to be >>> moved to other zones, i.e. the AvailabilityZoneFilter >>> is >>> a no-op. >>> >>> I see that everyone usually creates instances with "Any Availability >>> Zone" on Horzion and also we don't specify AZ when creating instances by >>> cli. >>> >>> By this way, when we use Masakari or we miragrated instances( or >>> evacuate) so our instance will be moved to other zones. >>> >>> Can we attach AZ to server create requests API based on Any >>> Availability Zone to limit instances moved to other zones? >>> >>> Thank you. Regards >>> >>> Nguyen Huu Khoi >>> >> >> >> -- >> Rafael Weing?rtner >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Sun Mar 26 13:50:08 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Sun, 26 Mar 2023 13:50:08 +0000 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: <20230326135007.4yttykkpeidqcijl@yuggoth.org> On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote: > I don't know why this is not merged to github. [...] The change is only a few months old, and Nova (like many teams) receives more patches than they have time to review. It's probably worth trying to get the attention of some reviewers in the #openstack-nova IRC channel if this mailing list thread hasn't already. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From nguyenhuukhoinw at gmail.com Sun Mar 26 14:08:48 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Sun, 26 Mar 2023 21:08:48 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: <20230326135007.4yttykkpeidqcijl@yuggoth.org> References: <20230326135007.4yttykkpeidqcijl@yuggoth.org> Message-ID: Ok. I got it. Thank you very much On Sun, Mar 26, 2023, 8:56 PM Jeremy Stanley wrote: > On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote: > > I don't know why this is not merged to github. > [...] > > The change is only a few months old, and Nova (like many teams) > receives more patches than they have time to review. It's probably > worth trying to get the attention of some reviewers in the > #openstack-nova IRC channel if this mailing list thread hasn't > already. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hanguangyu2 at gmail.com Sun Mar 26 18:50:22 2023 From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=) Date: Mon, 27 Mar 2023 02:50:22 +0800 Subject: [nova] Can OpenStack support snapshot rollback (not creating a new instance)? Message-ID: Hello, I use Ceph as the storage backend for Nova, Glance, and Cinder. If I create a snapshot for a instance, It create a new image in glance. And I can use the image to create a new instance. This feels to me more like creating an image based on the current state of the VM rather than creating a VM snapshot. I want to ask: 1?Can I create and revert a VM snapshot like I would in virtual machine software? 2?When a VM uses multiple disks/volumes, does OpenStack support taking a snapshot of all disks/volumes of the VM as a whole? 3?Can OpenStack snapshot and save the memory state of a VM? If it is not currently supported, are there any simple customization implementation ideas that can be recommended? Thank you for any help and suggestions. Best wishes. Han From nguyenhuukhoinw at gmail.com Sun Mar 26 23:46:27 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 27 Mar 2023 06:46:27 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <20230326135007.4yttykkpeidqcijl@yuggoth.org> Message-ID: Hello. I want to update you in this case, I think we need adjust for cross zone attache = true also. Nguyen Huu Khoi On Sun, Mar 26, 2023 at 9:08?PM Nguy?n H?u Kh?i wrote: > Ok. I got it. Thank you very much > > On Sun, Mar 26, 2023, 8:56 PM Jeremy Stanley wrote: > >> On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote: >> > I don't know why this is not merged to github. >> [...] >> >> The change is only a few months old, and Nova (like many teams) >> receives more patches than they have time to review. It's probably >> worth trying to get the attention of some reviewers in the >> #openstack-nova IRC channel if this mailing list thread hasn't >> already. >> -- >> Jeremy Stanley >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Mon Mar 27 08:19:20 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 27 Mar 2023 10:19:20 +0200 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < rafaelweingartner at gmail.com> a ?crit : > Hello Nguy?n H?u Kh?i, > You might want to take a look at: > https://review.opendev.org/c/openstack/nova/+/864760. We created a patch > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an > AZ that has cross zone attache equals to false. > > Well, I'll provide some comments in the change, but I'm afraid we can't just modify the request spec like you would want. Anyway, if you want to discuss about it in the vPTG, just add it in the etherpad and add your IRC nick so we could try to find a time where we could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg Also, this kind of behaviour modification is more a new feature than a bugfix, so fwiw you should create a launchpad blueprint so we could better see it. -Sylvain > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i > wrote: > >> Hello guys. >> I playing with Nova AZ and Masakari >> >> https://docs.openstack.org/nova/latest/admin/availability-zones.html >> >> Masakari will move server by nova scheduler. >> >> Openstack Docs describe that: >> >> If the server was not created in a specific zone then it is free to be >> moved to other zones, i.e. the AvailabilityZoneFilter >> is >> a no-op. >> >> I see that everyone usually creates instances with "Any Availability >> Zone" on Horzion and also we don't specify AZ when creating instances by >> cli. >> >> By this way, when we use Masakari or we miragrated instances( or >> evacuate) so our instance will be moved to other zones. >> >> Can we attach AZ to server create requests API based on Any >> Availability Zone to limit instances moved to other zones? >> >> Thank you. Regards >> >> Nguyen Huu Khoi >> > > > -- > Rafael Weing?rtner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.rohmann at inovex.de Mon Mar 27 08:47:58 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Mon, 27 Mar 2023 10:47:58 +0200 Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder or Nova In-Reply-To: References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de> Message-ID: <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de> Thanks for your extensive reply Sean! I also replied inline and would love to continue the conversation with you and other with this use case to find the best / most suitable approach. On 24/03/2023 19:50, Sean Mooney wrote: > i responed in line but just a waring this is a usecase we ahve heard before. > there is no simple option im afraid and there are many many sharp edges > and severl littel know features/limitatiosn that your question puts you right in the > middel of. > > On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote: >> 1) *Via Nova* instance storage and the configurable "ephemeral" volume >> for a flavor >> >> a) We currently use Ceph RBD als image_type >> (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), >> so instance images are stored in Ceph, not locally on disk. I believe >> this setting will also cause ephemeral volumes (destination_local) to be >> placed on a RBD and not /var/lib/nova/instances? > it should be in ceph yes we do not support havign the root/swap/ephemral > disk use diffent storage locatiosn >> Or is there a setting to set a different backend for local block devices >> providing "ephemeral" storage? So RBD for the root disk and a local LVM >> VG for ephemeral? > no that would be a new feature and not a trivial one as yo uwould have to make > sure it works for live migration and cold migration. While having the root disk on resilient storage, using local storage swap / ephemeral actually seems quite obvious. Do you happen to know if there ever was a spec / push to implement this? >> b) Will an ephemeral volume also be migrated when the instance is >> shutoff as with live-migration? > its hsoudl be. its not included in snapshots so its not presergved > when shelving. that means corss cell cold migration will not preserve the disk. > > but for a normal cold migration it shoudl be scp'd or rsynced with the root disk > if you are using the raw/qcow/flat images type if i remember correctly. > with RBD or other shared storage like nfs it really sould be preserved. > > one other thing to note is ironic and only ironic support the > preserve_ephemeral option in the rebuild api. > > libvirt will wipte the ephmeral disk if you rebuild or evacuate. Could I somehow configure a flavor to "require" a rebuild / evacuate or? to disable live migration for it? >> Or will there be an new volume created on the target host? I am asking >> because I want to avoid syncing 500G or 1T when it's only "ephemeral" >> and the instance will not expect any data on it on the next boot. > i would perssonally consider it a bug if it was not transfered. > that does not mean that could not change in the future. > this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted. > the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm > > for exmple it should nto get recreate vai a simple reboot or live migration > it should not get created for cold migration or rezise. > but it will get wipted for shelve_offload, cross cell resize and evacuate. So even for cold migration it would be preserved then? So my only option would be to shelve such instances when trying to "move" instances off a certain hypervisor while NOT syncing ephemeral storage? >> c) Is the size of the ephemeral storage for flavors a fixed size or just >> the upper bound for users? So if I limit this to 1T, will such a flavor >> always provision a block device with his size? > flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks > on the same instance. so if its 100G you can ask for 2 50G epmeeral disks > > you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server > create. > this has been automated in recent version of the openstack client > > so you can do > > openstack server creeate --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ... > > https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral > this is limted by > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices > >> I suppose using LVM this will be thin provisioned anyways? > to use the lvm backend with libvirt you set > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group > to identify which lvm VG to use. > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might > work without it but see the note > > """ > Warning > > This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future. > > Reason > > Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin > provisioning, use Cinder thin-provisioned volumes. > """ > > the nova lvm supprot has been in maintance mode for many years. > > im not opposed to improving it just calling out that it has bugs and noone has really > worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local > storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase. > > you are well into undefined behavior land however at this point > > we do not test it so we assume untile told otherwise that its broken. Thanks for the heads up. I looked at LVM for cinder and there LVM volumes are thin provisioned, so I figured this might be the case for Nova as well. >> 2) *Via Cinder*, running cinder-volume on each compute node to provide a >> volume type "ephemeral", using e.g. the LVM driver >> >> a) While not really "ephemeral" and bound to the instance lifecycle, >> this would allow users to provision ephemeral volume just as they need them. >> I suppose I could use backend specific quotas >> (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) >> to >> limit the number of size of such volumes? >> >> b) Do I need to use the instance locality filter >> (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) >> then? > That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost > so you still have the the network layer overhead. Thanks for the hint - one can easily be confused when reading "LVM" ... I actually thought there was a way to have "host-only" style volumes which are simply local block devices with no iscsi / NVME in between which are used by Nova then. These kind of volumes could maybe be built into cinder as a "taget_protocol: local" together with the instance_locality_filter - but apparently now the only way is through iSCSI or NVME. > when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver > so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has. > if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do > that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not > have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement. > we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term. This sounds like a "3rd" approach: Using Cyborg to provide local storage (via LVM). >> c)? Since a volume will always be bound to a certain host, I suppose >> this will cause side-effects to instance scheduling? >> With the volume remaining after an instance has been destroyed (beating >> the purpose of it being "ephemeral") I suppose any other instance >> attaching this volume will >> be scheduling on this very machine? >> > no nova would have no knowage about the volume locality out of the box >> Is there any way around this? Maybe >> a driver setting to have such volumes "self-destroy" if they are not >> attached anymore? > we hate those kind of config options nova would not know that its bound to the host at the schduler level and > we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data" > by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely. > > you woudl have to then do a volume migration sepreately in cinder i think. >> d) Same question as with Nova: What happens when an instance is >> live-migrated? >> > i think i anser this above? Yes, these questions where all due to my misconception that cinder-volume backend "LVM" did not have any networking layer and was host-local. >> >> Maybe others also have this use case and you can share your solution(s)? > adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option > > you coudl extend nova but as i said we have rejected that in the past. > that said the generic resouce table we added for pemem was made generic so that future resocues like local block > device could be tracked there without db changes. > > supproting differnt image_type backend for root,swap and ephmeral would be possibel. > its an invasive change but might be more natural then teh resouce tabel approch. > you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break > anything in the process woudl take a lot of testing. Thanks for the sum up! Regards Christian From artem.goncharov at gmail.com Mon Mar 27 09:18:52 2023 From: artem.goncharov at gmail.com (Artem Goncharov) Date: Mon, 27 Mar 2023 11:18:52 +0200 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: References: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com> Message-ID: <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com> Okay, I have not received any other feedback, so I went and booked 2 slots Wed 15:00-17:00 and left also 1h slot on Fri 14:00 just for ?safety?. Looking forward seeing you there. Artem > On 25. Mar 2023, at 01:39, Kendall Nelson wrote: > > Heh, okay well not complete overlap, but there is still a 3 hour overlap as sdk things are currently scheduled go from 14 - 17 UTC. > > Either way, I would rather not try to squeeze it down on Friday, when we can just move it to Wednesday. > > -Kendall > > On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann > wrote: >> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 UTC slot does not overlap with TC. >> >> - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18 >> >> -gmann >> >> ---- On Fri, 24 Mar 2023 11:12:52 -0700 Artem Goncharov wrote --- >> > Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear. >> > What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day) >> > Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway. >> > >> > >> > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com > wrote: >> > Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/ >> > >> > And I am very very sorry if I missed sharing an opinion on when would be good to meet. >> > -Kendall >> > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov artem.goncharov at gmail.com > wrote: >> > Hi all, >> > A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room. >> > Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli >> > Feel free to feel in topics you want to discuss >> > Cheers,Artem >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adivya1.singh at gmail.com Mon Mar 27 11:36:30 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Mon, 27 Mar 2023 17:06:30 +0530 Subject: (Open Stack )Image Upload in Open Stack in a Bulk Message-ID: Hi Team, Any hints, if i want to upload images in a bulk in a Open Stack , because it takes some time for the image to copy if we go one by one, or even of we go with script Also if there is a scenario where glance mount point fails and we can create the same Share path and Copy the Image from the source , Will the OpenStack glance Service will start detecting those images upload in a share Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Mar 27 11:51:11 2023 From: smooney at redhat.com (Sean Mooney) Date: Mon, 27 Mar 2023 12:51:11 +0100 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: Message-ID: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < > rafaelweingartner at gmail.com> a ?crit : > > > Hello Nguy?n H?u Kh?i, > > You might want to take a look at: > > https://review.opendev.org/c/openstack/nova/+/864760. We created a patch > > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an > > AZ that has cross zone attache equals to false. > > > > > Well, I'll provide some comments in the change, but I'm afraid we can't > just modify the request spec like you would want. > > Anyway, if you want to discuss about it in the vPTG, just add it in the > etherpad and add your IRC nick so we could try to find a time where we > could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg > Also, this kind of behaviour modification is more a new feature than a > bugfix, so fwiw you should create a launchpad blueprint so we could better > see it. i tought i left review feedback on that too that the approch was not correct. i guess i did not in the end. modifying the request spec as sylvain menthioned is not correct. i disucssed this topic on irc a few weeks back with mohomad for vxhost. what can be done is as follows. we can add a current_az field to the Destination object https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 The conductor can read the instance.AZ and populate it in that new field. We can then add a new weigher to prefer hosts that are in the same az. This will provide soft AZ affinity for the vm and preserve the fact that if a vm is created without sepcifying An AZ the expectaiton at the api level woudl be that it can migrate to any AZ. To provide hard AZ affintiy we could also add prefileter that would use the same data but instead include it in the placement query so that only the current AZ is considered. This would have to be disabled by default. That woudl allow operators to choose the desired behavior. curret behavior (disable weigher and dont enabel prefilter) new default, prefer current AZ (weigher enabeld prefilter disabled) hard affintiy(prefilter enabled.) there are other ways to approch this but updating the request spec is not one of them. we have to maintain the fact the enduser did not request an AZ. > > -Sylvain > > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i > > wrote: > > > > > Hello guys. > > > I playing with Nova AZ and Masakari > > > > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > > > > > Masakari will move server by nova scheduler. > > > > > > Openstack Docs describe that: > > > > > > If the server was not created in a specific zone then it is free to be > > > moved to other zones, i.e. the AvailabilityZoneFilter > > > is > > > a no-op. > > > > > > I see that everyone usually creates instances with "Any Availability > > > Zone" on Horzion and also we don't specify AZ when creating instances by > > > cli. > > > > > > By this way, when we use Masakari or we miragrated instances( or > > > evacuate) so our instance will be moved to other zones. > > > > > > Can we attach AZ to server create requests API based on Any > > > Availability Zone to limit instances moved to other zones? > > > > > > Thank you. Regards > > > > > > Nguyen Huu Khoi > > > > > > > > > -- > > Rafael Weing?rtner > > From sbauza at redhat.com Mon Mar 27 12:06:56 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 27 Mar 2023 14:06:56 +0200 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Le lun. 27 mars 2023 ? 13:51, Sean Mooney a ?crit : > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < > > rafaelweingartner at gmail.com> a ?crit : > > > > > Hello Nguy?n H?u Kh?i, > > > You might want to take a look at: > > > https://review.opendev.org/c/openstack/nova/+/864760. We created a > patch > > > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in > an > > > AZ that has cross zone attache equals to false. > > > > > > > > Well, I'll provide some comments in the change, but I'm afraid we can't > > just modify the request spec like you would want. > > > > Anyway, if you want to discuss about it in the vPTG, just add it in the > > etherpad and add your IRC nick so we could try to find a time where we > > could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg > > Also, this kind of behaviour modification is more a new feature than a > > bugfix, so fwiw you should create a launchpad blueprint so we could > better > > see it. > > i tought i left review feedback on that too that the approch was not > correct. > i guess i did not in the end. > > modifying the request spec as sylvain menthioned is not correct. > i disucssed this topic on irc a few weeks back with mohomad for vxhost. > what can be done is as follows. > > we can add a current_az field to the Destination object > > https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 > The conductor can read the instance.AZ and populate it in that new field. > We can then add a new weigher to prefer hosts that are in the same az. > > I tend to disagree this approach as people would think that the Destination.az field would be related to the current AZ for an instance, while we only look at the original AZ. That being said, we could have a weigher that would look at whether the host is in the same AZ than the instance.host. This will provide soft AZ affinity for the vm and preserve the fact that if > a vm is created without sepcifying > An AZ the expectaiton at the api level woudl be that it can migrate to any > AZ. > > To provide hard AZ affintiy we could also add prefileter that would use > the same data but instead include it in the > placement query so that only the current AZ is considered. This would have > to be disabled by default. > > Sure, we could create a new prefilter so we could then deprecate the AZFilter if we want. > That woudl allow operators to choose the desired behavior. > curret behavior (disable weigher and dont enabel prefilter) > new default, prefer current AZ (weigher enabeld prefilter disabled) > hard affintiy(prefilter enabled.) > > there are other ways to approch this but updating the request spec is not > one of them. > we have to maintain the fact the enduser did not request an AZ. > > Anyway, if folks want to discuss about AZs, this week is the good time :-) > > > > -Sylvain > > > > > > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < > nguyenhuukhoinw at gmail.com> > > > wrote: > > > > > > > Hello guys. > > > > I playing with Nova AZ and Masakari > > > > > > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > > > > > > > Masakari will move server by nova scheduler. > > > > > > > > Openstack Docs describe that: > > > > > > > > If the server was not created in a specific zone then it is free to > be > > > > moved to other zones, i.e. the AvailabilityZoneFilter > > > > < > https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> > is > > > > a no-op. > > > > > > > > I see that everyone usually creates instances with "Any Availability > > > > Zone" on Horzion and also we don't specify AZ when creating > instances by > > > > cli. > > > > > > > > By this way, when we use Masakari or we miragrated instances( or > > > > evacuate) so our instance will be moved to other zones. > > > > > > > > Can we attach AZ to server create requests API based on Any > > > > Availability Zone to limit instances moved to other zones? > > > > > > > > Thank you. Regards > > > > > > > > Nguyen Huu Khoi > > > > > > > > > > > > > -- > > > Rafael Weing?rtner > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Mon Mar 27 12:20:04 2023 From: smooney at redhat.com (Sean Mooney) Date: Mon, 27 Mar 2023 13:20:04 +0100 Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder or Nova In-Reply-To: <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de> References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de> <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de> Message-ID: On Mon, 2023-03-27 at 10:47 +0200, Christian Rohmann wrote: > Thanks for your extensive reply Sean! > > I also replied inline and would love to continue the conversation with > you and other with this use case > to find the best / most suitable approach. > > > On 24/03/2023 19:50, Sean Mooney wrote: > > i responed in line but just a waring this is a usecase we ahve heard before. > > there is no simple option im afraid and there are many many sharp edges > > and severl littel know features/limitatiosn that your question puts you right in the > > middel of. > > > > On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote: > > > 1) *Via Nova* instance storage and the configurable "ephemeral" volume > > > for a flavor > > > > > > a) We currently use Ceph RBD als image_type > > > (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), > > > so instance images are stored in Ceph, not locally on disk. I believe > > > this setting will also cause ephemeral volumes (destination_local) to be > > > placed on a RBD and not /var/lib/nova/instances? > > it should be in ceph yes we do not support havign the root/swap/ephemral > > disk use diffent storage locatiosn > > > Or is there a setting to set a different backend for local block devices > > > providing "ephemeral" storage? So RBD for the root disk and a local LVM > > > VG for ephemeral? > > no that would be a new feature and not a trivial one as yo uwould have to make > > sure it works for live migration and cold migration. > > While having the root disk on resilient storage, using local storage > swap / ephemeral actually seems quite obvious. > Do you happen to know if there ever was a spec / push to implement this? as far as i am aware no. but if we were to have one i would do it as an api option basically the inverse of https://specs.openstack.org/openstack/nova-specs/specs/xena/approved/allow-migrate-pmem-data.html For PMEM instance we defautl to not copying the possibel multiple TB of PMEM over the network on cold migrate. later we added that option as an api paramter. Swap is not coppied for cold migration today but is for live for obvious reasons. like the ?copy_pmem_devices?: ?true? option i woudl be fine wiht adding ?copy_ephmeral_devices?: ?true|false?. We woudl proably need to default to copying the data but we coudl discuss that in the spec. > > > > > b) Will an ephemeral volume also be migrated when the instance is > > > shutoff as with live-migration? > > its hsoudl be. its not included in snapshots so its not presergved > > when shelving. that means corss cell cold migration will not preserve the disk. > > > > but for a normal cold migration it shoudl be scp'd or rsynced with the root disk > > if you are using the raw/qcow/flat images type if i remember correctly. > > with RBD or other shared storage like nfs it really sould be preserved. > > > > one other thing to note is ironic and only ironic support the > > preserve_ephemeral option in the rebuild api. > > > > libvirt will wipte the ephmeral disk if you rebuild or evacuate. > > Could I somehow configure a flavor to "require" a rebuild / evacuate or? > to disable live migration for it? rebuild is not a move operation so that wont help you move the instance and evacuate is admin only and required you to ensure teh instance is not runnign before its used. disabling live migration is something that you can do via custom policy but its admin only by default as well. > > > > > Or will there be an new volume created on the target host? I am asking > > > because I want to avoid syncing 500G or 1T when it's only "ephemeral" > > > and the instance will not expect any data on it on the next boot. > > i would perssonally consider it a bug if it was not transfered. > > that does not mean that could not change in the future. > > this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted. > > the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm > > > > for exmple it should nto get recreate vai a simple reboot or live migration > > it should not get created for cold migration or rezise. > > but it will get wipted for shelve_offload, cross cell resize and evacuate. > So even for cold migration it would be preserved then? So my only option > would be to shelve such instances when trying to > "move" instances off a certain hypervisor while NOT syncing ephemeral > storage? yes shelve woudl be your only option today. > > > > > c) Is the size of the ephemeral storage for flavors a fixed size or just > > > the upper bound for users? So if I limit this to 1T, will such a flavor > > > always provision a block device with his size? > > flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks > > on the same instance. so if its 100G you can ask for 2 50G epmeeral disks > > > > you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server > > create. > > this has been automated in recent version of the openstack client > > > > so you can do > > > > openstack server creeate --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ... > > > > https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral > > this is limted by > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices > > > > > I suppose using LVM this will be thin provisioned anyways? > > to use the lvm backend with libvirt you set > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group > > to identify which lvm VG to use. > > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might > > work without it but see the note > > > > """ > > Warning > > > > This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future. > > > > Reason > > > > Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin > > provisioning, use Cinder thin-provisioned volumes. > > """ > > > > the nova lvm supprot has been in maintance mode for many years. > > > > im not opposed to improving it just calling out that it has bugs and noone has really > > worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local > > storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase. > > > > you are well into undefined behavior land however at this point > > > > we do not test it so we assume untile told otherwise that its broken. > > Thanks for the heads up. I looked at LVM for cinder and there LVM > volumes are thin provisioned, > so I figured this might be the case for Nova as well. > > > > > 2) *Via Cinder*, running cinder-volume on each compute node to provide a > > > volume type "ephemeral", using e.g. the LVM driver > > > > > > a) While not really "ephemeral" and bound to the instance lifecycle, > > > this would allow users to provision ephemeral volume just as they need them. > > > I suppose I could use backend specific quotas > > > (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) > > > to > > > limit the number of size of such volumes? > > > > > > b) Do I need to use the instance locality filter > > > (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) > > > then? > > That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost > > so you still have the the network layer overhead. > > Thanks for the hint - one can easily be confused when reading "LVM" ... > I actually thought there was a way to have "host-only" style volumes which > are simply local block devices with no iscsi / NVME in between which are > used by Nova then. > > These kind of volumes could maybe be built into cinder as a > "taget_protocol: local" together with the instance_locality_filter - > but apparently now the only way is through iSCSI or NVME. > > > > when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver > > so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has. > > if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do > > that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not > > have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement. > > we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term. > > This sounds like a "3rd" approach: Using Cyborg to provide local storage > (via LVM). yes cyborg woudl be a third approch. i was going to enable this in a new project i was calling Arbiterd but that proposal was rejected in the last ptg so i currenlty have no planns to enabel local block device managment. > > > > > c)? Since a volume will always be bound to a certain host, I suppose > > > this will cause side-effects to instance scheduling? > > > With the volume remaining after an instance has been destroyed (beating > > > the purpose of it being "ephemeral") I suppose any other instance > > > attaching this volume will > > > be scheduling on this very machine? > > > > > no nova would have no knowage about the volume locality out of the box > > > Is there any way around this? Maybe > > > a driver setting to have such volumes "self-destroy" if they are not > > > attached anymore? > > we hate those kind of config options nova would not know that its bound to the host at the schduler level and > > we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data" > > by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely. > > > > you woudl have to then do a volume migration sepreately in cinder i think. > > > d) Same question as with Nova: What happens when an instance is > > > live-migrated? > > > > > i think i anser this above? > > Yes, these questions where all due to my misconception that > cinder-volume backend "LVM" did not have any networking layer > and was host-local. > > > > > > > > Maybe others also have this use case and you can share your solution(s)? > > adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option > > > > you coudl extend nova but as i said we have rejected that in the past. > > that said the generic resouce table we added for pemem was made generic so that future resocues like local block > > device could be tracked there without db changes. > > > > supproting differnt image_type backend for root,swap and ephmeral would be possibel. > > its an invasive change but might be more natural then teh resouce tabel approch. > > you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break > > anything in the process woudl take a lot of testing. > > Thanks for the sum up! i think your two best options are add teh parmater to the migrat/resize apis to skip copying the ephmeral disks. and second propose a replacement for https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type these should be seperate specs. that woudl work like the how we supprot generic mdevs using dynimc config sections i.e.? [libvirt] storage_profiles=swap:swap_storage,ephmeral:ephmeral_storage,root:root_storage [swap_stroage} driver=raw driver_data:/mnt/nvme-swap/nova/ [ephmeral_stroage} driver=lvm driver_data:vg_ephmeral [root_storage] driver=rbd driver_data:vms we woudl have to work this out in a spec but if nova was every to support something like this in the futrue i think we would to model it somethign along those lines. im not sure how popular this woudl be however so we would need to get input form teh wirder nova team. i do see value in being ablt ot have differnt storage profiles for root_gb, ephmeral_gb and swap_gb in the falvor. but the last time somethign like this was discussed was the creation of a cinder images_type backend to allow for automatic BootFormVolume. i actully think that would be a nice feature too but its complex and because both were disucssed aroudn the same tiem neithre got done. > > > > > Regards > > > Christian > From smooney at redhat.com Mon Mar 27 12:28:31 2023 From: smooney at redhat.com (Sean Mooney) Date: Mon, 27 Mar 2023 13:28:31 +0100 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a ?crit : > > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < > > > rafaelweingartner at gmail.com> a ?crit : > > > > > > > Hello Nguy?n H?u Kh?i, > > > > You might want to take a look at: > > > > https://review.opendev.org/c/openstack/nova/+/864760. We created a > > patch > > > > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in > > an > > > > AZ that has cross zone attache equals to false. > > > > > > > > > > > Well, I'll provide some comments in the change, but I'm afraid we can't > > > just modify the request spec like you would want. > > > > > > Anyway, if you want to discuss about it in the vPTG, just add it in the > > > etherpad and add your IRC nick so we could try to find a time where we > > > could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg > > > Also, this kind of behaviour modification is more a new feature than a > > > bugfix, so fwiw you should create a launchpad blueprint so we could > > better > > > see it. > > > > i tought i left review feedback on that too that the approch was not > > correct. > > i guess i did not in the end. > > > > modifying the request spec as sylvain menthioned is not correct. > > i disucssed this topic on irc a few weeks back with mohomad for vxhost. > > what can be done is as follows. > > > > we can add a current_az field to the Destination object > > > > https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 > > The conductor can read the instance.AZ and populate it in that new field. > > We can then add a new weigher to prefer hosts that are in the same az. > > > > > > I tend to disagree this approach as people would think that the > Destination.az field would be related to the current AZ for an instance, > while we only look at the original AZ. > That being said, we could have a weigher that would look at whether the > host is in the same AZ than the instance.host. you miss understood what i wrote i suggested addint Destination.current_az to store teh curernt AZ of the instance before scheduling. so my proposal is if RequestSpec.AZ is not set and Destination.current_az is set then the new weigher would prefer hosts that are in the same az as Destination.current_az we coudl also call Destination.current_az Destination.prefered_az > > > This will provide soft AZ affinity for the vm and preserve the fact that if > > a vm is created without sepcifying > > An AZ the expectaiton at the api level woudl be that it can migrate to any > > AZ. > > > > To provide hard AZ affintiy we could also add prefileter that would use > > the same data but instead include it in the > > placement query so that only the current AZ is considered. This would have > > to be disabled by default. > > > > > Sure, we could create a new prefilter so we could then deprecate the > AZFilter if we want. we already have an AZ prefilter and the AZFilter is deprecate for removal i ment to delete it in zed but did not have time to do it in zed of Antielope i deprecated the AZ| filter in https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 xena when i enabeld the az prefilter by default. i will try an delete teh AZ filter before m1 if others dont. > > > > That woudl allow operators to choose the desired behavior. > > curret behavior (disable weigher and dont enabel prefilter) > > new default, prefer current AZ (weigher enabeld prefilter disabled) > > hard affintiy(prefilter enabled.) > > > > there are other ways to approch this but updating the request spec is not > > one of them. > > we have to maintain the fact the enduser did not request an AZ. > > > > > Anyway, if folks want to discuss about AZs, this week is the good time :-) > > > > > > > > -Sylvain > > > > > > > > > > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < > > nguyenhuukhoinw at gmail.com> > > > > wrote: > > > > > > > > > Hello guys. > > > > > I playing with Nova AZ and Masakari > > > > > > > > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > > > > > > > > > Masakari will move server by nova scheduler. > > > > > > > > > > Openstack Docs describe that: > > > > > > > > > > If the server was not created in a specific zone then it is free to > > be > > > > > moved to other zones, i.e. the AvailabilityZoneFilter > > > > > < > > https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> > > is > > > > > a no-op. > > > > > > > > > > I see that everyone usually creates instances with "Any Availability > > > > > Zone" on Horzion and also we don't specify AZ when creating > > instances by > > > > > cli. > > > > > > > > > > By this way, when we use Masakari or we miragrated instances( or > > > > > evacuate) so our instance will be moved to other zones. > > > > > > > > > > Can we attach AZ to server create requests API based on Any > > > > > Availability Zone to limit instances moved to other zones? > > > > > > > > > > Thank you. Regards > > > > > > > > > > Nguyen Huu Khoi > > > > > > > > > > > > > > > > > -- > > > > Rafael Weing?rtner > > > > > > > > From sbauza at redhat.com Mon Mar 27 12:43:16 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 27 Mar 2023 14:43:16 +0200 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Le lun. 27 mars 2023 ? 14:28, Sean Mooney a ?crit : > On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: > > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a ?crit : > > > > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: > > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < > > > > rafaelweingartner at gmail.com> a ?crit : > > > > > > > > > Hello Nguy?n H?u Kh?i, > > > > > You might want to take a look at: > > > > > https://review.opendev.org/c/openstack/nova/+/864760. We created a > > > patch > > > > > to avoid migrating VMs to any AZ, once the VM has been > bootstrapped in > > > an > > > > > AZ that has cross zone attache equals to false. > > > > > > > > > > > > > > Well, I'll provide some comments in the change, but I'm afraid we > can't > > > > just modify the request spec like you would want. > > > > > > > > Anyway, if you want to discuss about it in the vPTG, just add it in > the > > > > etherpad and add your IRC nick so we could try to find a time where > we > > > > could be discussing it : > https://etherpad.opendev.org/p/nova-bobcat-ptg > > > > Also, this kind of behaviour modification is more a new feature than > a > > > > bugfix, so fwiw you should create a launchpad blueprint so we could > > > better > > > > see it. > > > > > > i tought i left review feedback on that too that the approch was not > > > correct. > > > i guess i did not in the end. > > > > > > modifying the request spec as sylvain menthioned is not correct. > > > i disucssed this topic on irc a few weeks back with mohomad for vxhost. > > > what can be done is as follows. > > > > > > we can add a current_az field to the Destination object > > > > > > > https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 > > > The conductor can read the instance.AZ and populate it in that new > field. > > > We can then add a new weigher to prefer hosts that are in the same az. > > > > > > > > > > I tend to disagree this approach as people would think that the > > Destination.az field would be related to the current AZ for an instance, > > while we only look at the original AZ. > > That being said, we could have a weigher that would look at whether the > > host is in the same AZ than the instance.host. > you miss understood what i wrote > > i suggested addint Destination.current_az to store teh curernt AZ of the > instance before scheduling. > > so my proposal is if RequestSpec.AZ is not set and Destination.current_az > is set then the new > weigher would prefer hosts that are in the same az as > Destination.current_az > > we coudl also call Destination.current_az Destination.prefered_az > > I meant, I think we don't need to provide a new field, we can already know about what host an existing instance uses if we want (using [1]) Anyway, let's stop to discuss about it here, we should rather review that for a Launchpad blueprint or more a spec. -Sylvain [1] https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 > > > > > > This will provide soft AZ affinity for the vm and preserve the fact that > if > > > a vm is created without sepcifying > > > An AZ the expectaiton at the api level woudl be that it can migrate to > any > > > AZ. > > > > > > To provide hard AZ affintiy we could also add prefileter that would use > > > the same data but instead include it in the > > > placement query so that only the current AZ is considered. This would > have > > > to be disabled by default. > > > > > > > > Sure, we could create a new prefilter so we could then deprecate the > > AZFilter if we want. > we already have an AZ prefilter and the AZFilter is deprecate for removal > i ment to delete it in zed but did not have time to do it in zed of > Antielope > i deprecated the AZ| filter in > https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 > xena when i enabeld the az prefilter by default. > > Ah whoops, indeed I forgot the fact we already have the prefilter, so the hard support for AZ is already existing. > i will try an delete teh AZ filter before m1 if others dont. > OK. > > > > > > > That woudl allow operators to choose the desired behavior. > > > curret behavior (disable weigher and dont enabel prefilter) > > > new default, prefer current AZ (weigher enabeld prefilter disabled) > > > hard affintiy(prefilter enabled.) > > > > > > there are other ways to approch this but updating the request spec is > not > > > one of them. > > > we have to maintain the fact the enduser did not request an AZ. > > > > > > > > Anyway, if folks want to discuss about AZs, this week is the good time > :-) > > > > > > > > > > > > -Sylvain > > > > > > > > > > > > > > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < > > > nguyenhuukhoinw at gmail.com> > > > > > wrote: > > > > > > > > > > > Hello guys. > > > > > > I playing with Nova AZ and Masakari > > > > > > > > > > > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > > > > > > > > > > > Masakari will move server by nova scheduler. > > > > > > > > > > > > Openstack Docs describe that: > > > > > > > > > > > > If the server was not created in a specific zone then it is free > to > > > be > > > > > > moved to other zones, i.e. the AvailabilityZoneFilter > > > > > > < > > > > https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter > > > > > is > > > > > > a no-op. > > > > > > > > > > > > I see that everyone usually creates instances with "Any > Availability > > > > > > Zone" on Horzion and also we don't specify AZ when creating > > > instances by > > > > > > cli. > > > > > > > > > > > > By this way, when we use Masakari or we miragrated instances( or > > > > > > evacuate) so our instance will be moved to other zones. > > > > > > > > > > > > Can we attach AZ to server create requests API based on Any > > > > > > Availability Zone to limit instances moved to other zones? > > > > > > > > > > > > Thank you. Regards > > > > > > > > > > > > Nguyen Huu Khoi > > > > > > > > > > > > > > > > > > > > > -- > > > > > Rafael Weing?rtner > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Mon Mar 27 13:25:41 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Mon, 27 Mar 2023 13:25:41 +0000 Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects) In-Reply-To: References: Message-ID: Hi, (First of all, I'm writing this as stable maintainer, someone who was there when the 'Extended Maintenance' process was formulated in the first place) As far as I understand, neutron's stable/train gate is still fully operational. I also know that backporting every bug fix to stable branches is time and resource consuming, and the team does not have / want to spend time on this anymore. Between EOL'ing and backporting every single bug fix, there are another levels of engagement. What I want to say is: what if stable/train of neutron is kept open as long as the gate is functional, to give people the possibility for cooperation, give the opportunity to test backports, bug fixes on upstream CI for stable/train. There are two extremity in opinions about how far back we should maintain things: 1) we should keep only open the most recent stable release to free up resources, and minimize maintenance cost 2) we should keep everything open, even the very old stable branches, where even the gate jobs are not functional anymore, to give space for collaboration in fixing important bugs (like security bugs) I think the right way is somewhere in the middle: as long as the gate is functional we can keep a branch open, for *collaboration*. I understand if most active neutron team members do not propose backports to stable/train anymore. Some way, this is acceptable according to Extended Maintenance process: it is not "fully maintained", rather there is still the possibility to do *some* maintenance. (Note, that I'm mostly talking about neutron. Stadium projects, that have broken gates (even on master branch), I support the EOL'ing) What do you think about the above suggestion? Thanks, El?d irc: elodilles ________________________________ From: Rodolfo Alonso Hernandez Sent: Thursday, March 16, 2023 5:15 PM To: openstack-discuss Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects) Hello: I'm sending this mail in advance to propose transitioning Neutron and all related projects to EOL. I'll propose this topic too during the next Neutron meeting. The announcement is the first step [1] to transition a stable branch to EOL. The patch to mark these branches as EOL will be pushed in two weeks. If you have any inconvenience, please let me know in this mail chain or in IRC (ralonsoh, #openstack-neutron channel). You can also contact any Neutron core reviewer in the IRC channel. Regards. [1]https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Mon Mar 27 13:37:28 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Mon, 27 Mar 2023 20:37:28 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Hello guys. I just suggest to openstack nova works better. My story because 1. The server was created in a specific zone with the POST /servers request containing the availability_zone parameter. It will be nice when we attach randow zone when we create instances then It will only move to the same zone when migrating or masakari ha. Currently we can force it to zone by default zone shedule in nova.conf. Sorry because I am new to Openstack and I am just an operator. I try to verify some real cases. Nguyen Huu Khoi On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza wrote: > > > Le lun. 27 mars 2023 ? 14:28, Sean Mooney a ?crit : > >> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: >> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a ?crit >> : >> > >> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: >> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < >> > > > rafaelweingartner at gmail.com> a ?crit : >> > > > >> > > > > Hello Nguy?n H?u Kh?i, >> > > > > You might want to take a look at: >> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We created >> a >> > > patch >> > > > > to avoid migrating VMs to any AZ, once the VM has been >> bootstrapped in >> > > an >> > > > > AZ that has cross zone attache equals to false. >> > > > > >> > > > > >> > > > Well, I'll provide some comments in the change, but I'm afraid we >> can't >> > > > just modify the request spec like you would want. >> > > > >> > > > Anyway, if you want to discuss about it in the vPTG, just add it in >> the >> > > > etherpad and add your IRC nick so we could try to find a time where >> we >> > > > could be discussing it : >> https://etherpad.opendev.org/p/nova-bobcat-ptg >> > > > Also, this kind of behaviour modification is more a new feature >> than a >> > > > bugfix, so fwiw you should create a launchpad blueprint so we could >> > > better >> > > > see it. >> > > >> > > i tought i left review feedback on that too that the approch was not >> > > correct. >> > > i guess i did not in the end. >> > > >> > > modifying the request spec as sylvain menthioned is not correct. >> > > i disucssed this topic on irc a few weeks back with mohomad for >> vxhost. >> > > what can be done is as follows. >> > > >> > > we can add a current_az field to the Destination object >> > > >> > > >> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 >> > > The conductor can read the instance.AZ and populate it in that new >> field. >> > > We can then add a new weigher to prefer hosts that are in the same az. >> > > >> > > >> > >> > I tend to disagree this approach as people would think that the >> > Destination.az field would be related to the current AZ for an instance, >> > while we only look at the original AZ. >> > That being said, we could have a weigher that would look at whether the >> > host is in the same AZ than the instance.host. >> you miss understood what i wrote >> >> i suggested addint Destination.current_az to store teh curernt AZ of the >> instance before scheduling. >> >> so my proposal is if RequestSpec.AZ is not set and Destination.current_az >> is set then the new >> weigher would prefer hosts that are in the same az as >> Destination.current_az >> >> we coudl also call Destination.current_az Destination.prefered_az >> >> > I meant, I think we don't need to provide a new field, we can already know > about what host an existing instance uses if we want (using [1]) > Anyway, let's stop to discuss about it here, we should rather review that > for a Launchpad blueprint or more a spec. > > -Sylvain > > [1] > https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 > >> > >> > >> > This will provide soft AZ affinity for the vm and preserve the fact >> that if >> > > a vm is created without sepcifying >> > > An AZ the expectaiton at the api level woudl be that it can migrate >> to any >> > > AZ. >> > > >> > > To provide hard AZ affintiy we could also add prefileter that would >> use >> > > the same data but instead include it in the >> > > placement query so that only the current AZ is considered. This would >> have >> > > to be disabled by default. >> > > >> > > >> > Sure, we could create a new prefilter so we could then deprecate the >> > AZFilter if we want. >> we already have an AZ prefilter and the AZFilter is deprecate for removal >> i ment to delete it in zed but did not have time to do it in zed of >> Antielope >> i deprecated the AZ| filter in >> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 >> xena when i enabeld the az prefilter by default. >> >> > Ah whoops, indeed I forgot the fact we already have the prefilter, so the > hard support for AZ is already existing. > > >> i will try an delete teh AZ filter before m1 if others dont. >> > > OK. > > >> > >> > >> > > That woudl allow operators to choose the desired behavior. >> > > curret behavior (disable weigher and dont enabel prefilter) >> > > new default, prefer current AZ (weigher enabeld prefilter disabled) >> > > hard affintiy(prefilter enabled.) >> > > >> > > there are other ways to approch this but updating the request spec is >> not >> > > one of them. >> > > we have to maintain the fact the enduser did not request an AZ. >> > > >> > > >> > Anyway, if folks want to discuss about AZs, this week is the good time >> :-) >> > >> > >> > > > >> > > > -Sylvain >> > > > >> > > > >> > > > >> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >> > > nguyenhuukhoinw at gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Hello guys. >> > > > > > I playing with Nova AZ and Masakari >> > > > > > >> > > > > > >> https://docs.openstack.org/nova/latest/admin/availability-zones.html >> > > > > > >> > > > > > Masakari will move server by nova scheduler. >> > > > > > >> > > > > > Openstack Docs describe that: >> > > > > > >> > > > > > If the server was not created in a specific zone then it is >> free to >> > > be >> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter >> > > > > > < >> > > >> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter >> > >> > > is >> > > > > > a no-op. >> > > > > > >> > > > > > I see that everyone usually creates instances with "Any >> Availability >> > > > > > Zone" on Horzion and also we don't specify AZ when creating >> > > instances by >> > > > > > cli. >> > > > > > >> > > > > > By this way, when we use Masakari or we miragrated instances( or >> > > > > > evacuate) so our instance will be moved to other zones. >> > > > > > >> > > > > > Can we attach AZ to server create requests API based on Any >> > > > > > Availability Zone to limit instances moved to other zones? >> > > > > > >> > > > > > Thank you. Regards >> > > > > > >> > > > > > Nguyen Huu Khoi >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Rafael Weing?rtner >> > > > > >> > > >> > > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Mon Mar 27 13:44:18 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 27 Mar 2023 06:44:18 -0700 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Message-ID: On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde wrote: > I?m happy to book an additional time slot(s) specifically for this > discussion if something other than what we currently have works better for > everyone. Please let me know. > > /Dave > On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp>, wrote: > > As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this > discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic > members, please kindly reply convenient timeslots. > > Unfortunately, I took the last few days off and I'm only seeing this now. My morning is booked up aside from the original time slot which was discussed. Maybe there is a time later in the week which could work? > > [1] https://ptg.opendev.org/ptg.html > > Thanks, > > Hiromu Asahina > > On 2023/03/22 20:01, Hiromu Asahina wrote: > > Thanks! > > I look forward to your reply. > > On 2023/03/22 1:29, Julia Kreger wrote: > > No worries! > > I think that time works for me. I'm not sure it will work for > everyone, but > I can proxy information back to the whole of the ironic project as we > also > have the question of this functionality listed for our Operator Hour in > order to help ironic gauge interest. > > -Julia > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > I apologize that I couldn't reply before the Ironic meeting on Monday. > > I need one slot to discuss this topic. > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, > 27)[1,2] works for them. Does this work for Ironic? I understand not all > Ironic members will join this discussion, so I hope we can arrange a > convenient date for you two at least and, hopefully, for those > interested in this topic. > > [1] > > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z > [2] https://ptg.opendev.org/ptg.html > > Thanks, > Hiromu Asahina > > On 2023/03/17 23:29, Julia Kreger wrote: > > I'm not sure how many Ironic contributors would be the ones to attend a > discussion, in part because this is disjointed from the items they need > > to > > focus on. It is much more of a "big picture" item for those of us > who are > leaders in the project. > > I think it would help to understand how much time you expect the > > discussion > > to take to determine a path forward and how we can collaborate. Ironic > > has > > a huge number of topics we want to discuss during the PTG, and I > suspect > our team meeting on Monday next week should yield more > interest/awareness > as well as an amount of time for each topic which will aid us in > > scheduling. > > > If you can let us know how long, then I think we can figure out when > the > best day/time will be. > > Thanks! > > -Julia > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > Thank you for your reply. > > I'd like to decide the time slot for this topic. > I just checked PTG schedule [1]. > > We have the following time slots. Which one is convenient to gether? > (I didn't get reply but I listed Barbican, as its cores are almost the > same as Keystone) > > Mon, 27: > > - 14 (keystone) > - 15 (keystone) > > Tue, 28 > > - 13 (barbican) > - 14 (keystone, ironic) > - 15 (keysonte, ironic) > - 16 (ironic) > > Wed, 29 > > - 13 (ironic) > - 14 (keystone, ironic) > - 15 (keystone, ironic) > - 21 (ironic) > > Thanks, > > [1] https://ptg.opendev.org/ptg.html > > Hiromu Asahina > > > On 2023/02/11 1:41, Jay Faulkner wrote: > > I think it's safe to say the Ironic community would be very > invested in > such an effort. Let's make sure the time chosen for vPTG with this is > > such > > that Ironic contributors can attend as well. > > Thanks, > Jay Faulkner > Ironic PTL > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > Hello Everyone, > > Recently, Tacker and Keystone have been working together on a new > > Keystone > > Middleware that can work with external authentication > services, such as Keycloak. The code has already been submitted [1], > > but > > we want to make this middleware a generic plugin that works > with as many OpenStack services as possible. To that end, we would > > like > > to > > hear from other projects with similar use cases > (especially Ironic and Barbican, which run as standalone > services). We > will make a time slot to discuss this topic at the next vPTG. > Please contact me if you are interested and available to > participate. > > [1] > > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 > > > -- > Hiromu Asahina > > > > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > Phone: +81-422-59-7008 > Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > Phone: +81-422-59-7008 > Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > > > > > -- > ?-------------------------------------? > NTT Network Innovation Center > Hiromu Asahina > ------------------------------------- > 3-9-11, Midori-cho, Musashino-shi > Tokyo 180-8585, Japan > Phone: +81-422-59-7008 > Email: hiromu.asahina.az at hco.ntt.co.jp > ?-------------------------------------? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Mon Mar 27 14:00:40 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 27 Mar 2023 09:00:40 -0500 Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection OpenStack is now booked In-Reply-To: <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com> References: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com> <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com> <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com> Message-ID: Perfect. Thank you! -Kendall On Mon, Mar 27, 2023 at 4:19?AM Artem Goncharov wrote: > Okay, I have not received any other feedback, so I went and booked 2 slots > Wed 15:00-17:00 and left also 1h slot on Fri 14:00 just for ?safety?. > > Looking forward seeing you there. > > Artem > > On 25. Mar 2023, at 01:39, Kendall Nelson wrote: > > Heh, okay well not complete overlap, but there is still a 3 hour overlap > as sdk things are currently scheduled go from 14 - 17 UTC. > > Either way, I would rather not try to squeeze it down on Friday, when we > can just move it to Wednesday. > > -Kendall > > On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann > wrote: > >> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 >> UTC slot does not overlap with TC. >> >> - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18 >> >> -gmann >> >> ---- On Fri, 24 Mar 2023 11:12:52 -0700 Artem Goncharov wrote --- >> > Well, there was actually no pool, since I was not even sure anybody is >> that interested, but glad to hear. >> > What about Wed somewhere from 13:00 to 17:00? There is however overlap >> with Nova (pretty much like on any other day) >> > Ideas? I just want to avoid overlap with public cloud, but maybe even >> 1h is enough. So far there are not much topics anyway. >> > >> > >> > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com> >> wrote: >> > Super annoying request, but can we do earlier in the week? The >> sessions for sdk have 100% overlap with the TC which I was planning on >> attending :/ >> > >> > And I am very very sorry if I missed sharing an opinion on when would >> be good to meet. >> > -Kendall >> > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov >> artem.goncharov at gmail.com> wrote: >> > Hi all, >> > A bit late, but still - I have booked a 3 hours slot during PTG on >> Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I >> think some people and outcomes will follow directly into our room. >> > Etherpad is there: >> https://etherpad.opendev.org/p/march2023-ptg-sdk-cli >> > Feel free to feel in topics you want to discuss >> > Cheers,Artem >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwilde at redhat.com Mon Mar 27 14:07:36 2023 From: dwilde at redhat.com (Dave Wilde) Date: Mon, 27 Mar 2023 09:07:36 -0500 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Message-ID: Hi Julia, No worries! I see that several of our sessions are overlapping, perhaps we could combine the 15:00 UTC session tomorrow to discuss this topic? /Dave On Mar 27, 2023 at 8:44 AM -0500, Julia Kreger , wrote: > > > > On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde wrote: > > > I?m happy to book an additional time slot(s) specifically for this discussion if something other than what we currently have works better for everyone. Please let me know. > > > > > > /Dave > > > On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina , wrote: > > > > As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this > > > > discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic > > > > members, please kindly reply convenient timeslots. > > > > Unfortunately, I took the last few days off and I'm only seeing this now. My morning is booked up aside from the original time slot which was discussed. > > > > Maybe there is a time later in the week which could work? > > > > > > > > > > > > [1] https://ptg.opendev.org/ptg.html > > > > > > > > Thanks, > > > > > > > > Hiromu Asahina > > > > > > > > On 2023/03/22 20:01, Hiromu Asahina wrote: > > > > > Thanks! > > > > > > > > > > I look forward to your reply. > > > > > > > > > > On 2023/03/22 1:29, Julia Kreger wrote: > > > > > > No worries! > > > > > > > > > > > > I think that time works for me. I'm not sure it will work for > > > > > > everyone, but > > > > > > I can proxy information back to the whole of the ironic project as we > > > > > > also > > > > > > have the question of this functionality listed for our Operator Hour in > > > > > > order to help ironic gauge interest. > > > > > > > > > > > > -Julia > > > > > > > > > > > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > > > > > > > I apologize that I couldn't reply before the Ironic meeting on Monday. > > > > > > > > > > > > > > I need one slot to discuss this topic. > > > > > > > > > > > > > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, > > > > > > > 27)[1,2] works for them. Does this work for Ironic? I understand not all > > > > > > > Ironic members will join this discussion, so I hope we can arrange a > > > > > > > convenient date for you two at least and, hopefully, for those > > > > > > > interested in this topic. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z > > > > > > > [2] https://ptg.opendev.org/ptg.html > > > > > > > > > > > > > > Thanks, > > > > > > > Hiromu Asahina > > > > > > > > > > > > > > On 2023/03/17 23:29, Julia Kreger wrote: > > > > > > > > I'm not sure how many Ironic contributors would be the ones to attend a > > > > > > > > discussion, in part because this is disjointed from the items they need > > > > > > > to > > > > > > > > focus on. It is much more of a "big picture" item for those of us > > > > > > > > who are > > > > > > > > leaders in the project. > > > > > > > > > > > > > > > > I think it would help to understand how much time you expect the > > > > > > > discussion > > > > > > > > to take to determine a path forward and how we can collaborate. Ironic > > > > > > > has > > > > > > > > a huge number of topics we want to discuss during the PTG, and I > > > > > > > > suspect > > > > > > > > our team meeting on Monday next week should yield more > > > > > > > > interest/awareness > > > > > > > > as well as an amount of time for each topic which will aid us in > > > > > > > scheduling. > > > > > > > > > > > > > > > > If you can let us know how long, then I think we can figure out when > > > > > > > > the > > > > > > > > best day/time will be. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > -Julia > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < > > > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > > > > > > > > > > > Thank you for your reply. > > > > > > > > > > > > > > > > > > I'd like to decide the time slot for this topic. > > > > > > > > > I just checked PTG schedule [1]. > > > > > > > > > > > > > > > > > > We have the following time slots. Which one is convenient to gether? > > > > > > > > > (I didn't get reply but I listed Barbican, as its cores are almost the > > > > > > > > > same as Keystone) > > > > > > > > > > > > > > > > > > Mon, 27: > > > > > > > > > > > > > > > > > > - 14 (keystone) > > > > > > > > > - 15 (keystone) > > > > > > > > > > > > > > > > > > Tue, 28 > > > > > > > > > > > > > > > > > > - 13 (barbican) > > > > > > > > > - 14 (keystone, ironic) > > > > > > > > > - 15 (keysonte, ironic) > > > > > > > > > - 16 (ironic) > > > > > > > > > > > > > > > > > > Wed, 29 > > > > > > > > > > > > > > > > > > - 13 (ironic) > > > > > > > > > - 14 (keystone, ironic) > > > > > > > > > - 15 (keystone, ironic) > > > > > > > > > - 21 (ironic) > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > [1] https://ptg.opendev.org/ptg.html > > > > > > > > > > > > > > > > > > Hiromu Asahina > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2023/02/11 1:41, Jay Faulkner wrote: > > > > > > > > > > I think it's safe to say the Ironic community would be very > > > > > > > > > > invested in > > > > > > > > > > such an effort. Let's make sure the time chosen for vPTG with this is > > > > > > > > > such > > > > > > > > > > that Ironic contributors can attend as well. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Jay Faulkner > > > > > > > > > > Ironic PTL > > > > > > > > > > > > > > > > > > > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < > > > > > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote: > > > > > > > > > > > > > > > > > > > > > Hello Everyone, > > > > > > > > > > > > > > > > > > > > > > Recently, Tacker and Keystone have been working together on a new > > > > > > > > > Keystone > > > > > > > > > > > Middleware that can work with external authentication > > > > > > > > > > > services, such as Keycloak. The code has already been submitted [1], > > > > > > > but > > > > > > > > > > > we want to make this middleware a generic plugin that works > > > > > > > > > > > with as many OpenStack services as possible. To that end, we would > > > > > > > like > > > > > > > > > to > > > > > > > > > > > hear from other projects with similar use cases > > > > > > > > > > > (especially Ironic and Barbican, which run as standalone > > > > > > > > > > > services). We > > > > > > > > > > > will make a time slot to discuss this topic at the next vPTG. > > > > > > > > > > > Please contact me if you are interested and available to > > > > > > > > > > > participate. > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Hiromu Asahina > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > ?-------------------------------------? > > > > > > > > > ????? NTT Network Innovation Center > > > > > > > > > ??????? Hiromu Asahina > > > > > > > > > ???? ------------------------------------- > > > > > > > > > ????? 3-9-11, Midori-cho, Musashino-shi > > > > > > > > > ??????? Tokyo 180-8585, Japan > > > > > > > > > Phone: +81-422-59-7008 > > > > > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp > > > > > > > > > ?-------------------------------------? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > ?-------------------------------------? > > > > > > > ???? NTT Network Innovation Center > > > > > > > ?????? Hiromu Asahina > > > > > > > ??? ------------------------------------- > > > > > > > ???? 3-9-11, Midori-cho, Musashino-shi > > > > > > > ?????? Tokyo 180-8585, Japan > > > > > > > Phone: +81-422-59-7008 > > > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp > > > > > > > ?-------------------------------------? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > ?-------------------------------------? > > > > NTT Network Innovation Center > > > > Hiromu Asahina > > > > ------------------------------------- > > > > 3-9-11, Midori-cho, Musashino-shi > > > > Tokyo 180-8585, Japan > > > > ? Phone: +81-422-59-7008 > > > > ? Email: hiromu.asahina.az at hco.ntt.co.jp > > > > ?-------------------------------------? > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Mon Mar 27 14:40:00 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Mon, 27 Mar 2023 16:40:00 +0200 Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects) In-Reply-To: References: Message-ID: Hello El?d: As you said, we are no longer sending patches for Train. In the last four months, we have sent only two patches changing the code (apart from other testing patches). I proposed the EOL of Train because of this and the extra cost involved in maintaining older versions, regardless of this CI status. In any case, I'll propose this topic in the PTG tomorrow, considering leaving only the Neutron Train branch as EM and closing the rest of the projects. Regards. On Mon, Mar 27, 2023 at 3:25?PM El?d Ill?s wrote: > Hi, > > (First of all, I'm writing this as stable maintainer, someone who > was there when the 'Extended Maintenance' process was formulated > in the first place) > > As far as I understand, neutron's stable/train gate is still fully > operational. I also know that backporting every bug fix to stable > branches is time and resource consuming, and the team does not have / > want to spend time on this anymore. Between EOL'ing and backporting > every single bug fix, there are another levels of engagement. > > What I want to say is: what if stable/train of neutron is kept open as > long as the gate is functional, to give people the possibility for > cooperation, give the opportunity to test backports, bug fixes on > upstream CI for stable/train. > > There are two extremity in opinions about how far back we should > maintain things: > 1) we should keep only open the most recent stable release to free up > resources, and minimize maintenance cost > 2) we should keep everything open, even the very old stable branches, > where even the gate jobs are not functional anymore, to give space > for collaboration in fixing important bugs (like security bugs) > > I think the right way is somewhere in the middle: as long as the gate > is functional we can keep a branch open, for *collaboration*. > I understand if most active neutron team members do not propose > backports to stable/train anymore. Some way, this is acceptable > according to Extended Maintenance process: it is not "fully maintained", > rather there is still the possibility to do *some* maintenance. > > (Note, that I'm mostly talking about neutron. Stadium projects, that > have broken gates (even on master branch), I support the EOL'ing) > > What do you think about the above suggestion? > > Thanks, > > El?d > irc: elodilles > ------------------------------ > *From:* Rodolfo Alonso Hernandez > *Sent:* Thursday, March 16, 2023 5:15 PM > *To:* openstack-discuss > *Subject:* [neutron][release] Proposing transition to EOL Train (all > Neutron related projects) > > Hello: > > I'm sending this mail in advance to propose transitioning Neutron and all > related projects to EOL. I'll propose this topic too during the next > Neutron meeting. > > The announcement is the first step [1] to transition a stable branch to > EOL. > > The patch to mark these branches as EOL will be pushed in two weeks. If > you have any inconvenience, please let me know in this mail chain or in IRC > (ralonsoh, #openstack-neutron channel). You can also contact any Neutron > core reviewer in the IRC channel. > > Regards. > > [1] > https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Mon Mar 27 15:28:20 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 27 Mar 2023 17:28:20 +0200 Subject: [nova] Hold your rechecks Message-ID: Hey, Due to the recent merge of https://review.opendev.org/c/openstack/requirements/+/872065/10/upper-constraints.txt#298 we now use mypy==1.1.1 which includes a breaking behavioural change against our code : https://07de6a0c9e6ec0c6835f-ccccbfab26b1456f69293167016566bc.ssl.cf2.rackcdn.com/875621/10/gate/openstack-tox-pep8/e50f9f0/job-output.txt Thanks to Eric (kudos to him, he was quickier than me), we have a fix https://review.opendev.org/c/openstack/nova/+/878693 Please accordingly hold your rechecks until that fix is merged. -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Mon Mar 27 16:07:20 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Mon, 27 Mar 2023 16:07:20 +0000 Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects) In-Reply-To: References: Message-ID: <20230327160720.hmf53vm5czvzntbh@yuggoth.org> On 2023-03-27 13:25:41 +0000 (+0000), El?d Ill?s wrote: [...] > I also know that backporting every bug fix to stable branches is > time and resource consuming, and the team does not have / want to > spend time on this anymore. [...] Note that this was actually the point of Extended Maintenance. Team members aren't expected to backport fixes to EM phase branches. They exist so interested members of the community can propose, review, and otherwise collaborate on backports even if the core review team for the project is no longer interested in paying attention to them. > what if stable/train of neutron is kept open as long as the gate > is functional, to give people the possibility for cooperation, > give the opportunity to test backports, bug fixes on upstream CI > for stable/train. [...] And this can be accomplished by removing jobs which will no longer work without significant effort, we included provisions for exactly that in the original EM resolution: "[...] these older branches might, at some point, just be running pep8 and unit tests but those are required at a minimum." https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html#testing So dropping "integration" (e.g. DevStack/Tempest) and "functional testing" jobs from EM branches is fine, even expected. If unit testing and static analysis jobs required by the PTI don't pass any longer, then the branch and all branches older than it have to switch to unmaintained of end of life. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gilles.mocellin at nuagelibre.org Mon Mar 27 18:01:00 2023 From: gilles.mocellin at nuagelibre.org (Gilles Mocellin) Date: Mon, 27 Mar 2023 20:01:00 +0200 Subject: [nova] Can OpenStack support snapshot rollback (not creating a new instance)? In-Reply-To: References: Message-ID: <2275345.ElGaqSPkdT@guitare> Le dimanche 26 mars 2023, 20:50:22 CEST ??? a ?crit : > Hello, Hello, > I use Ceph as the storage backend for Nova, Glance, and Cinder. > > If I create a snapshot for a instance, It create a new image in > glance. And I can use the image to create a new instance. > > This feels to me more like creating an image based on the current > state of the VM rather than creating a VM snapshot. The term "snapshot" is not ideal, as it miss-leads every virtualization platform users (VMware, Hyper-V...). > I want to ask: > 1?Can I create and revert a VM snapshot like I would in virtual > machine software? In fact, if you don't user boot from volume instances, you could have something similar of snapshots by rebuilding your VM with the image create by the snapshot. > 2?When a VM uses multiple disks/volumes, does OpenStack support taking > a snapshot of all disks/volumes of the VM as a whole? No, but you can take snapshot of the volumes. It won't be coherent, in a single transaction. > 3?Can OpenStack snapshot and save the memory state of a VM? I think not. When the instance is suspended, it's done, but even if you snapshot it, it would not take that memory state with the image. > If it is not currently supported, are there any simple customization > implementation ideas that can be recommended? You really need to think differently. OpenStack is a Cloud platform, made to consume Infrastructure as a Service, with Infra as code tools (like Terraform). You should have disposable instances, that could be destroyed and rebuild when you want easily. Use clusters for your middlwares (mariadb, redis...), use load balancer : Octavia our you own HAproxy, in front of your several web frontends / backends... Keep your data on additional volumes, and also do backups in object storage (Swift/S3, handled by Ceph). Make restoration easy and test it. Deploy different environments of your projects in different OpenStack projects, to test changes. That way of thinking will make it easier for you when you will begin to think about containers and Kubernetes. > Thank you for any help and suggestions. > Best wishes. > > Han From smooney at redhat.com Mon Mar 27 19:03:12 2023 From: smooney at redhat.com (Sean Mooney) Date: Mon, 27 Mar 2023 20:03:12 +0100 Subject: [nova] Can OpenStack support snapshot rollback (not creating a new instance)? In-Reply-To: <2275345.ElGaqSPkdT@guitare> References: <2275345.ElGaqSPkdT@guitare> Message-ID: On Mon, 2023-03-27 at 20:01 +0200, Gilles Mocellin wrote: > Le dimanche 26 mars 2023, 20:50:22 CEST ??? a ?crit : > > Hello, > > Hello, > > > I use Ceph as the storage backend for Nova, Glance, and Cinder. > > > > If I create a snapshot for a instance, It create a new image in > > glance. And I can use the image to create a new instance. > > > > This feels to me more like creating an image based on the current > > state of the VM rather than creating a VM snapshot. > > The term "snapshot" is not ideal, as it miss-leads every virtualization > platform users (VMware, Hyper-V...). nova snapshots are snapshots of the root disk not of the disk and memory. > > > I want to ask: > > 1?Can I create and revert a VM snapshot like I would in virtual > > machine software? > > In fact, if you don't user boot from volume instances, you could have > something similar of snapshots by rebuilding your VM with the image create by > the snapshot. if its just the root disk state you can do that now for boot form volume or non boot from voluem isntances. regardless of the storage backend you use. nova snapshots are only of the vms root disk. cinder snapshots are only of the volume content. enitehr supprot capturing the ram state. one of the main reason rebuild exist is to allow rolling back the vm root disk state. cinder volume snapshots have the same usecase. its better to think of it as backup and restore then vitrualbox style snapshots. > > > 2?When a VM uses multiple disks/volumes, does OpenStack support taking > > a snapshot of all disks/volumes of the VM as a whole? > > No, but you can take snapshot of the volumes. It won't be coherent, in a > single transaction. well yes and no. if you have the qemu quest agent you can quiese all writes to all file systems during the snapshot cinder allso supprot voluem groups i belive and i think they allow you to do a consitent snapshot of all volumes in a group at once. you cant as far as i am aware do a consitent snapshot of all voluem and the root disk at the same time however. > > > 3?Can OpenStack snapshot and save the memory state of a VM? > > I think not. When the instance is suspended, it's done, but even if you > snapshot it, it would not take that memory state with the image. no we almost can. for the libvirt driver we implement suspend as manage save. this dumps the guest ram to disk liek virtual box does for its snapshots but we don thave a way to then snapshot that and use it to restore later. to get what you are asking for would actully be an extention to shelve or snapthos that would require the guest to be stopped while the disk and memeory snapshot is done. it woudl requrie use to assocate 2 images with teh snapshot the ram and disk iamge. there would be security consideration with saving hte ram like this too. its a feature that might be doable but i dont know if it coudl be done for other hypervieros liek powervm, hyperv or vmware. it obviouly woudl not work with ironic. > > > If it is not currently supported, are there any simple customization > > implementation ideas that can be recommended? if we were to do this i woudl see it as an extention to shelve i think. i think this is not really inline with the normal cloud usage model and defintly feel more like classic virutaliastion. in general im not sure if it woudl be an acceptable change to the nova api but it would be a new feature. > > You really need to think differently. > OpenStack is a Cloud platform, made to consume Infrastructure as a Service, > with Infra as code tools (like Terraform). > > You should have disposable instances, that could be destroyed and rebuild when > you want easily. > Use clusters for your middlwares (mariadb, redis...), use load balancer : > Octavia our you own HAproxy, in front of your several web frontends / > backends... > Keep your data on additional volumes, and also do backups in object storage > (Swift/S3, handled by Ceph). > Make restoration easy and test it. > > Deploy different environments of your projects in different OpenStack projects, > to test changes. > > That way of thinking will make it easier for you when you will begin to think > about containers and Kubernetes. this is often refered to as the pets vs cattel view. we supprot backup and restore type functionality in nova and cinder and that is unliekly to go away. so this request is not ensirely out of scope but i twould require a lot of work and testing to enable. on the nova side it would requrie extenstion to the rebuild, shelve/unshleve and the backup/create iamge apis. it would be a preat large change to implemenet and im not sure it would reach a quorm of agreemnt to accept. however this week is the upstream vPTG. if you want to ask for feedback syncronosely you could add it as a topic to the nova adgenda https://etherpad.opendev.org/p/nova-bobcat-ptg, operator pain point adgenda https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova or continue async on the mailing list or vai a nova spec. > > > Thank you for any help and suggestions. > > Best wishes. > > > > Han > > > > > From jay at gr-oss.io Mon Mar 27 19:10:28 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 27 Mar 2023 12:10:28 -0700 Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature Supporting OAuth2.0 with External Authorization Service In-Reply-To: References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp> <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp> <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp> Message-ID: So, looking over the Ironic PTG schedule, I appear to have booked Firmware Upgrade interface in two places -- tomorrow and Wednesday 2200 UTC. This is fortuitous: I can move the firmware upgrade conversation entirely into 2200 UTC, and give the time we had set aside to this topic. Dave, Julia and I consulted on IRC, and decided to take this action. We'll be adding an item to Ironic's PTG for tomorrow, Tuesday March 28 at 1500 UTC - 1525 UTC to discuss KeystoneMiddleware OAUTH support. I will perform the following changes to the Ironic schedule to accommodate: - Remove firmware upgrades from Ironic Tues 1630-1700 UTC, move all discussion fo it to Weds 2200 UTC - 2300 UTC (should be plenty of time). - Move everything from Service Steps and later (after the first break) forward 30 minutes - Add new item for KeystoneMiddleware/OAUTH discussion into Ironic's schedule at Wednesday, 1500 UTC - 1525 UTC (30 minutes with room for a break) Ironic will host the discussion in the Folsom room, and Dave will ensure interested keystone contributors are redirected to our room for this period. - Jay Faulkner Ironic PTL On Mon, Mar 27, 2023 at 7:07?AM Dave Wilde wrote: > Hi Julia, > > No worries! > > I see that several of our sessions are overlapping, perhaps we could > combine the 15:00 UTC session tomorrow to discuss this topic? > > /Dave > On Mar 27, 2023 at 8:44 AM -0500, Julia Kreger < > juliaashleykreger at gmail.com>, wrote: > > > > On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde wrote: > >> I?m happy to book an additional time slot(s) specifically for this >> discussion if something other than what we currently have works better for >> everyone. Please let me know. >> >> /Dave >> On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina < >> hiromu.asahina.az at hco.ntt.co.jp>, wrote: >> >> As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this >> discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic >> members, please kindly reply convenient timeslots. >> >> > Unfortunately, I took the last few days off and I'm only seeing this now. > My morning is booked up aside from the original time slot which was > discussed. > > Maybe there is a time later in the week which could work? > > > >> >> [1] https://ptg.opendev.org/ptg.html >> >> Thanks, >> >> Hiromu Asahina >> >> On 2023/03/22 20:01, Hiromu Asahina wrote: >> >> Thanks! >> >> I look forward to your reply. >> >> On 2023/03/22 1:29, Julia Kreger wrote: >> >> No worries! >> >> I think that time works for me. I'm not sure it will work for >> everyone, but >> I can proxy information back to the whole of the ironic project as we >> also >> have the question of this functionality listed for our Operator Hour in >> order to help ironic gauge interest. >> >> -Julia >> >> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina < >> hiromu.asahina.az at hco.ntt.co.jp> wrote: >> >> I apologize that I couldn't reply before the Ironic meeting on Monday. >> >> I need one slot to discuss this topic. >> >> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, >> 27)[1,2] works for them. Does this work for Ironic? I understand not all >> Ironic members will join this discussion, so I hope we can arrange a >> convenient date for you two at least and, hopefully, for those >> interested in this topic. >> >> [1] >> >> >> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z >> [2] https://ptg.opendev.org/ptg.html >> >> Thanks, >> Hiromu Asahina >> >> On 2023/03/17 23:29, Julia Kreger wrote: >> >> I'm not sure how many Ironic contributors would be the ones to attend a >> discussion, in part because this is disjointed from the items they need >> >> to >> >> focus on. It is much more of a "big picture" item for those of us >> who are >> leaders in the project. >> >> I think it would help to understand how much time you expect the >> >> discussion >> >> to take to determine a path forward and how we can collaborate. Ironic >> >> has >> >> a huge number of topics we want to discuss during the PTG, and I >> suspect >> our team meeting on Monday next week should yield more >> interest/awareness >> as well as an amount of time for each topic which will aid us in >> >> scheduling. >> >> >> If you can let us know how long, then I think we can figure out when >> the >> best day/time will be. >> >> Thanks! >> >> -Julia >> >> >> >> >> >> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina < >> hiromu.asahina.az at hco.ntt.co.jp> wrote: >> >> Thank you for your reply. >> >> I'd like to decide the time slot for this topic. >> I just checked PTG schedule [1]. >> >> We have the following time slots. Which one is convenient to gether? >> (I didn't get reply but I listed Barbican, as its cores are almost the >> same as Keystone) >> >> Mon, 27: >> >> - 14 (keystone) >> - 15 (keystone) >> >> Tue, 28 >> >> - 13 (barbican) >> - 14 (keystone, ironic) >> - 15 (keysonte, ironic) >> - 16 (ironic) >> >> Wed, 29 >> >> - 13 (ironic) >> - 14 (keystone, ironic) >> - 15 (keystone, ironic) >> - 21 (ironic) >> >> Thanks, >> >> [1] https://ptg.opendev.org/ptg.html >> >> Hiromu Asahina >> >> >> On 2023/02/11 1:41, Jay Faulkner wrote: >> >> I think it's safe to say the Ironic community would be very >> invested in >> such an effort. Let's make sure the time chosen for vPTG with this is >> >> such >> >> that Ironic contributors can attend as well. >> >> Thanks, >> Jay Faulkner >> Ironic PTL >> >> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina < >> hiromu.asahina.az at hco.ntt.co.jp> wrote: >> >> Hello Everyone, >> >> Recently, Tacker and Keystone have been working together on a new >> >> Keystone >> >> Middleware that can work with external authentication >> services, such as Keycloak. The code has already been submitted [1], >> >> but >> >> we want to make this middleware a generic plugin that works >> with as many OpenStack services as possible. To that end, we would >> >> like >> >> to >> >> hear from other projects with similar use cases >> (especially Ironic and Barbican, which run as standalone >> services). We >> will make a time slot to discuss this topic at the next vPTG. >> Please contact me if you are interested and available to >> participate. >> >> [1] >> >> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734 >> >> >> -- >> Hiromu Asahina >> >> >> >> >> >> >> -- >> ?-------------------------------------? >> NTT Network Innovation Center >> Hiromu Asahina >> ------------------------------------- >> 3-9-11, Midori-cho, Musashino-shi >> Tokyo 180-8585, Japan >> Phone: +81-422-59-7008 >> Email: hiromu.asahina.az at hco.ntt.co.jp >> ?-------------------------------------? >> >> >> >> >> -- >> ?-------------------------------------? >> NTT Network Innovation Center >> Hiromu Asahina >> ------------------------------------- >> 3-9-11, Midori-cho, Musashino-shi >> Tokyo 180-8585, Japan >> Phone: +81-422-59-7008 >> Email: hiromu.asahina.az at hco.ntt.co.jp >> ?-------------------------------------? >> >> >> >> >> >> -- >> ?-------------------------------------? >> NTT Network Innovation Center >> Hiromu Asahina >> ------------------------------------- >> 3-9-11, Midori-cho, Musashino-shi >> Tokyo 180-8585, Japan >> Phone: +81-422-59-7008 >> Email: hiromu.asahina.az at hco.ntt.co.jp >> ?-------------------------------------? >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Mon Mar 27 19:16:35 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Mon, 27 Mar 2023 12:16:35 -0700 Subject: [ironic] Slight PTG Schedule change Message-ID: Take notice: tomorrow's topic schedule has been changed slightly. We have moved Service Steps and DPU Orchestration conversations up 30 minutes. As mentioned in the other thread ( https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032983.html), I accidentally booked Firmware Upgrades to two times. This was fortuitous because we needed to add a cross-team item with Keystone. As always, the up to date schedule and notes are here: https://etherpad.opendev.org/p/ironic-bobcat-ptg Thanks, Jay Faulkner Ironic PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 28 00:49:57 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 28 Mar 2023 06:19:57 +0530 Subject: Nova undefine secret | openstack | wallaby Message-ID: Hi, For some reason, i had to redeploy ceph for my hci nodes and then found that the deployment command is giving out the following error: 2023-03-28 01:49:46.709605 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: error: Failed to set attributes from /etc/nova/secret.xml error: internal error: a secret with UUID bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with client.openstack secret 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} Can you please tell me how I can undefine the existing secret? With regards, Swogat Pradhan -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 28 00:54:49 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 28 Mar 2023 06:24:49 +0530 Subject: Nova undefine secret | openstack | wallaby In-Reply-To: References: Message-ID: Update podman logs: [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864 ------------------------------------------------ Initializing virsh secrets for: dcn01:openstack -------- Initializing the virsh secret for 'dcn01' cluster (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client The /etc/nova/secret.xml file already exists error: Failed to set attributes from /etc/nova/secret.xml error: internal error: a secret with UUID bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with client.openstack secret On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan wrote: > Hi, > For some reason, i had to redeploy ceph for my hci nodes and then found > that the deployment command is giving out the following error: > 2023-03-28 01:49:46.709605 | | > WARNING | ERROR: Can't run container nova_libvirt_init_secret > stderr: error: Failed to set attributes from /etc/nova/secret.xml > error: internal error: a secret with UUID > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > client.openstack secret > 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | > FATAL | Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | > error={"changed": false, "msg": "Failed containers: > nova_libvirt_init_secret"} > > Can you please tell me how I can undefine the existing secret? > > With regards, > Swogat Pradhan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.yip at ardc.edu.au Tue Mar 28 01:21:06 2023 From: jake.yip at ardc.edu.au (Jake Yip) Date: Tue, 28 Mar 2023 12:21:06 +1100 Subject: [Magnum] vPTG Message-ID: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au> Dear all, Magnum vPTG will be held at Wed 0900 UTC in the Havana Room. Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum for updates Regards, Jake (sorry for duplicates) From adivya1.singh at gmail.com Tue Mar 28 02:50:27 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Tue, 28 Mar 2023 08:20:27 +0530 Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk In-Reply-To: References: Message-ID: Hi Team, Any thoughts on this ? Regards Adivya Singh On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh wrote: > Hi Team, > > Any hints, if i want to upload images in a bulk in a Open Stack , because > it takes some time for the image to copy if we go one by one, or even of we > go with script > > > Also if there is a scenario where glance mount point fails and we can > create the same Share path and Copy the Image from the source , Will the > OpenStack glance Service will start detecting those images upload in a share > > Regards > Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt at oliver.net.au Tue Mar 28 03:30:56 2023 From: matt at oliver.net.au (Matthew Oliver) Date: Tue, 28 Mar 2023 14:30:56 +1100 Subject: [swift][ptg] Ops Feedback Session - 29th March at 13:00 UTC Message-ID: As we've done in PTGs past, we're getting devs and ops together to talk about Swift: what's working, what isn't, and what would be most helpful to improve. We're meeting in Ocata (https://www.openstack.org/ptg/rooms/ocata) at 13:00UTC -- if you run a Swift cluster, we hope to see you there! Even if you can't make it, We'd appreciate it if you can offer some feedback on the feedback etherpad (https://etherpad.opendev.org/p/swift-bobcat-ops-feedback ). This has always been a highlight at every PTG for us swift devs. Have your say and help make Swift even better! Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From abishop at redhat.com Tue Mar 28 03:56:08 2023 From: abishop at redhat.com (Alan Bishop) Date: Mon, 27 Mar 2023 20:56:08 -0700 Subject: Nova undefine secret | openstack | wallaby In-Reply-To: References: Message-ID: On Mon, Mar 27, 2023 at 5:56?PM Swogat Pradhan wrote: > Update podman logs: > [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864 > ------------------------------------------------ > Initializing virsh secrets for: dcn01:openstack > -------- > Initializing the virsh secret for 'dcn01' cluster > (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client > The /etc/nova/secret.xml file already exists > error: Failed to set attributes from /etc/nova/secret.xml > error: internal error: a secret with UUID > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > client.openstack secret > > > On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan > wrote: > >> Hi, >> For some reason, i had to redeploy ceph for my hci nodes and then found >> that the deployment command is giving out the following error: >> 2023-03-28 01:49:46.709605 | | >> WARNING | ERROR: Can't run container nova_libvirt_init_secret >> stderr: error: Failed to set attributes from /etc/nova/secret.xml >> error: internal error: a secret with UUID >> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with >> client.openstack secret >> 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | >> FATAL | Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | >> error={"changed": false, "msg": "Failed containers: >> nova_libvirt_init_secret"} >> >> Can you please tell me how I can undefine the existing secret? >> > Use "podman exec -ti bash" to open a shell within the nova_libvirt container, then you can use virsh commands to examine and delete any extraneous secrets. This command might be all that you need: [root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine bd136bb0-fd78-5429-ab80-80b8c571d821 You should also delete the /etc/nova/secret.xml file, and let it be recreated when you re-run the deployment command. Alan >> With regards, >> Swogat Pradhan >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Tue Mar 28 05:28:12 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 28 Mar 2023 10:58:12 +0530 Subject: Nova undefine secret | openstack | wallaby In-Reply-To: References: Message-ID: Hi Alan, Thank you for your response. We cannot run that particular command as the container itself doesn't run. That container is only used to set the secret and stays in exited state if i am correct. [root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine bd136bb0-fd78-5429-ab80-80b8c571d821 Error: can only create exec sessions on running containers: container state improper With regards, Swogat Pradhan On Tue, Mar 28, 2023 at 9:26?AM Alan Bishop wrote: > > > On Mon, Mar 27, 2023 at 5:56?PM Swogat Pradhan > wrote: > >> Update podman logs: >> [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864 >> ------------------------------------------------ >> Initializing virsh secrets for: dcn01:openstack >> -------- >> Initializing the virsh secret for 'dcn01' cluster >> (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client >> The /etc/nova/secret.xml file already exists >> error: Failed to set attributes from /etc/nova/secret.xml >> error: internal error: a secret with UUID >> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with >> client.openstack secret >> >> >> On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan >> wrote: >> >>> Hi, >>> For some reason, i had to redeploy ceph for my hci nodes and then found >>> that the deployment command is giving out the following error: >>> 2023-03-28 01:49:46.709605 | | >>> WARNING | ERROR: Can't run container nova_libvirt_init_secret >>> stderr: error: Failed to set attributes from /etc/nova/secret.xml >>> error: internal error: a secret with UUID >>> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with >>> client.openstack secret >>> 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | >>> FATAL | Create containers managed by Podman for >>> /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | >>> error={"changed": false, "msg": "Failed containers: >>> nova_libvirt_init_secret"} >>> >>> Can you please tell me how I can undefine the existing secret? >>> >> > Use "podman exec -ti bash" to open a shell within > the nova_libvirt container, then you can use virsh commands to examine and > delete any extraneous secrets. This command might be all that you need: > > [root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine > bd136bb0-fd78-5429-ab80-80b8c571d821 > > You should also delete the /etc/nova/secret.xml file, and let it be > recreated when you re-run the deployment command. > > Alan > > >>> With regards, >>> Swogat Pradhan >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Tue Mar 28 05:59:05 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 28 Mar 2023 07:59:05 +0200 Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk In-Reply-To: References: Message-ID: There's no server-side support for bulk upload of images in glance API. But I see no reason why client-side tooling would work for that. Simplest thing would be using xargs in some bash one-liner. If talking about python and sdk, should be also quite trivial to implement that leveraging multiprocessing or joblib libraries. > On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh > wrote: > >> Hi Team, >> >> Any hints, if i want to upload images in a bulk in a Open Stack , >> because it takes some time for the image to copy if we go one by one, or >> even of we go with script >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Tue Mar 28 06:04:24 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Tue, 28 Mar 2023 08:04:24 +0200 Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk In-Reply-To: References: Message-ID: Sorry, made a confusing typo in my reply, what I meant was that some client-side script will work just nicely for this purpose if it's written in a way to execute multiple processes simultaneously. ??, 28 ???. 2023 ?., 07:59 Dmitriy Rabotyagov : > There's no server-side support for bulk upload of images in glance API. > But I see no reason why client-side tooling would work for that. Simplest > thing would be using xargs in some bash one-liner. > > If talking about python and sdk, should be also quite trivial to implement > that leveraging multiprocessing or joblib libraries. > > > >> On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh >> wrote: >> >>> Hi Team, >>> >>> Any hints, if i want to upload images in a bulk in a Open Stack , >>> because it takes some time for the image to copy if we go one by one, or >>> even of we go with script >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Tue Mar 28 06:34:09 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Tue, 28 Mar 2023 06:34:09 +0000 Subject: [all][stable][ptl] Propose to EOL Rocky series In-Reply-To: References: Message-ID: Hi, this thread was bit of forgotten, sorry for that. A bit more than two weeks ago we had a discussion on #openstack-release about this [1]. So, to summerize, there are the issues: - stable/rocky's gate is mostly broken - more than one third of the repositories have transitioned their stable/rocky branch to EOL (including multiple core component) - old, unmaintained CI jobs, testing environments, hinders refactoring of Zuul jobs and other configurations On the other hand, as Thomas mentioned, there is the need for some to be able to cooperate (as an example: recent security issue [2], mentioned in previous mail or in our IRC discussion) on a common place, namely in gerrit. This was originally the intention with Extended Maintenance. We just haven't thought about eternity :) It seems that teams feel that if a branch is 'open' and in 'Extended Maintenance' then it still means it is 'fully supported', thus cannot let the gate failing AND don't want to merge patches without gate tests, that's one reason why teams rather EOL their branches. We might need to think more about what is the best way forward. [1] https://meetings.opendev.org/irclogs/%23openstack-release/%23openstack-release.2023-03-08.log.html#t2023-03-08T13:54:34 [2] https://security.openstack.org/ossa/OSSA-2023-002.html El?d irc: elodilles ________________________________ From: Thomas Goirand Sent: Tuesday, February 14, 2023 7:31 PM To: El?d Ill?s Cc: openstack-discuss at lists.openstack.org Subject: Re: [all][stable][ptl] Propose to EOL Rocky series On 2/10/23 18:26, El?d Ill?s wrote: > Hi, > > thanks for all the feedbacks from teams so far! > > @Zigo: Extended Maintenance process was created just for the same > situation: to give space to interested parties to cooperate and keep > things maintained even when stable releases are over their 'supported' > lifetime. So it's good to see that there is interest in it! > Unfortunately, with very old branches we've reached the state where > gates can't be maintained and without a functional gate it's not safe to > merge patches (yes, even security fixes) and they are just using > resources (CI & maintainers' time). When gate is broken in such extent, > then i think the community have to accept that it is not possible to > merge patches confidently and needs to EOL that release. That's where I don't agree. There are ways, outside of the OpenStack gate, to test things, in such ways that merging patches there can be a thing. > Another aspect is that code cannot be cleaned up until those old > branches are still present (CI jobs, project configurations, etc) which > gives pain for developers. Just disable gating completely then. > So, however some vendors would appreciate probably to keep things open > forever, for the community this is not beneficial and doable I think. I don't agree. We need a place to share patches between distros. The official Git feels like the natural place to do so, even without any type of gating. BTW, my Nova patches for CVE-2022-47951 in Rocky, Stein & Train are currently wrong and need another approach. I was thinking about simply disabling .vmdk altogether (rather than having a complicated code to check for the VMDK subtype). I wonder what other distros did. Where do I disucss this? Cheers, Thomas Goirand (zigo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Tue Mar 28 09:16:16 2023 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 28 Mar 2023 18:16:16 +0900 Subject: [neutron] Bug deputy report (week of Mar 20) Message-ID: Hi, Here is the bug deputy report last week. Hopefully, someone familiar with DNS integration can check the first bug. # Needs assignee * FQDN inside guest VM is not the same as dns_assignment on network port https://bugs.launchpad.net/neutron/+bug/2012391 It would be nice if someone familiar with DNS integration can look into it. * [CI] "neutron-ovs-grenade-multinode-skip-level" and "neutron-ovn-grenade-multinode-skip-level" failing always https://bugs.launchpad.net/neutron/+bug/2012731 # New but assigned * [OVN] Define the OVS port in the LSP to allow OVN to set the QoS rules https://bugs.launchpad.net/neutron/+bug/2012613 Assigned to ralonsoh # In Progress * [ovn] N/S traffic for VMs without FIPs not working https://bugs.launchpad.net/neutron/+bug/2012712 Assigned to ltomasbo [1] was proposed as a fix of bug 2012712 which is caused by [2]. In parallel, [3] was proposed to revert [2]. Reverting [2] first sounds reasonable to avoid the regression but it is better to clarify the priority and the relationship in the review comments. ltomasbo is involved in both, so I think there is no confusion though. [1] https://review.opendev.org/c/openstack/neutron/+/878450 [2] https://review.opendev.org/c/openstack/neutron/+/875644 [3] https://review.opendev.org/c/openstack/neutron/+/878441 * neutron-ovn-agent fails on do_commit aborted due to error: 'Chassis_Private' https://bugs.launchpad.net/neutron/+bug/2012385 * Intermittent failures of test_agent_metadata_port_ip_update_event https://bugs.launchpad.net/neutron/+bug/2012754 * [sqlalchemy-20] sqlalchemy.exc.InvalidRequestError: No 'on clause' argument may be passed when joining to a relationship path as a target https://bugs.launchpad.net/neutron/+bug/2012643 * [sqlalchemy-20] Strings are not accepted for attribute names in loader options https://bugs.launchpad.net/neutron/+bug/2012662 * [sqlalchemy-20] Unexpected keyword argument "when" in "sqlalchemy.case" method https://bugs.launchpad.net/neutron/+bug/2012705 # RFE * [rfe] Add one api support CRUD allowed_address_pairs https://bugs.launchpad.net/neutron/+bug/2012332 Thanks, Akihiro Motoki (amotoki) From elod.illes at est.tech Tue Mar 28 09:56:23 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Tue, 28 Mar 2023 09:56:23 +0000 Subject: [networking-mlnx] release job failure - missing openstackci as maintainer in pypi Message-ID: Hi networking-mlnx maintainers, latest networking-mlnx releases caused release job failures [1][2][3], with the following error: The user 'openstackci' isn't allowed to upload to project 'networking-mlnx'. (note, that networking-mlnx is not under openstack namespace (x/networking-mlnx)) [1] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001654.html [2] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001655.html [3] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001656.html Thanks, El?d irc: elodilles @ #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Mar 28 11:30:29 2023 From: smooney at redhat.com (Sean Mooney) Date: Tue, 28 Mar 2023 12:30:29 +0100 Subject: Nova undefine secret | openstack | wallaby In-Reply-To: References: Message-ID: On Tue, 2023-03-28 at 06:24 +0530, Swogat Pradhan wrote: > Update podman logs: > [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864 > ------------------------------------------------ > Initializing virsh secrets for: dcn01:openstack > -------- > Initializing the virsh secret for 'dcn01' cluster > (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client > The /etc/nova/secret.xml file already exists > error: Failed to set attributes from /etc/nova/secret.xml > error: internal error: a secret with UUID > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > client.openstack secret you jsut do "virsh secret-undefine " > > > On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan > wrote: > > > Hi, > > For some reason, i had to redeploy ceph for my hci nodes and then found > > that the deployment command is giving out the following error: > > 2023-03-28 01:49:46.709605 | | > > WARNING | ERROR: Can't run container nova_libvirt_init_secret > > stderr: error: Failed to set attributes from /etc/nova/secret.xml > > error: internal error: a secret with UUID > > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > > client.openstack secret > > 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | > > FATAL | Create containers managed by Podman for > > /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | > > error={"changed": false, "msg": "Failed containers: > > nova_libvirt_init_secret"} > > > > Can you please tell me how I can undefine the existing secret? > > > > With regards, > > Swogat Pradhan > > From jake.yip at unimelb.edu.au Tue Mar 28 01:15:51 2023 From: jake.yip at unimelb.edu.au (Jake Yip) Date: Tue, 28 Mar 2023 12:15:51 +1100 Subject: [Magnum] vPTG Message-ID: Dear all, Magnum vPTG will be held at Wed 0900 UTC in the Havana Room. Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum for updates Regards, Jake -- Jake Yip DevOps Engineer, ARDC Nectar Research Cloud From swogatpradhan22 at gmail.com Tue Mar 28 02:09:23 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Tue, 28 Mar 2023 07:39:23 +0530 Subject: DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo In-Reply-To: References: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag> <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag> <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com> Message-ID: Hi Alan, Thank you so much. The issue was with the images. Qcow2 images are not working. I used RAW images and it now takes 2 mins to spawn the instances without any issues. Thank you With regards, Swogat Pradhan On Thu, Mar 23, 2023 at 10:35?PM Alan Bishop wrote: > > > On Thu, Mar 23, 2023 at 9:01?AM Swogat Pradhan > wrote: > >> Hi, >> Can someone please help me identify the issue here? >> Latest cinder-volume logs from dcn02: >> (ATTACHED) >> > > It's really not possible to analyze what's happening with just one or two > log entries. Do you have > debug logs enabled? One thing I noticed is the glance image's disk_format > is qcow2. You should > use "raw" images with ceph RBD. > > Alan > > >> >> The volume is stuck in creating state. >> >> With regards, >> Swogat Pradhan >> >> On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan >> wrote: >> >>> Hi Jhon, >>> Thank you for clarifying that. >>> Right now the cinder volume is stuck in *creating *state when adding >>> image as volume source. >>> But when creating an empty volume the volumes are getting created >>> successfully without any errors. >>> >>> We are getting volume creation request in cinder-volume.log as such: >>> 2023-03-23 12:34:40.152 108 INFO >>> cinder.volume.flows.manager.create_volume >>> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db >>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with >>> specification: {'status': 'creating', 'volume_name': >>> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1, >>> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': >>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': >>> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), >>> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, >>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> 'metadata': {'store': 'ceph'}}, {'url': >>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', >>> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', >>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>> 'owner_specified.openstack.object': 'images/cirros', >>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>> } >>> >>> But there is nothing else after that and the volume doesn't even >>> timeout, it just gets stuck in creating state. >>> Can you advise what might be the issue here? >>> All the containers are in a healthy state now. >>> >>> With regards, >>> Swogat Pradhan >>> >>> >>> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop wrote: >>> >>>> >>>> >>>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan < >>>> swogatpradhan22 at gmail.com> wrote: >>>> >>>>> Hi, >>>>> Is this bind not required for cinder_scheduler container? >>>>> >>>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind", >>>>> I do not see this particular bind on the cinder scheduler containers >>>>> on my controller nodes. >>>>> >>>> >>>> That is correct, because the scheduler does not access the ceph >>>> cluster. >>>> >>>> Alan >>>> >>>> >>>>> With regards, >>>>> Swogat Pradhan >>>>> >>>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan < >>>>> swogatpradhan22 at gmail.com> wrote: >>>>> >>>>>> Cinder volume config: >>>>>> >>>>>> [tripleo_ceph] >>>>>> volume_backend_name=tripleo_ceph >>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>> rbd_user=openstack >>>>>> rbd_pool=volumes >>>>>> rbd_flatten_volume_from_snapshot=False >>>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b >>>>>> report_discard_supported=True >>>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf >>>>>> rbd_cluster_name=dcn02 >>>>>> >>>>>> Glance api config: >>>>>> >>>>>> [dcn02] >>>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf >>>>>> rbd_store_user=openstack >>>>>> rbd_store_pool=images >>>>>> rbd_thin_provisioning=False >>>>>> store_description=dcn02 rbd glance store >>>>>> [ceph] >>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>> rbd_store_user=openstack >>>>>> rbd_store_pool=images >>>>>> rbd_thin_provisioning=False >>>>>> store_description=Default glance store backend. >>>>>> >>>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan < >>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>> >>>>>>> I still have the same issue, I'm not sure what's left to try. >>>>>>> All the pods are now in a healthy state, I am getting log entries 3 >>>>>>> mins after I hit the create volume button in cinder-volume when I try to >>>>>>> create a volume with an image. >>>>>>> And the volumes are just stuck in creating state for more than 20 >>>>>>> mins now. >>>>>>> >>>>>>> Cinder logs: >>>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc >>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected >>>>>>> cinder-volume RPC version 3.17 as minimum service version. >>>>>>> 2023-03-22 20:34:59.166 108 INFO >>>>>>> cinder.volume.flows.manager.create_volume >>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db >>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with >>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2, >>>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location': >>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> [{'url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at': >>>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc), >>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54, >>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap', >>>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file', >>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>> } >>>>>>> >>>>>>> With regards, >>>>>>> Swogat Pradhan >>>>>>> >>>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan < >>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Adam, >>>>>>>>> The systems are in same LAN, in this case it seemed like the image >>>>>>>>> was getting pulled from the central site which was caused due to an >>>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/ >>>>>>>>> directory, which seems to have been resolved after the changes i made to >>>>>>>>> fix it. >>>>>>>>> >>>>>>>>> Right now the glance api podman is running in unhealthy state and >>>>>>>>> the podman logs don't show any error whatsoever and when issued the command >>>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn >>>>>>>>> site, which is why cinder is throwing an error stating: >>>>>>>>> >>>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server >>>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error >>>>>>>>> finding address for >>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>>>> Unable to establish connection to >>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: >>>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded >>>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by >>>>>>>>> NewConnectionError('>>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] >>>>>>>>> ECONNREFUSED',)) >>>>>>>>> >>>>>>>>> Now i need to find out why the port is not listed as the glance >>>>>>>>> service is running, which i am not sure how to find out. >>>>>>>>> >>>>>>>> >>>>>>>> One other thing to investigate is whether your deployment includes >>>>>>>> this patch [1]. If it does, then bear in mind >>>>>>>> the glance-api service running at the edge site will be an >>>>>>>> "internal" (non public facing) instance that uses port 9293 >>>>>>>> instead of 9292. You should familiarize yourself with the release >>>>>>>> note [2]. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6 >>>>>>>> [2] >>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml >>>>>>>> >>>>>>>> Alan >>>>>>>> >>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Swogat Pradhan >>>>>>>>> >>>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan < >>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Update: >>>>>>>>>>> Here is the log when creating a volume using cirros image: >>>>>>>>>>> >>>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO >>>>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with >>>>>>>>>>> specification: {'status': 'creating', 'volume_name': >>>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, >>>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': >>>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> [{'url': >>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', >>>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', >>>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', >>>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, >>>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', >>>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': >>>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', >>>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': >>>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), >>>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, >>>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url': >>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url': >>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url': >>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', >>>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', >>>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', >>>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', >>>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros', >>>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service': >>>>>>>>>>> } >>>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image >>>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and >>>>>>>>>> 0.16 MB/s suggests you have a network issue. >>>>>>>>>> >>>>>>>>>> John Fulton previously stated your cinder-volume service at the >>>>>>>>>> edge site is not using the local ceph image store. Assuming you are >>>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should >>>>>>>>>> be configured to use the local glance service [2]. You should check >>>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29 >>>>>>>>>> [2] >>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80 >>>>>>>>>> >>>>>>>>>> Alan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>>>> category=FutureWarning) >>>>>>>>>>> >>>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] >>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: >>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will >>>>>>>>>>> be removed. Use explicitly json instead in version 'xena' >>>>>>>>>>> category=FutureWarning) >>>>>>>>>>> >>>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 >>>>>>>>>>> MB/s >>>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO >>>>>>>>>>> cinder.volume.flows.manager.create_volume >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume >>>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f >>>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully >>>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager >>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully. >>>>>>>>>>> >>>>>>>>>>> The image is present in dcn02 store but still it downloaded the >>>>>>>>>>> image in 0.16 MB/s and then created the volume. >>>>>>>>>>> >>>>>>>>>>> With regards, >>>>>>>>>>> Swogat Pradhan >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan < >>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Jhon, >>>>>>>>>>>> This seems to be an issue. >>>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the >>>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the >>>>>>>>>>>> config files were created in the name of ceph.conf and keyring was >>>>>>>>>>>> ceph.client.openstack.keyring. >>>>>>>>>>>> >>>>>>>>>>>> Which created issues in glance as well as the naming convention >>>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename >>>>>>>>>>>> the central ceph conf file as such: >>>>>>>>>>>> >>>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/ >>>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll >>>>>>>>>>>> total 16 >>>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56 >>>>>>>>>>>> ceph_central.client.openstack.keyring >>>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf >>>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 >>>>>>>>>>>> ceph.client.openstack.keyring >>>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf >>>>>>>>>>>> [root at dcn02-compute-0 ceph]# >>>>>>>>>>>> >>>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of >>>>>>>>>>>> the respective clusters in both dcn01 and dcn02. >>>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are >>>>>>>>>>>> the files used to access dcn02 ceph cluster and ceph_central* files are >>>>>>>>>>>> used in for accessing central ceph cluster. >>>>>>>>>>>> >>>>>>>>>>>> glance multistore config: >>>>>>>>>>>> [dcn02] >>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>>> rbd_store_user=openstack >>>>>>>>>>>> rbd_store_pool=images >>>>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>>>> store_description=dcn02 rbd glance store >>>>>>>>>>>> >>>>>>>>>>>> [ceph_central] >>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf >>>>>>>>>>>> rbd_store_user=openstack >>>>>>>>>>>> rbd_store_pool=images >>>>>>>>>>>> rbd_thin_provisioning=False >>>>>>>>>>>> store_description=Default glance store backend. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> With regards, >>>>>>>>>>>> Swogat Pradhan >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton < >>>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Hi, >>>>>>>>>>>>> > Seems like cinder is not using the local ceph. >>>>>>>>>>>>> >>>>>>>>>>>>> That explains the issue. It's a misconfiguration. >>>>>>>>>>>>> >>>>>>>>>>>>> I hope this is not a production system since the mailing list >>>>>>>>>>>>> now has >>>>>>>>>>>>> the cinder.conf which contains passwords. >>>>>>>>>>>>> >>>>>>>>>>>>> The section that looks like this: >>>>>>>>>>>>> >>>>>>>>>>>>> [tripleo_ceph] >>>>>>>>>>>>> volume_backend_name=tripleo_ceph >>>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver >>>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf >>>>>>>>>>>>> rbd_user=openstack >>>>>>>>>>>>> rbd_pool=volumes >>>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False >>>>>>>>>>>>> rbd_secret_uuid= >>>>>>>>>>>>> report_discard_supported=True >>>>>>>>>>>>> >>>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and >>>>>>>>>>>>> not the >>>>>>>>>>>>> central one. Use the ceph conf file for that cluster and >>>>>>>>>>>>> ensure the >>>>>>>>>>>>> rbd_secret_uuid corresponds to that one. >>>>>>>>>>>>> >>>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID >>>>>>>>>>>>> of the >>>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The >>>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so >>>>>>>>>>>>> that >>>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. >>>>>>>>>>>>> This >>>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh >>>>>>>>>>>>> secret-get-value $FSID`. >>>>>>>>>>>>> >>>>>>>>>>>>> The documentation describes how to configure the central and >>>>>>>>>>>>> DCN sites >>>>>>>>>>>>> correctly but an error seems to have occurred while you were >>>>>>>>>>>>> following >>>>>>>>>>>>> it. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html >>>>>>>>>>>>> >>>>>>>>>>>>> John >>>>>>>>>>>>> >>>>>>>>>>>>> > >>>>>>>>>>>>> > Ceph Output: >>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l >>>>>>>>>>>>> > NAME SIZE PARENT >>>>>>>>>>>>> FMT PROT LOCK >>>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65 8 MiB >>>>>>>>>>>>> 2 excl >>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 16 MiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap 16 MiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d 321 MiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap 321 MiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 386 MiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap 386 MiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a 15 GiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap 15 GiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b 15 GiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap 15 GiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 15 GiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap 15 GiB >>>>>>>>>>>>> 2 yes >>>>>>>>>>>>> > >>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l >>>>>>>>>>>>> > NAME SIZE >>>>>>>>>>>>> PARENT FMT PROT LOCK >>>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d 100 GiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63 10 GiB >>>>>>>>>>>>> 2 >>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# >>>>>>>>>>>>> > >>>>>>>>>>>>> > Attached the cinder config. >>>>>>>>>>>>> > Please let me know how I can solve this issue. >>>>>>>>>>>>> > >>>>>>>>>>>>> > With regards, >>>>>>>>>>>>> > Swogat Pradhan >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton < >>>>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run >>>>>>>>>>>>> a command like this:" I suggested some steps you could try to confirm the >>>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder >>>>>>>>>>>>> config. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> Update: >>>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it >>>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02. >>>>>>>>>>>>> >>> The image size is 389 MB. >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> Hi Jhon, >>>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images >>>>>>>>>>>>> created after importing from the central site. >>>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a >>>>>>>>>>>>> long time for the volume to get created. >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> When launching an instance from volume the instance is >>>>>>>>>>>>> getting created properly without any errors. >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> I tried to cache images in nova using >>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>>>> but getting checksum failed error. >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> With regards, >>>>>>>>>>>>> >>>> Swogat Pradhan >>>>>>>>>>>>> >>>> >>>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton < >>>>>>>>>>>>> johfulto at redhat.com> wrote: >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan >>>>>>>>>>>>> >>>>> wrote: >>>>>>>>>>>>> >>>>> > >>>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the >>>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to >>>>>>>>>>>>> launch the VM from volume. >>>>>>>>>>>>> >>>>> > >>>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the >>>>>>>>>>>>> block device creation is stuck in creating state, it is taking more than 10 >>>>>>>>>>>>> mins for the volume to be created, whereas the image has already been >>>>>>>>>>>>> imported to the edge glance. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> Try following this document and making the same >>>>>>>>>>>>> observations in your >>>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> On a DCN site if you run a command like this: >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf >>>>>>>>>>>>> --keyring >>>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring >>>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l >>>>>>>>>>>>> >>>>> NAME SIZE PARENT >>>>>>>>>>>>> >>>>> FMT PROT LOCK >>>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB >>>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap 2 >>>>>>>>>>>>> excl >>>>>>>>>>>>> >>>>> $ >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the >>>>>>>>>>>>> image which is on >>>>>>>>>>>>> >>>>> the same local ceph cluster. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're >>>>>>>>>>>>> encountering >>>>>>>>>>>>> >>>>> the streaming behavior described here: >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance >>>>>>>>>>>>> and be copied >>>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted >>>>>>>>>>>>> on DCN sites. >>>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is >>>>>>>>>>>>> booted, then the >>>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the >>>>>>>>>>>>> image will boot as >>>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site >>>>>>>>>>>>> has access to >>>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the >>>>>>>>>>>>> booting of >>>>>>>>>>>>> >>>>> the image will take time because it has not been copied >>>>>>>>>>>>> in advance, >>>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN >>>>>>>>>>>>> site and >>>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster. >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> John >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> > >>>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again >>>>>>>>>>>>> then update. >>>>>>>>>>>>> >>>>> > >>>>>>>>>>>>> >>>>> > With regards, >>>>>>>>>>>>> >>>>> > Swogat Pradhan >>>>>>>>>>>>> >>>>> > >>>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>> >>>>> >> >>>>>>>>>>>>> >>>>> >> Update: >>>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is >>>>>>>>>>>>> showing down. >>>>>>>>>>>>> >>>>> >> >>>>>>>>>>>>> >>>>> >> >>>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> Hi Brendan, >>>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 >>>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes. >>>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad >>>>>>>>>>>>> (lacp=active). >>>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the >>>>>>>>>>>>> instance timed out so i waited for the volume to be created. >>>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the >>>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state. >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> Here is the nova-compute log: >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities >>>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437 >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING >>>>>>>>>>>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error >>>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command. >>>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value >>>>>>>>>>>>> >>>>> >>> Exit code: 2 >>>>>>>>>>>>> >>>>> >>> Stdout: '' >>>>>>>>>>>>> >>>>> >>> Stderr: '': >>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while >>>>>>>>>>>>> running command. >>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO >>>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 >>>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the >>>>>>>>>>>>> template mentioned here ?: >>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> The volume is already created and i do not >>>>>>>>>>>>> understand why the instance is stuck in spawning state. >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> With regards, >>>>>>>>>>>>> >>>>> >>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> >>>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard < >>>>>>>>>>>>> bshephar at redhat.com> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Does your environment use different network >>>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything >>>>>>>>>>>>> on it? >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching >>>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the >>>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various >>>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause >>>>>>>>>>>>> issues if everything is running over a single 1Gbe interface. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a >>>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network >>>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping >>>>>>>>>>>>> packets. In the situation I described, there were dropped packets which >>>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the >>>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being >>>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor >>>>>>>>>>>>> while spawning the instance. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP >>>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly >>>>>>>>>>>>> sounds like some kind of network issue. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Brendan Shephard >>>>>>>>>>>>> >>>>> >>>> Senior Software Engineer >>>>>>>>>>>>> >>>>> >>>> Red Hat Australia >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block < >>>>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some >>>>>>>>>>>>> time ago in this thread: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it >>>>>>>>>>>>> for that user, not sure if that could apply here. But is it possible that >>>>>>>>>>>>> your nova and neutron versions are different between central and edge site? >>>>>>>>>>>>> Have you restarted nova and neutron services on the compute nodes after >>>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute? >>>>>>>>>>>>> Maybe they can help narrow down the issue. >>>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the >>>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to >>>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit >>>>>>>>>>>>> is running, this will most likely impact client IO depending on your load. >>>>>>>>>>>>> Check out the rabbitmqctl commands. >>>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia >>>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc. >>>>>>>>>>>>> rebuild. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" >>>>>>>>>>>>> while being replicated across the rabbit nodes. But I don't really know the >>>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give >>>>>>>>>>>>> a better advice. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Regards, >>>>>>>>>>>>> >>>>> >>>> Eugen >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan >>>>>>>>>>>> >: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue? >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Hi >>>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss. >>>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe >>>>>>>>>>>>> but not due to packet >>>>>>>>>>>>> >>>>> >>>> loss. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> with regards, >>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan < >>>>>>>>>>>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>>> >>>>> >>>> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Hi, >>>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'. >>>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never >>>>>>>>>>>>> checked when >>>>>>>>>>>>> >>>>> >>>> launching the instance. >>>>>>>>>>>>> >>>>> >>>> I will check that and come back. >>>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance >>>>>>>>>>>>> gets stuck at spawning >>>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not >>>>>>>>>>>>> sure if packet loss >>>>>>>>>>>>> >>>>> >>>> causes this. >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> With regards, >>>>>>>>>>>>> >>>>> >>>> Swogat pradhan >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block < >>>>>>>>>>>>> eblock at nde.ag> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they >>>>>>>>>>>>> identical between >>>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss >>>>>>>>>>>>> through the tunnel? >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan >>>>>>>>>>>> >: >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> > Hi Eugen, >>>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' >>>>>>>>>>>>> or 'cc' as i am not >>>>>>>>>>>>> >>>>> >>>> > getting email's from you. >>>>>>>>>>>>> >>>>> >>>> > Coming to the issue: >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# >>>>>>>>>>>>> rabbitmqctl list_policies -p >>>>>>>>>>>>> >>>>> >>>> / >>>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ... >>>>>>>>>>>>> >>>>> >>>> > vhost name pattern apply-to >>>>>>>>>>>>> definition priority >>>>>>>>>>>>> >>>>> >>>> > / ha-all ^(?!amq\.).* queues >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0 >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only >>>>>>>>>>>>> goes down when i am >>>>>>>>>>>>> >>>>> >>>> trying >>>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a >>>>>>>>>>>>> spawning state and >>>>>>>>>>>>> >>>>> >>>> then >>>>>>>>>>>>> >>>>> >>>> > gets stuck. >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the >>>>>>>>>>>>> edge sites. >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> > With regards, >>>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan < >>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>>> >>>>> >>>> > wrote: >>>>>>>>>>>>> >>>>> >>>> > >>>>>>>>>>>>> >>>>> >>>> >> Hi Eugen, >>>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to >>>>>>>>>>>>> me directly, i am >>>>>>>>>>>>> >>>>> >>>> checking >>>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find >>>>>>>>>>>>> your reply. >>>>>>>>>>>>> >>>>> >>>> >> Here is the log for download: >>>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq >>>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue >>>>>>>>>>>>> occurred. >>>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform >>>>>>>>>>>>> other activities in the >>>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge >>>>>>>>>>>>> site.* >>>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>>> >>>>> >>>> >> With regards, >>>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan < >>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>>> >>>>> >>>> >> wrote: >>>>>>>>>>>>> >>>>> >>>> >> >>>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen, >>>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response. >>>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here >>>>>>>>>>>>> are the details: >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:* >>>>>>>>>>>>> >>>>> >>>> >>> * Container bundle set: rabbitmq-bundle [ >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest >>>>>>>>>>>>> ]: >>>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-0 >>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3 >>>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-1 >>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2 >>>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-2 >>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1 >>>>>>>>>>>>> >>>>> >>>> >>> * rabbitmq-bundle-3 >>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster): >>>>>>>>>>>>> >>>>> >>>> Started >>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0 >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple >>>>>>>>>>>>> times but the issue is >>>>>>>>>>>>> >>>>> >>>> still >>>>>>>>>>>>> >>>>> >>>> >>> present. >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:* >>>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl >>>>>>>>>>>>> cluster_status >>>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ... >>>>>>>>>>>>> >>>>> >>>> >>> Basics >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Cluster name: >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Running Nodes >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Versions >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: >>>>>>>>>>>>> RabbitMQ >>>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: >>>>>>>>>>>>> RabbitMQ >>>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: >>>>>>>>>>>>> RabbitMQ >>>>>>>>>>>>> >>>>> >>>> 3.8.3 >>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1 >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> : >>>>>>>>>>>>> >>>>> >>>> RabbitMQ >>>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1 >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Alarms >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Network Partitions >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> (none) >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Listeners >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, >>>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: >>>>>>>>>>>>> HTTP API >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, >>>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: >>>>>>>>>>>>> HTTP API >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, >>>>>>>>>>>>> purpose: inter-node and CLI >>>>>>>>>>>>> >>>>> >>>> tool >>>>>>>>>>>>> >>>>> >>>> >>> communication >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, >>>>>>>>>>>>> purpose: AMQP 0-9-1 >>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com, >>>>>>>>>>>>> >>>>> >>>> interface: >>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: >>>>>>>>>>>>> HTTP API >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: >>>>>>>>>>>>> clustering, purpose: >>>>>>>>>>>>> >>>>> >>>> inter-node and >>>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, >>>>>>>>>>>>> protocol: amqp, purpose: AMQP >>>>>>>>>>>>> >>>>> >>>> 0-9-1 >>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0 >>>>>>>>>>>>> >>>>> >>>> >>> Node: >>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> , >>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, >>>>>>>>>>>>> purpose: HTTP API >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Feature flags >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled >>>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled >>>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled >>>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled >>>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> *Logs:* >>>>>>>>>>>>> >>>>> >>>> >>> *(Attached)* >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> With regards, >>>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan < >>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com> >>>>>>>>>>>>> >>>>> >>>> >>> wrote: >>>>>>>>>>>>> >>>>> >>>> >>> >>>>>>>>>>>>> >>>>> >>>> >>>> Hi, >>>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova >>>>>>>>>>>>> api log. >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor: >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR >>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - >>>>>>>>>>>>> - - -] The reply >>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to >>>>>>>>>>>>> send after 60 seconds >>>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4). >>>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR >>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] The reply >>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to >>>>>>>>>>>>> send after 60 seconds >>>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR >>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] The reply >>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to >>>>>>>>>>>>> send after 60 seconds >>>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING >>>>>>>>>>>>> nova.cache_utils >>>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] Cache enabled >>>>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null. >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING >>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] >>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't >>>>>>>>>>>>> exist, drop reply to >>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR >>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver >>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - >>>>>>>>>>>>> - - -] The reply >>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to >>>>>>>>>>>>> send after 60 seconds >>>>>>>>>>>>> >>>>> >>>> due to a >>>>>>>>>>>>> >>>>> >>>> >>>> missing queue >>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066). >>>>>>>>>>>>> >>>>> >>>> Abandoning...: >>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>> With regards, >>>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan >>>>>>>>>>>>> < >>>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote: >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> Hi, >>>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge >>>>>>>>>>>>> site1 where i am trying to >>>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's. >>>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node >>>>>>>>>>>>> goes down (openstack >>>>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i >>>>>>>>>>>>> restart the nova >>>>>>>>>>>>> >>>>> >>>> compute >>>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails. >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO >>>>>>>>>>>>> nova.compute.manager >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - >>>>>>>>>>>>> - - -] Running >>>>>>>>>>>>> >>>>> >>>> >>>>> instance usage >>>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from >>>>>>>>>>>>> 2023-02-26 07:00:00 >>>>>>>>>>>>> >>>>> >>>> to >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances. >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO >>>>>>>>>>>>> nova.compute.claims >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] [instance: >>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim >>>>>>>>>>>>> successful on node >>>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO >>>>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] [instance: >>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] >>>>>>>>>>>>> Ignoring supplied device >>>>>>>>>>>>> >>>>> >>>> name: >>>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied >>>>>>>>>>>>> dev names >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO >>>>>>>>>>>>> nova.virt.block_device >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] [instance: >>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting >>>>>>>>>>>>> with volume >>>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at >>>>>>>>>>>>> /dev/vda >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING >>>>>>>>>>>>> nova.cache_utils >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] Cache enabled >>>>>>>>>>>>> >>>>> >>>> with >>>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null. >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO >>>>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] Running >>>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper: >>>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', >>>>>>>>>>>>> '/etc/nova/rootwrap.conf', >>>>>>>>>>>>> >>>>> >>>> 'privsep-helper', >>>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', >>>>>>>>>>>>> '--config-file', >>>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', >>>>>>>>>>>>> '--privsep_context', >>>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', >>>>>>>>>>>>> '--privsep_sock_path', >>>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock'] >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO >>>>>>>>>>>>> oslo.privsep.daemon >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] Spawned new >>>>>>>>>>>>> >>>>> >>>> privsep >>>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0 >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities >>>>>>>>>>>>> (eff/prm/inh): >>>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO >>>>>>>>>>>>> oslo.privsep.daemon [-] privsep >>>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647 >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING >>>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] Process >>>>>>>>>>>>> >>>>> >>>> >>>>> execution error >>>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while >>>>>>>>>>>>> running command. >>>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value >>>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2 >>>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: '' >>>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '': >>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: >>>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command. >>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO >>>>>>>>>>>>> nova.virt.libvirt.driver >>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 >>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db >>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default >>>>>>>>>>>>> default] [instance: >>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] >>>>>>>>>>>>> Creating image >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue? >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> With regards, >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan >>>>>>>>>>>>> >>>>> >>>> >>>>> >>>>>>>>>>>>> >>>>> >>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From AnnieLiu at zhaoxin.com Tue Mar 28 09:21:16 2023 From: AnnieLiu at zhaoxin.com (Annie Liu(BJ-RD)) Date: Tue, 28 Mar 2023 09:21:16 +0000 Subject: [Freezer] Restore action report ERROR can't delete temporary Image Message-ID: Hi All My Cinder Backend is Ceph, and so is Glance. For backup and restore, I choose Cinder for mode and local for storage . When I restore a Cinder volume with Freezer, an ERROR reported related to deleting temporary Image failure, which makes the whole restore flow can't be completed. 2023-03-23 16:53:50.555 361 ERROR freezer.main [-] HTTP 409 Conflict: Image 8af4be3f-2e7a-4965-8b34-74e610a89a3e could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.: HTTPConflict: HTTP 409 Conflict: Image 8af4be3f-2e7a-4965-8b34-74e610a89a3e could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance. According to Cinder source code, the volume created from Image is a child of the Image, since it is made by clone which means it doesn't copy volume data at first. In other words, the Image is its parent holding the real data, and that's why deleting is failed. My question is, is this a general issue for other storage solution? Is there any opportunity to fix it? For example, do a flatten immediately after create volume from Image? Thanks. Beat Regards, Annie Liu ????? ????????????????????????????????????????????????????? CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Mar 28 14:48:01 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 28 Mar 2023 09:48:01 -0500 Subject: [Magnum] vPTG In-Reply-To: References: Message-ID: I can't say I will be awake at that time, but I look forward to reading the notes/ summary! I will maybe use some of the conversations in my forum proposal wrt k8s certification. -Kendall On Tue, Mar 28, 2023 at 9:19?AM Jake Yip wrote: > Dear all, > > Magnum vPTG will be held at Wed 0900 UTC in the Havana Room. > > Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum > for updates > > Regards, > Jake > > -- > Jake Yip > DevOps Engineer, ARDC Nectar Research Cloud > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Tue Mar 28 16:35:56 2023 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 28 Mar 2023 17:35:56 +0100 Subject: [scientific-sig] PTG sessions for Scientific SIG Message-ID: Hi all - The Scientific SIG has two sessions on Wednesday (tomorrow) at the PTG, at 1400 UTC and 2100 UTC. Everyone is welcome. There is an etherpad for the sessions here: https://etherpad.opendev.org/p/march2023-ptg-scientific-sig . We'll begin with an introduction to the SIG and some discussion of problems, challenges and new features specific to research computing use cases. We'd also like participants to contribute lightning talks about anything relevant that is of interest to them. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Tue Mar 28 16:47:31 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Tue, 28 Mar 2023 18:47:31 +0200 Subject: [neutron] Deprecate networking-odl project Message-ID: Hello all: During the last releases, the support for "networking-odl" project has decreased. Currently there is no active developer or maintainer in the community. This project depends on https://www.opendaylight.org/; the latest version released is Sulfur (16) while the version still used in the CI is Sodium (11) [1]. I would like first to make a call for developers to update this project. But if this is not possible, I will then start the procedure to deprecate it [2] (**not to retire it**). Regards. [1] https://github.com/openstack/networking-odl/blob/db5c79b3ee5054feb8a17df130e4ce3a95ec64c2/.zuul.d/jobs.yaml#L172 [2] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository -------------- next part -------------- An HTML attachment was scrubbed... URL: From adivya1.singh at gmail.com Tue Mar 28 18:02:47 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Tue, 28 Mar 2023 23:32:47 +0530 Subject: (Openstack-Nova) Message-ID: Hi Team I see these error in my syslog related to my nova compute service getting hung while communicating to rabbit-mq service "A recoverable connection/channel error occurred, trying to reconnect: [Errno 24] Too many open files" Is this a OS related error, or some thing i can change to get rid of this error Regards Adivya Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at knox.net.nz Tue Mar 28 19:55:07 2023 From: michael at knox.net.nz (Michael Knox) Date: Tue, 28 Mar 2023 15:55:07 -0400 Subject: (Openstack-Nova) In-Reply-To: References: Message-ID: Hi, This will be the OS you have rabbit running on. You will need to increase the ulimit. "ulimit -n" will provide the current limit for the installed OS and configuration. So you will need more than what's there. There could also be other configuration issues, a normal default of 1024 isn't low for most uses, but you will need to consider that as part of the increase. Cheers On Tue, Mar 28, 2023 at 2:16?PM Adivya Singh wrote: > Hi Team > > I see these error in my syslog related to my nova compute service getting > hung while communicating to rabbit-mq service > > "A recoverable connection/channel error occurred, trying to reconnect: > [Errno 24] Too many open files" > > Is this a OS related error, or some thing i can change to get rid of this > error > > Regards > Adivya Singh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamesleong123098 at gmail.com Tue Mar 28 20:30:46 2023 From: jamesleong123098 at gmail.com (James Leong) Date: Tue, 28 Mar 2023 15:30:46 -0500 Subject: [Horizon][policies][keystone] allow _member_ role to add user Message-ID: Hi all, I am using kolla-ansible for OpenStack deployment in the yoga version. Would it be possible to allow a user with a "_member_" role to add a user to the respective project? In OpenStack, an admin role allows users to add, delete, and edit a user profile. I would like to have the same privilege added to the "_member_" role. I have tried to modify the file "user.py" in the keystone container at "keystone/common/policies" and restarted the container. However, the button on my horizon dashboard did not appear. How could I integrate the code in order to allow the create button to appear on my dashboard with the "_member_" role? Is there a way to do that? THanks for your help James -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Tue Mar 28 21:05:04 2023 From: melwittt at gmail.com (melanie witt) Date: Tue, 28 Mar 2023 14:05:04 -0700 Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks Message-ID: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com> Hey all, Sending this to the ML to try to help others who may not have heard about it yet. I didn't know about myself until I saw the job fail on a random nova patch and I looked around to figure out why. It looks like openstack-tox-pep8 is failing 100% in nova due to a newer version of the mypy library being pulled in since a recent upper-constraints bump: https://review.opendev.org/c/openstack/requirements/+/872065 And the job isn't going to pass until the following fix merges: https://review.opendev.org/c/openstack/nova/+/878693 Finally, to help prevent a breakage for this in the future, a cross-nova-pep8 job has been approved: https://review.opendev.org/c/openstack/requirements/+/878748 and will be on its way to the gate once the aforementioned fix merges. I'll post a reply to this email when it's OK to do rechecks again. Cheers, -melwitt From adivya1.singh at gmail.com Wed Mar 29 04:55:45 2023 From: adivya1.singh at gmail.com (Adivya Singh) Date: Wed, 29 Mar 2023 10:25:45 +0530 Subject: (Openstack-Designate) rndc key not getting generated in /etc/designate Message-ID: Hi Team, My DNS Server located outside the Open Stack, and i am using below variables in my user_variables.yaml File. But When i ' m running os-desigante-install.yml Playbook, rndc key are not generating in /etc/designate Folder and the playbook fail at the below Task {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'file'\n\nThe error appears to be in '/etc/ansible/roles/os_designate/tasks/designate_post_install.yml': line 89, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create Designate rndc key file\n ^ here\n"} - name: Create Designate rndc key file template: src: rndc.key.j2 dest: "{{ item.file }}" owner: "{{ item.owner | default('root') }}" group: "{{ item.group | default('root') }}" mode: "{{ item.mode | default('0600') }}" with_items: "{{ designate_rndc_keys }}" when: designate_rndc_keys is defined and the post-install.yml File looks like this Any idea on this, Where i am missing ## rndc keys for authenticating with bind9 # define this to create as many key files as are required # designate_rndc_keys # - name: "rndc-key" # file: /etc/designate/rndc.key # algorithm: "hmac-md5" # secret: "" -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Wed Mar 29 06:06:13 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 29 Mar 2023 13:06:13 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Hello. I have one question. Follow this https://docs.openstack.org/nova/latest/admin/availability-zones.html If the server was not created in a specific zone then it is free to be moved to other zones. but when I use openstack server show [server id] I still see the "OS-EXT-AZ:availability_zone" value belonging to my instance. Could you tell the difference which causes "if the server was not created in a specific zone then it is free to be moved to other zones." Nguyen Huu Khoi On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i wrote: > Hello guys. > > I just suggest to openstack nova works better. My story because > > > 1. > > The server was created in a specific zone with the POST /servers request > containing the availability_zone parameter. > > It will be nice when we attach randow zone when we create instances then > It will only move to the same zone when migrating or masakari ha. > > Currently we can force it to zone by default zone shedule in nova.conf. > > Sorry because I am new to Openstack and I am just an operator. I try to > verify some real cases. > > > > Nguyen Huu Khoi > > > On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza wrote: > >> >> >> Le lun. 27 mars 2023 ? 14:28, Sean Mooney a ?crit : >> >>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: >>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a >>> ?crit : >>> > >>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: >>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < >>> > > > rafaelweingartner at gmail.com> a ?crit : >>> > > > >>> > > > > Hello Nguy?n H?u Kh?i, >>> > > > > You might want to take a look at: >>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We >>> created a >>> > > patch >>> > > > > to avoid migrating VMs to any AZ, once the VM has been >>> bootstrapped in >>> > > an >>> > > > > AZ that has cross zone attache equals to false. >>> > > > > >>> > > > > >>> > > > Well, I'll provide some comments in the change, but I'm afraid we >>> can't >>> > > > just modify the request spec like you would want. >>> > > > >>> > > > Anyway, if you want to discuss about it in the vPTG, just add it >>> in the >>> > > > etherpad and add your IRC nick so we could try to find a time >>> where we >>> > > > could be discussing it : >>> https://etherpad.opendev.org/p/nova-bobcat-ptg >>> > > > Also, this kind of behaviour modification is more a new feature >>> than a >>> > > > bugfix, so fwiw you should create a launchpad blueprint so we could >>> > > better >>> > > > see it. >>> > > >>> > > i tought i left review feedback on that too that the approch was not >>> > > correct. >>> > > i guess i did not in the end. >>> > > >>> > > modifying the request spec as sylvain menthioned is not correct. >>> > > i disucssed this topic on irc a few weeks back with mohomad for >>> vxhost. >>> > > what can be done is as follows. >>> > > >>> > > we can add a current_az field to the Destination object >>> > > >>> > > >>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 >>> > > The conductor can read the instance.AZ and populate it in that new >>> field. >>> > > We can then add a new weigher to prefer hosts that are in the same >>> az. >>> > > >>> > > >>> > >>> > I tend to disagree this approach as people would think that the >>> > Destination.az field would be related to the current AZ for an >>> instance, >>> > while we only look at the original AZ. >>> > That being said, we could have a weigher that would look at whether the >>> > host is in the same AZ than the instance.host. >>> you miss understood what i wrote >>> >>> i suggested addint Destination.current_az to store teh curernt AZ of the >>> instance before scheduling. >>> >>> so my proposal is if RequestSpec.AZ is not set and >>> Destination.current_az is set then the new >>> weigher would prefer hosts that are in the same az as >>> Destination.current_az >>> >>> we coudl also call Destination.current_az Destination.prefered_az >>> >>> >> I meant, I think we don't need to provide a new field, we can already >> know about what host an existing instance uses if we want (using [1]) >> Anyway, let's stop to discuss about it here, we should rather review that >> for a Launchpad blueprint or more a spec. >> >> -Sylvain >> >> [1] >> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 >> >>> > >>> > >>> > This will provide soft AZ affinity for the vm and preserve the fact >>> that if >>> > > a vm is created without sepcifying >>> > > An AZ the expectaiton at the api level woudl be that it can migrate >>> to any >>> > > AZ. >>> > > >>> > > To provide hard AZ affintiy we could also add prefileter that would >>> use >>> > > the same data but instead include it in the >>> > > placement query so that only the current AZ is considered. This >>> would have >>> > > to be disabled by default. >>> > > >>> > > >>> > Sure, we could create a new prefilter so we could then deprecate the >>> > AZFilter if we want. >>> we already have an AZ prefilter and the AZFilter is deprecate for removal >>> i ment to delete it in zed but did not have time to do it in zed of >>> Antielope >>> i deprecated the AZ| filter in >>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 >>> xena when i enabeld the az prefilter by default. >>> >>> >> Ah whoops, indeed I forgot the fact we already have the prefilter, so the >> hard support for AZ is already existing. >> >> >>> i will try an delete teh AZ filter before m1 if others dont. >>> >> >> OK. >> >> >>> > >>> > >>> > > That woudl allow operators to choose the desired behavior. >>> > > curret behavior (disable weigher and dont enabel prefilter) >>> > > new default, prefer current AZ (weigher enabeld prefilter disabled) >>> > > hard affintiy(prefilter enabled.) >>> > > >>> > > there are other ways to approch this but updating the request spec >>> is not >>> > > one of them. >>> > > we have to maintain the fact the enduser did not request an AZ. >>> > > >>> > > >>> > Anyway, if folks want to discuss about AZs, this week is the good time >>> :-) >>> > >>> > >>> > > > >>> > > > -Sylvain >>> > > > >>> > > > >>> > > > >>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >>> > > nguyenhuukhoinw at gmail.com> >>> > > > > wrote: >>> > > > > >>> > > > > > Hello guys. >>> > > > > > I playing with Nova AZ and Masakari >>> > > > > > >>> > > > > > >>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>> > > > > > >>> > > > > > Masakari will move server by nova scheduler. >>> > > > > > >>> > > > > > Openstack Docs describe that: >>> > > > > > >>> > > > > > If the server was not created in a specific zone then it is >>> free to >>> > > be >>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter >>> > > > > > < >>> > > >>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter >>> > >>> > > is >>> > > > > > a no-op. >>> > > > > > >>> > > > > > I see that everyone usually creates instances with "Any >>> Availability >>> > > > > > Zone" on Horzion and also we don't specify AZ when creating >>> > > instances by >>> > > > > > cli. >>> > > > > > >>> > > > > > By this way, when we use Masakari or we miragrated instances( >>> or >>> > > > > > evacuate) so our instance will be moved to other zones. >>> > > > > > >>> > > > > > Can we attach AZ to server create requests API based on Any >>> > > > > > Availability Zone to limit instances moved to other zones? >>> > > > > > >>> > > > > > Thank you. Regards >>> > > > > > >>> > > > > > Nguyen Huu Khoi >>> > > > > > >>> > > > > >>> > > > > >>> > > > > -- >>> > > > > Rafael Weing?rtner >>> > > > > >>> > > >>> > > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralonsoh at redhat.com Wed Mar 29 08:28:13 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Wed, 29 Mar 2023 10:28:13 +0200 Subject: [neutron][ptg] Today's agenda Message-ID: Hello all: This is a quick summary of the agenda for today's meeting (starting at 13UTC): * Status and questions about https://review.opendev.org/q/topic:port-hints * IPv6 Prefix Delegation in OVN * Neutron agents status (https://bugs.launchpad.net/neutron/+bug/2011422) * DHCP IPv6 issues with metadata service ( https://bugs.launchpad.net/neutron/+bug/1953165) * Operator hour Please check the full agenda, topics and past logs in https://etherpad.opendev.org/p/neutron-bobcat-ptg. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 29 09:49:42 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 29 Mar 2023 11:49:42 +0200 Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks In-Reply-To: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com> References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com> Message-ID: Heh, you missed my email ;-) https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html No worries tho :) Le mar. 28 mars 2023 ? 23:11, melanie witt a ?crit : > Hey all, > > Sending this to the ML to try to help others who may not have heard > about it yet. I didn't know about myself until I saw the job fail on a > random nova patch and I looked around to figure out why. > > It looks like openstack-tox-pep8 is failing 100% in nova due to a newer > version of the mypy library being pulled in since a recent > upper-constraints bump: > > https://review.opendev.org/c/openstack/requirements/+/872065 > > And the job isn't going to pass until the following fix merges: > > https://review.opendev.org/c/openstack/nova/+/878693 > > Finally, to help prevent a breakage for this in the future, a > cross-nova-pep8 job has been approved: > > https://review.opendev.org/c/openstack/requirements/+/878748 > > and will be on its way to the gate once the aforementioned fix merges. > > I'll post a reply to this email when it's OK to do rechecks again. > > Cheers, > -melwitt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 29 10:05:13 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 29 Mar 2023 12:05:13 +0200 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i a ?crit : > Hello. > I have one question. > Follow this > > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > If the server was not created in a specific zone then it is free to be > moved to other zones. but when I use > > openstack server show [server id] > > I still see the "OS-EXT-AZ:availability_zone" value belonging to my > instance. > > Correct, this is normal. If the operators creates some AZs, then the enduser should see where the instance in which AZ. > Could you tell the difference which causes "if the server was not created > in a specific zone then it is free to be moved to other zones." > > To be clear, an operator can create Availability Zones. Those AZs can then be seen by an enduser using the os-availability-zones API [1]. Then, either the enduser wants to use a specific AZ for their next instance creation (and if so, he/she adds --availability-zone parameter to their instance creation client) or they don't want and then they don't provide this parameter. If they provide this parameter, then the server will be created only in one host in the specific AZ and then when moving the instance later, it will continue to move to any host within the same AZ. If they *don't* provide this parameter, then depending on the default_schedule_zone config option, either the instance will eventually use a specific AZ (and then it's like if the enduser was asking for this AZ), or none of AZ is requested and then the instance can be created and moved between any hosts within *all* AZs. That being said, as I said earlier, the enduser can still verify the AZ from where the instance is by the server show parameter you told. We also have a documentation explaining about Availability Zones, maybe this would help you more to understand about AZs : https://docs.openstack.org/nova/latest/admin/availability-zones.html [1] https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone (tbc, the enduser won't see the hosts, but they can see the list of existing AZs) > Nguyen Huu Khoi > > > On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i > wrote: > >> Hello guys. >> >> I just suggest to openstack nova works better. My story because >> >> >> 1. >> >> The server was created in a specific zone with the POST /servers request >> containing the availability_zone parameter. >> >> It will be nice when we attach randow zone when we create instances then >> It will only move to the same zone when migrating or masakari ha. >> >> Currently we can force it to zone by default zone shedule in nova.conf. >> >> Sorry because I am new to Openstack and I am just an operator. I try to >> verify some real cases. >> >> >> >> Nguyen Huu Khoi >> >> >> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza wrote: >> >>> >>> >>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney a ?crit : >>> >>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: >>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a >>>> ?crit : >>>> > >>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: >>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < >>>> > > > rafaelweingartner at gmail.com> a ?crit : >>>> > > > >>>> > > > > Hello Nguy?n H?u Kh?i, >>>> > > > > You might want to take a look at: >>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We >>>> created a >>>> > > patch >>>> > > > > to avoid migrating VMs to any AZ, once the VM has been >>>> bootstrapped in >>>> > > an >>>> > > > > AZ that has cross zone attache equals to false. >>>> > > > > >>>> > > > > >>>> > > > Well, I'll provide some comments in the change, but I'm afraid we >>>> can't >>>> > > > just modify the request spec like you would want. >>>> > > > >>>> > > > Anyway, if you want to discuss about it in the vPTG, just add it >>>> in the >>>> > > > etherpad and add your IRC nick so we could try to find a time >>>> where we >>>> > > > could be discussing it : >>>> https://etherpad.opendev.org/p/nova-bobcat-ptg >>>> > > > Also, this kind of behaviour modification is more a new feature >>>> than a >>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we >>>> could >>>> > > better >>>> > > > see it. >>>> > > >>>> > > i tought i left review feedback on that too that the approch was not >>>> > > correct. >>>> > > i guess i did not in the end. >>>> > > >>>> > > modifying the request spec as sylvain menthioned is not correct. >>>> > > i disucssed this topic on irc a few weeks back with mohomad for >>>> vxhost. >>>> > > what can be done is as follows. >>>> > > >>>> > > we can add a current_az field to the Destination object >>>> > > >>>> > > >>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 >>>> > > The conductor can read the instance.AZ and populate it in that new >>>> field. >>>> > > We can then add a new weigher to prefer hosts that are in the same >>>> az. >>>> > > >>>> > > >>>> > >>>> > I tend to disagree this approach as people would think that the >>>> > Destination.az field would be related to the current AZ for an >>>> instance, >>>> > while we only look at the original AZ. >>>> > That being said, we could have a weigher that would look at whether >>>> the >>>> > host is in the same AZ than the instance.host. >>>> you miss understood what i wrote >>>> >>>> i suggested addint Destination.current_az to store teh curernt AZ of >>>> the instance before scheduling. >>>> >>>> so my proposal is if RequestSpec.AZ is not set and >>>> Destination.current_az is set then the new >>>> weigher would prefer hosts that are in the same az as >>>> Destination.current_az >>>> >>>> we coudl also call Destination.current_az Destination.prefered_az >>>> >>>> >>> I meant, I think we don't need to provide a new field, we can already >>> know about what host an existing instance uses if we want (using [1]) >>> Anyway, let's stop to discuss about it here, we should rather review >>> that for a Launchpad blueprint or more a spec. >>> >>> -Sylvain >>> >>> [1] >>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 >>> >>>> > >>>> > >>>> > This will provide soft AZ affinity for the vm and preserve the fact >>>> that if >>>> > > a vm is created without sepcifying >>>> > > An AZ the expectaiton at the api level woudl be that it can migrate >>>> to any >>>> > > AZ. >>>> > > >>>> > > To provide hard AZ affintiy we could also add prefileter that would >>>> use >>>> > > the same data but instead include it in the >>>> > > placement query so that only the current AZ is considered. This >>>> would have >>>> > > to be disabled by default. >>>> > > >>>> > > >>>> > Sure, we could create a new prefilter so we could then deprecate the >>>> > AZFilter if we want. >>>> we already have an AZ prefilter and the AZFilter is deprecate for >>>> removal >>>> i ment to delete it in zed but did not have time to do it in zed of >>>> Antielope >>>> i deprecated the AZ| filter in >>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 >>>> xena when i enabeld the az prefilter by default. >>>> >>>> >>> Ah whoops, indeed I forgot the fact we already have the prefilter, so >>> the hard support for AZ is already existing. >>> >>> >>>> i will try an delete teh AZ filter before m1 if others dont. >>>> >>> >>> OK. >>> >>> >>>> > >>>> > >>>> > > That woudl allow operators to choose the desired behavior. >>>> > > curret behavior (disable weigher and dont enabel prefilter) >>>> > > new default, prefer current AZ (weigher enabeld prefilter disabled) >>>> > > hard affintiy(prefilter enabled.) >>>> > > >>>> > > there are other ways to approch this but updating the request spec >>>> is not >>>> > > one of them. >>>> > > we have to maintain the fact the enduser did not request an AZ. >>>> > > >>>> > > >>>> > Anyway, if folks want to discuss about AZs, this week is the good >>>> time :-) >>>> > >>>> > >>>> > > > >>>> > > > -Sylvain >>>> > > > >>>> > > > >>>> > > > >>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >>>> > > nguyenhuukhoinw at gmail.com> >>>> > > > > wrote: >>>> > > > > >>>> > > > > > Hello guys. >>>> > > > > > I playing with Nova AZ and Masakari >>>> > > > > > >>>> > > > > > >>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>>> > > > > > >>>> > > > > > Masakari will move server by nova scheduler. >>>> > > > > > >>>> > > > > > Openstack Docs describe that: >>>> > > > > > >>>> > > > > > If the server was not created in a specific zone then it is >>>> free to >>>> > > be >>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter >>>> > > > > > < >>>> > > >>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter >>>> > >>>> > > is >>>> > > > > > a no-op. >>>> > > > > > >>>> > > > > > I see that everyone usually creates instances with "Any >>>> Availability >>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating >>>> > > instances by >>>> > > > > > cli. >>>> > > > > > >>>> > > > > > By this way, when we use Masakari or we miragrated instances( >>>> or >>>> > > > > > evacuate) so our instance will be moved to other zones. >>>> > > > > > >>>> > > > > > Can we attach AZ to server create requests API based on Any >>>> > > > > > Availability Zone to limit instances moved to other zones? >>>> > > > > > >>>> > > > > > Thank you. Regards >>>> > > > > > >>>> > > > > > Nguyen Huu Khoi >>>> > > > > > >>>> > > > > >>>> > > > > >>>> > > > > -- >>>> > > > > Rafael Weing?rtner >>>> > > > > >>>> > > >>>> > > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 29 10:32:26 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 29 Mar 2023 12:32:26 +0200 Subject: [nova][ptg] Today's agenda Message-ID: (just shamelessly stealing the idea from Neutron's team) Hey foks, Yesterday was a packed day but we didn't really progressed on a lot of topics. Today I'm gonna propose a list of topics in order to improve our sessions's visibility and in order to provide some timeboxing. 13:00 UTC - 14:45 UTC : * How to make sure people can help to review ? * Should we ask for some implementation before accepting a spec ? * CI stability is a nightmare, let's fight over this * Bobcat is a non-SLURP release * Let's clean up our upgrade documentation * Nova community outreach * Clean-up our bug list by abandoning very old LP bug reports ? * Summit/PTG : what could we be doing for the physical PTG ? (Will be 4 weeks before milestone-2) 15:00 UTC - 15:45 UTC : * Nova/Manila cross-project session : Prevent share deletion while it's attached to an instance 16:00 UTC - 17:00 UTC : * When your instance is stuck due to hard affinity policies, what could we do ? * Users reported exhaustion of primary keys ('id') in some large tables like system_metadata. How could we achieve a data migration from sa.Integer to sa.BigInteger ? Details can be found in https://etherpad.opendev.org/p/nova-bobcat-ptg#L192 I assume the packed agenda, particularly the first two hours. As a reminder, people are welcome to add their IRC nicks in each of the courtesy ping list of the related topic. Hope this gives you a taste of joining the Nova PTG today. -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Wed Mar 29 12:38:05 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 29 Mar 2023 19:38:05 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: Yes. Thanks, but the things I would like to know: after instances are created, how do we know if it was launched with specified AZ or without it? I mean the way to distinguish between specified instances and non specified instances? Nguyen Huu Khoi On Wed, Mar 29, 2023 at 5:05?PM Sylvain Bauza wrote: > > > Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i > a ?crit : > >> Hello. >> I have one question. >> Follow this >> >> https://docs.openstack.org/nova/latest/admin/availability-zones.html >> >> If the server was not created in a specific zone then it is free to be >> moved to other zones. but when I use >> >> openstack server show [server id] >> >> I still see the "OS-EXT-AZ:availability_zone" value belonging to my >> instance. >> >> > Correct, this is normal. If the operators creates some AZs, then the > enduser should see where the instance in which AZ. > > >> Could you tell the difference which causes "if the server was not >> created in a specific zone then it is free to be moved to other zones." >> >> > To be clear, an operator can create Availability Zones. Those AZs can then > be seen by an enduser using the os-availability-zones API [1]. Then, either > the enduser wants to use a specific AZ for their next instance creation > (and if so, he/she adds --availability-zone parameter to their instance > creation client) or they don't want and then they don't provide this > parameter. > > If they provide this parameter, then the server will be created only in > one host in the specific AZ and then when moving the instance later, it > will continue to move to any host within the same AZ. > If they *don't* provide this parameter, then depending on the > default_schedule_zone config option, either the instance will eventually > use a specific AZ (and then it's like if the enduser was asking for this > AZ), or none of AZ is requested and then the instance can be created and > moved between any hosts within *all* AZs. > > That being said, as I said earlier, the enduser can still verify the AZ > from where the instance is by the server show parameter you told. > > We also have a documentation explaining about Availability Zones, maybe > this would help you more to understand about AZs : > https://docs.openstack.org/nova/latest/admin/availability-zones.html > > > [1] > https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone > (tbc, the enduser won't see the hosts, but they can see the list of > existing AZs) > > > >> Nguyen Huu Khoi >> >> >> On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i < >> nguyenhuukhoinw at gmail.com> wrote: >> >>> Hello guys. >>> >>> I just suggest to openstack nova works better. My story because >>> >>> >>> 1. >>> >>> The server was created in a specific zone with the POST /servers request >>> containing the availability_zone parameter. >>> >>> It will be nice when we attach randow zone when we create instances then >>> It will only move to the same zone when migrating or masakari ha. >>> >>> Currently we can force it to zone by default zone shedule in nova.conf. >>> >>> Sorry because I am new to Openstack and I am just an operator. I try to >>> verify some real cases. >>> >>> >>> >>> Nguyen Huu Khoi >>> >>> >>> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza wrote: >>> >>>> >>>> >>>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney a >>>> ?crit : >>>> >>>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: >>>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a >>>>> ?crit : >>>>> > >>>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: >>>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < >>>>> > > > rafaelweingartner at gmail.com> a ?crit : >>>>> > > > >>>>> > > > > Hello Nguy?n H?u Kh?i, >>>>> > > > > You might want to take a look at: >>>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We >>>>> created a >>>>> > > patch >>>>> > > > > to avoid migrating VMs to any AZ, once the VM has been >>>>> bootstrapped in >>>>> > > an >>>>> > > > > AZ that has cross zone attache equals to false. >>>>> > > > > >>>>> > > > > >>>>> > > > Well, I'll provide some comments in the change, but I'm afraid >>>>> we can't >>>>> > > > just modify the request spec like you would want. >>>>> > > > >>>>> > > > Anyway, if you want to discuss about it in the vPTG, just add it >>>>> in the >>>>> > > > etherpad and add your IRC nick so we could try to find a time >>>>> where we >>>>> > > > could be discussing it : >>>>> https://etherpad.opendev.org/p/nova-bobcat-ptg >>>>> > > > Also, this kind of behaviour modification is more a new feature >>>>> than a >>>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we >>>>> could >>>>> > > better >>>>> > > > see it. >>>>> > > >>>>> > > i tought i left review feedback on that too that the approch was >>>>> not >>>>> > > correct. >>>>> > > i guess i did not in the end. >>>>> > > >>>>> > > modifying the request spec as sylvain menthioned is not correct. >>>>> > > i disucssed this topic on irc a few weeks back with mohomad for >>>>> vxhost. >>>>> > > what can be done is as follows. >>>>> > > >>>>> > > we can add a current_az field to the Destination object >>>>> > > >>>>> > > >>>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 >>>>> > > The conductor can read the instance.AZ and populate it in that new >>>>> field. >>>>> > > We can then add a new weigher to prefer hosts that are in the same >>>>> az. >>>>> > > >>>>> > > >>>>> > >>>>> > I tend to disagree this approach as people would think that the >>>>> > Destination.az field would be related to the current AZ for an >>>>> instance, >>>>> > while we only look at the original AZ. >>>>> > That being said, we could have a weigher that would look at whether >>>>> the >>>>> > host is in the same AZ than the instance.host. >>>>> you miss understood what i wrote >>>>> >>>>> i suggested addint Destination.current_az to store teh curernt AZ of >>>>> the instance before scheduling. >>>>> >>>>> so my proposal is if RequestSpec.AZ is not set and >>>>> Destination.current_az is set then the new >>>>> weigher would prefer hosts that are in the same az as >>>>> Destination.current_az >>>>> >>>>> we coudl also call Destination.current_az Destination.prefered_az >>>>> >>>>> >>>> I meant, I think we don't need to provide a new field, we can already >>>> know about what host an existing instance uses if we want (using [1]) >>>> Anyway, let's stop to discuss about it here, we should rather review >>>> that for a Launchpad blueprint or more a spec. >>>> >>>> -Sylvain >>>> >>>> [1] >>>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 >>>> >>>>> > >>>>> > >>>>> > This will provide soft AZ affinity for the vm and preserve the fact >>>>> that if >>>>> > > a vm is created without sepcifying >>>>> > > An AZ the expectaiton at the api level woudl be that it can >>>>> migrate to any >>>>> > > AZ. >>>>> > > >>>>> > > To provide hard AZ affintiy we could also add prefileter that >>>>> would use >>>>> > > the same data but instead include it in the >>>>> > > placement query so that only the current AZ is considered. This >>>>> would have >>>>> > > to be disabled by default. >>>>> > > >>>>> > > >>>>> > Sure, we could create a new prefilter so we could then deprecate the >>>>> > AZFilter if we want. >>>>> we already have an AZ prefilter and the AZFilter is deprecate for >>>>> removal >>>>> i ment to delete it in zed but did not have time to do it in zed of >>>>> Antielope >>>>> i deprecated the AZ| filter in >>>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 >>>>> xena when i enabeld the az prefilter by default. >>>>> >>>>> >>>> Ah whoops, indeed I forgot the fact we already have the prefilter, so >>>> the hard support for AZ is already existing. >>>> >>>> >>>>> i will try an delete teh AZ filter before m1 if others dont. >>>>> >>>> >>>> OK. >>>> >>>> >>>>> > >>>>> > >>>>> > > That woudl allow operators to choose the desired behavior. >>>>> > > curret behavior (disable weigher and dont enabel prefilter) >>>>> > > new default, prefer current AZ (weigher enabeld prefilter disabled) >>>>> > > hard affintiy(prefilter enabled.) >>>>> > > >>>>> > > there are other ways to approch this but updating the request spec >>>>> is not >>>>> > > one of them. >>>>> > > we have to maintain the fact the enduser did not request an AZ. >>>>> > > >>>>> > > >>>>> > Anyway, if folks want to discuss about AZs, this week is the good >>>>> time :-) >>>>> > >>>>> > >>>>> > > > >>>>> > > > -Sylvain >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >>>>> > > nguyenhuukhoinw at gmail.com> >>>>> > > > > wrote: >>>>> > > > > >>>>> > > > > > Hello guys. >>>>> > > > > > I playing with Nova AZ and Masakari >>>>> > > > > > >>>>> > > > > > >>>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>>>> > > > > > >>>>> > > > > > Masakari will move server by nova scheduler. >>>>> > > > > > >>>>> > > > > > Openstack Docs describe that: >>>>> > > > > > >>>>> > > > > > If the server was not created in a specific zone then it is >>>>> free to >>>>> > > be >>>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter >>>>> > > > > > < >>>>> > > >>>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter >>>>> > >>>>> > > is >>>>> > > > > > a no-op. >>>>> > > > > > >>>>> > > > > > I see that everyone usually creates instances with "Any >>>>> Availability >>>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating >>>>> > > instances by >>>>> > > > > > cli. >>>>> > > > > > >>>>> > > > > > By this way, when we use Masakari or we miragrated >>>>> instances( or >>>>> > > > > > evacuate) so our instance will be moved to other zones. >>>>> > > > > > >>>>> > > > > > Can we attach AZ to server create requests API based on Any >>>>> > > > > > Availability Zone to limit instances moved to other zones? >>>>> > > > > > >>>>> > > > > > Thank you. Regards >>>>> > > > > > >>>>> > > > > > Nguyen Huu Khoi >>>>> > > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > -- >>>>> > > > > Rafael Weing?rtner >>>>> > > > > >>>>> > > >>>>> > > >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From elod.illes at est.tech Wed Mar 29 13:24:09 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Wed, 29 Mar 2023 13:24:09 +0000 Subject: [PTL][release][stable][EM] Extended Maintenance - Xena Message-ID: Hi teams, As 2023.1 Antelope was released last week and we are in a less busy period, now is the good time to call your attention to the following: In less than a month Xena is planned to transition to Extended Maintenance phase [1] (planned date: 2023-04-20). I have generated the list of the current *open* and *unreleased* changes in stable/xena for every repositories [2] (where there are such patches). These lists could help the teams who are planning to do a *final* release on Xena before moving stable/xena branches to Extended Maintenance. Feel free to edit and extend these lists to track your team's progress! Note that the *latest* Xena *release* tagging patches ('xena-em' tag) have been generated too in advance [3], please mark with a -1 if your team plans to do a final release, or +1 if the team is ready for the transition. The schedule from now on is as follows: * patches with +1 from PTL / release liaison will be merged, thus those repositories will transition to Extended Maintenance * at the planned deadline (April 20th) the Release Team will merge all of the transition patches (even the ones without any response!) * after the transition, stable/xena will be still open for bug fixes, but there won't be official releases anymore. *NOTE*: teams, please focus on wrapping up your libraries first if there is any concern about the changes, in order to avoid broken (final!!) releases! [1] https://releases.openstack.org/ [2] https://etherpad.opendev.org/p/xena-final-release-before-em [3] https://review.opendev.org/q/topic:xena-em Thanks, El?d irc: elodilles @ #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenhuukhoinw at gmail.com Wed Mar 29 13:26:26 2023 From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=) Date: Wed, 29 Mar 2023 20:26:26 +0700 Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem In-Reply-To: References: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com> Message-ID: "If they *don't* provide this parameter, then depending on the default_schedule_zone config option, either the instance will eventually use a specific AZ (and then it's like if the enduser was asking for this AZ), or none of AZ is requested and then the instance can be created and moved between any hosts within *all* AZs." I ask aftet that, although without az when launch instances but they still have az. But i still mv to diffent host in diffent az when mirgrating or spawn which masakari. i am not clear, I tested. On Wed, Mar 29, 2023, 7:38 PM Nguy?n H?u Kh?i wrote: > Yes. Thanks, but the things I would like to know: after instances are > created, how do we know if it was launched with specified AZ or without it? > I mean the way to distinguish between specified instances and non specified > instances? > > Nguyen Huu Khoi > > > On Wed, Mar 29, 2023 at 5:05?PM Sylvain Bauza wrote: > >> >> >> Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i >> a ?crit : >> >>> Hello. >>> I have one question. >>> Follow this >>> >>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>> >>> If the server was not created in a specific zone then it is free to be >>> moved to other zones. but when I use >>> >>> openstack server show [server id] >>> >>> I still see the "OS-EXT-AZ:availability_zone" value belonging to my >>> instance. >>> >>> >> Correct, this is normal. If the operators creates some AZs, then the >> enduser should see where the instance in which AZ. >> >> >>> Could you tell the difference which causes "if the server was not >>> created in a specific zone then it is free to be moved to other zones." >>> >>> >> To be clear, an operator can create Availability Zones. Those AZs can >> then be seen by an enduser using the os-availability-zones API [1]. Then, >> either the enduser wants to use a specific AZ for their next instance >> creation (and if so, he/she adds --availability-zone parameter to their >> instance creation client) or they don't want and then they don't provide >> this parameter. >> >> If they provide this parameter, then the server will be created only in >> one host in the specific AZ and then when moving the instance later, it >> will continue to move to any host within the same AZ. >> If they *don't* provide this parameter, then depending on the >> default_schedule_zone config option, either the instance will eventually >> use a specific AZ (and then it's like if the enduser was asking for this >> AZ), or none of AZ is requested and then the instance can be created and >> moved between any hosts within *all* AZs. >> >> That being said, as I said earlier, the enduser can still verify the AZ >> from where the instance is by the server show parameter you told. >> >> We also have a documentation explaining about Availability Zones, maybe >> this would help you more to understand about AZs : >> https://docs.openstack.org/nova/latest/admin/availability-zones.html >> >> >> [1] >> https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone >> (tbc, the enduser won't see the hosts, but they can see the list of >> existing AZs) >> >> >> >>> Nguyen Huu Khoi >>> >>> >>> On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i < >>> nguyenhuukhoinw at gmail.com> wrote: >>> >>>> Hello guys. >>>> >>>> I just suggest to openstack nova works better. My story because >>>> >>>> >>>> 1. >>>> >>>> The server was created in a specific zone with the POST /servers request >>>> containing the availability_zone parameter. >>>> >>>> It will be nice when we attach randow zone when we create instances >>>> then It will only move to the same zone when migrating or masakari ha. >>>> >>>> Currently we can force it to zone by default zone shedule in nova.conf. >>>> >>>> Sorry because I am new to Openstack and I am just an operator. I try to >>>> verify some real cases. >>>> >>>> >>>> >>>> Nguyen Huu Khoi >>>> >>>> >>>> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza >>>> wrote: >>>> >>>>> >>>>> >>>>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney a >>>>> ?crit : >>>>> >>>>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote: >>>>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney a >>>>>> ?crit : >>>>>> > >>>>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote: >>>>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner < >>>>>> > > > rafaelweingartner at gmail.com> a ?crit : >>>>>> > > > >>>>>> > > > > Hello Nguy?n H?u Kh?i, >>>>>> > > > > You might want to take a look at: >>>>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We >>>>>> created a >>>>>> > > patch >>>>>> > > > > to avoid migrating VMs to any AZ, once the VM has been >>>>>> bootstrapped in >>>>>> > > an >>>>>> > > > > AZ that has cross zone attache equals to false. >>>>>> > > > > >>>>>> > > > > >>>>>> > > > Well, I'll provide some comments in the change, but I'm afraid >>>>>> we can't >>>>>> > > > just modify the request spec like you would want. >>>>>> > > > >>>>>> > > > Anyway, if you want to discuss about it in the vPTG, just add >>>>>> it in the >>>>>> > > > etherpad and add your IRC nick so we could try to find a time >>>>>> where we >>>>>> > > > could be discussing it : >>>>>> https://etherpad.opendev.org/p/nova-bobcat-ptg >>>>>> > > > Also, this kind of behaviour modification is more a new feature >>>>>> than a >>>>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we >>>>>> could >>>>>> > > better >>>>>> > > > see it. >>>>>> > > >>>>>> > > i tought i left review feedback on that too that the approch was >>>>>> not >>>>>> > > correct. >>>>>> > > i guess i did not in the end. >>>>>> > > >>>>>> > > modifying the request spec as sylvain menthioned is not correct. >>>>>> > > i disucssed this topic on irc a few weeks back with mohomad for >>>>>> vxhost. >>>>>> > > what can be done is as follows. >>>>>> > > >>>>>> > > we can add a current_az field to the Destination object >>>>>> > > >>>>>> > > >>>>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122 >>>>>> > > The conductor can read the instance.AZ and populate it in that >>>>>> new field. >>>>>> > > We can then add a new weigher to prefer hosts that are in the >>>>>> same az. >>>>>> > > >>>>>> > > >>>>>> > >>>>>> > I tend to disagree this approach as people would think that the >>>>>> > Destination.az field would be related to the current AZ for an >>>>>> instance, >>>>>> > while we only look at the original AZ. >>>>>> > That being said, we could have a weigher that would look at whether >>>>>> the >>>>>> > host is in the same AZ than the instance.host. >>>>>> you miss understood what i wrote >>>>>> >>>>>> i suggested addint Destination.current_az to store teh curernt AZ of >>>>>> the instance before scheduling. >>>>>> >>>>>> so my proposal is if RequestSpec.AZ is not set and >>>>>> Destination.current_az is set then the new >>>>>> weigher would prefer hosts that are in the same az as >>>>>> Destination.current_az >>>>>> >>>>>> we coudl also call Destination.current_az Destination.prefered_az >>>>>> >>>>>> >>>>> I meant, I think we don't need to provide a new field, we can already >>>>> know about what host an existing instance uses if we want (using [1]) >>>>> Anyway, let's stop to discuss about it here, we should rather review >>>>> that for a Launchpad blueprint or more a spec. >>>>> >>>>> -Sylvain >>>>> >>>>> [1] >>>>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370 >>>>> >>>>>> > >>>>>> > >>>>>> > This will provide soft AZ affinity for the vm and preserve the fact >>>>>> that if >>>>>> > > a vm is created without sepcifying >>>>>> > > An AZ the expectaiton at the api level woudl be that it can >>>>>> migrate to any >>>>>> > > AZ. >>>>>> > > >>>>>> > > To provide hard AZ affintiy we could also add prefileter that >>>>>> would use >>>>>> > > the same data but instead include it in the >>>>>> > > placement query so that only the current AZ is considered. This >>>>>> would have >>>>>> > > to be disabled by default. >>>>>> > > >>>>>> > > >>>>>> > Sure, we could create a new prefilter so we could then deprecate the >>>>>> > AZFilter if we want. >>>>>> we already have an AZ prefilter and the AZFilter is deprecate for >>>>>> removal >>>>>> i ment to delete it in zed but did not have time to do it in zed of >>>>>> Antielope >>>>>> i deprecated the AZ| filter in >>>>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08 >>>>>> xena when i enabeld the az prefilter by default. >>>>>> >>>>>> >>>>> Ah whoops, indeed I forgot the fact we already have the prefilter, so >>>>> the hard support for AZ is already existing. >>>>> >>>>> >>>>>> i will try an delete teh AZ filter before m1 if others dont. >>>>>> >>>>> >>>>> OK. >>>>> >>>>> >>>>>> > >>>>>> > >>>>>> > > That woudl allow operators to choose the desired behavior. >>>>>> > > curret behavior (disable weigher and dont enabel prefilter) >>>>>> > > new default, prefer current AZ (weigher enabeld prefilter >>>>>> disabled) >>>>>> > > hard affintiy(prefilter enabled.) >>>>>> > > >>>>>> > > there are other ways to approch this but updating the request >>>>>> spec is not >>>>>> > > one of them. >>>>>> > > we have to maintain the fact the enduser did not request an AZ. >>>>>> > > >>>>>> > > >>>>>> > Anyway, if folks want to discuss about AZs, this week is the good >>>>>> time :-) >>>>>> > >>>>>> > >>>>>> > > > >>>>>> > > > -Sylvain >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i < >>>>>> > > nguyenhuukhoinw at gmail.com> >>>>>> > > > > wrote: >>>>>> > > > > >>>>>> > > > > > Hello guys. >>>>>> > > > > > I playing with Nova AZ and Masakari >>>>>> > > > > > >>>>>> > > > > > >>>>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html >>>>>> > > > > > >>>>>> > > > > > Masakari will move server by nova scheduler. >>>>>> > > > > > >>>>>> > > > > > Openstack Docs describe that: >>>>>> > > > > > >>>>>> > > > > > If the server was not created in a specific zone then it is >>>>>> free to >>>>>> > > be >>>>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter >>>>>> > > > > > < >>>>>> > > >>>>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter >>>>>> > >>>>>> > > is >>>>>> > > > > > a no-op. >>>>>> > > > > > >>>>>> > > > > > I see that everyone usually creates instances with "Any >>>>>> Availability >>>>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating >>>>>> > > instances by >>>>>> > > > > > cli. >>>>>> > > > > > >>>>>> > > > > > By this way, when we use Masakari or we miragrated >>>>>> instances( or >>>>>> > > > > > evacuate) so our instance will be moved to other zones. >>>>>> > > > > > >>>>>> > > > > > Can we attach AZ to server create requests API based on Any >>>>>> > > > > > Availability Zone to limit instances moved to other zones? >>>>>> > > > > > >>>>>> > > > > > Thank you. Regards >>>>>> > > > > > >>>>>> > > > > > Nguyen Huu Khoi >>>>>> > > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> > > > > -- >>>>>> > > > > Rafael Weing?rtner >>>>>> > > > > >>>>>> > > >>>>>> > > >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 29 14:49:32 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 29 Mar 2023 16:49:32 +0200 Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks In-Reply-To: References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com> Message-ID: Fix is merged, everyone can recheck (with a written reason ;) ) Le mer. 29 mars 2023 ? 11:49, Sylvain Bauza a ?crit : > Heh, you missed my email ;-) > > > https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html > > No worries tho :) > > Le mar. 28 mars 2023 ? 23:11, melanie witt a ?crit : > >> Hey all, >> >> Sending this to the ML to try to help others who may not have heard >> about it yet. I didn't know about myself until I saw the job fail on a >> random nova patch and I looked around to figure out why. >> >> It looks like openstack-tox-pep8 is failing 100% in nova due to a newer >> version of the mypy library being pulled in since a recent >> upper-constraints bump: >> >> https://review.opendev.org/c/openstack/requirements/+/872065 >> >> And the job isn't going to pass until the following fix merges: >> >> https://review.opendev.org/c/openstack/nova/+/878693 >> >> Finally, to help prevent a breakage for this in the future, a >> cross-nova-pep8 job has been approved: >> >> https://review.opendev.org/c/openstack/requirements/+/878748 >> >> and will be on its way to the gate once the aforementioned fix merges. >> >> I'll post a reply to this email when it's OK to do rechecks again. >> >> Cheers, >> -melwitt >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Wed Mar 29 14:51:10 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Wed, 29 Mar 2023 16:51:10 +0200 Subject: [nova] Hold your rechecks In-Reply-To: References: Message-ID: Le lun. 27 mars 2023 ? 17:28, Sylvain Bauza a ?crit : > Hey, > > Due to the recent merge of > https://review.opendev.org/c/openstack/requirements/+/872065/10/upper-constraints.txt#298 > we now use mypy==1.1.1 which includes a breaking behavioural change against > our code : > > https://07de6a0c9e6ec0c6835f-ccccbfab26b1456f69293167016566bc.ssl.cf2.rackcdn.com/875621/10/gate/openstack-tox-pep8/e50f9f0/job-output.txt > > Thanks to Eric (kudos to him, he was quickier than me), we have a fix > https://review.opendev.org/c/openstack/nova/+/878693 > > Please accordingly hold your rechecks until that fix is merged. > > Aaaaand this is done (after a few fights against CI failures). You can now recheck with a reason like : "recheck mypy upgrade issue is fixed by Ie50c8d364ad9c339355cc138b560ec4df14fe307 " -Sylvain > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Mar 29 15:15:33 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 29 Mar 2023 08:15:33 -0700 Subject: [ptl][tc] OpenStack packages PyPi additional external maintainers audit & cleanup In-Reply-To: <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com> References: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com> <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com> Message-ID: <1872df04490.f676662973065.8836276543901283423@ghanshyammann.com> Hi Everyone, Posting top of the email. I am listing the projects that have not updated the status in etherpad; if you have any progress, please write in etherpad. If not request you to plan the same while in vPTG? - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L43 * adjutant * barbican * cloudkitty * cyborg * designate * ec2-api * freezer * heat * kuryr * mistral * monasca * murano * octavia * OpenStackSDK * oslo * rally * Release Management * requirements * sahara * senlin * skyline * solum * storlets * swift * tacker * Telemetry * trove * vitrage * watcher * winstackers * zaqar * zun -gmann ---- On Wed, 22 Mar 2023 08:45:49 -0700 Ghanshyam Mann wrote --- > ---- On Fri, 20 Jan 2023 15:36:08 -0800 Ghanshyam Mann wrote --- > > Hi PTLs, > > > > As you might know or have seen for your project package on PyPi, OpenStack deliverables on PyPi have > > additional maintainers, For example, https://pypi.org/project/murano/, https://pypi.org/project/glance/ > > > > We should keep only 'openstackci' as a maintainer in PyPi so that releases of OpenStack deliverables > > can be managed in a single place. Otherwise, we might face the two sets of maintainers' places and > > packages might get released in PyPi by additional maintainers without the OpenStack project team > > knowing about it. One such case is in Horizon repo 'xstatic-font-awesome' where a new maintainer is > > added by an existing additional maintainer and this package was released without the Horizon team > > knowing about the changes and release. > > - https://github.com/openstack/xstatic-font-awesome/pull/2 > > > > To avoid the 'xstatic-font-awesome' case for other packages, TC discussed it in their weekly meetings[1] > > and agreed to audit all the OpenStack packages and then clean up the additional maintainers in PyPi > > (keep only 'openstackci' as maintainers). > > > > To help in this task, TC requests project PTL to perform the audit for their project's repo and add comments > > in the below etherpad. > > > > - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup > > Hello Everyone, > > To update, there is an extra step for project PTLs in this task: > > * Step 1.1: Project PTL/team needs to communicate to the additional maintainers about removing themselves > and transferring ownership to 'openstackci' > - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L23 > > Initially, TC thought we could do a cleanup with the help of openstackci admin for all repo. But, to avoid any issue > or misunderstanding/panic among additional maintainers on removal, it is better that projects communicate with > additional maintainers and ask them to remove themself. JayF sent the email format to communicate to additional > maintainers[1]. Please use that and let TC know if any queries/issues you are facing. > > [1] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032780.html > > -gmann > > > > > Thanks to knikolla to automate the listing of the OpenStack packages with additional maintainers in PyPi which > > you can find the result in output.txt at the bottom of this link. I have added the project list of who needs to check > > their repo in etherpad. > > > > - https://gist.github.com/knikolla/7303a65a5ddaa2be553fc6e54619a7a1 > > > > Please complete the audit for your project before March 15 so that TC can discuss the next step in vPTG. > > > > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-01-11-16.00.log.html#l-41 > > > > > > -gmann > > > > > > From melwittt at gmail.com Wed Mar 29 16:02:34 2023 From: melwittt at gmail.com (melanie witt) Date: Wed, 29 Mar 2023 09:02:34 -0700 Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks In-Reply-To: References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com> Message-ID: <3f221938-ef0d-b957-d4a9-be4bac28e402@gmail.com> On 03/29/23 02:49, Sylvain Bauza wrote: > Heh, you missed my email ;-) > > https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html > > No worries tho :) Ugh, sorry. I searched for "[nova]" and skimmed for the words "broken" or "gate" or "CI" and managed to miss it accordingly. Sorry about that. > Le?mar. 28 mars 2023 ??23:11, melanie witt > a ?crit?: > > Hey all, > > Sending this to the ML to try to help others who may not have heard > about it yet. I didn't know about myself until I saw the job fail on a > random nova patch and I looked around to figure out why. > > It looks like openstack-tox-pep8 is failing 100% in nova due to a newer > version of the mypy library being pulled in since a recent > upper-constraints bump: > > https://review.opendev.org/c/openstack/requirements/+/872065 > > > And the job isn't going to pass until the following fix merges: > > https://review.opendev.org/c/openstack/nova/+/878693 > > > Finally, to help prevent a breakage for this in the future, a > cross-nova-pep8 job has been approved: > > https://review.opendev.org/c/openstack/requirements/+/878748 > > > and will be on its way to the gate once the aforementioned fix merges. > > I'll post a reply to this email when it's OK to do rechecks again. > > Cheers, > -melwitt > From ihrachys at redhat.com Wed Mar 29 16:45:26 2023 From: ihrachys at redhat.com (Ihar Hrachyshka) Date: Wed, 29 Mar 2023 12:45:26 -0400 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: References: <3840757.STTH5IQzZg@p1> Message-ID: To close the loop, We had a very productive discussion of the topic during vPTG today. Some of it is captured here: https://etherpad.opendev.org/p/neutron-bobcat-ptg#L207 and below. Here is the brief plus next steps. In regards to api-ref definitions for stateless SG: - it is agreed that it should explain the semantics and not only mechanics of API fields; - it is agreed that it should explain behavior of basic network services; - it is agreed that basic network services that are expected to work by default are things like ARP, DHCP; while metadata service is not; - this will mimic what OVS implementation of stateless SG already does; - it is agreed that these basic services that are expected to work will work transparently, meaning no SG rules will be visible for them; - this will mimic OVS implementation too. Next steps: - update api-ref stateless SG description to capture decisions above; - update my neutron patch series to exclude metadata enablement; - adjust tempest scenarios for stateless SG to not create explicit SG rules for DHCPv6 stateless (there are already patches for that); - clean up Launchpad bugs as per decisions above. I will take care of the above in next days. Thanks everyone, Ihar On Wed, Mar 22, 2023 at 12:55?PM Ihar Hrachyshka wrote: > > On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez > wrote: > > > > Hello: > > > > I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly. > > Thanks for this, I didn't realize that iptables may be considered prior art. > > > > > The discussion here could be the default behaviour for standard services: > > * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that. > > * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance. > > Agreed DHCPv6 rules are closer to "base" and that the argument for RA > / NA flows is stronger because of the parallel to DHCPv4 operation. > > > * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2] > > At this point I am more ambivalent to the decision of whether to > include metadata into the list of "base" services, as long as we > define the list (behavior) in api-ref. But to address the point, since > Slawek leans to creating SG rules in Neutron API to handle ICMP > traffic necessary for RA / NA (which seems to have a merit and > internal logic) anyway, we could as well at this point create another > "default" rule for metadata replies. > > But - I will repeat - as long as a decision on what the list of "base" > services enabled for any SG by default is, I can live with metadata > out of the list. It may not be as convenient to users (which is my > concern), but that's probably a matter of taste in API design. > > BTW Rodolfo, thanks for allocating a time slot for this discussion at > vPTG. I hope we get to the bottom of it then. See you all next Wed > @13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg) > > Ihar > > > > > Regards. > > > > [1]https://review.opendev.org/c/openstack/neutron/+/877049 > > [2]https://review.opendev.org/c/openstack/neutron/+/876659 > > > > On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka wrote: > >> > >> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski wrote: > >> > > >> > Hi, > >> > > >> > > >> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: > >> > > >> > > Hi all, > >> > > >> > > > >> > > >> > > (I've tagged the thread with [ovn] because this question was raised in > >> > > >> > > the context of OVN, but it really is about the intent of neutron > >> > > >> > > stateless SG API.) > >> > > >> > > > >> > > >> > > Neutron API supports 'stateless' field for security groups: > >> > > >> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group > >> > > >> > > > >> > > >> > > The API reference doesn't explain the intent of the API, merely > >> > > >> > > walking through the field mechanics, as in > >> > > >> > > > >> > > >> > > "The stateful security group extension (stateful-security-group) adds > >> > > >> > > the stateful field to security groups, allowing users to configure > >> > > >> > > stateful or stateless security groups for ports. The existing security > >> > > >> > > groups will all be considered as stateful. Update of the stateful > >> > > >> > > attribute is allowed when there is no port associated with the > >> > > >> > > security group." > >> > > >> > > > >> > > >> > > The meaning of the API is left for users to deduce. It's customary > >> > > >> > > understood as something like > >> > > >> > > > >> > > >> > > "allowing to bypass connection tracking in the firewall, potentially > >> > > >> > > providing performance and simplicity benefits" (while imposing > >> > > >> > > additional complexity onto rule definitions - the user now has to > >> > > >> > > explicitly define rules for both directions of a duplex connection.) > >> > > >> > > [This is not an official definition, nor it's quoted from a respected > >> > > >> > > source, please don't criticize it. I don't think this is an important > >> > > >> > > point here.] > >> > > >> > > > >> > > >> > > Either way, the definition doesn't explain what should happen with > >> > > >> > > basic network services that a user of Neutron SG API is used to rely > >> > > >> > > on. Specifically, what happens for a port related to a stateless SG > >> > > >> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 > >> > > >> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 > >> > > >> > > procedure to configure its IPv6 stack. > >> > > >> > > > >> > > >> > > As part of our testing of stateless SG implementation for OVN backend, > >> > > >> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to > >> > > >> > > configure IPv6. > >> > > >> > > > >> > > >> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 > >> > > >> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 > >> > > >> > > > >> > > >> > > We've noticed that adding explicit SG rules to allow 'returning' > >> > > >> > > communication for 169.254.169.254:80 and RA / NA fixes the problem. > >> > > >> > > > >> > > >> > > I figured that these services are "base" / "basic" and should be > >> > > >> > > provided to ports regardless of the stateful-ness of SG. I proposed > >> > > >> > > patches for this here: > >> > > >> > > > >> > > >> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 > >> > > >> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 > >> > > >> > > > >> > > >> > > Discussion in the patch that adjusts the existing stateless SG test > >> > > >> > > scenarios to not create explicit SG rules for metadata and ICMP > >> > > >> > > replies suggests that it's not a given / common understanding that > >> > > >> > > these "base" services should work by default for stateless SGs. > >> > > >> > > > >> > > >> > > See discussion in comments here: > >> > > >> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 > >> > > >> > > > >> > > >> > > While this discussion is happening in the context of OVN, I think it > >> > > >> > > should be resolved in a broader context. Specifically, a decision > >> > > >> > > should be made about what Neutron API "means" by stateless SGs, and > >> > > >> > > how "base" services are supposed to behave. Then backends can act > >> > > >> > > accordingly. > >> > > >> > > > >> > > >> > > There's also an open question of how this should be implemented. > >> > > >> > > Whether Neutron would like to create explicit SG rules visible in API > >> > > >> > > that would allow for the returning traffic and that could be deleted > >> > > >> > > as needed, or whether backends should do it implicitly. We already > >> > > >> > > have "default" egress rules, so there's a precedent here. On the other > >> > > >> > > hand, the egress rules are broad (allowing everything) and there's > >> > > >> > > more rationale to delete them and replace them with tighter filters. > >> > > >> > > In my OVN series, I implement ACLs directly in OVN database, without > >> > > >> > > creating SG rules in Neutron API. > >> > > >> > > > >> > > >> > > So, questions for the community to clarify: > >> > > >> > > - whether Neutron API should define behavior of stateless SGs in general, > >> > > >> > > - if so, whether Neutron API should also define behavior of stateless > >> > > >> > > SGs in terms of "base" services like metadata and DHCP, > >> > > >> > > - if so, whether backends should implement the necessary filters > >> > > >> > > themselves, or Neutron will create default SG rules itself. > >> > > >> > > >> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. > >> > > >> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. > >> > > >> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there? > >> > > >> > >> Thanks for clarifying the rationale on picking SG rules and not > >> per-backend implementation. > >> > >> What would be your answer to the two other questions in the list > >> above, specifically, "whether Neutron API should define behavior of > >> stateless SGs in general" and "whether Neutron API should define > >> behavior of stateless SGs in relation to metadata / RA / NA". Once we > >> have agreement on these points, we can discuss the exact mechanism - > >> whether to implement in backend or in API. But these two questions are > >> first order in my view. > >> > >> (To give an idea of my thinking, I believe API definition should not > >> only define fields and their mechanics but also semantics, so > >> > >> - yes, api-ref should define the meaning ("behavior") of stateless SG > >> in general, and > >> - yes, api-ref should also define the meaning ("behavior") of > >> stateless SG in relation to "standard" services like ipv6 addressing > >> or metadata. > >> > >> As to the last question - whether it's up to ml2 backend to implement > >> the behavior, or up to the core SG database plugin - I don't have a > >> strong opinion. I lean to "backend" solution just because it allows > >> for more granular definition because SG rules may not express some > >> filter rules, e.g. source port for metadata replies (an unfortunate > >> limitation of SG API that we inherited from AWS?). But perhaps others > >> prefer paying the price for having neutron ml2 plugin enforcing the > >> behavior consistently across all backends. > >> > >> > > >> > > > >> > > >> > > I hope I laid the problem out clearly, let me know if anything needs > >> > > >> > > clarification or explanation. > >> > > >> > > >> > Yes :) At least for me. > >> > > >> > > >> > > > >> > > >> > > Yours, > >> > > >> > > Ihar > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > >> > > >> > -- > >> > > >> > Slawek Kaplonski > >> > > >> > Principal Software Engineer > >> > > >> > Red Hat > >> > >> From posta at dnzydn.com Wed Mar 29 17:13:25 2023 From: posta at dnzydn.com (Deniz AYDIN) Date: Wed, 29 Mar 2023 20:13:25 +0300 Subject: [neutron] BGP for self-service network Message-ID: Hi, I am looking for options for removing Layer-2 in the underlay as much as possible. The features explained in the document, BGP floating IPs over L2 segmented network , solve the problem for floating IPS where layer 2 is only needed between servers and rack switches. Is there any specific reason that this feature is limited to floating IPS? As long as we have unique BGP next-hops defined for every DVR, it can also be used for advertising self-service networks /32 routes. Thanks for the help Deniz -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Wed Mar 29 18:05:35 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Wed, 29 Mar 2023 20:05:35 +0200 Subject: (Openstack-Nova) In-Reply-To: References: Message-ID: Hey there, If you're using an OpenStack-Ansible as a deployment tool, you can make an override like that in your user_varaibles.yml and run openstack-ansible os-nova-install.yml --limit nova_compute afterwards: nova_compute_init_overrides: Service: LimitNOFILE: 4096 ??, 28 ???. 2023??. ? 21:59, Michael Knox : > > Hi, > > This will be the OS you have rabbit running on. You will need to increase the ulimit. "ulimit -n" will provide the current limit for the installed OS and configuration. So you will need more than what's there. There could also be other configuration issues, a normal default of 1024 isn't low for most uses, but you will need to consider that as part of the increase. > > Cheers > > > > On Tue, Mar 28, 2023 at 2:16?PM Adivya Singh wrote: >> >> Hi Team >> >> I see these error in my syslog related to my nova compute service getting hung while communicating to rabbit-mq service >> >> "A recoverable connection/channel error occurred, trying to reconnect: [Errno 24] Too many open files" >> >> Is this a OS related error, or some thing i can change to get rid of this error >> >> Regards >> Adivya Singh From juliaashleykreger at gmail.com Wed Mar 29 18:14:02 2023 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 29 Mar 2023 11:14:02 -0700 Subject: [ironic] bug tracking move to launchpad Message-ID: Greetings Ironic! During the PTG yesterday, we discussed the fact we had not moved back to Launchpad for bug tracking. Mainly because we are all very busy. The point was raised, why don't we just re-enable launchpad bug tracking, and move our primary usage back to that. Consensus on the call resulted with this proposal, and the consensus to send this email to the mailing list. As such, if nobody objects, I'm going to go turn the bug tracker for Ironic in launchpad back on next Monday. If you object, scream now and/or write a migration script and/or propose another solution. Thanks! -Julia -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Wed Mar 29 18:14:15 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Wed, 29 Mar 2023 20:14:15 +0200 Subject: (Openstack-Designate) rndc key not getting generated in /etc/designate In-Reply-To: References: Message-ID: Hi there, Looking through the code, I really don't see any obvious issue or bug in there. So based on that it sounds that this error might be raised only if you have defined `designate_rndc_keys` variable somewhere (like user_variables) but did not provided `file` key in it, which made this task fail. As `file`, `name`, `secret` and `algorithm` keys are all required ones in this variable. Would be great if you could double-check definition of the variable in your user_variables. ??, 29 ???. 2023??. ? 07:00, Adivya Singh : > > Hi Team, > > My DNS Server located outside the Open Stack, and i am using below variables in my user_variables.yaml File. > > But When i ' m running os-desigante-install.yml Playbook, rndc key are not generating in /etc/designate Folder > > and the playbook fail at the below Task > > {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'file'\n\nThe error appears to be in '/etc/ansible/roles/os_designate/tasks/designate_post_install.yml': line 89, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create Designate rndc key file\n ^ here\n"} > > - name: Create Designate rndc key file > template: > src: rndc.key.j2 > dest: "{{ item.file }}" > owner: "{{ item.owner | default('root') }}" > group: "{{ item.group | default('root') }}" > mode: "{{ item.mode | default('0600') }}" > with_items: "{{ designate_rndc_keys }}" > when: designate_rndc_keys is defined > > and the post-install.yml File looks like this > > Any idea on this, Where i am missing > > > > > > > > ## rndc keys for authenticating with bind9 > # define this to create as many key files as are required > # designate_rndc_keys > # - name: "rndc-key" > # file: /etc/designate/rndc.key > # algorithm: "hmac-md5" > # secret: "" From jay at gr-oss.io Wed Mar 29 18:39:47 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Wed, 29 Mar 2023 11:39:47 -0700 Subject: [ironic] ARM Support in CI: Call for vendors / contributors / interested parties Message-ID: Hi stackers, Ironic has published an experimental Ironic Python Agent image for ARM64 ( https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/) and discussed promoting this image to supported via CI testing. However, we have a problem: there are no Ironic developers with easy access to ARM hardware at the moment, and no Ironic developers with free time to commit to improving our support of ARM hardware. So we're putting out a call for help: - If you're a hardware vendor and want your ARM hardware supported? Please come talk to the Ironic community about setting up third-party-CI. - Are you an operator or contributor from a company invested in ARM bare metal? Please come join the Ironic community to help us build this support. Thanks, Jay Faulkner Ironic PTL -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.bryant at canonical.com Wed Mar 29 19:13:00 2023 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 29 Mar 2023 15:13:00 -0400 Subject: OpenStack 2023.1 Antelope for Ubuntu 22.04 LTS Message-ID: The Ubuntu OpenStack team at Canonical is pleased to announce the general availability of OpenStack 2023.1 Antelope on Ubuntu 22.04 LTS (Jammy Jellyfish). Details of the Antelope release can be found at: https://www.openstack.org/software/antelope The Ubuntu Cloud Archive for OpenStack 2023.1 Antelope can be enabled on Ubuntu 22.04 by running the following command: sudo add-apt-repository cloud-archive:antelope The Ubuntu Cloud Archive for 2023.1 Antelope includes updates for: aodh, barbican, ceilometer, cinder, designate, designate-dashboard, dpdk (22.11.1), glance, gnocchi, heat, heat-dashboard, horizon, ironic, ironic-ui, keystone, magnum, magnum-ui, manila, manila-ui, masakari, mistral, murano, murano-dashboard, networking-arista, networking-bagpipe, networking-baremetal, networking-bgpvpn, networking-hyperv, networking-l2gw, networking-mlnx, networking-odl, networking-sfc, neutron, neutron-dynamic-routing, neutron-fwaas, neutron-taas, neutron-vpnaas, nova, octavia, octavia-dashboard, openstack-trove, openvswitch (3.1.0), ovn (23.03.0), ovn-octavia-provider, placement, sahara, sahara-dashboard, senlin, swift, trove-dashboard, vitrage, watcher, watcher-dashboard, zaqar, and zaqar-ui. For a full list of packages and versions, please refer to: https://openstack-ci-reports.ubuntu.com/reports/cloud-archive/antelope_versions.html == Reporting bugs == If you have any issues please report bugs using the ?ubuntu-bug? tool to ensure that bugs get logged in the right place in Launchpad: sudo ubuntu-bug nova-conductor Thank you to everyone who contributed to OpenStack 2023.1 Antelope! Corey (on behalf of the Ubuntu OpenStack Engineering team) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jake.yip at ardc.edu.au Wed Mar 29 23:10:27 2023 From: jake.yip at ardc.edu.au (Jake Yip) Date: Thu, 30 Mar 2023 10:10:27 +1100 Subject: [Magnum] vPTG summary In-Reply-To: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au> References: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au> Message-ID: Hi all, We had a good attendance of Magnum developers and operators from different clouds providers - Nectar, StackHPC, Catalyst Cloud NZ, Vexxhost, Cleura. As the Magnum team spans EU and AU/NZ, we have decided to hold the PTG on Wed 0900 UTC so that most of us can make it. One of the main topics we discussed was the progress of ClusterAPI driver in Magnum. This work is ongoing and we hope to have it in this cycle or next. Thanks to the hardworking folks at StackHPC (Matt, johnthetubaguy, Tyler) and Vexxhost (mnaser) driving this initiative. We also discussed the issues with testing in check/gate. Testing for Magnum is quite resource intensive, as it needs to spin up a cluster. This needs more work so we can land patches with more confidence. There will also be more deprecations/removals in this cycle to keep up with Kubernetes. One of the things we agreed on was the removal of PodSecurityPolicy so that we can continue supporting K8S >= v1.25. This would be flagged in an upgrade note containing the upstream instructions[1] on how to migrate to PodSecurity Admission Controller. We briefly touched on the many reports of Magnum not working in (W/X/Y) versions of Kubernetes. It is unfortunate situation; Kubernetes move very quickly and the Kubernetes versions (v1.21 ~ v1.23) we have developed for in Yoga is already EOL. In addition, there are a few incompatible changes that happened from v1.21 to v1.25 that makes backporting newer K8S support to W/X/Y/Z challenging. We will ease this hump as much as possible by (1) careful backports, (2) better testing and (3) better documentation. It is still a big barrier to new users, and we hope to leave this behind with ClusterAPI (my new hope!). I hope I've summarised the vPTG satisfactorily. Feel free to check our etherpad[2] for more details. Last but not least, Matt Pryor and Mohammed Naser will be giving a talk "Magnum Episode IV - A New Hope: OpenStack, Kubernetes and ClusterAPI" at the Vancouver summit. Please give them your support! [1] https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/ [2] https://etherpad.opendev.org/p/march2023-ptg-magnum Regards, Jake Yip On behalf of Magnum Team -- Jake Yip Technical Lead, Nectar Research Cloud From tkajinam at redhat.com Thu Mar 30 02:20:43 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 30 Mar 2023 11:20:43 +0900 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints Message-ID: Hello, I have had some local discussions with gmann, but I'd really like to move this discussion forward to fix the broken stable/xena gate in heat so I will start this thread, hoping the thread can provide more context behind my proposal. Historically stable branches of heat have been frequently affected by any change in requirements of tempest. This is mainly because in our CI we install our own in-tree integration tests[1] into tempest venv where tempest and heat-tempest-plugin are installed. Because in-tree integration tests are tied to that specific stable branch, this has been often causing conflicts in requirements (master constraint vs stable/X constraint). [1] https://github.com/openstack/heat/tree/master/heat_integrationtests In the past we changed our test installation[2] to use stable constraint to avoid this conflicts, but this approach does no longer work since stable/xena because 1. stable/xena u-c no longer includes tempest 2. latest tempest CAN'T be installed with stable/xena u-c because current tempest requires fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c. [2] https://review.opendev.org/c/openstack/heat/+/803890 https://review.opendev.org/c/openstack/heat/+/848215 I've proposed the change to pin tempest[3] in stable/xena u-c so that people can install tempest with stable/xena u-c. [3] https://review.opendev.org/c/openstack/requirements/+/878228 I understand the reason tempest was removed from u-c was that we should use the latest tempest to test recent stable releases.I agree we can keep tempest excluded for stable/yoga and onwards because tempest is installable with their u-c, but stable/xena u-c is no longer compatible with master. Adding pin to xena u-c does not mainly affect the policy to test stable branches with latest tempest because for that we anyway need to use more recent u-c. I'm still trying to find out the workaround within heat but IMO adding tempest pin to stable/xena u-c is harmless but beneficial in case anyone is trying to use tempest with stable/xena u-c. Thank you, Takashi Kajinami -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 30 02:46:51 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 30 Mar 2023 11:46:51 +0900 Subject: [heat][magnum][tacker] Future of SoftwareDeployment support Message-ID: Hello, We discussed this briefly in the past thread where we discussed maintenance of os-*-agent repos, and also talked about this topic during Heat PTG, but I'd like to formalize the discussion to get a clear agreement. Heat has been supporting SoftwareDeployment resources to configure software in instances using some agents such as os-collect-config[1]. [1] https://docs.openstack.org/heat/latest/template_guide/software_deployment.html#software-deployment-resources This feature was initially developed to be used by TripleO (IIUC), but TripleO is retired now and we are losing the first motivation to maintain the feature. # Even TripleO replaced most of its usage of softwaredeployment by config-download lately. Because the heat project team has drunk dramatically recently, we'd like to put more focus on core features. For that aim we are now wondering if we can deprecate and remove this feature, and would like to hear from anyone who has any concerns about this. Quickly looking through the repos, it seems currently Magnum and Tacker are using SoftwareDeployment, and it'd be nice especially if we can understand their current requirements. 1. Magnum It seems SoftwareDeployment is used by k8s_fedora_atomic_v1 driver but I'm not too sure whether this driver is still supported, because Fedora Atomic was EOLed a while ago, right ? 2. Tacker SoftwareDeployment can be found in only test code in the tacker repo. We have some references kept in heat-translator which look related to TOSCA templates. Thank you, Takashi Kajinami -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 30 04:10:20 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 30 Mar 2023 13:10:20 +0900 Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy Message-ID: Hello, Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate jobs have become very flaky. Further investigation revealed that the issue is related to something in libvirt from Ubuntu Jammy and that prevents detaching devices from instances[1]. The same problem appears in different jobs[2] and we workaround the problem by disabling some affected jobs. In heat we also disabled some flaky tests but because of this we no longer run basic scenario tests which deploys instance/volume/network in a single stack, which means we lost the quite basic test coverage. My question is, is there anyone in the Nova team working on "fixing" this problem ? We might be able to implement some workaround (like checking status of the instances before attempting to delete it) but this should be fixed in libvirt side IMO, as this looks like a "regression" in Ubuntu Jammy. Probably we should report a bug against the libvirt package in Ubuntu but I'd like to hear some thoughts from the nova team because they are more directly affected by this problem. I'm now trying to set up a centos stream 9 job in Heat repo to see whether this can be reproduced if we use centos stream 9. I've been running that specific scenario test in centos stream 9 jobs in puppet repos but I've never seen this issue, so I suspect the issue is really specific to libvirt in Jammy. [1] https://bugs.launchpad.net/nova/+bug/1998274 [2] https://bugs.launchpad.net/nova/+bug/1998148 Thank you, Takashi -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 30 06:29:47 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 30 Mar 2023 08:29:47 +0200 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: Message-ID: I know, I'm whining a lot about usage of u-c for such projects, but I'm just gonna say that u-c is also might be used for tempest installation itself. So if you're trying to install specific tempest version from the requirements file with providing u-c as constraints while having tempest in u-c - this will break due to pip being unable to resolve that. And installing tempest without constraints also tends to break. I've used a workaround to filter out u-c to drop tempest from them until xena, so moving this back and force is a bit annoying for the end users. I know nobody agrees with me here, but I do see u-c as an instruction for end users on how to build their venvs (because these constraints are tested!) to install openstack projects (can build analogy to poetry here) and not CI thing only. Eventually, we see more troubles with time not in tempest itself, but in tempest plugins, when a new test being added to the plugin, that requires new API but not verifying API microversion or feature availability. These kind of failures we experience quite regularly, couple time during any given cycle, which made us also pin tempest plugin versions in requirements with for every release. Also I have a feeling that a lot of times we're treating tempest as a CI-only thing, which is also weird and not true for me, since it's valuable tool for operators and being leveraged by rally or refstack to ensure state of production environments. ??, 30 ???. 2023 ?., 04:24 Takashi Kajinami : > Hello, > > > I have had some local discussions with gmann, but I'd really like to move > this discussion forward > to fix the broken stable/xena gate in heat so I will start this thread, > hoping the thread can provide > more context behind my proposal. > > Historically stable branches of heat have been frequently affected by any > change in requirements > of tempest. This is mainly because in our CI we install our own in-tree > integration tests[1] into > tempest venv where tempest and heat-tempest-plugin are installed. Because > in-tree integration tests > are tied to that specific stable branch, this has been often causing > conflicts in requirements > (master constraint vs stable/X constraint). > > [1] > https://github.com/openstack/heat/tree/master/heat_integrationtests > > In the past we changed our test installation[2] to use stable constraint > to avoid this conflicts, > but this approach does no longer work since stable/xena because > > 1. stable/xena u-c no longer includes tempest > > 2. latest tempest CAN'T be installed with stable/xena u-c because current > tempest requires > fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c. > > [2] > https://review.opendev.org/c/openstack/heat/+/803890 > https://review.opendev.org/c/openstack/heat/+/848215 > > I've proposed the change to pin tempest[3] in stable/xena u-c so that > people can install tempest > with stable/xena u-c. > [3] https://review.opendev.org/c/openstack/requirements/+/878228 > > I understand the reason tempest was removed from u-c was that we should > use the latest tempest > to test recent stable releases.I agree we can keep tempest excluded for > stable/yoga and onwards > because tempest is installable with their u-c, but stable/xena u-c is no > longer compatible with master. > Adding pin to xena u-c does not mainly affect the policy to test stable > branches with latest tempest > because for that we anyway need to use more recent u-c. > > I'm still trying to find out the workaround within heat but IMO adding > tempest pin to stable/xena u-c > is harmless but beneficial in case anyone is trying to use tempest with > stable/xena u-c. > > Thank you, > Takashi Kajinami > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 30 06:54:32 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 30 Mar 2023 15:54:32 +0900 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: Message-ID: On Thu, Mar 30, 2023 at 3:35?PM Dmitriy Rabotyagov wrote: > I know, I'm whining a lot about usage of u-c for such projects, but I'm > just gonna say that u-c is also might be used for tempest installation > itself. So if you're trying to install specific tempest version from the > requirements file with providing u-c as constraints while having tempest in > u-c - this will break due to pip being unable to resolve that. > To support installing a specific tempest with older stable u-c we probably can try adding upper version instead of requiring a specific version ( like <= 33.0.0 instead of === 33.0.0 ), though I guess this might not be accepted by pip. > And installing tempest without constraints also tends to break. > I didn't really get this point. Do you mind elaborating on this ? > > I've used a workaround to filter out u-c to drop tempest from them until > xena, so moving this back and force is a bit annoying for the end users. > > I know nobody agrees with me here, but I do see u-c as an instruction for > end users on how to build their venvs (because these constraints are > tested!) to install openstack projects (can build analogy to poetry here) > and not CI thing only. > > Eventually, we see more troubles with time not in tempest itself, but in > tempest plugins, when a new test being added to the plugin, that requires > new API but not verifying API microversion or feature availability. These > kind of failures we experience quite regularly, couple time during any > given cycle, which made us also pin tempest plugin versions in requirements > with for every release. > > Also I have a feeling that a lot of times we're treating tempest as a > CI-only thing, which is also weird and not true for me, since it's valuable > tool for operators and being leveraged by rally or refstack to ensure state > of production environments. > > I tend to agree with these points and these would be the problem caused mainly by the fact tempest is branchless, IMHO. > > ??, 30 ???. 2023 ?., 04:24 Takashi Kajinami : > >> Hello, >> >> >> I have had some local discussions with gmann, but I'd really like to move >> this discussion forward >> to fix the broken stable/xena gate in heat so I will start this thread, >> hoping the thread can provide >> more context behind my proposal. >> >> Historically stable branches of heat have been frequently affected by any >> change in requirements >> of tempest. This is mainly because in our CI we install our own in-tree >> integration tests[1] into >> tempest venv where tempest and heat-tempest-plugin are installed. Because >> in-tree integration tests >> are tied to that specific stable branch, this has been often causing >> conflicts in requirements >> (master constraint vs stable/X constraint). >> >> [1] >> https://github.com/openstack/heat/tree/master/heat_integrationtests >> >> In the past we changed our test installation[2] to use stable constraint >> to avoid this conflicts, >> but this approach does no longer work since stable/xena because >> >> 1. stable/xena u-c no longer includes tempest >> >> 2. latest tempest CAN'T be installed with stable/xena u-c because current >> tempest requires >> fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c. >> >> [2] >> https://review.opendev.org/c/openstack/heat/+/803890 >> https://review.opendev.org/c/openstack/heat/+/848215 >> >> I've proposed the change to pin tempest[3] in stable/xena u-c so that >> people can install tempest >> with stable/xena u-c. >> [3] https://review.opendev.org/c/openstack/requirements/+/878228 >> >> I understand the reason tempest was removed from u-c was that we should >> use the latest tempest >> to test recent stable releases.I agree we can keep tempest excluded for >> stable/yoga and onwards >> because tempest is installable with their u-c, but stable/xena u-c is no >> longer compatible with master. >> Adding pin to xena u-c does not mainly affect the policy to test stable >> branches with latest tempest >> because for that we anyway need to use more recent u-c. >> >> I'm still trying to find out the workaround within heat but IMO adding >> tempest pin to stable/xena u-c >> is harmless but beneficial in case anyone is trying to use tempest with >> stable/xena u-c. >> >> Thank you, >> Takashi Kajinami >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at gmail.com Thu Mar 30 07:06:48 2023 From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov) Date: Thu, 30 Mar 2023 09:06:48 +0200 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: Message-ID: > And installing tempest without constraints also tends to break Basically what I meant is, if user decides not to use u-c for installing tempest due to the conflict, as tempest is part of u-c, then this also tended to fail due to having too fresh libraries, so this older tempest (or even master) is no longer compatible. But yeah, master tempest will be fixed soonish to match these newer dependencies, still you can easily get unlucky. So what I was saying - it's better to use u-c for installing tempest itself. ??, 30 ???. 2023 ?., 08:54 Takashi Kajinami : > > > On Thu, Mar 30, 2023 at 3:35?PM Dmitriy Rabotyagov < > noonedeadpunk at gmail.com> wrote: > >> I know, I'm whining a lot about usage of u-c for such projects, but I'm >> just gonna say that u-c is also might be used for tempest installation >> itself. So if you're trying to install specific tempest version from the >> requirements file with providing u-c as constraints while having tempest in >> u-c - this will break due to pip being unable to resolve that. >> > > To support installing a specific tempest with older stable u-c we probably > can try adding upper version > instead of requiring a specific version ( like <= 33.0.0 instead of === > 33.0.0 ), though I guess this might > not be accepted by pip. > > > >> And installing tempest without constraints also tends to break. >> > > I didn't really get this point. Do you mind elaborating on this ? > > >> >> I've used a workaround to filter out u-c to drop tempest from them until >> xena, so moving this back and force is a bit annoying for the end users. >> >> I know nobody agrees with me here, but I do see u-c as an instruction for >> end users on how to build their venvs (because these constraints are >> tested!) to install openstack projects (can build analogy to poetry here) >> and not CI thing only. >> >> Eventually, we see more troubles with time not in tempest itself, but in >> tempest plugins, when a new test being added to the plugin, that requires >> new API but not verifying API microversion or feature availability. These >> kind of failures we experience quite regularly, couple time during any >> given cycle, which made us also pin tempest plugin versions in requirements >> with for every release. >> >> Also I have a feeling that a lot of times we're treating tempest as a >> CI-only thing, which is also weird and not true for me, since it's valuable >> tool for operators and being leveraged by rally or refstack to ensure state >> of production environments. >> >> I tend to agree with these points and these would be the problem caused > mainly by the fact > tempest is branchless, IMHO. > > >> >> ??, 30 ???. 2023 ?., 04:24 Takashi Kajinami : >> >>> Hello, >>> >>> >>> I have had some local discussions with gmann, but I'd really like to >>> move this discussion forward >>> to fix the broken stable/xena gate in heat so I will start this thread, >>> hoping the thread can provide >>> more context behind my proposal. >>> >>> Historically stable branches of heat have been frequently affected by >>> any change in requirements >>> of tempest. This is mainly because in our CI we install our own in-tree >>> integration tests[1] into >>> tempest venv where tempest and heat-tempest-plugin are installed. >>> Because in-tree integration tests >>> are tied to that specific stable branch, this has been often causing >>> conflicts in requirements >>> (master constraint vs stable/X constraint). >>> >>> [1] >>> https://github.com/openstack/heat/tree/master/heat_integrationtests >>> >>> In the past we changed our test installation[2] to use stable constraint >>> to avoid this conflicts, >>> but this approach does no longer work since stable/xena because >>> >>> 1. stable/xena u-c no longer includes tempest >>> >>> 2. latest tempest CAN'T be installed with stable/xena u-c because >>> current tempest requires >>> fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c. >>> >>> [2] >>> https://review.opendev.org/c/openstack/heat/+/803890 >>> https://review.opendev.org/c/openstack/heat/+/848215 >>> >>> I've proposed the change to pin tempest[3] in stable/xena u-c so that >>> people can install tempest >>> with stable/xena u-c. >>> [3] https://review.opendev.org/c/openstack/requirements/+/878228 >>> >>> I understand the reason tempest was removed from u-c was that we should >>> use the latest tempest >>> to test recent stable releases.I agree we can keep tempest excluded for >>> stable/yoga and onwards >>> because tempest is installable with their u-c, but stable/xena u-c is no >>> longer compatible with master. >>> Adding pin to xena u-c does not mainly affect the policy to test stable >>> branches with latest tempest >>> because for that we anyway need to use more recent u-c. >>> >>> I'm still trying to find out the workaround within heat but IMO adding >>> tempest pin to stable/xena u-c >>> is harmless but beneficial in case anyone is trying to use tempest with >>> stable/xena u-c. >>> >>> Thank you, >>> Takashi Kajinami >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpetrut at cloudbasesolutions.com Thu Mar 30 07:11:38 2023 From: lpetrut at cloudbasesolutions.com (Lucian Petrut) Date: Thu, 30 Mar 2023 07:11:38 +0000 Subject: [ptl] Need PTL volunteer for OpenStack Winstackers In-Reply-To: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com> References: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com> Message-ID: Hi, Thanks for reaching out. As mentioned here [1], Cloudbase Solutions can no longer lead the Winstackers project. Since there weren?t any other interested parties, I think there?s no other option but to retire the project. [1] https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031044.html Regards, Lucian Petrut On 22 Mar 2023, at 19:43, Ghanshyam Mann wrote: Hi Lukas, I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle. There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please check if you or anyone you know would like to lead this project. - https://etherpad.opendev.org/p/2023.2-leaderless Also, if anyone else would like to help leading this project, this is time to let TC knows. -gmann -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Thu Mar 30 08:07:29 2023 From: skaplons at redhat.com (Slawek Kaplonski) Date: Thu, 30 Mar 2023 10:07:29 +0200 Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6 In-Reply-To: References: Message-ID: <5996164.lOV4Wx5bFT@p1> Hi, Dnia ?roda, 29 marca 2023 18:45:26 CEST Ihar Hrachyshka pisze: > To close the loop, > > We had a very productive discussion of the topic during vPTG today. > Some of it is captured here: > https://etherpad.opendev.org/p/neutron-bobcat-ptg#L207 and below. Here > is the brief plus next steps. > > In regards to api-ref definitions for stateless SG: > - it is agreed that it should explain the semantics and not only > mechanics of API fields; > - it is agreed that it should explain behavior of basic network services; > - it is agreed that basic network services that are expected to work > by default are things like ARP, DHCP; while metadata service is not; - > this will mimic what OVS implementation of stateless SG already does; > - it is agreed that these basic services that are expected to work > will work transparently, meaning no SG rules will be visible for them; > - this will mimic OVS implementation too. > > Next steps: > - update api-ref stateless SG description to capture decisions above; > - update my neutron patch series to exclude metadata enablement; > - adjust tempest scenarios for stateless SG to not create explicit SG > rules for DHCPv6 stateless (there are already patches for that); > - clean up Launchpad bugs as per decisions above. > > I will take care of the above in next days. Thx Ihar for summary of the yesterday's discussion and for taking care of it. > > Thanks everyone, > Ihar > > On Wed, Mar 22, 2023 at 12:55?PM Ihar Hrachyshka wrote: > > > > On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez > > wrote: > > > > > > Hello: > > > > > > I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly. > > > > Thanks for this, I didn't realize that iptables may be considered prior art. > > > > > > > > The discussion here could be the default behaviour for standard services: > > > * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that. > > > * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance. > > > > Agreed DHCPv6 rules are closer to "base" and that the argument for RA > > / NA flows is stronger because of the parallel to DHCPv4 operation. > > > > > * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2] > > > > At this point I am more ambivalent to the decision of whether to > > include metadata into the list of "base" services, as long as we > > define the list (behavior) in api-ref. But to address the point, since > > Slawek leans to creating SG rules in Neutron API to handle ICMP > > traffic necessary for RA / NA (which seems to have a merit and > > internal logic) anyway, we could as well at this point create another > > "default" rule for metadata replies. > > > > But - I will repeat - as long as a decision on what the list of "base" > > services enabled for any SG by default is, I can live with metadata > > out of the list. It may not be as convenient to users (which is my > > concern), but that's probably a matter of taste in API design. > > > > BTW Rodolfo, thanks for allocating a time slot for this discussion at > > vPTG. I hope we get to the bottom of it then. See you all next Wed > > @13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg) > > > > Ihar > > > > > > > > Regards. > > > > > > [1]https://review.opendev.org/c/openstack/neutron/+/877049 > > > [2]https://review.opendev.org/c/openstack/neutron/+/876659 > > > > > > On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka wrote: > > >> > > >> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski wrote: > > >> > > > >> > Hi, > > >> > > > >> > > > >> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze: > > >> > > > >> > > Hi all, > > >> > > > >> > > > > >> > > > >> > > (I've tagged the thread with [ovn] because this question was raised in > > >> > > > >> > > the context of OVN, but it really is about the intent of neutron > > >> > > > >> > > stateless SG API.) > > >> > > > >> > > > > >> > > > >> > > Neutron API supports 'stateless' field for security groups: > > >> > > > >> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group > > >> > > > >> > > > > >> > > > >> > > The API reference doesn't explain the intent of the API, merely > > >> > > > >> > > walking through the field mechanics, as in > > >> > > > >> > > > > >> > > > >> > > "The stateful security group extension (stateful-security-group) adds > > >> > > > >> > > the stateful field to security groups, allowing users to configure > > >> > > > >> > > stateful or stateless security groups for ports. The existing security > > >> > > > >> > > groups will all be considered as stateful. Update of the stateful > > >> > > > >> > > attribute is allowed when there is no port associated with the > > >> > > > >> > > security group." > > >> > > > >> > > > > >> > > > >> > > The meaning of the API is left for users to deduce. It's customary > > >> > > > >> > > understood as something like > > >> > > > >> > > > > >> > > > >> > > "allowing to bypass connection tracking in the firewall, potentially > > >> > > > >> > > providing performance and simplicity benefits" (while imposing > > >> > > > >> > > additional complexity onto rule definitions - the user now has to > > >> > > > >> > > explicitly define rules for both directions of a duplex connection.) > > >> > > > >> > > [This is not an official definition, nor it's quoted from a respected > > >> > > > >> > > source, please don't criticize it. I don't think this is an important > > >> > > > >> > > point here.] > > >> > > > >> > > > > >> > > > >> > > Either way, the definition doesn't explain what should happen with > > >> > > > >> > > basic network services that a user of Neutron SG API is used to rely > > >> > > > >> > > on. Specifically, what happens for a port related to a stateless SG > > >> > > > >> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6 > > >> > > > >> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6 > > >> > > > >> > > procedure to configure its IPv6 stack. > > >> > > > >> > > > > >> > > > >> > > As part of our testing of stateless SG implementation for OVN backend, > > >> > > > >> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to > > >> > > > >> > > configure IPv6. > > >> > > > >> > > > > >> > > > >> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053 > > >> > > > >> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949 > > >> > > > >> > > > > >> > > > >> > > We've noticed that adding explicit SG rules to allow 'returning' > > >> > > > >> > > communication for 169.254.169.254:80 and RA / NA fixes the problem. > > >> > > > >> > > > > >> > > > >> > > I figured that these services are "base" / "basic" and should be > > >> > > > >> > > provided to ports regardless of the stateful-ness of SG. I proposed > > >> > > > >> > > patches for this here: > > >> > > > >> > > > > >> > > > >> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053 > > >> > > > >> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049 > > >> > > > >> > > > > >> > > > >> > > Discussion in the patch that adjusts the existing stateless SG test > > >> > > > >> > > scenarios to not create explicit SG rules for metadata and ICMP > > >> > > > >> > > replies suggests that it's not a given / common understanding that > > >> > > > >> > > these "base" services should work by default for stateless SGs. > > >> > > > >> > > > > >> > > > >> > > See discussion in comments here: > > >> > > > >> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692 > > >> > > > >> > > > > >> > > > >> > > While this discussion is happening in the context of OVN, I think it > > >> > > > >> > > should be resolved in a broader context. Specifically, a decision > > >> > > > >> > > should be made about what Neutron API "means" by stateless SGs, and > > >> > > > >> > > how "base" services are supposed to behave. Then backends can act > > >> > > > >> > > accordingly. > > >> > > > >> > > > > >> > > > >> > > There's also an open question of how this should be implemented. > > >> > > > >> > > Whether Neutron would like to create explicit SG rules visible in API > > >> > > > >> > > that would allow for the returning traffic and that could be deleted > > >> > > > >> > > as needed, or whether backends should do it implicitly. We already > > >> > > > >> > > have "default" egress rules, so there's a precedent here. On the other > > >> > > > >> > > hand, the egress rules are broad (allowing everything) and there's > > >> > > > >> > > more rationale to delete them and replace them with tighter filters. > > >> > > > >> > > In my OVN series, I implement ACLs directly in OVN database, without > > >> > > > >> > > creating SG rules in Neutron API. > > >> > > > >> > > > > >> > > > >> > > So, questions for the community to clarify: > > >> > > > >> > > - whether Neutron API should define behavior of stateless SGs in general, > > >> > > > >> > > - if so, whether Neutron API should also define behavior of stateless > > >> > > > >> > > SGs in terms of "base" services like metadata and DHCP, > > >> > > > >> > > - if so, whether backends should implement the necessary filters > > >> > > > >> > > themselves, or Neutron will create default SG rules itself. > > >> > > > >> > > > >> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. > > >> > > > >> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. > > >> > > > >> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there? > > >> > > > >> > > >> Thanks for clarifying the rationale on picking SG rules and not > > >> per-backend implementation. > > >> > > >> What would be your answer to the two other questions in the list > > >> above, specifically, "whether Neutron API should define behavior of > > >> stateless SGs in general" and "whether Neutron API should define > > >> behavior of stateless SGs in relation to metadata / RA / NA". Once we > > >> have agreement on these points, we can discuss the exact mechanism - > > >> whether to implement in backend or in API. But these two questions are > > >> first order in my view. > > >> > > >> (To give an idea of my thinking, I believe API definition should not > > >> only define fields and their mechanics but also semantics, so > > >> > > >> - yes, api-ref should define the meaning ("behavior") of stateless SG > > >> in general, and > > >> - yes, api-ref should also define the meaning ("behavior") of > > >> stateless SG in relation to "standard" services like ipv6 addressing > > >> or metadata. > > >> > > >> As to the last question - whether it's up to ml2 backend to implement > > >> the behavior, or up to the core SG database plugin - I don't have a > > >> strong opinion. I lean to "backend" solution just because it allows > > >> for more granular definition because SG rules may not express some > > >> filter rules, e.g. source port for metadata replies (an unfortunate > > >> limitation of SG API that we inherited from AWS?). But perhaps others > > >> prefer paying the price for having neutron ml2 plugin enforcing the > > >> behavior consistently across all backends. > > >> > > >> > > > >> > > > > >> > > > >> > > I hope I laid the problem out clearly, let me know if anything needs > > >> > > > >> > > clarification or explanation. > > >> > > > >> > > > >> > Yes :) At least for me. > > >> > > > >> > > > >> > > > > >> > > > >> > > Yours, > > >> > > > >> > > Ihar > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > > > >> > Slawek Kaplonski > > >> > > > >> > Principal Software Engineer > > >> > > > >> > Red Hat > > >> > > >> > > -- Slawek Kaplonski Principal Software Engineer Red Hat -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part. URL: From suzhengwei at inspur.com Thu Mar 30 08:53:47 2023 From: suzhengwei at inspur.com (=?utf-8?B?U2FtIFN1ICjoi4/mraPkvJ8p?=) Date: Thu, 30 Mar 2023 08:53:47 +0000 Subject: =?utf-8?B?562U5aSNOiBbb3Nsb11baGVhdF1bbWFzYWthcmldW3Nlbmxpbl1bdmVudXNd?= =?utf-8?B?W2FsbF0gb3Nsby5kYiAxMy4wLjAgd2lsbCByZW1vdmUgc3FsYWxjaGVteS1t?= =?utf-8?Q?igrate_support?= In-Reply-To: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com> Message-ID: <23e450dc390b452c8b8129774b94d90e@inspur.com> Hi, Stephen, I have tried to remove the dependency on sqlalchemy-migrate from Masakari. But obviously it is not easy to me. Would you please to take this work? Any help would be very appreciated. -----????----- ???: Stephen Finucane [mailto:stephenfin at redhat.com] ????: 2023?3?23? 0:38 ???: openstack-discuss at lists.openstack.org ??: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will remove sqlalchemy-migrate support tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to start their switch to alembic immediately. Projects with "legacy" sqlalchemy-migrated based migrations need to drop them. A quick heads up that oslo.db 13.0.0 will be release in the next month or so and will remove sqlalchemy-migrate support and formally add support for sqlalchemy 2.x. The removal of sqlalchemy-migrate support should only affect projects using oslo.db's sqlalchemy-migrate wrappers, as opposed to using sqlalchemy-migrate directly. For any projects that rely on this functionality, a short-term fix is to vendor the removed code [1] in your project. However, I must emphasise that we're not removing sqlalchemy-migrate integration for the fun of it: it's not compatible with sqlalchemy 2.x and is no longer maintained. If your project uses sqlalchemy-migrate and you haven't migrated to alembic yet, you need to start doing so immediately. If you have migrated to alembic but still have sqlalchemy- migrate "legacy" migrations in-tree, you need to look at dropping these asap. Anything less will result in broken master when we bump upper-constraints to allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that appear to be using the removed modules. For more advice on migrating to sqlalchemy 2.x and alembic, please look at my previous post on the matter [2]. Cheers, Stephen [1] https://review.opendev.org/c/openstack/oslo.db/+/853025 [2] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3606 bytes Desc: not available URL: From sbauza at redhat.com Thu Mar 30 09:27:45 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 30 Mar 2023 11:27:45 +0200 Subject: [nova][ptg] Today's agenda (Thursday) Message-ID: Heya again, Yesterday was a very productive day. Thanks folks. Today, we'll have mostly cross-project discussions but we'll also try to discuss about 3 topics : - 13:00 - 14:30 UTC : Nova-Neutron cross-project sessions* in the neutron room (tbd)* - 14:30 - 14:45 UTC : Transition Xena to EM, any concerns ? - 14:45 - 15:00 UTC : break - 15:00 - 15:30 UTC : Glance/Cinder/Nova cross-project session about secure glance Direct URLs *in the glance room (newton)* - 15:30 - 16:30 UTC : Cinder/Nova cross-project session* in the nova room (diablo)* - 16:30 - 16:50 UTC : Discuss the next steps with compute hostname robustification - 16:50 - 16:00 UTC : (tentatively) Instance.name is not persisted and just sub'd by every service calling the object Thanks and enjoy this day. -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From swogatpradhan22 at gmail.com Thu Mar 30 09:35:34 2023 From: swogatpradhan22 at gmail.com (Swogat Pradhan) Date: Thu, 30 Mar 2023 15:05:34 +0530 Subject: Nova undefine secret | openstack | wallaby In-Reply-To: References: Message-ID: It is actually not that simple, as everything is containerised. To get past this issue i deleted two files by the name of on the braemetal nodes from the directory /etc/libvirt/secrets/ This issue is now resolved. On Tue, Mar 28, 2023 at 5:00?PM Sean Mooney wrote: > On Tue, 2023-03-28 at 06:24 +0530, Swogat Pradhan wrote: > > Update podman logs: > > [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864 > > ------------------------------------------------ > > Initializing virsh secrets for: dcn01:openstack > > -------- > > Initializing the virsh secret for 'dcn01' cluster > > (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client > > The /etc/nova/secret.xml file already exists > > error: Failed to set attributes from /etc/nova/secret.xml > > error: internal error: a secret with UUID > > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > > client.openstack secret > > you jsut do "virsh secret-undefine " > > > > > > > On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan < > swogatpradhan22 at gmail.com> > > wrote: > > > > > Hi, > > > For some reason, i had to redeploy ceph for my hci nodes and then found > > > that the deployment command is giving out the following error: > > > 2023-03-28 01:49:46.709605 | | > > > WARNING | ERROR: Can't run container nova_libvirt_init_secret > > > stderr: error: Failed to set attributes from /etc/nova/secret.xml > > > error: internal error: a secret with UUID > > > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with > > > client.openstack secret > > > 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 | > > > FATAL | Create containers managed by Podman for > > > /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 | > > > error={"changed": false, "msg": "Failed containers: > > > nova_libvirt_init_secret"} > > > > > > Can you please tell me how I can undefine the existing secret? > > > > > > With regards, > > > Swogat Pradhan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Mar 30 09:44:02 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 30 Mar 2023 11:44:02 +0200 Subject: [nova][ptg] Today's agenda (Thursday) In-Reply-To: References: Message-ID: Just a short modification : we will use the Cinder room for both the Glance/Nova and Cinder/Nova discussions, starting at 1500UTC. Etherpad is accordingly modified https://etherpad.opendev.org/p/nova-bobcat-ptg#L55 Le jeu. 30 mars 2023 ? 11:27, Sylvain Bauza a ?crit : > Heya again, > > Yesterday was a very productive day. Thanks folks. > Today, we'll have mostly cross-project discussions but we'll also try to > discuss about 3 topics : > > > - 13:00 - 14:30 UTC : Nova-Neutron cross-project sessions* in the > neutron room (tbd)* > > > - 14:30 - 14:45 UTC : Transition Xena to EM, any concerns ? > > > - 14:45 - 15:00 UTC : break > > > - 15:00 - 15:30 UTC : Glance/Cinder/Nova cross-project session about > secure glance Direct URLs *in the glance room (newton)* > > > - 15:30 - 16:30 UTC : Cinder/Nova cross-project session* in the nova > room (diablo)* > > > - 16:30 - 16:50 UTC : Discuss the next steps with compute hostname > robustification > - 16:50 - 16:00 UTC : (tentatively) Instance.name is not persisted and > just sub'd by every service calling the object > > Thanks and enjoy this day. > > -Sylvain > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbauza at redhat.com Thu Mar 30 10:10:16 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Thu, 30 Mar 2023 12:10:16 +0200 Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy In-Reply-To: References: Message-ID: Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami a ?crit : > Hello, > > > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate > jobs have > become very flaky. Further investigation revealed that the issue is > related to something > in libvirt from Ubuntu Jammy and that prevents detaching devices from > instances[1]. > > The same problem appears in different jobs[2] and we workaround the > problem by disabling > some affected jobs. In heat we also disabled some flaky tests but because > of this we no longer > run basic scenario tests which deploys instance/volume/network in a single > stack, which means > we lost the quite basic test coverage. > > My question is, is there anyone in the Nova team working on "fixing" this > problem ? > We might be able to implement some workaround (like checking status of the > instances before > attempting to delete it) but this should be fixed in libvirt side IMO, as > this looks like a "regression" > in Ubuntu Jammy. > Probably we should report a bug against the libvirt package in Ubuntu but > I'd like to hear some > thoughts from the nova team because they are more directly affected by > this problem. > > FWIW, we discussed about it yesterday on our vPTG : https://etherpad.opendev.org/p/nova-bobcat-ptg#L289 Most of the problems come from the volume detach thing. We also merged some Tempest changes for not trying to cleanup some volumes if the test was OK (thanks Dan for this). We also added more verifications to ask SSH to wait for a bit of time before calling the instance. Eventually, as you see in the etherpad, we didn't found any solutions but we'll try to add some canary job for testing multiple times volume attachs/detachs. We'll also continue to discuss on the CI failures during every Nova weekly meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and others. I leave other SMEs to reply on your other points, like for c9s. > I'm now trying to set up a centos stream 9 job in Heat repo to see whether > this can be reproduced > if we use centos stream 9. I've been running that specific scenario test > in centos stream 9 jobs > in puppet repos but I've never seen this issue, so I suspect the issue is > really specific to libvirt > in Jammy. > Well, maybe I'm wrong, but no, we also have a centos9stream issue for volume detachs : https://bugs.launchpad.net/nova/+bug/1960346 > [1] https://bugs.launchpad.net/nova/+bug/1998274 > [2] https://bugs.launchpad.net/nova/+bug/1998148 > > Thank you, > Takashi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skidoo at tlen.pl Thu Mar 30 10:10:23 2023 From: skidoo at tlen.pl (Luk) Date: Thu, 30 Mar 2023 12:10:23 +0200 Subject: Migration from linuxbridge to ovs Message-ID: <1253710667.20230330121023@tlen.pl> Hello, Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ? We have now linuxbridge with l3-ha, and we noticed that for example when doing live migration of VM from linuxbridge baked compute to openvswitch compute is created bridge... inside openvswitch, instead adding qvo device to br-int: Bridge brq91dc40ac-ea datapath_type: system Port qvo84e2bd98-e9 Interface qvo84e2bd98-e9 Port brq91dc40ac-ea Interface brq91dc40ac-ea type: internal After removing the brq91dc40ac-ea from ovs, and hard reboot, the qvo interface is added properly to br-int: Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port qvo84e2bd98-e9 tag: 1 Interface qvo84e2bd98-e9 Also, before hard reboot, there is no flow for br-int or any other openvswitch bridge regarding this VM/ip. Does anyone have same problems ? Have tried to migrate from lb to ovs ? Openstack version: ussuri OS: ubuntu 20 Regards Lukasz From ralonsoh at redhat.com Thu Mar 30 10:27:39 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 30 Mar 2023 12:27:39 +0200 Subject: Migration from linuxbridge to ovs In-Reply-To: <1253710667.20230330121023@tlen.pl> References: <1253710667.20230330121023@tlen.pl> Message-ID: Hi Lukasz: This is happening because you are using the "iptables_hybrid" firewall driver in the OVS agent. That creates a namespace where a set of iptables is defined (firewall rules) and a linux bridge, that is connected to OVS using a veth pair [1]. If you need the native plug implementation, then use the native firewall (or don't use any). That will create a TAP port directly connected to the integration bridge. Regards. [1]https://www.rdoproject.org/networking/networking-in-too-much-detail/ On Thu, Mar 30, 2023 at 12:11?PM Luk wrote: > Hello, > > Can You share some thoughts/ideas or some clues regarding migration from > linux bridge to ovs ? Does this migration is posible without interrupting > traffic from VMs ? > > We have now linuxbridge with l3-ha, and we noticed that for example when > doing live migration of VM from linuxbridge baked compute to openvswitch > compute is created > bridge... inside openvswitch, instead adding qvo device to br-int: > > Bridge brq91dc40ac-ea > datapath_type: system > Port qvo84e2bd98-e9 > Interface qvo84e2bd98-e9 > Port brq91dc40ac-ea > Interface brq91dc40ac-ea > type: internal > > After removing the brq91dc40ac-ea from ovs, and hard reboot, the qvo > interface is added properly to br-int: > > Bridge br-int > Controller "tcp:127.0.0.1:6633" > is_connected: true > fail_mode: secure > datapath_type: system > Port qvo84e2bd98-e9 > tag: 1 > Interface qvo84e2bd98-e9 > > Also, before hard reboot, there is no flow for br-int or any other > openvswitch bridge regarding this VM/ip. > > Does anyone have same problems ? Have tried to migrate from lb to ovs ? > > Openstack version: ussuri > OS: ubuntu 20 > > Regards > Lukasz > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Thu Mar 30 10:54:03 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Thu, 30 Mar 2023 19:54:03 +0900 Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy In-Reply-To: References: Message-ID: Thank you, Sylvain, for all these inputs ! On Thu, Mar 30, 2023 at 7:10?PM Sylvain Bauza wrote: > > > Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami a > ?crit : > >> Hello, >> >> >> Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate >> jobs have >> become very flaky. Further investigation revealed that the issue is >> related to something >> in libvirt from Ubuntu Jammy and that prevents detaching devices from >> instances[1]. >> >> The same problem appears in different jobs[2] and we workaround the >> problem by disabling >> some affected jobs. In heat we also disabled some flaky tests but because >> of this we no longer >> run basic scenario tests which deploys instance/volume/network in a >> single stack, which means >> we lost the quite basic test coverage. >> >> My question is, is there anyone in the Nova team working on "fixing" this >> problem ? >> We might be able to implement some workaround (like checking status of >> the instances before >> attempting to delete it) but this should be fixed in libvirt side IMO, as >> this looks like a "regression" >> in Ubuntu Jammy. >> Probably we should report a bug against the libvirt package in Ubuntu but >> I'd like to hear some >> thoughts from the nova team because they are more directly affected by >> this problem. >> >> > > FWIW, we discussed about it yesterday on our vPTG : > https://etherpad.opendev.org/p/nova-bobcat-ptg#L289 > > Most of the problems come from the volume detach thing. We also merged > some Tempest changes for not trying to cleanup some volumes if the test was > OK (thanks Dan for this). We also added more verifications to ask SSH to > wait for a bit of time before calling the instance. > Eventually, as you see in the etherpad, we didn't found any solutions but > we'll try to add some canary job for testing multiple times volume > attachs/detachs. > > We'll also continue to discuss on the CI failures during every Nova weekly > meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a > cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and > others. > I leave other SMEs to reply on your other points, like for c9s. > It's good to hear that the issue is still getting attention. I'll catch up the discussion by reading the etherpad and will try to attend follow-up discussions if possible, especially if I can attend Vancouver vPTG. I know some changes have been proposed to check ssh-ability to workaround the problem (though the comment in the vPTG session indicates that does not fully solve the problem) but it's still annoying because we don't really block resource deletions based on instance status (especially its internal status) so we eventually need some solutions here to avoid this problem, IMHO. > >> I'm now trying to set up a centos stream 9 job in Heat repo to see >> whether this can be reproduced >> if we use centos stream 9. I've been running that specific scenario test >> in centos stream 9 jobs >> in puppet repos but I've never seen this issue, so I suspect the issue is >> really specific to libvirt >> in Jammy. >> > > > Well, maybe I'm wrong, but no, we also have a centos9stream issue for > volume detachs : > https://bugs.launchpad.net/nova/+bug/1960346 > > I just managed to launch a c9s job in heat but it seems the issue is reproducible in c9s as well[1]. I'll rerun the job a few more times to see how frequent the issue appears in c9s compared to ubuntu. We do not run many tests in puppet jobs so that might be the reason I've never hit it in puppet jobs. [1] https://review.opendev.org/c/openstack/heat/+/879014 > > >> [1] https://bugs.launchpad.net/nova/+bug/1998274 >> [2] https://bugs.launchpad.net/nova/+bug/1998148 >> >> Thank you, >> Takashi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Mar 30 11:16:09 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 30 Mar 2023 12:16:09 +0100 Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy In-Reply-To: References: Message-ID: <8e7154b88655b49c6fb2af3053fd9f7307d246cc.camel@redhat.com> On Thu, 2023-03-30 at 12:10 +0200, Sylvain Bauza wrote: > Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami a > ?crit : > > > Hello, > > > > > > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate > > jobs have > > become very flaky. Further investigation revealed that the issue is > > related to something > > in libvirt from Ubuntu Jammy and that prevents detaching devices from > > instances[1]. for what its worth this is not a probelm that is new in jammy it also affect the libvirt/qemu verion in focal and i centos 9 stream. this detach issue was intoduced in qemu as a sideeffect of fixign a security issue. we mostly mitigated the impact on Focal with some tempest changes but not entirly > > > > The same problem appears in different jobs[2] and we workaround the > > problem by disabling > > some affected jobs. In heat we also disabled some flaky tests but because > > of this we no longer > > run basic scenario tests which deploys instance/volume/network in a single > > stack, which means > > we lost the quite basic test coverage. > > > > My question is, is there anyone in the Nova team working on "fixing" this > > problem ? yes and no we cannot fix this in nova as it not a nova issue its a issue with qemu/libvirt and possible cirros. one possible "fix" is to stop using cirros so i did a few things last night first i tried using the ubuntu-minimal-cloud-image this is strip down image that is smaller and uses less memory while it could boot with the normal cirros flavor with 128mb of ram it OOMd cloud-init fortunetly it was after ssh was set up so i could log in but its too close to the memory limit to use. second attempt was to revive my alpine disk image builder serise https://review.opendev.org/c/openstack/diskimage-builder/+/755410 that now works to generate really light weight image (its using about 30mb of ram while idel) i am going to try creating a job that will use that instead of cirros for now im just goign to use a pre playbook to build the image in the job and make destack use that instead. > > We might be able to implement some workaround (like checking status of the > > instances before > > attempting to delete it) but this should be fixed in libvirt side IMO, as > > this looks like a "regression" > > in Ubuntu Jammy. This is not new in Jammy and it should affect RHEL9 i am very very surpsied this is not causeing us a lot of internal pain for our downstream ci as it was breaking centos 9 before it started affecting ubuntu. we have seen downstream detach issues but the sshablae changes in tempest mostly helped so this is not just a ubuntu issue its affecting all distros includeing rhel. this is the upstream libvirt bug for the current probelm https://gitlab.com/libvirt/libvirt/-/issues/309 https://bugzilla.redhat.com/show_bug.cgi?id=2087047 is the downstream tracker for the libvirt team to actully fix this i have left a comment there to see if i can move that along. > > Probably we should report a bug against the libvirt package in Ubuntu but > > I'd like to hear some > > thoughts from the nova team because they are more directly affected by > > this problem. > > > > > > FWIW, we discussed about it yesterday on our vPTG : > https://etherpad.opendev.org/p/nova-bobcat-ptg#L289 > > Most of the problems come from the volume detach thing. We also merged some > Tempest changes for not trying to cleanup some volumes if the test was OK > (thanks Dan for this). We also added more verifications to ask SSH to wait > for a bit of time before calling the instance. > Eventually, as you see in the etherpad, we didn't found any solutions but > we'll try to add some canary job for testing multiple times volume > attachs/detachs. > > We'll also continue to discuss on the CI failures during every Nova weekly > meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a > cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and > others. > I leave other SMEs to reply on your other points, like for c9s. c9s hit this before ubuntu did it will not help > > > > I'm now trying to set up a centos stream 9 job in Heat repo to see whether > > this can be reproduced > > if we use centos stream 9. I've been running that specific scenario test > > in centos stream 9 jobs > > in puppet repos but I've never seen this issue, so I suspect the issue is > > really specific to libvirt > > in Jammy. > > > > > Well, maybe I'm wrong, but no, we also have a centos9stream issue for > volume detachs : > https://bugs.launchpad.net/nova/+bug/1960346 > > > > > [1] https://bugs.launchpad.net/nova/+bug/1998274 > > [2] https://bugs.launchpad.net/nova/+bug/1998148 > > > > Thank you, > > Takashi > > From smooney at redhat.com Thu Mar 30 11:18:31 2023 From: smooney at redhat.com (Sean Mooney) Date: Thu, 30 Mar 2023 12:18:31 +0100 Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy In-Reply-To: References: Message-ID: <25dab368b2c68cc18ae83a52927c94561f46a77d.camel@redhat.com> On Thu, 2023-03-30 at 19:54 +0900, Takashi Kajinami wrote: > Thank you, Sylvain, for all these inputs ! > > On Thu, Mar 30, 2023 at 7:10?PM Sylvain Bauza wrote: > > > > > > > Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami a > > ?crit : > > > > > Hello, > > > > > > > > > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate > > > jobs have > > > become very flaky. Further investigation revealed that the issue is > > > related to something > > > in libvirt from Ubuntu Jammy and that prevents detaching devices from > > > instances[1]. > > > > > > The same problem appears in different jobs[2] and we workaround the > > > problem by disabling > > > some affected jobs. In heat we also disabled some flaky tests but because > > > of this we no longer > > > run basic scenario tests which deploys instance/volume/network in a > > > single stack, which means > > > we lost the quite basic test coverage. > > > > > > My question is, is there anyone in the Nova team working on "fixing" this > > > problem ? > > > We might be able to implement some workaround (like checking status of > > > the instances before > > > attempting to delete it) but this should be fixed in libvirt side IMO, as > > > this looks like a "regression" > > > in Ubuntu Jammy. > > > Probably we should report a bug against the libvirt package in Ubuntu but > > > I'd like to hear some > > > thoughts from the nova team because they are more directly affected by > > > this problem. > > > > > > > > > > FWIW, we discussed about it yesterday on our vPTG : > > https://etherpad.opendev.org/p/nova-bobcat-ptg#L289 > > > > Most of the problems come from the volume detach thing. We also merged > > some Tempest changes for not trying to cleanup some volumes if the test was > > OK (thanks Dan for this). We also added more verifications to ask SSH to > > wait for a bit of time before calling the instance. > > Eventually, as you see in the etherpad, we didn't found any solutions but > > we'll try to add some canary job for testing multiple times volume > > attachs/detachs. > > > > > We'll also continue to discuss on the CI failures during every Nova weekly > > meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a > > cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and > > others. > > I leave other SMEs to reply on your other points, like for c9s. > > > > It's good to hear that the issue is still getting attention. I'll catch up > the discussion by reading the etherpad > and will try to attend follow-up discussions if possible, especially if I > can attend Vancouver vPTG. > > I know some changes have been proposed to check ssh-ability to workaround > the problem (though > the comment in the vPTG session indicates that does not fully solve the > problem) but it's still annoying > because we don't really block resource deletions based on instance status > (especially its internal status) > so we eventually need some solutions here to avoid this problem, IMHO. > > > > > > > I'm now trying to set up a centos stream 9 job in Heat repo to see > > > whether this can be reproduced > > > if we use centos stream 9. I've been running that specific scenario test > > > in centos stream 9 jobs > > > in puppet repos but I've never seen this issue, so I suspect the issue is > > > really specific to libvirt > > > in Jammy. > > > > > > > > > Well, maybe I'm wrong, but no, we also have a centos9stream issue for > > volume detachs : > > https://bugs.launchpad.net/nova/+bug/1960346 > > > > > I just managed to launch a c9s job in heat but it seems the issue is > reproducible in c9s as well[1]. ya i replied in paralle in my other reply i noted that we saw this issue first in c9s then in ubuntu and we also see this in our internal downstram ci. changing the distro we use for the devstack jobs wont help unless we downgrade libvirt and qemu to before the orginal change in lbvirt was done. which would break other things. > I'll rerun the job a few more times to see how frequent the issue appears > in c9s compared to > ubuntu. > We do not run many tests in puppet jobs so that might be the reason I've > never hit it in > puppet jobs. > > [1] https://review.opendev.org/c/openstack/heat/+/879014 > > > > > > > > > [1] https://bugs.launchpad.net/nova/+bug/1998274 > > > [2] https://bugs.launchpad.net/nova/+bug/1998148 > > > > > > Thank you, > > > Takashi > > > > > From christian.rohmann at inovex.de Thu Mar 30 12:24:56 2023 From: christian.rohmann at inovex.de (Christian Rohmann) Date: Thu, 30 Mar 2023 14:24:56 +0200 Subject: Migration from linuxbridge to ovs In-Reply-To: <1253710667.20230330121023@tlen.pl> References: <1253710667.20230330121023@tlen.pl> Message-ID: <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de> On 30/03/2023 12:10, Luk wrote: > Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ? I asked a similar questions back in August - https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030070.html, maybe there are some insights there. We did not replace the SDN in place, but as actively looking into setting up a new cloud. Not that we do not believe in the idea of being able to replace the SDN, but we intend to change much much more and migrating through many big changes is too inefficient compared to replacing the cloud with a new one. Regards Christian From ralonsoh at redhat.com Thu Mar 30 12:44:35 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Thu, 30 Mar 2023 14:44:35 +0200 Subject: [neutron][ptg] Today's agenda Message-ID: Hello Neutrinos: Same as yesterday, we have a packed agenda. Today we'll have the Nova-Neutron meetings, starting at 13UTC. Quick summary: * delete_on_termination for Neutron ports * Blueprint: "Add support for Napatech LinkVirt SmartNICs" * https://bugs.launchpad.net/neutron/+bug/1986003 * ovn-bgp-agent roadmap * neutron-dynamic-routing: Make static scheduler finally the default? See you in a few minutes. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Thu Mar 30 14:17:39 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 30 Mar 2023 14:17:39 +0000 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: Message-ID: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org> On 2023-03-30 11:20:43 +0900 (+0900), Takashi Kajinami wrote: [...] > latest tempest CAN'T be installed with stable/xena u-c because > current tempest requires fasteners>=0.16.0 which conflicts with > 0.14.1 in stable/xena u-c. [...] Won't this situation sort itself out in a few weeks when the Tempest master branch officially ceases support for stable/xena? But more generally, Tempest isn't expected to be coinstallable with stable branches of projects, it's supposed to be installed into an isolated venv or possibly even onto an entirely separate VM. Why not move the problem tests into the heat-tempest-plugin repository instead, since it should be properly coinstallable with Tempest? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From wodel.youchi at gmail.com Thu Mar 30 20:19:28 2023 From: wodel.youchi at gmail.com (wodel youchi) Date: Thu, 30 Mar 2023 21:19:28 +0100 Subject: [kolla-ansible] deploy cinder and glance with multi-backend Message-ID: Hi, I need to deploy cinder and glance with multi-backend. My experience for now is simple, my deployment is an HCI built on top of ceph storage, and I am using it to store both cinder volumes and glance images. I have some questions if you can help: - can multi-backend be deployed at first run with kolla-ansible, or do I need to do at least two runs? - reading the doc about multi-backend, I saw that the first backend is specified, do I need to remove the creation of the first backend from glabals.yml and put it in a configuration file cinder.conf and glance.conf? In other words, do I have to create all the backends from the config files, or can I still create the first one using globals.yml and the rest from the config files? I hope my question is clear. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Thu Mar 30 20:49:43 2023 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 30 Mar 2023 16:49:43 -0400 Subject: [kolla] Image building question Message-ID: Folks, I am playing with kolla image building to understand how it works. I am using the following command to build images and wanted to check with you folks if that is the correct way to do it. $ kolla-build -b ubuntu -t source keystone nova neutron glance Does the above command compile code from source or just download images from remote repositories and re-compile them? because in command output I've not noticed anything related to the compiling process going on. Here is the output of all images produced by kolla-build command. Do I need anything else or is this enough to deploy kolla? root at docker-reg:~# docker images REPOSITORY TAG IMAGE ID CREATED SIZE kolla/mariadb-server 15.1.0 2a497eee8269 26 minutes ago 595MB kolla/cron 15.1.0 342877f26a8a 30 minutes ago 250MB kolla/memcached 15.1.0 0d19a4902644 31 minutes ago 250MB kolla/mariadb-clustercheck 15.1.0 d84427d3c639 31 minutes ago 314MB kolla/mariadb-base 15.1.0 34447e3e59b6 31 minutes ago 314MB kolla/keepalived 15.1.0 82133b09fbf0 31 minutes ago 260MB kolla/prometheus-memcached-exporter 15.1.0 6c2d605f70ee 31 minutes ago 262MB e66b228c2a07 31 minutes ago 248MB kolla/rabbitmq 15.1.0 8de5c39379d3 32 minutes ago 309MB kolla/fluentd 15.1.0 adfd19027862 33 minutes ago 519MB kolla/haproxy-ssh 15.1.0 514357ac4d36 36 minutes ago 255MB kolla/haproxy 15.1.0 e5b9cfdf6dfc 37 minutes ago 257MB kolla/prometheus-haproxy-exporter 15.1.0 a679f65fd735 37 minutes ago 263MB kolla/prometheus-base 15.1.0 afeff3ed5dce 37 minutes ago 248MB kolla/glance-api 15.1.0 a2241f68f23a 38 minutes ago 1.04GB kolla/glance-base 15.1.0 7286772a03a4 About an hour ago 1.03GB kolla/neutron-infoblox-ipam-agent 15.1.0 f90ffc1a3326 About an hour ago 1.05GB kolla/neutron-server 15.1.0 69c844a2e3a9 About an hour ago 1.05GB kolla/neutron-l3-agent 15.1.0 4d87e6963c96 About an hour ago 1.05GB 486da9a6562e About an hour ago 1.05GB kolla/neutron-linuxbridge-agent 15.1.0 e5b3ca7e099c About an hour ago 1.04GB kolla/neutron-bgp-dragent 15.1.0 ac37377820c6 About an hour ago 1.04GB kolla/ironic-neutron-agent 15.1.0 90993adcd74b About an hour ago 1.04GB kolla/neutron-metadata-agent 15.1.0 8522f147f88d About an hour ago 1.04GB kolla/neutron-sriov-agent 15.1.0 8a92ce7d13c0 About an hour ago 1.04GB kolla/neutron-dhcp-agent 15.1.0 5c214b0171f5 About an hour ago 1.04GB kolla/neutron-metering-agent 15.1.0 7b3b91ecd77b About an hour ago 1.04GB kolla/neutron-openvswitch-agent 15.1.0 1f8807308814 About an hour ago 1.04GB kolla/neutron-base 15.1.0 f85b6a2e2725 About an hour ago 1.04GB kolla/nova-libvirt 15.1.0 0f3ecefe4752 About an hour ago 987MB kolla/nova-compute 15.1.0 241b7e7fafbe About an hour ago 1.47GB kolla/nova-spicehtml5proxy 15.1.0 b740820a7ad1 About an hour ago 1.15GB kolla/nova-novncproxy 15.1.0 1ba2f443d5c3 About an hour ago 1.22GB kolla/nova-compute-ironic 15.1.0 716612107532 About an hour ago 1.12GB kolla/nova-ssh 15.1.0 ae2397f4e1c1 About an hour ago 1.11GB kolla/nova-api 15.1.0 2aef02667ff8 About an hour ago 1.11GB kolla/nova-conductor 15.1.0 6f1da3400901 About an hour ago 1.11GB kolla/nova-scheduler 15.1.0 628326776b1d About an hour ago 1.11GB kolla/nova-serialproxy 15.1.0 28eb7a4a13f8 About an hour ago 1.11GB kolla/nova-base 15.1.0 e47420013283 About an hour ago 1.11GB kolla/keystone 15.1.0 e5530d829d5f 2 hours ago 947MB kolla/keystone-ssh 15.1.0 eaa7e3f3985a 2 hours ago 953MB kolla/keystone-fernet 15.1.0 8a4fa24853a8 2 hours ago 951MB kolla/keystone-base 15.1.0 b6f9562364a9 2 hours ago 945MB kolla/barbican-base 15.1.0 b2fdef1afb44 2 hours ago 915MB kolla/barbican-keystone-listener 15.1.0 58bd59de2c63 2 hours ago 915MB kolla/openstack-base 15.1.0 c805b4b3b1c1 2 hours ago 893MB kolla/base 15.1.0 f68e9ef3dd30 2 hours ago 248MB registry 2 8db46f9d7550 19 hours ago 24.2MB ubuntu 22.04 08d22c0ceb15 3 weeks ago 77.8MB -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Mar 30 21:24:33 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 30 Mar 2023 14:24:33 -0700 Subject: [ptl] Need PTL volunteer for OpenStack Winstackers In-Reply-To: References: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com> Message-ID: <1873468752c.bf93124c180931.3029567703559224707@ghanshyammann.com> Thanks, Lucian, for the updates and email link. As the next step, we will discuss it in TC and take the next action. -gmann ---- On Thu, 30 Mar 2023 00:11:38 -0700 Lucian Petrut wrote --- > Hi, > Thanks for reaching out. As mentioned here [1], Cloudbase Solutions can no longer lead the Winstackers project. Since there weren?t any other interested parties, I think there?s no other option but to retire the project. > [1]?https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031044.html > Regards,Lucian Petrut > > On 22 Mar 2023, at 19:43, Ghanshyam Mann gmann at ghanshyammann.com> wrote: > Hi Lukas, > > I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle. > > There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please > check if you or anyone you know would like to lead this project. > > - https://etherpad.opendev.org/p/2023.2-leaderless > > Also, if anyone else would like to help leading this project, this is time to let TC knows. > > -gmann > > From gmann at ghanshyammann.com Fri Mar 31 02:12:46 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 30 Mar 2023 19:12:46 -0700 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org> References: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org> Message-ID: <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com> ---- On Thu, 30 Mar 2023 07:17:39 -0700 Jeremy Stanley wrote --- > On 2023-03-30 11:20:43 +0900 (+0900), Takashi Kajinami wrote: > [...] > > latest tempest CAN'T be installed with stable/xena u-c because > > current tempest requires fasteners>=0.16.0 which conflicts with > > 0.14.1 in stable/xena u-c. > [...] > > Won't this situation sort itself out in a few weeks when the Tempest > master branch officially ceases support for stable/xena? > > But more generally, Tempest isn't expected to be coinstallable with > stable branches of projects, it's supposed to be installed into an > isolated venv or possibly even onto an entirely separate VM. Why not > move the problem tests into the heat-tempest-plugin repository > instead, since it should be properly coinstallable with Tempest? This is not related to stable/xena or heat tests. Grenade job running on immediately supported branch from EM branch where the base is EM branch using old tempest and stable constraints and target use master tempest and constraints. When you run tempest on target, it causes an issue as constraints var are not set properly for the target. -gmann > -- > Jeremy Stanley > From gmann at ghanshyammann.com Fri Mar 31 02:26:25 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 30 Mar 2023 19:26:25 -0700 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: Message-ID: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com> ---- On Wed, 29 Mar 2023 19:20:43 -0700 Takashi Kajinami wrote --- > Hello, > > I have had some local discussions with gmann, but I'd really like to move this discussion forwardto fix the broken stable/xena gate in heat so I will start this thread, hoping the thread can providemore context behind my proposal. > Historically stable branches of heat have been frequently affected by any change in requirementsof tempest. This is mainly because in our CI we install our own in-tree integration tests[1] intotempest venv where tempest and heat-tempest-plugin are installed. Because in-tree integration testsare tied to that specific stable branch, this has been often causing conflicts in requirements(master constraint vs stable/X constraint). Let me explain the issue here. It is not because of using tempest from upper constraints or branchless things, it is because we are not setting the tempest venv constraints correctly for the target tempest run in the grenade job. We fixed the tempest venv constraints setting for the tempest test run on the base branch[1] but forgot to do the same for the target branch test run. As we do not have any grenade job except heat which is running tempest on the target branch in the grenade job, we could not face this issue and heat testing unhide it. I then reproduce it on a normal grenade job by running the tempest on target and the same issue[2][3]. The issue is when base and target branches have different Tempest and constraints to use (for example, stable/wallaby uses old tempest and stable/wallaby constraints, but stable/xena use tempest master and master constraints); in such cases, we need to set proper constraints defined in devstack and then run tempest. It will happen in the grenade job running on the immediately supported branch of the latest EM. I have pushed the grenade fix[4] and testing it by applying the same in heat[5]. If it work then I will push heat change form master itself and backported till stable/xena, so we fix it for all future EM/stable branches. [1] https://review.opendev.org/q/topic:bug%252F2003993 [2] https://review.opendev.org/c/openstack/grenade/+/878247/1 [3] https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184 [4] https://review.opendev.org/c/openstack/grenade/+/879113 [5] https://review.opendev.org/c/openstack/heat/+/872055 -gmann > > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests > In the past we changed our test installation[2] to use stable constraint to avoid this conflicts,but this approach does no longer work since stable/xena because > 1. stable/xena u-c no longer includes tempest > 2. latest tempest CAN'T be installed with stable/xena u-c because current tempest requires??? fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c. > [2]https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215 > I've proposed the change to pin tempest[3] in stable/xena u-c so that people can install tempestwith stable/xena u-c. > [3] https://review.opendev.org/c/openstack/requirements/+/878228 > I understand the reason tempest was removed from u-c was that we should use the latest tempestto test recent stable releases.I agree we can keep tempest excluded for stable/yoga and onwardsbecause tempest is installable with their u-c, but stable/xena u-c is no longer compatible with master.Adding pin to xena u-c does not mainly affect the policy to test stable branches with latest tempestbecause for that we anyway need to use more recent u-c. > I'm still trying to find out the workaround within heat but IMO adding tempest pin to stable/xena u-cis harmless but beneficial in case anyone is trying to use tempest with stable/xena u-c. > > Thank you, > Takashi Kajinami > From satish.txt at gmail.com Fri Mar 31 03:05:48 2023 From: satish.txt at gmail.com (Satish Patel) Date: Thu, 30 Mar 2023 23:05:48 -0400 Subject: [kolla] horizon image build failed Message-ID: Folks, All other images build successfully but when i am trying to build horizon which failed with following error: $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0) INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 INFO:kolla.common.utils.horizon: Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) INFO:kolla.common.utils.horizon: ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 INFO:kolla.common.utils.horizon: Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 INFO:kolla.common.utils.horizon: Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0 INFO:kolla.common.utils.horizon: Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished with status 'error' INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error INFO:kolla.common.utils.horizon: INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run successfully. INFO:kolla.common.utils.horizon: ? exit code: 1 INFO:kolla.common.utils.horizon: ??> [6 lines of output] INFO:kolla.common.utils.horizon: Traceback (most recent call last): INFO:kolla.common.utils.horizon: File "", line 2, in INFO:kolla.common.utils.horizon: File "", line 34, in INFO:kolla.common.utils.horizon: File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in INFO:kolla.common.utils.horizon: from xstatic.pkg import moment_timezone as xs INFO:kolla.common.utils.horizon: ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location) INFO:kolla.common.utils.horizon: [end of output] INFO:kolla.common.utils.horizon: INFO:kolla.common.utils.horizon: note: This error originates from a subprocess, and is likely not a problem with pip. INFO:kolla.common.utils.horizon: INFO:kolla.common.utils.horizon:error: metadata-generation-failed INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata. INFO:kolla.common.utils.horizon:??> See above for output. INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip. INFO:kolla.common.utils.horizon:hint: See above for details. INFO:kolla.common.utils.horizon: INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529 ERROR:kolla.common.utils.horizon:Error'd with the following message ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon && sed -i /^horizon=/d /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon && mkdir -p /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/ && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*; fi && for locale in /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) done && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1 INFO:kolla.common.utils:========================= INFO:kolla.common.utils:Successfully built images INFO:kolla.common.utils:========================= INFO:kolla.common.utils:base INFO:kolla.common.utils:openstack-base INFO:kolla.common.utils:=========================== INFO:kolla.common.utils:Images that failed to build INFO:kolla.common.utils:=========================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Fri Mar 31 03:27:17 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Fri, 31 Mar 2023 12:27:17 +0900 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com> References: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com> Message-ID: On Fri, Mar 31, 2023 at 11:26?AM Ghanshyam Mann wrote: > ---- On Wed, 29 Mar 2023 19:20:43 -0700 Takashi Kajinami wrote --- > > Hello, > > > > I have had some local discussions with gmann, but I'd really like to > move this discussion forwardto fix the broken stable/xena gate in heat so I > will start this thread, hoping the thread can providemore context behind my > proposal. > > Historically stable branches of heat have been frequently affected by > any change in requirementsof tempest. This is mainly because in our CI we > install our own in-tree integration tests[1] intotempest venv where tempest > and heat-tempest-plugin are installed. Because in-tree integration testsare > tied to that specific stable branch, this has been often causing conflicts > in requirements(master constraint vs stable/X constraint). > > Let me explain the issue here. It is not because of using tempest from > upper constraints or branchless things, > it is because we are not setting the tempest venv constraints correctly > for the target tempest run in the grenade job. > > We fixed the tempest venv constraints setting for the tempest test run on > the base branch[1] but forgot to do the same > for the target branch test run. As we do not have any grenade job except > heat which is running tempest on the target branch > in the grenade job, we could not face this issue and heat testing unhide > it. I then reproduce it on a normal grenade job by > running the tempest on target and the same issue[2][3]. > > The issue is when base and target branches have different Tempest and > constraints to use (for example, stable/wallaby uses old tempest > and stable/wallaby constraints, but stable/xena use tempest master and > master constraints); in such cases, we need to set proper constraints > defined in devstack and then run tempest. It will happen in the grenade > job running on the immediately supported branch of the latest EM. > This is the core problem in heat, which is conflicting what has been done in heat testing. During tests after upgrade we run not only tempest + heat-tempest-plugin tests but also in-tree heat integration tests which test more actual resources. However in-tree integration tests are dependent on a specific stable/requirement. So when we run tests in stable/xena then we need stable/xena constraints installed in venv, which means we need to install tempest which is compatible with stable/xena uc, rather than master tempest. For this sake, I'm asking for adding requirements so that we can install tempest with stable/xena u-c. (Currently we do not explicitly install tempest but it is installed as a dependency of heat-tempest-plugin. I tried to set an explicit tempest version but it has been failing for some reason.) https://review.opendev.org/c/openstack/heat/+/878610 If running tempest tests after upgrade is not commonly done then we probably can replace tempest by more simple ones as is done for the core services such as keystone, or at least we can get rid of integration tests. Though we still likely face an issue with our normal integration tests which run the same set of tests. > I have pushed the grenade fix[4] and testing it by applying the same in > heat[5]. If it work then I will push heat change > form master itself and backported till stable/xena, so we fix it for all > future EM/stable branches. > > [1] https://review.opendev.org/q/topic:bug%252F2003993 > [2] https://review.opendev.org/c/openstack/grenade/+/878247/1 > [3] > https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184 > [4] https://review.opendev.org/c/openstack/grenade/+/879113 > [5] https://review.opendev.org/c/openstack/heat/+/872055 > > > -gmann > > > > > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests > > In the past we changed our test installation[2] to use stable > constraint to avoid this conflicts,but this approach does no longer work > since stable/xena because > > 1. stable/xena u-c no longer includes tempest > > 2. latest tempest CAN'T be installed with stable/xena u-c because > current tempest requires fasteners>=0.16.0 which conflicts with 0.14.1 > in stable/xena u-c. > > [2] > https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215 > > I've proposed the change to pin tempest[3] in stable/xena u-c so that > people can install tempestwith stable/xena u-c. > > [3] https://review.opendev.org/c/openstack/requirements/+/878228 > > I understand the reason tempest was removed from u-c was that we should > use the latest tempestto test recent stable releases.I agree we can keep > tempest excluded for stable/yoga and onwardsbecause tempest is installable > with their u-c, but stable/xena u-c is no longer compatible with > master.Adding pin to xena u-c does not mainly affect the policy to test > stable branches with latest tempestbecause for that we anyway need to use > more recent u-c. > > I'm still trying to find out the workaround within heat but IMO adding > tempest pin to stable/xena u-cis harmless but beneficial in case anyone is > trying to use tempest with stable/xena u-c. > > > > Thank you, > > Takashi Kajinami > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkajinam at redhat.com Fri Mar 31 03:38:10 2023 From: tkajinam at redhat.com (Takashi Kajinami) Date: Fri, 31 Mar 2023 12:38:10 +0900 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: References: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com> Message-ID: On Fri, Mar 31, 2023 at 12:27?PM Takashi Kajinami wrote: > > > On Fri, Mar 31, 2023 at 11:26?AM Ghanshyam Mann > wrote: > >> ---- On Wed, 29 Mar 2023 19:20:43 -0700 Takashi Kajinami wrote --- >> > Hello, >> > >> > I have had some local discussions with gmann, but I'd really like to >> move this discussion forwardto fix the broken stable/xena gate in heat so I >> will start this thread, hoping the thread can providemore context behind my >> proposal. >> > Historically stable branches of heat have been frequently affected by >> any change in requirementsof tempest. This is mainly because in our CI we >> install our own in-tree integration tests[1] intotempest venv where tempest >> and heat-tempest-plugin are installed. Because in-tree integration testsare >> tied to that specific stable branch, this has been often causing conflicts >> in requirements(master constraint vs stable/X constraint). >> >> Let me explain the issue here. It is not because of using tempest from >> upper constraints or branchless things, >> it is because we are not setting the tempest venv constraints correctly >> for the target tempest run in the grenade job. >> >> We fixed the tempest venv constraints setting for the tempest test run on >> the base branch[1] but forgot to do the same >> for the target branch test run. As we do not have any grenade job except >> heat which is running tempest on the target branch >> in the grenade job, we could not face this issue and heat testing unhide >> it. I then reproduce it on a normal grenade job by >> running the tempest on target and the same issue[2][3]. >> >> The issue is when base and target branches have different Tempest and >> constraints to use (for example, stable/wallaby uses old tempest >> and stable/wallaby constraints, but stable/xena use tempest master and >> master constraints); in such cases, we need to set proper constraints >> defined in devstack and then run tempest. It will happen in the grenade >> job running on the immediately supported branch of the latest EM. >> > This is the core problem in heat, which is conflicting what has been done > in heat testing. > During tests after upgrade we run not only tempest + heat-tempest-plugin > tests but also in-tree heat integration tests > which test more actual resources. However in-tree integration tests are > dependent on a specific stable/requirement. > So when we run tests in stable/xena then we need stable/xena constraints > installed in venv, which means we need to > install tempest which is compatible with stable/xena uc, rather than > master tempest. For this sake, I'm asking for > adding requirements so that we can install tempest with stable/xena u-c. > (Currently we do not explicitly install tempest > but it is installed as a dependency of heat-tempest-plugin. I tried to set > an explicit tempest version but it has been failing > for some reason.) > > https://review.opendev.org/c/openstack/heat/+/878610 > > If running tempest tests after upgrade is not commonly done then we > probably can replace tempest by more simple > ones as is done for the core services such as keystone, or at least we can > get rid of integration tests. Though we still > likely face an issue with our normal integration tests which run the same > set of tests. > Hmm. Looking at the result of integration tests in https://review.opendev.org/c/openstack/heat/+/872055 , it seems heat integration tests in stable/xena works with master constraints. So we probably can use master constraints for now but in the future when any backport incompatibility affects integration tests then we have to find out the way to switch to stable constraints at that time. > > >> I have pushed the grenade fix[4] and testing it by applying the same in >> heat[5]. If it work then I will push heat change >> form master itself and backported till stable/xena, so we fix it for all >> future EM/stable branches. >> >> [1] https://review.opendev.org/q/topic:bug%252F2003993 >> [2] https://review.opendev.org/c/openstack/grenade/+/878247/1 >> [3] >> https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184 >> [4] https://review.opendev.org/c/openstack/grenade/+/879113 >> [5] https://review.opendev.org/c/openstack/heat/+/872055 >> >> >> -gmann >> >> > >> > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests >> > In the past we changed our test installation[2] to use stable >> constraint to avoid this conflicts,but this approach does no longer work >> since stable/xena because >> > 1. stable/xena u-c no longer includes tempest >> > 2. latest tempest CAN'T be installed with stable/xena u-c because >> current tempest requires fasteners>=0.16.0 which conflicts with 0.14.1 >> in stable/xena u-c. >> > [2] >> https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215 >> > I've proposed the change to pin tempest[3] in stable/xena u-c so that >> people can install tempestwith stable/xena u-c. >> > [3] https://review.opendev.org/c/openstack/requirements/+/878228 >> > I understand the reason tempest was removed from u-c was that we >> should use the latest tempestto test recent stable releases.I agree we can >> keep tempest excluded for stable/yoga and onwardsbecause tempest is >> installable with their u-c, but stable/xena u-c is no longer compatible >> with master.Adding pin to xena u-c does not mainly affect the policy to >> test stable branches with latest tempestbecause for that we anyway need to >> use more recent u-c. >> > I'm still trying to find out the workaround within heat but IMO adding >> tempest pin to stable/xena u-cis harmless but beneficial in case anyone is >> trying to use tempest with stable/xena u-c. >> > >> > Thank you, >> > Takashi Kajinami >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 31 05:51:42 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 31 Mar 2023 07:51:42 +0200 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: Hi Satish, Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this? You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report. Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913 Michal > On 31 Mar 2023, at 05:05, Satish Patel wrote: > > Folks, > > All other images build successfully but when i am trying to build horizon which failed with following error: > > $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon > > > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 > INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) > INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 > INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) > INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0) > INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 > INFO:kolla.common.utils.horizon: Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) > INFO:kolla.common.utils.horizon: ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) > INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 > INFO:kolla.common.utils.horizon: Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) > INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 > INFO:kolla.common.utils.horizon: Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) > INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0 > INFO:kolla.common.utils.horizon: Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) > INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished with status 'error' > INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run successfully. > INFO:kolla.common.utils.horizon: ? exit code: 1 > INFO:kolla.common.utils.horizon: ??> [6 lines of output] > INFO:kolla.common.utils.horizon: Traceback (most recent call last): > INFO:kolla.common.utils.horizon: File "", line 2, in > INFO:kolla.common.utils.horizon: File "", line 34, in > INFO:kolla.common.utils.horizon: File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in > INFO:kolla.common.utils.horizon: from xstatic.pkg import moment_timezone as xs > INFO:kolla.common.utils.horizon: ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location) > INFO:kolla.common.utils.horizon: [end of output] > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: note: This error originates from a subprocess, and is likely not a problem with pip. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:error: metadata-generation-failed > INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata. > INFO:kolla.common.utils.horizon:??> See above for output. > INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip. > INFO:kolla.common.utils.horizon:hint: See above for details. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529 > ERROR:kolla.common.utils.horizon:Error'd with the following message > ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon && sed -i /^horizon=/d /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon && mkdir -p /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/ && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*; fi && for locale in /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) done && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1 > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:Successfully built images > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:base > INFO:kolla.common.utils:openstack-base > INFO:kolla.common.utils:=========================== > INFO:kolla.common.utils:Images that failed to build > INFO:kolla.common.utils:=========================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From manchandavishal143 at gmail.com Fri Mar 31 05:58:45 2023 From: manchandavishal143 at gmail.com (vishal manchanda) Date: Fri, 31 Mar 2023 11:28:45 +0530 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: JFYI, there is also a bug opened for this issue in the horizon [1]. But no progress as of today. Thanks & regards, Vishal Manchanda [1] https://bugs.launchpad.net/horizon/+bug/2007574 On Fri, Mar 31, 2023 at 8:37?AM Satish Patel wrote: > Folks, > > All other images build successfully but when i am trying to build horizon > which failed with following error: > > $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed > horizon > > > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: > XStatic-Font-Awesome>=4.7.0.0 in > /var/lib/kolla/venv/lib/python3.10/site-packages (from > vitrage-dashboard==3.6.1.dev2) (4.7.0.0) > INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) > INFO:kolla.common.utils.horizon: > ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: > XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages > (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) > INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 > INFO:kolla.common.utils.horizon: Downloading > XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 > INFO:kolla.common.utils.horizon: Downloading > XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) > INFO:kolla.common.utils.horizon: > ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting > XStatic-Moment-Timezone>=0.5.22.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) > INFO:kolla.common.utils.horizon: > ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished > with status 'error' > INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run > successfully. > INFO:kolla.common.utils.horizon: ? exit code: 1 > INFO:kolla.common.utils.horizon: ??> [6 lines of output] > INFO:kolla.common.utils.horizon: Traceback (most recent call last): > INFO:kolla.common.utils.horizon: File "", line 2, in > > INFO:kolla.common.utils.horizon: File "", > line 34, in > INFO:kolla.common.utils.horizon: File > "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", > line 2, in > INFO:kolla.common.utils.horizon: from xstatic.pkg import > moment_timezone as xs > INFO:kolla.common.utils.horizon: ImportError: cannot import name > 'moment_timezone' from 'xstatic.pkg' (unknown location) > INFO:kolla.common.utils.horizon: [end of output] > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: note: This error originates from a > subprocess, and is likely not a problem with pip. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:error: metadata-generation-failed > INFO:kolla.common.utils.horizon:? Encountered error while generating > package metadata. > INFO:kolla.common.utils.horizon:??> See above for output. > INFO:kolla.common.utils.horizon:note: This is an issue with the package > mentioned above, not pip. > INFO:kolla.common.utils.horizon:hint: See above for details. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:Removing intermediate container > e6cd437ba529 > ERROR:kolla.common.utils.horizon:Error'd with the following message > ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s > horizon-source/* horizon && sed -i /^horizon=/d > /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib > python3 -m pip --no-cache-dir install --upgrade -c > /requirements/upper-constraints.txt /horizon && mkdir -p > /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* > /etc/openstack-dashboard/ && cp > /horizon/openstack_dashboard/local/local_settings.py.example > /etc/openstack-dashboard/local_settings && cp /horizon/manage.py > /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then > SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir > install --upgrade -c /requirements/upper-constraints.txt /plugins/*; > fi && for locale in > /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do > (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) > done && chmod 644 /usr/local/bin/kolla_extend_start' returned a > non-zero code: 1 > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:Successfully built images > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:base > INFO:kolla.common.utils:openstack-base > INFO:kolla.common.utils:=========================== > INFO:kolla.common.utils:Images that failed to build > INFO:kolla.common.utils:=========================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 31 06:47:42 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 31 Mar 2023 08:47:42 +0200 Subject: [kolla] Image building question In-Reply-To: References: Message-ID: <8A0037B4-3C63-4EA5-ADC5-282B9246E578@gmail.com> Hello Satish, Only OpenStack is installed from source, all the dependencies (e.g. MariaDB, Apache, etc) are installed from distribution repositories. The set of images you need depends on what you enable in Kolla-Ansible (unless you use a different mechanism for deploying those images). Michal > On 30 Mar 2023, at 22:49, Satish Patel wrote: > > Folks, > > I am playing with kolla image building to understand how it works. I am using the following command to build images and wanted to check with you folks if that is the correct way to do it. > > $ kolla-build -b ubuntu -t source keystone nova neutron glance > > Does the above command compile code from source or just download images from remote repositories and re-compile them? because in command output I've not noticed anything related to the compiling process going on. > > Here is the output of all images produced by kolla-build command. Do I need anything else or is this enough to deploy kolla? > > root at docker-reg:~# docker images > REPOSITORY TAG IMAGE ID CREATED SIZE > kolla/mariadb-server 15.1.0 2a497eee8269 26 minutes ago 595MB > kolla/cron 15.1.0 342877f26a8a 30 minutes ago 250MB > kolla/memcached 15.1.0 0d19a4902644 31 minutes ago 250MB > kolla/mariadb-clustercheck 15.1.0 d84427d3c639 31 minutes ago 314MB > kolla/mariadb-base 15.1.0 34447e3e59b6 31 minutes ago 314MB > kolla/keepalived 15.1.0 82133b09fbf0 31 minutes ago 260MB > kolla/prometheus-memcached-exporter 15.1.0 6c2d605f70ee 31 minutes ago 262MB > e66b228c2a07 31 minutes ago 248MB > kolla/rabbitmq 15.1.0 8de5c39379d3 32 minutes ago 309MB > kolla/fluentd 15.1.0 adfd19027862 33 minutes ago 519MB > kolla/haproxy-ssh 15.1.0 514357ac4d36 36 minutes ago 255MB > kolla/haproxy 15.1.0 e5b9cfdf6dfc 37 minutes ago 257MB > kolla/prometheus-haproxy-exporter 15.1.0 a679f65fd735 37 minutes ago 263MB > kolla/prometheus-base 15.1.0 afeff3ed5dce 37 minutes ago 248MB > kolla/glance-api 15.1.0 a2241f68f23a 38 minutes ago 1.04GB > kolla/glance-base 15.1.0 7286772a03a4 About an hour ago 1.03GB > kolla/neutron-infoblox-ipam-agent 15.1.0 f90ffc1a3326 About an hour ago 1.05GB > kolla/neutron-server 15.1.0 69c844a2e3a9 About an hour ago 1.05GB > kolla/neutron-l3-agent 15.1.0 4d87e6963c96 About an hour ago 1.05GB > 486da9a6562e About an hour ago 1.05GB > kolla/neutron-linuxbridge-agent 15.1.0 e5b3ca7e099c About an hour ago 1.04GB > kolla/neutron-bgp-dragent 15.1.0 ac37377820c6 About an hour ago 1.04GB > kolla/ironic-neutron-agent 15.1.0 90993adcd74b About an hour ago 1.04GB > kolla/neutron-metadata-agent 15.1.0 8522f147f88d About an hour ago 1.04GB > kolla/neutron-sriov-agent 15.1.0 8a92ce7d13c0 About an hour ago 1.04GB > kolla/neutron-dhcp-agent 15.1.0 5c214b0171f5 About an hour ago 1.04GB > kolla/neutron-metering-agent 15.1.0 7b3b91ecd77b About an hour ago 1.04GB > kolla/neutron-openvswitch-agent 15.1.0 1f8807308814 About an hour ago 1.04GB > kolla/neutron-base 15.1.0 f85b6a2e2725 About an hour ago 1.04GB > kolla/nova-libvirt 15.1.0 0f3ecefe4752 About an hour ago 987MB > kolla/nova-compute 15.1.0 241b7e7fafbe About an hour ago 1.47GB > kolla/nova-spicehtml5proxy 15.1.0 b740820a7ad1 About an hour ago 1.15GB > kolla/nova-novncproxy 15.1.0 1ba2f443d5c3 About an hour ago 1.22GB > kolla/nova-compute-ironic 15.1.0 716612107532 About an hour ago 1.12GB > kolla/nova-ssh 15.1.0 ae2397f4e1c1 About an hour ago 1.11GB > kolla/nova-api 15.1.0 2aef02667ff8 About an hour ago 1.11GB > kolla/nova-conductor 15.1.0 6f1da3400901 About an hour ago 1.11GB > kolla/nova-scheduler 15.1.0 628326776b1d About an hour ago 1.11GB > kolla/nova-serialproxy 15.1.0 28eb7a4a13f8 About an hour ago 1.11GB > kolla/nova-base 15.1.0 e47420013283 About an hour ago 1.11GB > kolla/keystone 15.1.0 e5530d829d5f 2 hours ago 947MB > kolla/keystone-ssh 15.1.0 eaa7e3f3985a 2 hours ago 953MB > kolla/keystone-fernet 15.1.0 8a4fa24853a8 2 hours ago 951MB > kolla/keystone-base 15.1.0 b6f9562364a9 2 hours ago 945MB > kolla/barbican-base 15.1.0 b2fdef1afb44 2 hours ago 915MB > kolla/barbican-keystone-listener 15.1.0 58bd59de2c63 2 hours ago 915MB > kolla/openstack-base 15.1.0 c805b4b3b1c1 2 hours ago 893MB > kolla/base 15.1.0 f68e9ef3dd30 2 hours ago 248MB > registry 2 8db46f9d7550 19 hours ago 24.2MB > ubuntu 22.04 08d22c0ceb15 3 weeks ago 77.8MB > > From gthiemonge at redhat.com Fri Mar 31 08:19:49 2023 From: gthiemonge at redhat.com (Gregory Thiemonge) Date: Fri, 31 Mar 2023 10:19:49 +0200 Subject: [Octavia] moving back to Launchpad Message-ID: Hi Folks, During the Antelope PTG, we discussed the move from Storyboard to Launchpad, and in the Bobcat PTG session, we decided to switch at the beginning of the B cycle (now). The Octavia Launchpad [0] is now re-enabled, we are marking all the old entries as Invalid. We don't plan to have an automated migration script, we will duplicate manually the most recent bugs reported in Storyboard (not started bugs). I'm also proposing patches to update the links to the bug tracker in the Octavia projects. Greg [0] https://launchpad.net/octavia -------------- next part -------------- An HTML attachment was scrubbed... URL: From tweining at redhat.com Fri Mar 31 09:50:11 2023 From: tweining at redhat.com (Tom Weininger) Date: Fri, 31 Mar 2023 11:50:11 +0200 Subject: [Octavia] moving back to Launchpad In-Reply-To: References: Message-ID: <7d01bf75-13d0-0008-8535-904378768d46@redhat.com> Thank you Greg for working on this and for coordinating the migration. I'm happy that Octavia does this migration now. Best regards, Tom On 31.03.23 10:19, Gregory Thiemonge wrote: > Hi Folks, > > During the Antelope PTG, we discussed the move from Storyboard to Launchpad, and in the > Bobcat PTG session, we decided to switch at the beginning?of the B cycle (now). > > The Octavia Launchpad [0] is now re-enabled, we are marking all the old entries as Invalid. > We don't plan to have an automated migration script, we will duplicate manually the most > recent bugs reported in Storyboard (not started bugs). > > I'm also proposing patches to update the links to the bug tracker in the Octavia projects. > > Greg > > [0] https://launchpad.net/octavia From smooney at redhat.com Fri Mar 31 11:01:21 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 31 Mar 2023 12:01:21 +0100 Subject: [kolla] Image building question In-Reply-To: References: Message-ID: On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote: > Folks, > > I am playing with kolla image building to understand how it works. I am > using the following command to build images and wanted to check with you > folks if that is the correct way to do it. > > $ kolla-build -b ubuntu -t source keystone nova neutron glance > > Does the above command compile code from source or just download images > from remote repositories and re-compile them? > openstack is mainly python so in general ther is no complie step. but to answer your question that builds the image using the source tarballs or the openstakc packages. the defaults soruce locations are rendered into a file which you can override from the data stored in https://github.com/openstack/kolla/blob/master/kolla/common/sources.py the other build config defaults are generated form this code https://github.com/openstack/kolla/blob/master/kolla/common/config.py when you invoke kolla-build its executing https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py but the main build workflow is here https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95 the tl;dr is the build worklow starts by creating build director and locating the docker file templats. in otherwords the content of the https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker directory each project has a direcoty in the docker directory and then each contaienr that project has has a directory in the project directory so the aodh project has a aodh folder https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh the convention is to have a -base contaienr which handels the depency installation and then one addtional contaienr for each binary deamon the project has i.e. aodh-api the name of the folder in teh project dir is used as the name of the contaienr if we look at the content of the docker files we will see that they are not actuly dockerfiles https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2 they are jinja2 templates that produce docker files kolla as far as i am aware has drop support for binary images and alternitiv distos but looking at an older release we can se ehow this worked https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52 each docker file template would use the jinja2 to generate a set of concreate docker files form the template and make dession based on the parmater passed in. so when you are invokeing kolla-build -b ubuntu -t source keystone nova neutron glance what actully happening is that the -t flag is being set as teh install_type parmater in the the jinja2 environemtn when the docker file is rendered. after all the docer files are rendered into normal docker files kolla just invokes the build. in the case of a source build that inovles pre fetching the source tar from https://tarballs.opendev.org and puting it in the build directory so that it can be included into the contianer. kolla also used to supprot git repo as a alternitve source fromat i have glossed over a lot of the details of how this actully work but that is the essence of what that command is doing creating a build dir, downloading the source, rendering the dockerfile templates to docker files, invokeing docker build on those and then taging them with the contaienr nameand build tag https://docs.openstack.org/kolla/latest/admin/image-building.html covers this form a high level > because in command output > I've not noticed anything related to the compiling process going on. > > Here is the output of all images produced by kolla-build command. Do I need > anything else or is this enough to deploy kolla? you can deploy coll with what you have yes although since the kolla files are automaticaly built by ci kolla-ansible can just use the ones form the docker hub or quay instead you do not need to build them yourself if you do build them your self then there is basically one other stpe that you shoudl take if this si a multi node deployment you should push the iamges to an interally host docker registry although based on the hostname in the prompt below it looks like you ahve alredy done that. > > root at docker-reg:~# docker images > REPOSITORY TAG IMAGE ID CREATED > SIZE > kolla/mariadb-server 15.1.0 2a497eee8269 26 minutes > ago 595MB > kolla/cron 15.1.0 342877f26a8a 30 minutes > ago 250MB > kolla/memcached 15.1.0 0d19a4902644 31 minutes > ago 250MB > kolla/mariadb-clustercheck 15.1.0 d84427d3c639 31 minutes > ago 314MB > kolla/mariadb-base 15.1.0 34447e3e59b6 31 minutes > ago 314MB > kolla/keepalived 15.1.0 82133b09fbf0 31 minutes > ago 260MB > kolla/prometheus-memcached-exporter 15.1.0 6c2d605f70ee 31 minutes > ago 262MB > e66b228c2a07 31 minutes > ago 248MB > kolla/rabbitmq 15.1.0 8de5c39379d3 32 minutes > ago 309MB > kolla/fluentd 15.1.0 adfd19027862 33 minutes > ago 519MB > kolla/haproxy-ssh 15.1.0 514357ac4d36 36 minutes > ago 255MB > kolla/haproxy 15.1.0 e5b9cfdf6dfc 37 minutes > ago 257MB > kolla/prometheus-haproxy-exporter 15.1.0 a679f65fd735 37 minutes > ago 263MB > kolla/prometheus-base 15.1.0 afeff3ed5dce 37 minutes > ago 248MB > kolla/glance-api 15.1.0 a2241f68f23a 38 minutes > ago 1.04GB > kolla/glance-base 15.1.0 7286772a03a4 About an > hour ago 1.03GB > kolla/neutron-infoblox-ipam-agent 15.1.0 f90ffc1a3326 About an > hour ago 1.05GB > kolla/neutron-server 15.1.0 69c844a2e3a9 About an > hour ago 1.05GB > kolla/neutron-l3-agent 15.1.0 4d87e6963c96 About an > hour ago 1.05GB > 486da9a6562e About an > hour ago 1.05GB > kolla/neutron-linuxbridge-agent 15.1.0 e5b3ca7e099c About an > hour ago 1.04GB > kolla/neutron-bgp-dragent 15.1.0 ac37377820c6 About an > hour ago 1.04GB > kolla/ironic-neutron-agent 15.1.0 90993adcd74b About an > hour ago 1.04GB > kolla/neutron-metadata-agent 15.1.0 8522f147f88d About an > hour ago 1.04GB > kolla/neutron-sriov-agent 15.1.0 8a92ce7d13c0 About an > hour ago 1.04GB > kolla/neutron-dhcp-agent 15.1.0 5c214b0171f5 About an > hour ago 1.04GB > kolla/neutron-metering-agent 15.1.0 7b3b91ecd77b About an > hour ago 1.04GB > kolla/neutron-openvswitch-agent 15.1.0 1f8807308814 About an > hour ago 1.04GB > kolla/neutron-base 15.1.0 f85b6a2e2725 About an > hour ago 1.04GB > kolla/nova-libvirt 15.1.0 0f3ecefe4752 About an > hour ago 987MB > kolla/nova-compute 15.1.0 241b7e7fafbe About an > hour ago 1.47GB > kolla/nova-spicehtml5proxy 15.1.0 b740820a7ad1 About an > hour ago 1.15GB > kolla/nova-novncproxy 15.1.0 1ba2f443d5c3 About an > hour ago 1.22GB > kolla/nova-compute-ironic 15.1.0 716612107532 About an > hour ago 1.12GB > kolla/nova-ssh 15.1.0 ae2397f4e1c1 About an > hour ago 1.11GB > kolla/nova-api 15.1.0 2aef02667ff8 About an > hour ago 1.11GB > kolla/nova-conductor 15.1.0 6f1da3400901 About an > hour ago 1.11GB > kolla/nova-scheduler 15.1.0 628326776b1d About an > hour ago 1.11GB > kolla/nova-serialproxy 15.1.0 28eb7a4a13f8 About an > hour ago 1.11GB > kolla/nova-base 15.1.0 e47420013283 About an > hour ago 1.11GB > kolla/keystone 15.1.0 e5530d829d5f 2 hours ago > 947MB > kolla/keystone-ssh 15.1.0 eaa7e3f3985a 2 hours ago > 953MB > kolla/keystone-fernet 15.1.0 8a4fa24853a8 2 hours ago > 951MB > kolla/keystone-base 15.1.0 b6f9562364a9 2 hours ago > 945MB > kolla/barbican-base 15.1.0 b2fdef1afb44 2 hours ago > 915MB > kolla/barbican-keystone-listener 15.1.0 58bd59de2c63 2 hours ago > 915MB > kolla/openstack-base 15.1.0 c805b4b3b1c1 2 hours ago > 893MB > kolla/base 15.1.0 f68e9ef3dd30 2 hours ago > 248MB > registry 2 8db46f9d7550 19 hours ago > 24.2MB > ubuntu 22.04 08d22c0ceb15 3 weeks ago > 77.8MB From skidoo at tlen.pl Fri Mar 31 11:11:55 2023 From: skidoo at tlen.pl (Luk) Date: Fri, 31 Mar 2023 13:11:55 +0200 Subject: Migration from linuxbridge to ovs In-Reply-To: <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de> References: <1253710667.20230330121023@tlen.pl> <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de> Message-ID: <781242298.20230331131155@tlen.pl> Cze??, > On 30/03/2023 12:10, Luk wrote: >> Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ? > I asked a similar questions back in August - https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030070.html, maybe there are some insights there. Thank You, this thread is quite good in this case :) > We did not replace the SDN in place, but as actively looking into setting up a new cloud. Not that we do not believe in the idea of being able to replace the SDN, > but we intend to change much much more and migrating through many big changes is too inefficient compared to replacing the cloud with a new one. It looks the best way... Anyway - there is chance to make live migration between lb and openvswitch, but need to add flows by hand and add proper tag into br-int - and this 'solution' works only for external/provider network. As S?awek pointed out - in case of vxlan connection there is no opportunity to connect neturon ovs controller with linuxbridge compute nodes. -- Pozdrowienia, Lukasz From ralonsoh at redhat.com Fri Mar 31 11:12:37 2023 From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez) Date: Fri, 31 Mar 2023 13:12:37 +0200 Subject: [neutron][ptg] Today's agenda Message-ID: Hello Neutrinos: Today is the last day of the PTG. You can check the agenda in [1]. This is a quick summary: * Open hour for core reviewers: join us! Neutron needs you. * FIPS jobs: status and distro support. If you have a last minute topic, today is the best moment to show up and present it (it doesn't matter if it is not on the agenda). Regards. [1]https://etherpad.opendev.org/p/neutron-bobcat-ptg -------------- next part -------------- An HTML attachment was scrubbed... URL: From pdeore at redhat.com Fri Mar 31 11:25:16 2023 From: pdeore at redhat.com (Pranali Deore) Date: Fri, 31 Mar 2023 16:55:16 +0530 Subject: [Glance] Bobcat PTG Updates Message-ID: Hello Everyone, We have concluded Glance vPTG yesterday and I will share the summary early next week. You can find tentative milestone wise priorities for glance in PTG etherpad[1]. Join us in the weekly meeting coming Thursday if you have any doubts/suggestions. Thanks & Regards, Pranali [1]: https://etherpad.opendev.org/p/glance-bobcat-ptg#L143 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Fri Mar 31 11:25:49 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 31 Mar 2023 11:25:49 +0000 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com> References: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org> <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com> Message-ID: <20230331112549.we4jjivoxgeew2qt@yuggoth.org> On 2023-03-30 19:12:46 -0700 (-0700), Ghanshyam Mann wrote: [...] > This is not related to stable/xena or heat tests. Grenade job > running on immediately supported branch from EM branch where the > base is EM branch using old tempest and stable constraints and > target use master tempest and constraints. When you run tempest on > target, it causes an issue as constraints var are not set properly > for the target. We're still running grenade jobs that test upgrades from stable/wallaby to stable/xena? I thought by policy we dropped those when stable/wallaby entered extended maintenance, and that grenade only intended to support upgrading from maintained stable branches. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sbauza at redhat.com Fri Mar 31 12:00:10 2023 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 31 Mar 2023 14:00:10 +0200 Subject: [nova][ptg] Today's agenda (Friday) Message-ID: Hey folks, Today is our last vPTG day. The agenda is so : 13:00 UTC - 14:45 UTC : * Users reported exhaustion of primary keys ('id') in some large tables like system_metadata. How could we achieve a data migration from sa.Integer to sa.BigInteger ? * Should cold-migrate have a specific new policy like os_compute_api:os-migrate-server:migrate-specify-host to accept non-admins to migrate if they don't provide a host value ? * Discuss the next steps with compute hostname robustification * Disable compute services after being discovered * Limited lower constraint job in nova and placement 14:45 UTC - 15:00 UTC : break 15:00 UTC - 17:00 UTC : * Openstack server show command output (cross-project discussion with openstackSDK contributors) * Should we support dynamically disabling post copy live migration based on vm_state is paused ? * Evacuation with multiple allocations * Bobcat proposed planning (cont.) * trim support for virtio-blk feature or bug? * Summit/PTG : what could we be doing for the physical PTG ? (if we have time) As a reminder, you can look at the topics here : https://etherpad.opendev.org/p/nova-bobcat-ptg#L426 and you can add your IRC nick in the courtesy ping list if you want to be around. Hope you had a good week and see you in an hour by now ! -Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From kozhukalov at gmail.com Fri Mar 31 12:33:52 2023 From: kozhukalov at gmail.com (Vladimir Kozhukalov) Date: Fri, 31 Mar 2023 15:33:52 +0300 Subject: [openstack-helm][ptg] agenda changes for (Fri) Mar/31/2023 Message-ID: Dear helmers, Since we don't have any new topics in the etherpad [1] to discuss apart from those we discussed yesterday I think we can unbook our meeting room. Thanks for attending yesterday. I'll send the summary of our discussions later. [1] https://etherpad.opendev.org/p/march2023-ptg-openstack-helm -- Best regards, Kozhukalov Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Fri Mar 31 12:41:33 2023 From: kennelson11 at gmail.com (Kendall Nelson) Date: Fri, 31 Mar 2023 07:41:33 -0500 Subject: Reminder! In Person PTG 2023 Team Signup Deadline Message-ID: Hello Everyone, This is the last call to sign your team up for the *in-person* Project Teams Gathering (PTG) happening in Vancouver at the OpenInfra Summit! If you haven't already done so and your team is interested in participating, please complete the survey[1] by April 2nd, 2023 at 7:00 UTC. Registration for the PTG is included as a part of registration for the OpenInfra Summit in Vancouver. Prices increase May 5th so register soon! Thanks! -Kendall (diablo_rojo) [1] Team Survey: https://openinfrafoundation.formstack.com/forms/june2023_ptg_survey [2] Summit Registration: https://vancouver2023.openinfra.dev/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Fri Mar 31 13:25:25 2023 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Fri, 31 Mar 2023 06:25:25 -0700 Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints In-Reply-To: <20230331112549.we4jjivoxgeew2qt@yuggoth.org> References: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org> <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com> <20230331112549.we4jjivoxgeew2qt@yuggoth.org> Message-ID: <18737d827d2.112ab3364239474.9216356833253892791@ghanshyammann.com> ---- On Fri, 31 Mar 2023 04:25:49 -0700 Jeremy Stanley wrote --- > On 2023-03-30 19:12:46 -0700 (-0700), Ghanshyam Mann wrote: > [...] > > This is not related to stable/xena or heat tests. Grenade job > > running on immediately supported branch from EM branch where the > > base is EM branch using old tempest and stable constraints and > > target use master tempest and constraints. When you run tempest on > > target, it causes an issue as constraints var are not set properly > > for the target. > > We're still running grenade jobs that test upgrades from > stable/wallaby to stable/xena? I thought by policy we dropped those > when stable/wallaby entered extended maintenance, and that grenade > only intended to support upgrading from maintained stable branches. We do not need to run as you mentioned, but we try to keep it running as long as it can pass. This is my last attempt to fix on EM upgrade and in the next failure, I need to stop fixing and better to stop the grenade there. -gmann > -- > Jeremy Stanley > From satish.txt at gmail.com Fri Mar 31 13:53:31 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 31 Mar 2023 09:53:31 -0400 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: Thank Michal, I have posted commands in my original post which have distribution Ubuntu and release zed. ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon ) I can definitely open a new bug but it looks like vishal already on it. Are there any workarounds or interim solutions? I am new to the kolla-image building process so I'm not sure where I should change the setup tool version to move on. Very curious how the CI-CD pipeline passed this bug? On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka wrote: > Hi Satish, > > Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this? > > You have also not mentioned what distribution and Kolla release are you > using, so please do that in the bug report. > Looking at the output probably it?s stable/yoga and Debian - being fixed > in https://review.opendev.org/c/openstack/kolla/+/873913 > > Michal > > On 31 Mar 2023, at 05:05, Satish Patel wrote: > > Folks, > > All other images build successfully but when i am trying to build horizon > which failed with following error: > > $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed > horizon > > > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: > XStatic-Font-Awesome>=4.7.0.0 in > /var/lib/kolla/venv/lib/python3.10/site-packages (from > vitrage-dashboard==3.6.1.dev2) (4.7.0.0) > INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) > INFO:kolla.common.utils.horizon: > ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Requirement already satisfied: > XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages > (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) > INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 > INFO:kolla.common.utils.horizon: Downloading > XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) > INFO:kolla.common.utils.horizon: > ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 > INFO:kolla.common.utils.horizon: Downloading > XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) > INFO:kolla.common.utils.horizon: > ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon:Collecting > XStatic-Moment-Timezone>=0.5.22.0 > INFO:kolla.common.utils.horizon: Downloading > XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) > INFO:kolla.common.utils.horizon: > ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started > INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished > with status 'error' > INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run > successfully. > INFO:kolla.common.utils.horizon: ? exit code: 1 > INFO:kolla.common.utils.horizon: ??> [6 lines of output] > INFO:kolla.common.utils.horizon: Traceback (most recent call last): > INFO:kolla.common.utils.horizon: File "", line 2, in > > INFO:kolla.common.utils.horizon: File "", > line 34, in > INFO:kolla.common.utils.horizon: File > "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", > line 2, in > INFO:kolla.common.utils.horizon: from xstatic.pkg import > moment_timezone as xs > INFO:kolla.common.utils.horizon: ImportError: cannot import name > 'moment_timezone' from 'xstatic.pkg' (unknown location) > INFO:kolla.common.utils.horizon: [end of output] > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon: note: This error originates from a > subprocess, and is likely not a problem with pip. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:error: metadata-generation-failed > INFO:kolla.common.utils.horizon:? Encountered error while generating > package metadata. > INFO:kolla.common.utils.horizon:??> See above for output. > INFO:kolla.common.utils.horizon:note: This is an issue with the package > mentioned above, not pip. > INFO:kolla.common.utils.horizon:hint: See above for details. > INFO:kolla.common.utils.horizon: > INFO:kolla.common.utils.horizon:Removing intermediate container > e6cd437ba529 > ERROR:kolla.common.utils.horizon:Error'd with the following message > ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s > horizon-source/* horizon && sed -i /^horizon=/d > /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib > python3 -m pip --no-cache-dir install --upgrade -c > /requirements/upper-constraints.txt /horizon && mkdir -p > /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* > /etc/openstack-dashboard/ && cp > /horizon/openstack_dashboard/local/local_settings.py.example > /etc/openstack-dashboard/local_settings && cp /horizon/manage.py > /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then > SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir > install --upgrade -c /requirements/upper-constraints.txt /plugins/*; > fi && for locale in > /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do > (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) > done && chmod 644 /usr/local/bin/kolla_extend_start' returned a > non-zero code: 1 > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:Successfully built images > INFO:kolla.common.utils:========================= > INFO:kolla.common.utils:base > INFO:kolla.common.utils:openstack-base > INFO:kolla.common.utils:=========================== > INFO:kolla.common.utils:Images that failed to build > INFO:kolla.common.utils:=========================== > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 31 13:59:04 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 31 Mar 2023 15:59:04 +0200 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: Hi Satish, Vishal mentioned a bug that I raised in Horizon, but we have been pinning to earlier setuptools in Kolla builds just because of that (and that?s the workaround). Are you using kolla from PyPI or the latest stable/zed checkout from Git? We recommend the latter. Michal > On 31 Mar 2023, at 15:53, Satish Patel wrote: > > Thank Michal, > > I have posted commands in my original post which have distribution Ubuntu and release zed. ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon ) > > I can definitely open a new bug but it looks like vishal already on it. Are there any workarounds or interim solutions? I am new to the kolla-image building process so I'm not sure where I should change the setup tool version to move on. > > Very curious how the CI-CD pipeline passed this bug? > > > On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka > wrote: >> Hi Satish, >> >> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla ) for this? >> >> You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report. >> Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913 >> >> Michal >> >>> On 31 Mar 2023, at 05:05, Satish Patel > wrote: >>> >>> Folks, >>> >>> All other images build successfully but when i am trying to build horizon which failed with following error: >>> >>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon >>> >>> >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 >>> INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) >>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 >>> INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) >>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0) >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 >>> INFO:kolla.common.utils.horizon: Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) >>> INFO:kolla.common.utils.horizon: ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) >>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 >>> INFO:kolla.common.utils.horizon: Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) >>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 >>> INFO:kolla.common.utils.horizon: Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) >>> INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0 >>> INFO:kolla.common.utils.horizon: Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) >>> INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started >>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished with status 'error' >>> INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run successfully. >>> INFO:kolla.common.utils.horizon: ? exit code: 1 >>> INFO:kolla.common.utils.horizon: ??> [6 lines of output] >>> INFO:kolla.common.utils.horizon: Traceback (most recent call last): >>> INFO:kolla.common.utils.horizon: File "", line 2, in >>> INFO:kolla.common.utils.horizon: File "", line 34, in >>> INFO:kolla.common.utils.horizon: File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in >>> INFO:kolla.common.utils.horizon: from xstatic.pkg import moment_timezone as xs >>> INFO:kolla.common.utils.horizon: ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location) >>> INFO:kolla.common.utils.horizon: [end of output] >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon: note: This error originates from a subprocess, and is likely not a problem with pip. >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed >>> INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata. >>> INFO:kolla.common.utils.horizon:??> See above for output. >>> INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip. >>> INFO:kolla.common.utils.horizon:hint: See above for details. >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529 >>> ERROR:kolla.common.utils.horizon:Error'd with the following message >>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon && sed -i /^horizon=/d /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon && mkdir -p /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/ && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*; fi && for locale in /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) done && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1 >>> INFO:kolla.common.utils:========================= >>> INFO:kolla.common.utils:Successfully built images >>> INFO:kolla.common.utils:========================= >>> INFO:kolla.common.utils:base >>> INFO:kolla.common.utils:openstack-base >>> INFO:kolla.common.utils:=========================== >>> INFO:kolla.common.utils:Images that failed to build >>> INFO:kolla.common.utils:=========================== >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 31 14:25:03 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 31 Mar 2023 10:25:03 -0400 Subject: [kolla] Image building question In-Reply-To: References: Message-ID: Thank you Sean, What a wonderful explanation of the process. Yes I can download images from the public domain and push them to a local repository but in some cases I would like to add my own tools like monitoring agents, utilities etc for debugging so i decided to build my own images. I believe https://tarballs.opendev.org is the right place to source software correctly? If I want to add some tools or packages inside images then I should use Dockerfile.j2 to add and compile images. correct? ~S On Fri, Mar 31, 2023 at 7:01?AM Sean Mooney wrote: > On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote: > > Folks, > > > > I am playing with kolla image building to understand how it works. I am > > using the following command to build images and wanted to check with you > > folks if that is the correct way to do it. > > > > $ kolla-build -b ubuntu -t source keystone nova neutron glance > > > > Does the above command compile code from source or just download images > > from remote repositories and re-compile them? > > > openstack is mainly python so in general ther is no complie step. > but to answer your question that builds the image using the source tarballs > or the openstakc packages. > > the defaults soruce locations are rendered into a file which you can > override > from the data stored in > https://github.com/openstack/kolla/blob/master/kolla/common/sources.py > the other build config defaults are generated form this code > https://github.com/openstack/kolla/blob/master/kolla/common/config.py > > when you invoke kolla-build its executing > https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py > but the main build workflow is here > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95 > > the tl;dr is the build worklow starts by creating build director and > locating the docker file templats. > in otherwords the content of the > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker > directory > > each project has a direcoty in the docker directory and then each > contaienr that project has has a directory in the project directory > > so the aodh project has a aodh folder > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh > the convention is to have a -base contaienr which handels the > depency installation and then one addtional contaienr for each binary deamon > the project has i.e. aodh-api > > the name of the folder in teh project dir is used as the name of the > contaienr > > if we look at the content of the docker files we will see that they are > not actuly dockerfiles > > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2 > > they are jinja2 templates that produce docker files > > kolla as far as i am aware has drop support for binary images and > alternitiv distos > > but looking at an older release we can se ehow this worked > > https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52 > > each docker file template would use the jinja2 to generate a set of > concreate docker files form the template > and make dession based on the parmater passed in. > > so when you are invokeing > kolla-build -b ubuntu -t source keystone nova neutron glance > > what actully happening is that the -t flag is being set as teh > install_type parmater in the the jinja2 environemtn when > the docker file is rendered. > > after all the docer files are rendered into normal docker files kolla just > invokes the build. > > in the case of a source build that inovles pre fetching the source tar > from https://tarballs.opendev.org > and puting it in the build directory so that it can be included into the > contianer. > > kolla also used to supprot git repo as a alternitve source fromat > > i have glossed over a lot of the details of how this actully work but that > is the essence of what that command is doing > creating a build dir, downloading the source, rendering the dockerfile > templates to docker files, invokeing docker build on those > and then taging them with the contaienr nameand build tag > > > https://docs.openstack.org/kolla/latest/admin/image-building.html > covers this form a high level > > > because in command output > > I've not noticed anything related to the compiling process going on. > > > > Here is the output of all images produced by kolla-build command. Do I > need > > anything else or is this enough to deploy kolla? > you can deploy coll with what you have yes although since the kolla files > are automaticaly > built by ci kolla-ansible can just use the ones form the docker hub or > quay instead you do not need to build them yourself > > if you do build them your self then there is basically one other stpe that > you shoudl take if this si a multi node deployment > you should push the iamges to an interally host docker registry although > based on the hostname in the prompt below > it looks like you ahve alredy done that. > > > > root at docker-reg:~# docker images > > REPOSITORY TAG IMAGE ID CREATED > > SIZE > > kolla/mariadb-server 15.1.0 2a497eee8269 26 minutes > > ago 595MB > > kolla/cron 15.1.0 342877f26a8a 30 minutes > > ago 250MB > > kolla/memcached 15.1.0 0d19a4902644 31 minutes > > ago 250MB > > kolla/mariadb-clustercheck 15.1.0 d84427d3c639 31 minutes > > ago 314MB > > kolla/mariadb-base 15.1.0 34447e3e59b6 31 minutes > > ago 314MB > > kolla/keepalived 15.1.0 82133b09fbf0 31 minutes > > ago 260MB > > kolla/prometheus-memcached-exporter 15.1.0 6c2d605f70ee 31 minutes > > ago 262MB > > e66b228c2a07 31 minutes > > ago 248MB > > kolla/rabbitmq 15.1.0 8de5c39379d3 32 minutes > > ago 309MB > > kolla/fluentd 15.1.0 adfd19027862 33 minutes > > ago 519MB > > kolla/haproxy-ssh 15.1.0 514357ac4d36 36 minutes > > ago 255MB > > kolla/haproxy 15.1.0 e5b9cfdf6dfc 37 minutes > > ago 257MB > > kolla/prometheus-haproxy-exporter 15.1.0 a679f65fd735 37 minutes > > ago 263MB > > kolla/prometheus-base 15.1.0 afeff3ed5dce 37 minutes > > ago 248MB > > kolla/glance-api 15.1.0 a2241f68f23a 38 minutes > > ago 1.04GB > > kolla/glance-base 15.1.0 7286772a03a4 About an > > hour ago 1.03GB > > kolla/neutron-infoblox-ipam-agent 15.1.0 f90ffc1a3326 About an > > hour ago 1.05GB > > kolla/neutron-server 15.1.0 69c844a2e3a9 About an > > hour ago 1.05GB > > kolla/neutron-l3-agent 15.1.0 4d87e6963c96 About an > > hour ago 1.05GB > > 486da9a6562e About an > > hour ago 1.05GB > > kolla/neutron-linuxbridge-agent 15.1.0 e5b3ca7e099c About an > > hour ago 1.04GB > > kolla/neutron-bgp-dragent 15.1.0 ac37377820c6 About an > > hour ago 1.04GB > > kolla/ironic-neutron-agent 15.1.0 90993adcd74b About an > > hour ago 1.04GB > > kolla/neutron-metadata-agent 15.1.0 8522f147f88d About an > > hour ago 1.04GB > > kolla/neutron-sriov-agent 15.1.0 8a92ce7d13c0 About an > > hour ago 1.04GB > > kolla/neutron-dhcp-agent 15.1.0 5c214b0171f5 About an > > hour ago 1.04GB > > kolla/neutron-metering-agent 15.1.0 7b3b91ecd77b About an > > hour ago 1.04GB > > kolla/neutron-openvswitch-agent 15.1.0 1f8807308814 About an > > hour ago 1.04GB > > kolla/neutron-base 15.1.0 f85b6a2e2725 About an > > hour ago 1.04GB > > kolla/nova-libvirt 15.1.0 0f3ecefe4752 About an > > hour ago 987MB > > kolla/nova-compute 15.1.0 241b7e7fafbe About an > > hour ago 1.47GB > > kolla/nova-spicehtml5proxy 15.1.0 b740820a7ad1 About an > > hour ago 1.15GB > > kolla/nova-novncproxy 15.1.0 1ba2f443d5c3 About an > > hour ago 1.22GB > > kolla/nova-compute-ironic 15.1.0 716612107532 About an > > hour ago 1.12GB > > kolla/nova-ssh 15.1.0 ae2397f4e1c1 About an > > hour ago 1.11GB > > kolla/nova-api 15.1.0 2aef02667ff8 About an > > hour ago 1.11GB > > kolla/nova-conductor 15.1.0 6f1da3400901 About an > > hour ago 1.11GB > > kolla/nova-scheduler 15.1.0 628326776b1d About an > > hour ago 1.11GB > > kolla/nova-serialproxy 15.1.0 28eb7a4a13f8 About an > > hour ago 1.11GB > > kolla/nova-base 15.1.0 e47420013283 About an > > hour ago 1.11GB > > kolla/keystone 15.1.0 e5530d829d5f 2 hours > ago > > 947MB > > kolla/keystone-ssh 15.1.0 eaa7e3f3985a 2 hours > ago > > 953MB > > kolla/keystone-fernet 15.1.0 8a4fa24853a8 2 hours > ago > > 951MB > > kolla/keystone-base 15.1.0 b6f9562364a9 2 hours > ago > > 945MB > > kolla/barbican-base 15.1.0 b2fdef1afb44 2 hours > ago > > 915MB > > kolla/barbican-keystone-listener 15.1.0 58bd59de2c63 2 hours > ago > > 915MB > > kolla/openstack-base 15.1.0 c805b4b3b1c1 2 hours > ago > > 893MB > > kolla/base 15.1.0 f68e9ef3dd30 2 hours > ago > > 248MB > > registry 2 8db46f9d7550 19 hours > ago > > 24.2MB > > ubuntu 22.04 08d22c0ceb15 3 weeks > ago > > 77.8MB > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 31 14:27:50 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 31 Mar 2023 10:27:50 -0400 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: Hi Micha?, This is my sandbox environment so I did "pip install kolla" and started building images. How do I check out specific stable/zed or tag releases to build images? ~S On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka wrote: > Hi Satish, > > Vishal mentioned a bug that I raised in Horizon, but we have been pinning > to earlier setuptools in Kolla builds just because of that (and that?s the > workaround). > Are you using kolla from PyPI or the latest stable/zed checkout from Git? > We recommend the latter. > > Michal > > On 31 Mar 2023, at 15:53, Satish Patel wrote: > > Thank Michal, > > I have posted commands in my original post which have distribution Ubuntu > and release zed. ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t > source --tag zed horizon ) > > I can definitely open a new bug but it looks like vishal already on it. > Are there any workarounds or interim solutions? I am new to the kolla-image > building process so I'm not sure where I should change the setup tool > version to move on. > > Very curious how the CI-CD pipeline passed this bug? > > > On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka > wrote: > >> Hi Satish, >> >> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this? >> >> You have also not mentioned what distribution and Kolla release are you >> using, so please do that in the bug report. >> Looking at the output probably it?s stable/yoga and Debian - being fixed >> in https://review.opendev.org/c/openstack/kolla/+/873913 >> >> Michal >> >> On 31 Mar 2023, at 05:05, Satish Patel wrote: >> >> Folks, >> >> All other images build successfully but when i am trying to build horizon >> which failed with following error: >> >> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed >> horizon >> >> >> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) >> INFO:kolla.common.utils.horizon: >> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) >> INFO:kolla.common.utils.horizon: >> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon:Requirement already satisfied: >> XStatic-Font-Awesome>=4.7.0.0 in >> /var/lib/kolla/venv/lib/python3.10/site-packages (from >> vitrage-dashboard==3.6.1.dev2) (4.7.0.0) >> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) >> INFO:kolla.common.utils.horizon: >> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon:Requirement already satisfied: >> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages >> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) >> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) >> INFO:kolla.common.utils.horizon: >> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) >> INFO:kolla.common.utils.horizon: >> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon:Collecting >> XStatic-Moment-Timezone>=0.5.22.0 >> INFO:kolla.common.utils.horizon: Downloading >> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) >> INFO:kolla.common.utils.horizon: >> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 >> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started >> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished >> with status 'error' >> INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error >> INFO:kolla.common.utils.horizon: >> INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run >> successfully. >> INFO:kolla.common.utils.horizon: ? exit code: 1 >> INFO:kolla.common.utils.horizon: ??> [6 lines of output] >> INFO:kolla.common.utils.horizon: Traceback (most recent call last): >> INFO:kolla.common.utils.horizon: File "", line 2, in >> >> INFO:kolla.common.utils.horizon: File "", >> line 34, in >> INFO:kolla.common.utils.horizon: File >> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", >> line 2, in >> INFO:kolla.common.utils.horizon: from xstatic.pkg import >> moment_timezone as xs >> INFO:kolla.common.utils.horizon: ImportError: cannot import name >> 'moment_timezone' from 'xstatic.pkg' (unknown location) >> INFO:kolla.common.utils.horizon: [end of output] >> INFO:kolla.common.utils.horizon: >> INFO:kolla.common.utils.horizon: note: This error originates from a >> subprocess, and is likely not a problem with pip. >> INFO:kolla.common.utils.horizon: >> INFO:kolla.common.utils.horizon:error: metadata-generation-failed >> INFO:kolla.common.utils.horizon:? Encountered error while generating >> package metadata. >> INFO:kolla.common.utils.horizon:??> See above for output. >> INFO:kolla.common.utils.horizon:note: This is an issue with the package >> mentioned above, not pip. >> INFO:kolla.common.utils.horizon:hint: See above for details. >> INFO:kolla.common.utils.horizon: >> INFO:kolla.common.utils.horizon:Removing intermediate container >> e6cd437ba529 >> ERROR:kolla.common.utils.horizon:Error'd with the following message >> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s >> horizon-source/* horizon && sed -i /^horizon=/d >> /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib >> python3 -m pip --no-cache-dir install --upgrade -c >> /requirements/upper-constraints.txt /horizon && mkdir -p >> /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* >> /etc/openstack-dashboard/ && cp >> /horizon/openstack_dashboard/local/local_settings.py.example >> /etc/openstack-dashboard/local_settings && cp /horizon/manage.py >> /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then >> SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir >> install --upgrade -c /requirements/upper-constraints.txt /plugins/*; >> fi && for locale in >> /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do >> (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) >> done && chmod 644 /usr/local/bin/kolla_extend_start' returned a >> non-zero code: 1 >> INFO:kolla.common.utils:========================= >> INFO:kolla.common.utils:Successfully built images >> INFO:kolla.common.utils:========================= >> INFO:kolla.common.utils:base >> INFO:kolla.common.utils:openstack-base >> INFO:kolla.common.utils:=========================== >> INFO:kolla.common.utils:Images that failed to build >> INFO:kolla.common.utils:=========================== >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnasiadka at gmail.com Fri Mar 31 14:33:54 2023 From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=) Date: Fri, 31 Mar 2023 16:33:54 +0200 Subject: [kolla] horizon image build failed In-Reply-To: References: Message-ID: <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com> Hi Satish, git clone https://opendev.org/openstack/kolla -b stable/zed cd kolla pip3 install . I think we should amend the docs a bit to make it easier - thanks for pointing out. Michal > On 31 Mar 2023, at 16:27, Satish Patel wrote: > > Hi Micha?, > > This is my sandbox environment so I did "pip install kolla" and started building images. How do I check out specific stable/zed or tag releases to build images? > > ~S > > On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka > wrote: >> Hi Satish, >> >> Vishal mentioned a bug that I raised in Horizon, but we have been pinning to earlier setuptools in Kolla builds just because of that (and that?s the workaround). >> Are you using kolla from PyPI or the latest stable/zed checkout from Git? We recommend the latter. >> >> Michal >> >>> On 31 Mar 2023, at 15:53, Satish Patel > wrote: >>> >>> Thank Michal, >>> >>> I have posted commands in my original post which have distribution Ubuntu and release zed. ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon ) >>> >>> I can definitely open a new bug but it looks like vishal already on it. Are there any workarounds or interim solutions? I am new to the kolla-image building process so I'm not sure where I should change the setup tool version to move on. >>> >>> Very curious how the CI-CD pipeline passed this bug? >>> >>> >>> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka > wrote: >>>> Hi Satish, >>>> >>>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla ) for this? >>>> >>>> You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report. >>>> Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913 >>>> >>>> Michal >>>> >>>>> On 31 Mar 2023, at 05:05, Satish Patel > wrote: >>>>> >>>>> Folks, >>>>> >>>>> All other images build successfully but when i am trying to build horizon which failed with following error: >>>>> >>>>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon >>>>> >>>>> >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) >>>>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) >>>>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0) >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) >>>>> INFO:kolla.common.utils.horizon: ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) >>>>> INFO:kolla.common.utils.horizon: ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) >>>>> INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0 >>>>> INFO:kolla.common.utils.horizon: Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) >>>>> INFO:kolla.common.utils.horizon: ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 >>>>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started >>>>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): finished with status 'error' >>>>> INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error >>>>> INFO:kolla.common.utils.horizon: >>>>> INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run successfully. >>>>> INFO:kolla.common.utils.horizon: ? exit code: 1 >>>>> INFO:kolla.common.utils.horizon: ??> [6 lines of output] >>>>> INFO:kolla.common.utils.horizon: Traceback (most recent call last): >>>>> INFO:kolla.common.utils.horizon: File "", line 2, in >>>>> INFO:kolla.common.utils.horizon: File "", line 34, in >>>>> INFO:kolla.common.utils.horizon: File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in >>>>> INFO:kolla.common.utils.horizon: from xstatic.pkg import moment_timezone as xs >>>>> INFO:kolla.common.utils.horizon: ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location) >>>>> INFO:kolla.common.utils.horizon: [end of output] >>>>> INFO:kolla.common.utils.horizon: >>>>> INFO:kolla.common.utils.horizon: note: This error originates from a subprocess, and is likely not a problem with pip. >>>>> INFO:kolla.common.utils.horizon: >>>>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed >>>>> INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata. >>>>> INFO:kolla.common.utils.horizon:??> See above for output. >>>>> INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip. >>>>> INFO:kolla.common.utils.horizon:hint: See above for details. >>>>> INFO:kolla.common.utils.horizon: >>>>> INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529 >>>>> ERROR:kolla.common.utils.horizon:Error'd with the following message >>>>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon && sed -i /^horizon=/d /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon && mkdir -p /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/ && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*; fi && for locale in /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) done && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1 >>>>> INFO:kolla.common.utils:========================= >>>>> INFO:kolla.common.utils:Successfully built images >>>>> INFO:kolla.common.utils:========================= >>>>> INFO:kolla.common.utils:base >>>>> INFO:kolla.common.utils:openstack-base >>>>> INFO:kolla.common.utils:=========================== >>>>> INFO:kolla.common.utils:Images that failed to build >>>>> INFO:kolla.common.utils:=========================== >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From satish.txt at gmail.com Fri Mar 31 14:45:20 2023 From: satish.txt at gmail.com (Satish Patel) Date: Fri, 31 Mar 2023 10:45:20 -0400 Subject: [kolla] horizon image build failed In-Reply-To: <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com> References: <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com> Message-ID: Awesome, thanks! This is easy, and +++1 to amend that in docs :) On Fri, Mar 31, 2023 at 10:34?AM Micha? Nasiadka wrote: > Hi Satish, > > git clone https://opendev.org/openstack/kolla -b stable/zed > cd kolla > pip3 install . > > I think we should amend the docs a bit to make it easier - thanks for > pointing out. > > Michal > > On 31 Mar 2023, at 16:27, Satish Patel wrote: > > Hi Micha?, > > This is my sandbox environment so I did "pip install kolla" and started > building images. How do I check out specific stable/zed or tag releases to > build images? > > ~S > > On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka > wrote: > >> Hi Satish, >> >> Vishal mentioned a bug that I raised in Horizon, but we have been pinning >> to earlier setuptools in Kolla builds just because of that (and that?s the >> workaround). >> Are you using kolla from PyPI or the latest stable/zed checkout from Git? >> We recommend the latter. >> >> Michal >> >> On 31 Mar 2023, at 15:53, Satish Patel wrote: >> >> Thank Michal, >> >> I have posted commands in my original post which have distribution Ubuntu >> and release zed. ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t >> source --tag zed horizon ) >> >> I can definitely open a new bug but it looks like vishal already on it. >> Are there any workarounds or interim solutions? I am new to the kolla-image >> building process so I'm not sure where I should change the setup tool >> version to move on. >> >> Very curious how the CI-CD pipeline passed this bug? >> >> >> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka >> wrote: >> >>> Hi Satish, >>> >>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this? >>> >>> You have also not mentioned what distribution and Kolla release are you >>> using, so please do that in the bug report. >>> Looking at the output probably it?s stable/yoga and Debian - being fixed >>> in https://review.opendev.org/c/openstack/kolla/+/873913 >>> >>> Michal >>> >>> On 31 Mar 2023, at 05:05, Satish Patel wrote: >>> >>> Folks, >>> >>> All other images build successfully but when i am trying to build >>> horizon which failed with following error: >>> >>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed >>> horizon >>> >>> >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB) >>> INFO:kolla.common.utils.horizon: >>> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB) >>> INFO:kolla.common.utils.horizon: >>> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Requirement already satisfied: >>> XStatic-Font-Awesome>=4.7.0.0 in >>> /var/lib/kolla/venv/lib/python3.10/site-packages (from >>> vitrage-dashboard==3.6.1.dev2) (4.7.0.0) >>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB) >>> INFO:kolla.common.utils.horizon: >>> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Requirement already satisfied: >>> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages >>> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1) >>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB) >>> INFO:kolla.common.utils.horizon: >>> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB) >>> INFO:kolla.common.utils.horizon: >>> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon:Collecting >>> XStatic-Moment-Timezone>=0.5.22.0 >>> INFO:kolla.common.utils.horizon: Downloading >>> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB) >>> INFO:kolla.common.utils.horizon: >>> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00 >>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): started >>> INFO:kolla.common.utils.horizon: Preparing metadata (setup.py): >>> finished with status 'error' >>> INFO:kolla.common.utils.horizon: error: subprocess-exited-with-error >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon: ? python setup.py egg_info did not run >>> successfully. >>> INFO:kolla.common.utils.horizon: ? exit code: 1 >>> INFO:kolla.common.utils.horizon: ??> [6 lines of output] >>> INFO:kolla.common.utils.horizon: Traceback (most recent call last): >>> INFO:kolla.common.utils.horizon: File "", line 2, in >>> >>> INFO:kolla.common.utils.horizon: File "", >>> line 34, in >>> INFO:kolla.common.utils.horizon: File >>> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", >>> line 2, in >>> INFO:kolla.common.utils.horizon: from xstatic.pkg import >>> moment_timezone as xs >>> INFO:kolla.common.utils.horizon: ImportError: cannot import name >>> 'moment_timezone' from 'xstatic.pkg' (unknown location) >>> INFO:kolla.common.utils.horizon: [end of output] >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon: note: This error originates from a >>> subprocess, and is likely not a problem with pip. >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed >>> INFO:kolla.common.utils.horizon:? Encountered error while generating >>> package metadata. >>> INFO:kolla.common.utils.horizon:??> See above for output. >>> INFO:kolla.common.utils.horizon:note: This is an issue with the package >>> mentioned above, not pip. >>> INFO:kolla.common.utils.horizon:hint: See above for details. >>> INFO:kolla.common.utils.horizon: >>> INFO:kolla.common.utils.horizon:Removing intermediate container >>> e6cd437ba529 >>> ERROR:kolla.common.utils.horizon:Error'd with the following message >>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s >>> horizon-source/* horizon && sed -i /^horizon=/d >>> /requirements/upper-constraints.txt && SETUPTOOLS_USE_DISTUTILS=stdlib >>> python3 -m pip --no-cache-dir install --upgrade -c >>> /requirements/upper-constraints.txt /horizon && mkdir -p >>> /etc/openstack-dashboard && cp -r /horizon/openstack_dashboard/conf/* >>> /etc/openstack-dashboard/ && cp >>> /horizon/openstack_dashboard/local/local_settings.py.example >>> /etc/openstack-dashboard/local_settings && cp /horizon/manage.py >>> /var/lib/kolla/venv/bin/manage.py && if [ "$(ls /plugins)" ]; then >>> SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir >>> install --upgrade -c /requirements/upper-constraints.txt /plugins/*; >>> fi && for locale in >>> /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do >>> (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages) >>> done && chmod 644 /usr/local/bin/kolla_extend_start' returned a >>> non-zero code: 1 >>> INFO:kolla.common.utils:========================= >>> INFO:kolla.common.utils:Successfully built images >>> INFO:kolla.common.utils:========================= >>> INFO:kolla.common.utils:base >>> INFO:kolla.common.utils:openstack-base >>> INFO:kolla.common.utils:=========================== >>> INFO:kolla.common.utils:Images that failed to build >>> INFO:kolla.common.utils:=========================== >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.rosser at rd.bbc.co.uk Fri Mar 31 14:49:13 2023 From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser) Date: Fri, 31 Mar 2023 15:49:13 +0100 Subject: [ironic] ARM Support in CI: Call for vendors / contributors / interested parties In-Reply-To: References: Message-ID: <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk> I have Ironic working with Supermicro MegaDC / Ampere CPU in a R12SPD-A system board using the ipmi driver. Jon. On 29/03/2023 19:39, Jay Faulkner wrote: > Hi stackers, > > Ironic has published an experimental Ironic Python Agent image for > ARM64 > (https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/) > and discussed promoting this image to supported via CI testing. > However, we have a problem: there are no Ironic developers with easy > access to ARM hardware at the moment, and no Ironic developers with > free time to commit to improving our support of ARM hardware. > > So we're putting out a call for help: > - If you're a hardware vendor and want your ARM hardware supported? > Please come talk to the Ironic community about setting up third-party-CI. > - Are you an operator or contributor from a company invested in ARM > bare metal? Please come join the Ironic community to help us build > this support. > > Thanks, > Jay Faulkner > Ironic PTL > > From elod.illes at est.tech Fri Mar 31 15:08:26 2023 From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Fri, 31 Mar 2023 15:08:26 +0000 Subject: [release] Release countdown for week R-26, Apr 03 - 07 Message-ID: Hi, Welcome back to the release countdown emails! These will be sent at major points in the 2023.2 Bobcat development cycle, which should conclude with a final release on October 4th, 2023. Development Focus ----------------- At this stage in the release cycle, focus should be on planning the 2023.2 Bobcat development cycle and approving 2023.2 Bobcat specs. General Information ------------------- 2023.2 Bobcat is a 28 weeks long development cycle. In case you haven't seen it yet, please take a look over the schedule for this release: https://releases.openstack.org/bobcat/schedule.html By default, the team PTL is responsible for handling the release cycle and approving release requests. This task can (and probably should) be delegated to release liaisons. Now is a good time to review release liaison information for your team and make sure it is up to date: https://opendev.org/openstack/releases/src/branch/master/data/release_liaisons.yaml By default, all your team deliverables from the 2023.1 Antelope release are continued in 2023.2 Bobcat with a similar release model. Upcoming Deadlines & Dates -------------------------- Bobcat-1 milestone: May 11th, 2023 OpenInfra Summit + PTG Vancouver 2023 - June 13-15, 2023 El?d Ill?s irc: elodilles @ #openstack-release -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlandy at redhat.com Fri Mar 31 15:59:27 2023 From: rlandy at redhat.com (Ronelle Landy) Date: Fri, 31 Mar 2023 11:59:27 -0400 Subject: [tripleo] Removal of TripleO Master Integration and Component Lines Message-ID: Hello All, Removal of TripleO Master Integration and Component Lines Per the decision to not maintain TripleO after the Zed release [1], the master/main integration and component lines are being removed in the following patches: https://review.rdoproject.org/r/c/config/+/48073 https://review.rdoproject.org/r/c/config/+/48074 https://review.rdoproject.org/r/c/rdo-jobs/+/48075 The last promoted release of master through TripleO is: https://trunk.rdoproject.org/centos9-master/current-tripleo/delorean.repo (hash: ddce25bad764dde7e0515094b4d40471), which was promoted on 03/28/2023. Check/gate testing for the master branch is in process of being removed as well. [1] https://review.opendev.org/c/openstack/governance/+/878799 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay at gr-oss.io Fri Mar 31 16:01:24 2023 From: jay at gr-oss.io (Jay Faulkner) Date: Fri, 31 Mar 2023 09:01:24 -0700 Subject: [ironic] ARM Support in CI: Call for vendors / contributors / interested parties In-Reply-To: <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk> References: <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk> Message-ID: Thanks for responding, Jonathan! Did you have to make any downstream changes to Ironic for this to work? Are you using our published ARM64 image or using their own? Thanks, Jay Faulkner Ironic PTL On Fri, Mar 31, 2023 at 7:56?AM Jonathan Rosser < jonathan.rosser at rd.bbc.co.uk> wrote: > I have Ironic working with Supermicro MegaDC / Ampere CPU in a R12SPD-A > system board using the ipmi driver. > > Jon. > > On 29/03/2023 19:39, Jay Faulkner wrote: > > Hi stackers, > > > > Ironic has published an experimental Ironic Python Agent image for > > ARM64 > > ( > https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/) > > > and discussed promoting this image to supported via CI testing. > > However, we have a problem: there are no Ironic developers with easy > > access to ARM hardware at the moment, and no Ironic developers with > > free time to commit to improving our support of ARM hardware. > > > > So we're putting out a call for help: > > - If you're a hardware vendor and want your ARM hardware supported? > > Please come talk to the Ironic community about setting up third-party-CI. > > - Are you an operator or contributor from a company invested in ARM > > bare metal? Please come join the Ironic community to help us build > > this support. > > > > Thanks, > > Jay Faulkner > > Ironic PTL > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knikolla at bu.edu Fri Mar 31 17:27:34 2023 From: knikolla at bu.edu (Nikolla, Kristi) Date: Fri, 31 Mar 2023 17:27:34 +0000 Subject: [tc] No TC weekly meeting next week Message-ID: Hi all, There will be no TC meeting on Tuesday, April 4th. The next TC meeting will be held on Tuesday, April 11 at 18.00 UTC. More information and an ICS file can be found here https://meetings.opendev.org/#Technical_Committee_Meeting Thank you, Kristi Nikolla -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Fri Mar 31 17:31:01 2023 From: smooney at redhat.com (Sean Mooney) Date: Fri, 31 Mar 2023 18:31:01 +0100 Subject: [kolla] Image building question In-Reply-To: References: Message-ID: <93a78d3cdbc7fb8ca66545a981ca145c6ce21d7a.camel@redhat.com> On Fri, 2023-03-31 at 10:25 -0400, Satish Patel wrote: > Thank you Sean, > > What a wonderful explanation of the process. Yes I can download images from > the public domain and push them to a local repository but in some cases I > would like to add my own tools like monitoring agents, utilities etc > for debugging so i decided to build my own images. > > I believe https://tarballs.opendev.org is the right place to source > software correctly? that is the offcial location where all opendev/openstack projects are released and its the location distros use to build there packages. > > If I want to add some tools or packages inside images then I should use > Dockerfile.j2 to add and compile images. correct? yes so one of the great things about kolla images is tiem was taken to write down the image api when the project was first started https://docs.openstack.org/kolla/yoga/admin/kolla_api.html over time the common usecaes were then docuemnted in the admin image-building guide https://docs.openstack.org/kolla/yoga/admin/image-building.html#dockerfile-customisation all of the templated imnages have convension/contract that they provide delement that operator can use to add customisations. for examplel the nova_base_footer block and be used to add addtional content to the nova-base image https://github.com/openstack/kolla/blob/master/docker/nova/nova-base/Dockerfile.j2#L82 to customise the iamges you provide what is know as a template override file the contrib folder has a number of exmaples https://github.com/openstack/kolla/blob/master/contrib/template-override/ovs-dpdk.j2 https://github.com/openstack/kolla/blob/master/contrib/neutron-plugins/template_override-networking-mlnx.j2 https://github.com/openstack/kolla/blob/master/contrib/neutron-plugins/template_override-vmware-nsx.j2 the way they work is you starte it with {% extends parent_template %} then you just create a block that mages the name of the one you want to repalce {% extends parent_template %} {% block nova_base_footer %} RUN /bin/true {% endblock %} what ever content you put in the block will be injected directly into the rendered docer file https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality show how to use that for neutron in addtion to replacing block you can set the conten of specal variables like horizon_packages_append or horizon_packages_remove https://docs.openstack.org/kolla/yoga/admin/image-building.html#packages-customisation that allow you add an remove packages in a simple way there is also a set of macros that you can use you include them with {% import "macros.j2" as macros with context %} they are defiend here https://github.com/openstack/kolla/blob/master/docker/macros.j2 if you know that the docs exist the capablitys are coverd pretty well in the kolla docs you just need to know where to look hopefully that helps. the ovs-dpdk template overide docs can be fouund here https://docs.openstack.org/kolla/yoga/admin/template-override/ovs-dpdk.html its a liltle differnt then the other since there i used the template override mechanium to allow compiling ovs with dpdk form souces kolla normlly does not support that but it serves as a demonstration for how operator can do that if they really need too. i.e. compile a replacment for a minary componet like mariadb. it salso show how to use git as the souce localtion instead of tars if that is your prefernce. > > ~S > > On Fri, Mar 31, 2023 at 7:01?AM Sean Mooney wrote: > > > On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote: > > > Folks, > > > > > > I am playing with kolla image building to understand how it works. I am > > > using the following command to build images and wanted to check with you > > > folks if that is the correct way to do it. > > > > > > $ kolla-build -b ubuntu -t source keystone nova neutron glance > > > > > > Does the above command compile code from source or just download images > > > from remote repositories and re-compile them? > > > > > openstack is mainly python so in general ther is no complie step. > > but to answer your question that builds the image using the source tarballs > > or the openstakc packages. > > > > the defaults soruce locations are rendered into a file which you can > > override > > from the data stored in > > https://github.com/openstack/kolla/blob/master/kolla/common/sources.py > > the other build config defaults are generated form this code > > https://github.com/openstack/kolla/blob/master/kolla/common/config.py > > > > when you invoke kolla-build its executing > > https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py > > but the main build workflow is here > > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95 > > > > the tl;dr is the build worklow starts by creating build director and > > locating the docker file templats. > > in otherwords the content of the > > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker > > directory > > > > each project has a direcoty in the docker directory and then each > > contaienr that project has has a directory in the project directory > > > > so the aodh project has a aodh folder > > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh > > the convention is to have a -base contaienr which handels the > > depency installation and then one addtional contaienr for each binary deamon > > the project has i.e. aodh-api > > > > the name of the folder in teh project dir is used as the name of the > > contaienr > > > > if we look at the content of the docker files we will see that they are > > not actuly dockerfiles > > > > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2 > > > > they are jinja2 templates that produce docker files > > > > kolla as far as i am aware has drop support for binary images and > > alternitiv distos > > > > but looking at an older release we can se ehow this worked > > > > https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52 > > > > each docker file template would use the jinja2 to generate a set of > > concreate docker files form the template > > and make dession based on the parmater passed in. > > > > so when you are invokeing > > kolla-build -b ubuntu -t source keystone nova neutron glance > > > > what actully happening is that the -t flag is being set as teh > > install_type parmater in the the jinja2 environemtn when > > the docker file is rendered. > > > > after all the docer files are rendered into normal docker files kolla just > > invokes the build. > > > > in the case of a source build that inovles pre fetching the source tar > > from https://tarballs.opendev.org > > and puting it in the build directory so that it can be included into the > > contianer. > > > > kolla also used to supprot git repo as a alternitve source fromat > > > > i have glossed over a lot of the details of how this actully work but that > > is the essence of what that command is doing > > creating a build dir, downloading the source, rendering the dockerfile > > templates to docker files, invokeing docker build on those > > and then taging them with the contaienr nameand build tag > > > > > > https://docs.openstack.org/kolla/latest/admin/image-building.html > > covers this form a high level > > > > > because in command output > > > I've not noticed anything related to the compiling process going on. > > > > > > Here is the output of all images produced by kolla-build command. Do I > > need > > > anything else or is this enough to deploy kolla? > > you can deploy coll with what you have yes although since the kolla files > > are automaticaly > > built by ci kolla-ansible can just use the ones form the docker hub or > > quay instead you do not need to build them yourself > > > > if you do build them your self then there is basically one other stpe that > > you shoudl take if this si a multi node deployment > > you should push the iamges to an interally host docker registry although > > based on the hostname in the prompt below > > it looks like you ahve alredy done that. > > > > > > root at docker-reg:~# docker images > > > REPOSITORY TAG IMAGE ID CREATED > > > SIZE > > > kolla/mariadb-server 15.1.0 2a497eee8269 26 minutes > > > ago 595MB > > > kolla/cron 15.1.0 342877f26a8a 30 minutes > > > ago 250MB > > > kolla/memcached 15.1.0 0d19a4902644 31 minutes > > > ago 250MB > > > kolla/mariadb-clustercheck 15.1.0 d84427d3c639 31 minutes > > > ago 314MB > > > kolla/mariadb-base 15.1.0 34447e3e59b6 31 minutes > > > ago 314MB > > > kolla/keepalived 15.1.0 82133b09fbf0 31 minutes > > > ago 260MB > > > kolla/prometheus-memcached-exporter 15.1.0 6c2d605f70ee 31 minutes > > > ago 262MB > > > e66b228c2a07 31 minutes > > > ago 248MB > > > kolla/rabbitmq 15.1.0 8de5c39379d3 32 minutes > > > ago 309MB > > > kolla/fluentd 15.1.0 adfd19027862 33 minutes > > > ago 519MB > > > kolla/haproxy-ssh 15.1.0 514357ac4d36 36 minutes > > > ago 255MB > > > kolla/haproxy 15.1.0 e5b9cfdf6dfc 37 minutes > > > ago 257MB > > > kolla/prometheus-haproxy-exporter 15.1.0 a679f65fd735 37 minutes > > > ago 263MB > > > kolla/prometheus-base 15.1.0 afeff3ed5dce 37 minutes > > > ago 248MB > > > kolla/glance-api 15.1.0 a2241f68f23a 38 minutes > > > ago 1.04GB > > > kolla/glance-base 15.1.0 7286772a03a4 About an > > > hour ago 1.03GB > > > kolla/neutron-infoblox-ipam-agent 15.1.0 f90ffc1a3326 About an > > > hour ago 1.05GB > > > kolla/neutron-server 15.1.0 69c844a2e3a9 About an > > > hour ago 1.05GB > > > kolla/neutron-l3-agent 15.1.0 4d87e6963c96 About an > > > hour ago 1.05GB > > > 486da9a6562e About an > > > hour ago 1.05GB > > > kolla/neutron-linuxbridge-agent 15.1.0 e5b3ca7e099c About an > > > hour ago 1.04GB > > > kolla/neutron-bgp-dragent 15.1.0 ac37377820c6 About an > > > hour ago 1.04GB > > > kolla/ironic-neutron-agent 15.1.0 90993adcd74b About an > > > hour ago 1.04GB > > > kolla/neutron-metadata-agent 15.1.0 8522f147f88d About an > > > hour ago 1.04GB > > > kolla/neutron-sriov-agent 15.1.0 8a92ce7d13c0 About an > > > hour ago 1.04GB > > > kolla/neutron-dhcp-agent 15.1.0 5c214b0171f5 About an > > > hour ago 1.04GB > > > kolla/neutron-metering-agent 15.1.0 7b3b91ecd77b About an > > > hour ago 1.04GB > > > kolla/neutron-openvswitch-agent 15.1.0 1f8807308814 About an > > > hour ago 1.04GB > > > kolla/neutron-base 15.1.0 f85b6a2e2725 About an > > > hour ago 1.04GB > > > kolla/nova-libvirt 15.1.0 0f3ecefe4752 About an > > > hour ago 987MB > > > kolla/nova-compute 15.1.0 241b7e7fafbe About an > > > hour ago 1.47GB > > > kolla/nova-spicehtml5proxy 15.1.0 b740820a7ad1 About an > > > hour ago 1.15GB > > > kolla/nova-novncproxy 15.1.0 1ba2f443d5c3 About an > > > hour ago 1.22GB > > > kolla/nova-compute-ironic 15.1.0 716612107532 About an > > > hour ago 1.12GB > > > kolla/nova-ssh 15.1.0 ae2397f4e1c1 About an > > > hour ago 1.11GB > > > kolla/nova-api 15.1.0 2aef02667ff8 About an > > > hour ago 1.11GB > > > kolla/nova-conductor 15.1.0 6f1da3400901 About an > > > hour ago 1.11GB > > > kolla/nova-scheduler 15.1.0 628326776b1d About an > > > hour ago 1.11GB > > > kolla/nova-serialproxy 15.1.0 28eb7a4a13f8 About an > > > hour ago 1.11GB > > > kolla/nova-base 15.1.0 e47420013283 About an > > > hour ago 1.11GB > > > kolla/keystone 15.1.0 e5530d829d5f 2 hours > > ago > > > 947MB > > > kolla/keystone-ssh 15.1.0 eaa7e3f3985a 2 hours > > ago > > > 953MB > > > kolla/keystone-fernet 15.1.0 8a4fa24853a8 2 hours > > ago > > > 951MB > > > kolla/keystone-base 15.1.0 b6f9562364a9 2 hours > > ago > > > 945MB > > > kolla/barbican-base 15.1.0 b2fdef1afb44 2 hours > > ago > > > 915MB > > > kolla/barbican-keystone-listener 15.1.0 58bd59de2c63 2 hours > > ago > > > 915MB > > > kolla/openstack-base 15.1.0 c805b4b3b1c1 2 hours > > ago > > > 893MB > > > kolla/base 15.1.0 f68e9ef3dd30 2 hours > > ago > > > 248MB > > > registry 2 8db46f9d7550 19 hours > > ago > > > 24.2MB > > > ubuntu 22.04 08d22c0ceb15 3 weeks > > ago > > > 77.8MB > > > > From fungi at yuggoth.org Fri Mar 31 18:11:31 2023 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 31 Mar 2023 18:11:31 +0000 Subject: [tripleo] Removal of TripleO Master Integration and Component Lines In-Reply-To: References: Message-ID: <20230331181131.jbdeq3nrylsrag4s@yuggoth.org> On 2023-03-31 11:59:27 -0400 (-0400), Ronelle Landy wrote: [...] > Check/gate testing for the master branch is in process of being removed as > well. > > [1] https://review.opendev.org/c/openstack/governance/+/878799 I notice that the tripleo-ci repository only has a master branch. Will its contents going away cause problems with upstream testing for stable branches of other TripleO deliverables? Are there other single-branch TripleO deliverable repositories for which removal of content would impact stable branches? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: