From gmann at ghanshyammann.com  Wed Mar  1 02:02:36 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Tue, 28 Feb 2023 18:02:36 -0800
Subject: [kolla] [train] [cinder] Volume multiattach exposed to
 non-admin users via API
In-Reply-To: <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
Message-ID: <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>

---- On Thu, 23 Feb 2023 01:33:12 -0800  Rajat Dhasmana  wrote --- 
 > Hi,
 > It looks like there is a confusion between 3 things1) Multiattach volume type2) multiattach flag on the volume3) The policy volume:multiattach
 > I will try to briefly describe all of the 3 so there is clarity on the issue.1) Multiattach volume type: This is a volume type created with an extra spec multiattach=" True". This allows multiattach volumes to be created by using this type.Previously we used to allow a parameter --allow-multiattach while creating the volume. This was deprecated in Queens and removed in Train in favor of the volume type way of creating the multiattach volume[1].2) Multiattach flag of a volume: This is a parameter of volume that specifies if a volume is multiattach or not.3) volume:multiattach policy: The policy verifies if the user creating a multiattach volume is member or admin (and not reader).
 > Coming to the issue, I verified that what you're observing is correct. We removed the support for providing the "multiattach" flag from cinderclient and openstackclient but there still exists code on the API side that allows you to provide "multiattach": "True" in the JSON body of a curl command to create a multiattach volume.I will work on fixing the issue on the API side. 

I think removing from client is good way to stop exposing this old/not-recommended way to users
but API is separate things and removing the API request parameter 'multiattach' from it can break
the existing users using it this way. Tempest test is one good example of such users use case. To maintain
the backward compatibility/interoperability it should be removed by bumping the microversion so that
it continue working for older microversions. This way we will not break the existing users and will
provide the new way for users to start using.

Similar way in Nova we have lot of deprecated API and we need to keep them for older microversions.
-gmann

In the meantime, can you report an issue on launchpad for the same?
 > https://bugs.launchpad.net/cinder/+filebug
 > 
 > Snippet of curl command$ curl -g -i -X POST http://127.0.0.1/volume/v3/a5df9e29f521464f9158ff7a30b7e51f/volumes -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABj9zDtZO1mTld-BC-Yd8FRHDunc4-Xyg1jsgLembA-Ke7cr8aA4kCHHYYB4EPvhq1xL02FBYuXahhYBl_nKWjVbOTpd7R3kS4Libf-Kd9ackaYpWq4Mq4g7-2ORi7FcVg2IOdj3wUkDWegu9lI5PI-brNsAGUh8R1fW_y5bpDYWtfEFdw" -d '{"volume": {"size": 1, "consistencygroup_id": null, "snapshot_id": null, "name": null, "description": null, "volume_type": null, "availability_zone": null, "metadata": {}, "imageRef": null, "source_volid": null, "backup_id": null, "multiattach": "True"}}'
 > HTTP/1.1 202 Accepted
 > Date: Thu, 23 Feb 2023 09:25:23 GMT
 > Server: Apache/2.4.41 (Ubuntu)
 > Content-Type: application/json
 > x-compute-request-id: req-131b4a2d-f9d4-4d9d-b99c-c52012056dec
 > Content-Length: 798
 > OpenStack-API-Version: volume 3.0
 > Vary: OpenStack-API-Version
 > x-openstack-request-id: req-131b4a2d-f9d4-4d9d-b99c-c52012056dec
 > Connection: close
 > 
 > [1]?https://github.com/openstack/python-cinderclient/commit/3c1b417959689c85a2f54505057ca995fedca075
 > ThanksRajat Dhasmana
 > On Thu, Feb 23, 2023 at 3:08 AM Albert Braden ozzzo at yahoo.com> wrote:
 >                 We didn't create a multi-attach volume type, and when we try to create a multi-attach volume via CLI we aren't able to. It appears that our customer was able to circumvent the restriction by using the API via TF. Is this a bug?
 >                                                                                                     On Wednesday, February 22, 2023, 02:32:57 PM EST, Danny Webb danny.webb at thehutgroup.com> wrote:                                
 >                 
 >                  Creating a volume is not the same as creating a volume type.? A tenant can consume a volume type that allows multi-attach with no issue as you see in that policy.??
 > 
 > From: Albert Braden ozzzo at yahoo.com>
 > Sent: 22 February 2023 17:12
 > To: Openstack-discuss openstack-discuss at lists.openstack.org>
 > Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin users via API?CAUTION: This email originates from outside THG
 > 
 > According to this document [1] multiattach volumes can only be setup if explicitly allowed by creating a ?multiattach? volume type.
 > 
 > ?Starting from the Queens release the ability to attach a volume to multiple hosts/servers requires that the volume is of a special type that includes an extra-spec capability setting of multiattach= True? Creating a new volume type is an admin-only operation by default.
 > 
 > One of our customers appears to have used TerraForm to create a volume with the multiattach flag set and it worked, and that volume has multiple attachments. When I look here [2] it appears that the default is:
 > 
 > #"volume:multiattach": "rule:xena_system_admin_or_project_member"
 > 
 > So it looks like, by default, any project member can create a multiattach volume. What am I missing?
 > 
 > [1]: https://docs.openstack.org/cinder/latest/admin/volume-multiattach.html
 > [2]: https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/policy.yaml.html#policy-file
 > ? 
 > Danny Webb 
 > Principal OpenStack Engineer 
 > Danny.Webb at thehutgroup.com 
 >  
 >  
 > www.thg.com 
 > ? 
 >                             


From knikolla at bu.edu  Wed Mar  1 02:53:49 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Wed, 1 Mar 2023 02:53:49 +0000
Subject: [all][tc] Technical Committee next weekly meeting on 2023 Mar 1
 at 1600 UTC
In-Reply-To: <186949eba68.bb0de83e1432598.5051006967090034367@ghanshyammann.com>
References: <186949eba68.bb0de83e1432598.5051006967090034367@ghanshyammann.com>
Message-ID: <9F242D12-58E5-497F-AEF7-AA380AD0A921@bu.edu>

Hi all,

Please find below the agenda for tomorrow's TC meeting that will be held over Zoom on 2023 Mar 1, at 1600 UTC.

Link to connect can be found on https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting

Agenda
    ? Roll call
    ? Follow up on past action items
    ? Welcome new and returning TC members
    ? Gate health check
    ? Discussion on uwsgi alternative and if we should define wsgi standard server in PTI
        ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032345.html
        ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032369.html
    ? Discussion of "Add guidelines about naming versions of the OpenStack projects"
        ? https://review.opendev.org/c/openstack/governance/+/874484
    ? TC 2023.1 tracker status checks
        ? https://etherpad.opendev.org/p/tc-2023.1-tracker
    ? Deprecation process for TripleO
        ? https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032083.html
    ? Cleanup of PyPI maintainer list for OpenStack Projects
        ? Etherpad for audit and cleanup of additional PyPi maintainers
            ? https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
        ? ML discussion
            ? https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html
    ? Recurring tasks check
        ? Bare 'recheck' state
            ? https://etherpad.opendev.org/p/recheck-weekly-summary
    ? Open Reviews
        ? https://review.opendev.org/q/projects:openstack/governance+is:open

No noted absences.

> On Feb 27, 2023, at 3:44 PM, Ghanshyam Mann <gmann at ghanshyammann.com> wrote:
> 
> Hello Everyone,
> 
> The technical Committee's next weekly meeting is scheduled for 2023 Mar 1, at 1600 UTC.
> 
> If you would like to add topics for discussion, please add them to the below wiki page by
> Tuesday, Feb 28 at 2100 UTC.
> 
> https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
> 
> -gmann
> 


From adivya1.singh at gmail.com  Wed Mar  1 05:41:19 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Wed, 1 Mar 2023 11:11:19 +0530
Subject: (OpenStack-Upgrade)
Message-ID: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>

Hi Team,

I am planning to upgrade my Current Environment, The Upgrade procedure is
available in OpenStack Site and Forums.

But i am looking fwd to roll back Plan , Other then have a Local backup
copy of galera Database

Regards
Adivya Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/22ae5c97/attachment.htm>

From alsotoes at gmail.com  Wed Mar  1 07:16:46 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Wed, 1 Mar 2023 01:16:46 -0600
Subject: (OpenStack-Upgrade)
In-Reply-To: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
Message-ID: <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>

That will depend on how did you installed your environment: OSA, TripleO,
etc.

Can you provide more information?

---
Alvaro Soto.

Note: My work hours may not be your work hours. Please do not feel the need
to respond during a time that is not convenient for you.
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.

On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:

> Hi Team,
>
> I am planning to upgrade my Current Environment, The Upgrade procedure is
> available in OpenStack Site and Forums.
>
> But i am looking fwd to roll back Plan , Other then have a Local backup
> copy of galera Database
>
> Regards
> Adivya Singh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/aede7865/attachment.htm>

From Arne.Wiebalck at cern.ch  Wed Mar  1 07:53:36 2023
From: Arne.Wiebalck at cern.ch (Arne Wiebalck)
Date: Wed, 1 Mar 2023 07:53:36 +0000
Subject: [tc][all] OpenStack Technical Committee new Chair
In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
Message-ID: <GV0P278MB09267E6B5860368E574856BDEFAD9@GV0P278MB0926.CHEP278.PROD.OUTLOOK.COM>

Ghanshyam,

Thanks for all your work as the TC chair during the past two years! I think you did an
amazing job driving all the background activities and required decisions to maintain
and improve the OpenStack ecosystem ... and the weekly updates helped big time to
keep in the community in the loop!

Cheers,
 Arne

________________________________________
From: Ghanshyam Mann <gmann at ghanshyammann.com>
Sent: Tuesday, 28 February 2023 22:58
To: openstack-discuss
Subject: [tc][all] OpenStack Technical Committee new Chair

Hello Everyone,

I would like to inform the community and congratulate/welcome Kristi as the new
Chair of Technical Committee. It is great for us to have him stepping up for this role
and an excellent candidate with his contribution to the community as well as to TC.

Thanks for having me as a Chair for the past 2 years. I will continue as TC and my
other activities/role in the community. Also thanks for reading my weekly updates
which were lengthy sometimes or maybe many times :)

-gmann


From eblock at nde.ag  Wed Mar  1 08:11:42 2023
From: eblock at nde.ag (Eugen Block)
Date: Wed, 01 Mar 2023 08:11:42 +0000
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
Message-ID: <20230301081142.Horde.hEzM_pv6c33ED_YOh17hbIc@webmail.nde.ag>

I'm not familiar with TripleO so I'm not sure how much of help I can  
be here, maybe someone else with can chime in. I would look for  
network and rabbit issues. Are the control nodes heavily loaded? Do  
you see the compute services from the edge site up all the time? If  
you run a 'watch -n 20 openstack compute service list', do they "flap"  
all the time or only if you launch instances? Maybe rabbitmq needs  
some tweaking? Can you show your policies?
rabbitmqctl list_policies -p <VHOST>

What network connection do they have, is the network saturated? Is it  
different on the edge site compared to the central site?

Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:

> Hi Eugen,
> For some reason i am not getting your email to me directly, i am checking
> the email digest and there i am able to find your reply.
> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> Yes, these logs are from the time when the issue occurred.
>
> *Note: i am able to create vm's and perform other activities in the central
> site, only facing this issue in the edge site.*
>
> With regards,
> Swogat Pradhan
>
> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Eugen,
>> Thanks for your response.
>> I have actually a 4 controller setup so here are the details:
>>
>> *PCS Status:*
>>   * Container bundle set: rabbitmq-bundle [
>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-no-ceph-3
>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-2
>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-1
>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-0
>>
>> I have tried restarting the bundle multiple times but the issue is still
>> present.
>>
>> *Cluster status:*
>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>> Cluster status of node
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> Basics
>>
>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>
>> Disk Nodes
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>
>> Running Nodes
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>
>> Versions
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
>> Erlang 22.3.4.1
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
>> Erlang 22.3.4.1
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
>> Erlang 22.3.4.1
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3 on Erlang 22.3.4.1
>>
>> Alarms
>>
>> (none)
>>
>> Network Partitions
>>
>> (none)
>>
>> Listeners
>>
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and
>> CLI tool communication
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> and AMQP 1.0
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>
>> Feature flags
>>
>> Flag: drop_unroutable_metric, state: enabled
>> Flag: empty_basic_get_metric, state: enabled
>> Flag: implicit_default_bindings, state: enabled
>> Flag: quorum_queue, state: enabled
>> Flag: virtual_host_metadata, state: enabled
>>
>> *Logs:*
>> *(Attached)*
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Please find the nova conductor as well as nova api log.
>>>
>>> nova-conuctor:
>>>
>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 16152921c1eb45c2b1f562087140168b
>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> 83dbe5f567a940b698acfe986f6194fa
>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a
>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 897911a234a445d8a0d8af02ece40f6f:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>> backend dogpile.cache.null.
>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>> launch vm's.
>>>> When the VM is in spawning state the node goes down (openstack compute
>>>> service list), the node comes backup when i restart the nova compute
>>>> service but then the launch of the vm fails.
>>>>
>>>> nova-compute.log
>>>>
>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running  
>>>> instance usage
>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
>>>> 2023-02-26 08:00:00. 0 instances.
>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>> dcn01-hci-0.bdxworld.com
>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>>> backend dogpile.cache.null.
>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running  
>>>> privsep helper:
>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper',
>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep
>>>> daemon via rootwrap
>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep daemon
>>>> starting
>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>> process running with uid/gid: 0/0
>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> process running with capabilities (eff/prm/inh):
>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep daemon
>>>> running as pid 2647
>>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process  
>>>> execution error
>>>> in _get_host_uuid: Unexpected error while running command.
>>>> Command: blkid overlay -s UUID -o value
>>>> Exit code: 2
>>>> Stdout: ''
>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> Unexpected error while running command.
>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>
>>>> Is there a way to solve this issue?
>>>>
>>>>
>>>> With regards,
>>>>
>>>> Swogat Pradhan
>>>>
>>>


From smooney at redhat.com  Wed Mar  1 08:15:16 2023
From: smooney at redhat.com (Sean Mooney)
Date: Wed, 01 Mar 2023 08:15:16 +0000
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
Message-ID: <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>

On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> BTW, this link (
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) said
> I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?

no its not wrong but for dpu smart nics you have to make a choice when you deploy
either they can be used in dpu mode in which case remote_managed shoudl be set to true
and you can only use them via neutron ports with vnic-type=remote_managed as descried in that doc
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port


or if you disable dpu mode in the nic frimware then you shoudl remvoe remote_managed form the pci device list and
then it can be used liek a normal vf either for neutron sriov ports vnic-type=direct or via flavor based pci passthough.

the issue you were havign is you configured the pci device list to contain "remote_managed: ture" which means
the vf can only be consumed by a neutron port with vnic-type=remote_managed, when you have "remote_managed: false" or unset
you can use it via vnic-type=direct i forgot that slight detail that vnic-type=remote_managed is required for "remote_managed: ture".


in either case you foudn the correct doc https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
neutorn sriov port configuration is documented here https://docs.openstack.org/neutron/latest/admin/config-sriov.html
and nova flavor based pci passthough is documeted here https://docs.openstack.org/nova/latest/admin/pci-passthrough.html

all three server slightly differnt uses. both neutron proceedures are exclusivly fo network interfaces.
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html requires the use of ovn deployed on the dpu
to configure the VF contolplane. https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses the sriov nic agent
to manage the VF with ip tools. https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is intended for pci passthough
of stateless acclerorators like qat devices. while the nova flavor approch cna be used with nics it not how its generally 
ment to be used and when used to passthough a nic expectation is that its not related to a neuton network.


From skaplons at redhat.com  Wed Mar  1 08:18:17 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Wed, 01 Mar 2023 09:18:17 +0100
Subject: [tc][all] OpenStack Technical Committee new Chair
In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
Message-ID: <4789451.GXAFRqVoOG@p1>

Hi,

Dnia wtorek, 28 lutego 2023 22:58:34 CET Ghanshyam Mann pisze:
> Hello Everyone,
> 
> I would like to inform the community and congratulate/welcome Kristi as the new
> Chair of Technical Committee. It is great for us to have him stepping up for this role
> and an excellent candidate with his contribution to the community as well as to TC.
> 
> Thanks for having me as a Chair for the past 2 years. I will continue as TC and my
> other activities/role in the community. Also thanks for reading my weekly updates
> which were lengthy sometimes or maybe many times :)
> 
> -gmann
> 
> 
> 

Thx gmann for all Your work in those past 2 years as TC Chair - You did great job there.
Congrats Kristi for being our new Chair. Good luck in Your new role :)

-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/d789a6e0/attachment.sig>

From thierry at openstack.org  Wed Mar  1 08:44:10 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Wed, 1 Mar 2023 09:44:10 +0100
Subject: [tc][all] OpenStack Technical Committee new Chair
In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
Message-ID: <3f9a86c8-15d0-ea84-0751-13749fc97d7b@openstack.org>

Congrats to Kristi !

And welcome to the TC chair emeritus group, Ghanshyam!

Ghanshyam Mann wrote:
> Hello Everyone,
> 
> I would like to inform the community and congratulate/welcome Kristi as the new
> Chair of Technical Committee. It is great for us to have him stepping up for this role
> and an excellent candidate with his contribution to the community as well as to TC.
> 
> Thanks for having me as a Chair for the past 2 years. I will continue as TC and my
> other activities/role in the community. Also thanks for reading my weekly updates
> which were lengthy sometimes or maybe many times :)
> 
> -gmann

-- 
Thierry Carrez (ttx)


From yasufum.o at gmail.com  Wed Mar  1 09:41:04 2023
From: yasufum.o at gmail.com (Yasufumi Ogawa)
Date: Wed, 1 Mar 2023 18:41:04 +0900
Subject: [tc][heat][tacker] Moving governance of tosca-parser(and
 heat-translator ?) to Tacker
In-Reply-To: <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com>
References: <CAL_crJS=_OmhOTzj8uZ6naaZO2RncngsMjD6V8xnctTw4B5p0A@mail.gmail.com>
 <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com>
 <CAL_crJQE5MKsmH3xHeeAfSFc9E-8Sb=w90v6D6QhozpEvDc6xg@mail.gmail.com>
 <ba10bd2a-84e8-ad82-52d0-7c8217079de5@gmail.com>
 <CAL_crJQ8ZN20xF_eNBNa89=TQ9sJYtwqDj3Lb4BsqiQBWwB0XQ@mail.gmail.com>
 <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com>
Message-ID: <ed9c33f6-80ff-6429-372c-4776d49092a1@gmail.com>

On 2023/02/28 3:49, Ghanshyam Mann wrote:
>   ---- On Sun, 26 Feb 2023 19:54:45 -0800  Takashi Kajinami  wrote ---
>   >
>   >
>   > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com> wrote:
>   > Hi,
>   >
>   > On 2023/02/27 10:51, Takashi Kajinami wrote:
>   > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann gmann at ghanshyammann.com>
>   > > wrote:
>   > >
>   > >>? ?---- On Sun, 19 Feb 2023 18:44:14 -0800? Takashi Kajinami? wrote ---
>   > >>? ?> Hello,
>   > >>? ?>
>   > >>? ?> Currently tosca-parser is part of heat's governance, but the core
>   > >> reviewers of this repositorydoes not contain any active heat cores while we
>   > >> see multiple Tacker cores in this group.Considering the fact the project is
>   > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate this
>   > >> repository to Tacker's governance. Most of the current heat cores are not
>   > >> quitefamiliar with the codes in this repository, and if Tacker team is not
>   > >> interested in maintainingthis repository then I'd propose retiring this.
>   > As you mentioned, tacker still using tosca-parser and heat-translator.
>   >
>   > >>
>   > >> I think it makes sense and I remember its usage/maintenance by the Tacker
>   > >> team since starting.
>   > >> But let's wait for the Tacker team opinion and accordingly you can propose
>   > >> the governance patch.
>   > Although I've not joined to tacker team since starting, it might not be
>   > true because there was no cores of tosca-parser and heat-translator in
>   > tacker team. We've started to help maintenance the projects because no
>   > other active contributer.
>   >
>   > >>
>   > >>? ?>
>   > >>? ?> Similarly, we have heat-translator project which has both heat cores
>   > >> and tacker cores as itscore reviewers. IIUC this is tightly related to the
>   > >> work in tosca-parser, I'm wondering it makesmore sense to move this project
>   > >> to Tacker, because the requirement is mostly made fromTacker side rather
>   > >> than Heat side.
>   > >>
>   > >> I am not sure about this and from the name, it seems like more of a heat
>   > >> thing but it is not got beyond the Tosca template
>   > >> conversion. Are there no users of it outside of the Tacker service? or any
>   > >> request to support more template conversions than
>   > >> Tosca?
>   > >>
>   > >
>   > > Current hea-translator supports only the TOSCA template[1].
>   > > The heat-translator project can be a generic template converter by its
>   > > nature but we haven't seen any interest
>   > > in implementing support for different template formats.
>   > >
>   > > [1]
>   > > https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49
>   > >
>   > >
>   > >
>   > >> If no other user or use case then I think one option can be to merge it
>   > >> into Tosca-parser itself and retire heat-translator.
>   > >>
>   > >> Opinion?
>   > Hmm, as a core of tosca-parser, I'm not sure it's a good idea because it
>   > is just a parser TOSCA and independent from heat-translator. In
>   > addition, there is no experts of Heat or HOT in current tacker team
>   > actually, so it might be difficult to maintain heat-translator without
>   > any help from heat team.
>   >
>   > The hea-translator project was initially created to implement a translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2] but we have never increased scope of tosca-parser. So it has beenno more than the TOSCA template translator.
>   >
>   > [1] https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2] https://review.opendev.org/c/openstack/project-config/+/211204
>   > We (Heat team) can provide help with any problems with heat, but we own no actual use case of template translation.Maintaining the heat-translator repository with tacker, which currently provides actual use cases would make more sense.This also gives the benefit that Tacker team can decide when stable branches of heat-translator should be retiredalong with the other Tacker repos.
>   >
>   > By the way, may I ask what will be happened if the governance is move on
>   > to tacker? Is there any extra tasks for maintenance?
>   >
>   > TC would have better (and more precise) explanation but my understanding is that?- creating a release
>   > ?- maintaining stable branches
>   > ?- maintaining gate healthwould be the required tasks along with moderating dev discussion in mailing list/PTG/etc.
> 
> I think you covered all and the Core team (Tacker members)  might be already doing a few of the tasks. From the
> governance perspective, tacker PTL will be the point of contact for this repo in the case repo becomes inactive or so
> but it will be the project team's decision to merge/split things whatever way makes maintenance easy.
I understand. I've shared the proposal again in the previous meeting and 
no objection raised. So, we'd agree to move the governance as Tacker team.

Thanks,
Yasufumi
> 
> -gmann
> 
> 
>   > ?Thanks,
>   > Yasufumi
>   >
>   > >>
>   > >
>   > > That also sounds good to me.
>   > >
>   > >
>   > >> Also, correcting the email subject tag as [tc].
>   > >>
>   > >> -gmann
>   > >>
>   > >>? ?>
>   > >>? ?> [1]
>   > >> https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2]
>   > >> https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members
>   > >>? ?> Thank you,Takashi
>   > >>? ?>
>   > >>
>   > >>
>   > >
>   >
>   >


From swogatpradhan22 at gmail.com  Wed Mar  1 09:54:16 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 1 Mar 2023 15:24:16 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
Message-ID: <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>

Hi Eugen,
Request you to please add my email either on 'to' or 'cc' as i am not
getting email's from you.
Coming to the issue:

[root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p /
Listing policies for vhost "/" ...
vhost   name    pattern apply-to        definition      priority
/       ha-all  ^(?!amq\.).*    queues
 {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0

I have the edge site compute nodes up, it only goes down when i am trying
to launch an instance and the instance comes to a spawning state and then
gets stuck.

I have a tunnel setup between the central and the edge sites.

With regards,
Swogat Pradhan

On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Eugen,
> For some reason i am not getting your email to me directly, i am checking
> the email digest and there i am able to find your reply.
> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> Yes, these logs are from the time when the issue occurred.
>
> *Note: i am able to create vm's and perform other activities in the
> central site, only facing this issue in the edge site.*
>
> With regards,
> Swogat Pradhan
>
> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Eugen,
>> Thanks for your response.
>> I have actually a 4 controller setup so here are the details:
>>
>> *PCS Status:*
>>   * Container bundle set: rabbitmq-bundle [
>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-no-ceph-3
>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-2
>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-1
>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):       Started
>> overcloud-controller-0
>>
>> I have tried restarting the bundle multiple times but the issue is still
>> present.
>>
>> *Cluster status:*
>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>> Cluster status of node
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> Basics
>>
>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>
>> Disk Nodes
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>
>> Running Nodes
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>
>> Versions
>>
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3
>> on Erlang 22.3.4.1
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3
>> on Erlang 22.3.4.1
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3
>> on Erlang 22.3.4.1
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3 on Erlang 22.3.4.1
>>
>> Alarms
>>
>> (none)
>>
>> Network Partitions
>>
>> (none)
>>
>> Listeners
>>
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>> communication
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>> [::], port: 15672, protocol: http, purpose: HTTP API
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and
>> CLI tool communication
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> and AMQP 1.0
>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>
>> Feature flags
>>
>> Flag: drop_unroutable_metric, state: enabled
>> Flag: empty_basic_get_metric, state: enabled
>> Flag: implicit_default_bindings, state: enabled
>> Flag: quorum_queue, state: enabled
>> Flag: virtual_host_metadata, state: enabled
>>
>> *Logs:*
>> *(Attached)*
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Please find the nova conductor as well as nova api log.
>>>
>>> nova-conuctor:
>>>
>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 16152921c1eb45c2b1f562087140168b
>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> 83dbe5f567a940b698acfe986f6194fa
>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a
>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 897911a234a445d8a0d8af02ece40f6f:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>> backend dogpile.cache.null.
>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a
>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>> oslo_messaging.exceptions.MessageUndeliverable
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Hi,
>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>> launch vm's.
>>>> When the VM is in spawning state the node goes down (openstack compute
>>>> service list), the node comes backup when i restart the nova compute
>>>> service but then the launch of the vm fails.
>>>>
>>>> nova-compute.log
>>>>
>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running instance usage
>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
>>>> 2023-02-26 08:00:00. 0 instances.
>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>> dcn01-hci-0.bdxworld.com
>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>>> backend dogpile.cache.null.
>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running privsep helper:
>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper',
>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep
>>>> daemon via rootwrap
>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>> daemon starting
>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>> process running with uid/gid: 0/0
>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> process running with capabilities (eff/prm/inh):
>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> daemon running as pid 2647
>>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>> in _get_host_uuid: Unexpected error while running command.
>>>> Command: blkid overlay -s UUID -o value
>>>> Exit code: 2
>>>> Stdout: ''
>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> Unexpected error while running command.
>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>
>>>> Is there a way to solve this issue?
>>>>
>>>>
>>>> With regards,
>>>>
>>>> Swogat Pradhan
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/2116524d/attachment-0001.htm>

From eblock at nde.ag  Wed Mar  1 09:59:53 2023
From: eblock at nde.ag (Eugen Block)
Date: Wed, 01 Mar 2023 09:59:53 +0000
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
Message-ID: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>

One more thing coming to mind is MTU size. Are they identical between  
central and edge site? Do you see packet loss through the tunnel?

Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:

> Hi Eugen,
> Request you to please add my email either on 'to' or 'cc' as i am not
> getting email's from you.
> Coming to the issue:
>
> [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p /
> Listing policies for vhost "/" ...
> vhost   name    pattern apply-to        definition      priority
> /       ha-all  ^(?!amq\.).*    queues
>  {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>
> I have the edge site compute nodes up, it only goes down when i am trying
> to launch an instance and the instance comes to a spawning state and then
> gets stuck.
>
> I have a tunnel setup between the central and the edge sites.
>
> With regards,
> Swogat Pradhan
>
> On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Eugen,
>> For some reason i am not getting your email to me directly, i am checking
>> the email digest and there i am able to find your reply.
>> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>> Yes, these logs are from the time when the issue occurred.
>>
>> *Note: i am able to create vm's and perform other activities in the
>> central site, only facing this issue in the edge site.*
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi Eugen,
>>> Thanks for your response.
>>> I have actually a 4 controller setup so here are the details:
>>>
>>> *PCS Status:*
>>>   * Container bundle set: rabbitmq-bundle [
>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started
>>> overcloud-controller-no-ceph-3
>>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):       Started
>>> overcloud-controller-2
>>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):       Started
>>> overcloud-controller-1
>>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):       Started
>>> overcloud-controller-0
>>>
>>> I have tried restarting the bundle multiple times but the issue is still
>>> present.
>>>
>>> *Cluster status:*
>>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>> Cluster status of node
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>> Basics
>>>
>>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>
>>> Disk Nodes
>>>
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>
>>> Running Nodes
>>>
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>
>>> Versions
>>>
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3
>>> on Erlang 22.3.4.1
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3
>>> on Erlang 22.3.4.1
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3
>>> on Erlang 22.3.4.1
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3 on Erlang 22.3.4.1
>>>
>>> Alarms
>>>
>>> (none)
>>>
>>> Network Partitions
>>>
>>> (none)
>>>
>>> Listeners
>>>
>>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>>> communication
>>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1  
>>> and AMQP 1.0
>>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
>>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>>> communication
>>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1  
>>> and AMQP 1.0
>>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
>>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
>>> communication
>>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1  
>>> and AMQP 1.0
>>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
>>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>>> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and
>>> CLI tool communication
>>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> and AMQP 1.0
>>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>
>>> Feature flags
>>>
>>> Flag: drop_unroutable_metric, state: enabled
>>> Flag: empty_basic_get_metric, state: enabled
>>> Flag: implicit_default_bindings, state: enabled
>>> Flag: quorum_queue, state: enabled
>>> Flag: virtual_host_metadata, state: enabled
>>>
>>> *Logs:*
>>> *(Attached)*
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> Please find the nova conductor as well as nova api log.
>>>>
>>>> nova-conuctor:
>>>>
>>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> 16152921c1eb45c2b1f562087140168b
>>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> 83dbe5f567a940b698acfe986f6194fa
>>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a
>>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a
>>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a
>>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>>> backend dogpile.cache.null.
>>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a
>>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>>> launch vm's.
>>>>> When the VM is in spawning state the node goes down (openstack compute
>>>>> service list), the node comes backup when i restart the nova compute
>>>>> service but then the launch of the vm fails.
>>>>>
>>>>> nova-compute.log
>>>>>
>>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running  
>>>>> instance usage
>>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
>>>>> 2023-02-26 08:00:00. 0 instances.
>>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>>> dcn01-hci-0.bdxworld.com
>>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
>>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>>>> backend dogpile.cache.null.
>>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running  
>>>>> privsep helper:
>>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper',
>>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep
>>>>> daemon via rootwrap
>>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> daemon starting
>>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> process running with uid/gid: 0/0
>>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> process running with capabilities (eff/prm/inh):
>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> daemon running as pid 2647
>>>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process  
>>>>> execution error
>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>> Command: blkid overlay -s UUID -o value
>>>>> Exit code: 2
>>>>> Stdout: ''
>>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>>> Unexpected error while running command.
>>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45  
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>>
>>>>> Is there a way to solve this issue?
>>>>>
>>>>>
>>>>> With regards,
>>>>>
>>>>> Swogat Pradhan
>>>>>
>>>>


From swogatpradhan22 at gmail.com  Wed Mar  1 10:04:50 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 1 Mar 2023 15:34:50 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
Message-ID: <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>

Hi,
Yes the MTU is the same as the default '1500'.
Generally I haven't seen any packet loss, but never checked when launching
the instance.
I will check that and come back.
But everytime i launch an instance the instance gets stuck at spawning
state and there the hypervisor becomes down, so not sure if packet loss
causes this.

With regards,
Swogat pradhan

On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:

> One more thing coming to mind is MTU size. Are they identical between
> central and edge site? Do you see packet loss through the tunnel?
>
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>
> > Hi Eugen,
> > Request you to please add my email either on 'to' or 'cc' as i am not
> > getting email's from you.
> > Coming to the issue:
> >
> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p /
> > Listing policies for vhost "/" ...
> > vhost   name    pattern apply-to        definition      priority
> > /       ha-all  ^(?!amq\.).*    queues
> >  {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}
>  0
> >
> > I have the edge site compute nodes up, it only goes down when i am trying
> > to launch an instance and the instance comes to a spawning state and then
> > gets stuck.
> >
> > I have a tunnel setup between the central and the edge sites.
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> > wrote:
> >
> >> Hi Eugen,
> >> For some reason i am not getting your email to me directly, i am
> checking
> >> the email digest and there i am able to find your reply.
> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> >> Yes, these logs are from the time when the issue occurred.
> >>
> >> *Note: i am able to create vm's and perform other activities in the
> >> central site, only facing this issue in the edge site.*
> >>
> >> With regards,
> >> Swogat Pradhan
> >>
> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >> wrote:
> >>
> >>> Hi Eugen,
> >>> Thanks for your response.
> >>> I have actually a 4 controller setup so here are the details:
> >>>
> >>> *PCS Status:*
> >>>   * Container bundle set: rabbitmq-bundle [
> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>  Started
> >>> overcloud-controller-no-ceph-3
> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>  Started
> >>> overcloud-controller-2
> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>  Started
> >>> overcloud-controller-1
> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>  Started
> >>> overcloud-controller-0
> >>>
> >>> I have tried restarting the bundle multiple times but the issue is
> still
> >>> present.
> >>>
> >>> *Cluster status:*
> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> >>> Cluster status of node
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> >>> Basics
> >>>
> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>
> >>> Disk Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Running Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Versions
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> RabbitMQ
> >>> 3.8.3 on Erlang 22.3.4.1
> >>>
> >>> Alarms
> >>>
> >>> (none)
> >>>
> >>> Network Partitions
> >>>
> >>> (none)
> >>>
> >>> Listeners
> >>>
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> >>> interface: [::], port: 25672, protocol: clustering, purpose:
> inter-node and
> >>> CLI tool communication
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
> >>>
> >>> Feature flags
> >>>
> >>> Flag: drop_unroutable_metric, state: enabled
> >>> Flag: empty_basic_get_metric, state: enabled
> >>> Flag: implicit_default_bindings, state: enabled
> >>> Flag: quorum_queue, state: enabled
> >>> Flag: virtual_host_metadata, state: enabled
> >>>
> >>> *Logs:*
> >>> *(Attached)*
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>> Please find the nova conductor as well as nova api log.
> >>>>
> >>>> nova-conuctor:
> >>>>
> >>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due
> to a
> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due
> to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due
> to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> b240e3e89d99489284cd731e75f2a5db
> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
> >>>> backend dogpile.cache.null.
> >>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due
> to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>
> >>>> With regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
> >>>>> launch vm's.
> >>>>> When the VM is in spawning state the node goes down (openstack
> compute
> >>>>> service list), the node comes backup when i restart the nova compute
> >>>>> service but then the launch of the vm fails.
> >>>>>
> >>>>> nova-compute.log
> >>>>>
> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
> >>>>> instance usage
> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
> >>>>> dcn01-hci-0.bdxworld.com
> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> with
> >>>>> backend dogpile.cache.null.
> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
> >>>>> privsep helper:
> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> 'privsep-helper',
> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
> privsep
> >>>>> daemon via rootwrap
> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon starting
> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with uid/gid: 0/0
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with capabilities (eff/prm/inh):
> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon running as pid 2647
> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> os_brick.initiator.connectors.nvmeof
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
> >>>>> execution error
> >>>>> in _get_host_uuid: Unexpected error while running command.
> >>>>> Command: blkid overlay -s UUID -o value
> >>>>> Exit code: 2
> >>>>> Stdout: ''
> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> >>>>> Unexpected error while running command.
> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>>>
> >>>>> Is there a way to solve this issue?
> >>>>>
> >>>>>
> >>>>> With regards,
> >>>>>
> >>>>> Swogat Pradhan
> >>>>>
> >>>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/3f47e4ee/attachment-0001.htm>

From smooney at redhat.com  Wed Mar  1 10:34:55 2023
From: smooney at redhat.com (Sean Mooney)
Date: Wed, 01 Mar 2023 10:34:55 +0000
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
Message-ID: <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>

On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
> Thanks a lot !!!
> 
> As you say, I follow
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> And I want to use DPU mode. Not "disable DPU mode".
> So I think I should follow the link above exactlly, so I use
> vnic-type=remote_anaged.
> In my opnion, after I run first three command (which is "openstack network
> create ...", "openstack subnet create", "openstack port create ..."), the
> VF rep port and OVN and OVS rules are all ready.
not at that point nothign will have been done on ovn/ovs

that will only happen after the port is bound to a vm and host.

> What I should do in "openstack server create ..." is to JUST add PCI device
> into VM, do NOT call neutron-server in nova-compute of compute node ( like
> call port_binding or something).
this is incorrect.
> 
> But as the log and steps said in the emails above, nova-compute call
> port_binding to neutron-server while running the command "openstack server
> create ...".
> 
> So I still have questions is:
> 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT call
> neutron-server in nova-compute of compute node ( like call port_binding or
> something)" .
no this is not how its designed.
until you attach the logical port to a vm (either at runtime or as part of vm create)
the logical port is not assocated with any host or phsical dpu/vf.

so its not possibel to instanciate the openflow rules in ovs form the logical switch model
in the ovn north db as no chassie info has been populated and we do not have the dpu serial
info in the port binding details.
> 2) If it's right, how to deal with this? Which is how to JUST add PCI
> device into VM, do NOT call neutron-server? By command or by configure? Is
> there come document ?
no this happens automaticaly when nova does the port binding which cannot happen until after
teh vm is schduled to a host.
> 
> ----
> Simon Jones
> 
> 
> Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
> 
> > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> > > BTW, this link (
> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
> > said
> > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?
> > 
> > no its not wrong but for dpu smart nics you have to make a choice when you
> > deploy
> > either they can be used in dpu mode in which case remote_managed shoudl be
> > set to true
> > and you can only use them via neutron ports with vnic-type=remote_managed
> > as descried in that doc
> > 
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
> > 
> > 
> > or if you disable dpu mode in the nic frimware then you shoudl remvoe
> > remote_managed form the pci device list and
> > then it can be used liek a normal vf either for neutron sriov ports
> > vnic-type=direct or via flavor based pci passthough.
> > 
> > the issue you were havign is you configured the pci device list to contain
> > "remote_managed: ture" which means
> > the vf can only be consumed by a neutron port with
> > vnic-type=remote_managed, when you have "remote_managed: false" or unset
> > you can use it via vnic-type=direct i forgot that slight detail that
> > vnic-type=remote_managed is required for "remote_managed: ture".
> > 
> > 
> > in either case you foudn the correct doc
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > neutorn sriov port configuration is documented here
> > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> > and nova flavor based pci passthough is documeted here
> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> > 
> > all three server slightly differnt uses. both neutron proceedures are
> > exclusivly fo network interfaces.
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > requires the use of ovn deployed on the dpu
> > to configure the VF contolplane.
> > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses
> > the sriov nic agent
> > to manage the VF with ip tools.
> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is
> > intended for pci passthough
> > of stateless acclerorators like qat devices. while the nova flavor approch
> > cna be used with nics it not how its generally
> > ment to be used and when used to passthough a nic expectation is that its
> > not related to a neuton network.
> > 
> > 


From senrique at redhat.com  Wed Mar  1 11:00:49 2023
From: senrique at redhat.com (Sofia Enriquez)
Date: Wed, 1 Mar 2023 11:00:49 +0000
Subject: [cinder] Bug Report | 01-03-2023
Message-ID: <CANtmtpGH0AkyEY=GaK3fTKUPT8hrF4jkj-yHzYsZ-C5UKkP1VQ@mail.gmail.com>

Hello Argonauts,

Medium

   - Volume multiattach exposed to non-admin users via API
   <https://bugs.launchpad.net/cinder/+bug/2008259>.
      - *Status*: Fix proposed to master
      <https://review.opendev.org/c/openstack/cinder/+/874865>.
   -  [yadro] tatlin_client is_port_assigned method broken
   <https://bugs.launchpad.net/cinder/+bug/2008735>.
      - *Status*: Unassigned.

Low

   -  image_utils: code hardening around decompression
   <https://bugs.launchpad.net/cinder/+bug/2008429>.
      - *Status*: Unassigned and tagged as low-hanging-fruit
      <https://bugs.launchpad.net/cinder/+bugs?field.tag=low-hanging-fruit>.

Cheers,

-- 

Sof?a Enriquez

she/her

Software Engineer

Red Hat PnT <https://www.redhat.com>

IRC: @enriquetaso
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/1bed29b2/attachment.htm>

From adivya1.singh at gmail.com  Wed Mar  1 11:02:26 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Wed, 1 Mar 2023 16:32:26 +0530
Subject: (OpenStack-Upgrade)
In-Reply-To: <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
Message-ID: <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>

hi Alvaro,

i have installed using Openstack-ansible, The upgrade procedure is
consistent

but what is the roll back procedure , i m looking for

Regards
Adivya Singh

On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:

> That will depend on how did you installed your environment: OSA, TripleO,
> etc.
>
> Can you provide more information?
>
> ---
> Alvaro Soto.
>
> Note: My work hours may not be your work hours. Please do not feel the
> need to respond during a time that is not convenient for you.
> ----------------------------------------------------------
> Great people talk about ideas,
> ordinary people talk about things,
> small people talk... about other people.
>
> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I am planning to upgrade my Current Environment, The Upgrade procedure is
>> available in OpenStack Site and Forums.
>>
>> But i am looking fwd to roll back Plan , Other then have a Local backup
>> copy of galera Database
>>
>> Regards
>> Adivya Singh
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/3b9a591b/attachment.htm>

From lokendrarathour at gmail.com  Wed Mar  1 12:00:57 2023
From: lokendrarathour at gmail.com (Lokendra Rathour)
Date: Wed, 1 Mar 2023 17:30:57 +0530
Subject: Openstack Baremetal Instance creation using Alma
Message-ID: <CAJm6b-4KiJdx4LKv87_U2JT1aiUT9t8C+ATH8RydzLE5RpQB8Q@mail.gmail.com>

Hi Team,
Was trying to check whether we can launch a bare-metal Instance using Alma
Image.
Problem statement:
I have some applications that would need Alma 8 OS as the base, we are
planning to launch that application on OpenStack baremetal Instance, but we
are not able to create OpenStack Baremetal Instance using Alma Image.

OpenStack Version: TripleO Wallaby
Alam image Qcow2:
https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-9.1-20221118.x86_64.qcow2

This image works fine when launched as a VM instance, but not able to
launch the same as Baremetal Instance.


Best Regards,
Lokendra


-- 
~ Lokendra
skype: lokendrarathour
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/ea7c681c/attachment.htm>

From Arne.Wiebalck at cern.ch  Wed Mar  1 12:25:57 2023
From: Arne.Wiebalck at cern.ch (Arne Wiebalck)
Date: Wed, 1 Mar 2023 12:25:57 +0000
Subject: Openstack Baremetal Instance creation using Alma
In-Reply-To: <CAJm6b-4KiJdx4LKv87_U2JT1aiUT9t8C+ATH8RydzLE5RpQB8Q@mail.gmail.com>
References: <CAJm6b-4KiJdx4LKv87_U2JT1aiUT9t8C+ATH8RydzLE5RpQB8Q@mail.gmail.com>
Message-ID: <GV0P278MB092639136E5EA54A46C50B2BEFAD9@GV0P278MB0926.CHEP278.PROD.OUTLOOK.COM>

Lokendra,

We ran into issues recently where a missing module in the initrd prevented
nodes with specific h/w configurations from booting into 8 and 9 (after
successful instantiation).

Can you share some more details on what exactly fails, i.e. does the deployment
itself fail, or does the instance not boot after a successful deployment, or ...

Providing corresponding log snippets (paste.openstack.org) may also help to
pinpoint issue.

Cheers,
 Arne

________________________________________
From: Lokendra Rathour <lokendrarathour at gmail.com>
Sent: Wednesday, 1 March 2023 13:00
To: openstack-discuss
Subject: Openstack Baremetal Instance creation using Alma

Hi Team,
Was trying to check whether we can launch a bare-metal Instance using Alma Image.
Problem statement:
I have some applications that would need Alma 8 OS as the base, we are planning to launch that application on OpenStack baremetal Instance, but we are not able to create OpenStack Baremetal Instance using Alma Image.

OpenStack Version: TripleO Wallaby
Alam image Qcow2: https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-9.1-20221118.x86_64.qcow2

This image works fine when launched as a VM instance, but not able to launch the same as Baremetal Instance.


Best Regards,
Lokendra


--
~ Lokendra
skype: lokendrarathour
[https://ci3.googleusercontent.com/mail-sig/AIorK4zyd6LpJOGqagxmzUlY59eMQx0-FN0t8HtjdtGE7VLZSKIxBUz3bI7z-MBqbgDVg1-XbtvHgN_ATJ10N6bonyO-JSGTtl5s_mNSbDoXBg]


From noonedeadpunk at gmail.com  Wed Mar  1 12:47:01 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Wed, 1 Mar 2023 13:47:01 +0100
Subject: (OpenStack-Upgrade)
In-Reply-To: <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
Message-ID: <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>

Hey,

Regarding rollaback of upgrade in OSA we indeed don't have any good
established/documented process for that. At the same time it should be
completely possible with some "BUT". It also depends on what exactly
you want to rollback - roles, openstack services or both. As OSA roles
can actually install any openstack service version.

We keep all virtualenvs from the previous version, so during upgrade
we build just new virtualenvs and reconfigure systemd units to point
there. So fastest way likely would be to just edit systemd unit files
and point them to old venv version and reload systemd daemon and
service and restore DB from backup of course.
You can also define  <service>_venv_tag (ie `glance_venv_tag`) to the
old OSA version you was running and execute openstack-ansible
os-<service>-install.yml --tags  systemd-service,uwsgi - that in most
cases will be enough to just edit systemd units for the service and
start old version of it. BUT running without tags will result in
having new packages in old venv which is smth you totally want to
avoid.
To prevent that you can also define <service>_git_install_branch and
requirements_git_install_branch in /etc/openstack_deploy/group_vars
(it's important to use group vars if you want to rollback only one
service) and take value from
https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
(ofc pick your old version!)

For a full rollback and not in-place workarounds, I think it should be like that
* checkout to previous osa version
* re-execute scripts/bootstrap-ansible.sh
* you should still take current versions of mariadb and rabbitmq and
define them in user_variables (galera_major_version,
galera_minor_version, rabbitmq_package_version,
rabbitmq_erlang_version_spec) - it's close to never ends well
downgrading these.
* Restore DB backup
* Re-run setup-openstack.yml

It's quite a rough summary of how I do see this process, but to be
frank I never had to execute full downgrade - I was limited mostly by
downgrading 1 service tops after the upgrade.

Hope that helps!

??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:

>
> hi Alvaro,
>
> i have installed using Openstack-ansible, The upgrade procedure is consistent
>
> but what is the roll back procedure , i m looking for
>
> Regards
> Adivya Singh
>
> On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
>>
>> That will depend on how did you installed your environment: OSA, TripleO, etc.
>>
>> Can you provide more information?
>>
>> ---
>> Alvaro Soto.
>>
>> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
>> ----------------------------------------------------------
>> Great people talk about ideas,
>> ordinary people talk about things,
>> small people talk... about other people.
>>
>> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:
>>>
>>> Hi Team,
>>>
>>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
>>>
>>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
>>>
>>> Regards
>>> Adivya Singh


From jungleboyj at gmail.com  Wed Mar  1 13:37:00 2023
From: jungleboyj at gmail.com (Jay Bryant)
Date: Wed, 1 Mar 2023 07:37:00 -0600
Subject: [tc][all] OpenStack Technical Committee new Chair
In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
Message-ID: <4eae1d95-f0ff-1d6c-efc8-7a1b9156401c@gmail.com>

Gmann,

You did a great job as the TC Chair!? Thank you for all of the great 
leadership you provided!

Congratualtions Kristi!

Jay


On 2/28/2023 3:58 PM, Ghanshyam Mann wrote:
> Hello Everyone,
>
> I would like to inform the community and congratulate/welcome Kristi as the new
> Chair of Technical Committee. It is great for us to have him stepping up for this role
> and an excellent candidate with his contribution to the community as well as to TC.
>
> Thanks for having me as a Chair for the past 2 years. I will continue as TC and my
> other activities/role in the community. Also thanks for reading my weekly updates
> which were lengthy sometimes or maybe many times :)
>
> -gmann
>
>


From knikolla at bu.edu  Wed Mar  1 14:53:12 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Wed, 1 Mar 2023 14:53:12 +0000
Subject: [tc][all] OpenStack Technical Committee new Chair
In-Reply-To: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
References: <1869a08d282.e04553df1540988.464486862000665312@ghanshyammann.com>
Message-ID: <BL0PR03MB4227CE5C4CCEDAFCBCE4C4F0BEAD9@BL0PR03MB4227.namprd03.prod.outlook.com>

Thank you, and thank you for your amazing work as chair for the past two years!

From: Ghanshyam Mann <gmann at ghanshyammann.com>
Date: Tuesday, February 28, 2023 at 4:59 PM
To: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: [tc][all] OpenStack Technical Committee new Chair
Hello Everyone,

I would like to inform the community and congratulate/welcome Kristi as the new
Chair of Technical Committee. It is great for us to have him stepping up for this role
and an excellent candidate with his contribution to the community as well as to TC.

Thanks for having me as a Chair for the past 2 years. I will continue as TC and my
other activities/role in the community. Also thanks for reading my weekly updates
which were lengthy sometimes or maybe many times :)

-gmann

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/b01a83ef/attachment-0001.htm>

From kennelson11 at gmail.com  Wed Mar  1 15:04:17 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Wed, 1 Mar 2023 09:04:17 -0600
Subject: June 2023 PTG Team Signup Kickoff
Message-ID: <CAJ6yrQhHeWMdLC4ceei+odS9gxnkBW3OcgHpKtKuBPgwXDfCTA@mail.gmail.com>

Hello Everyone,

As you may have seen, we are hosting an abbreviated PTG in conjunction with
the Vancouver OpenInfra Summit[0]!

To sign your team up, you must complete the survey[1] by April 2nd at 7:00
UTC.

We NEED accurate contact information for the moderator of your team?s
sessions. This is because the survey information will be used to organize
the schedule signups which will be done via the PTGBot. If you are not on
IRC, please get setup[2] on the OFTC network and join #openinfra-events.
You are also encouraged to familiarize yourself with the PTGBot
documentation[3] as well. If you have any questions, please reach out!

Information about signing up for timeslots will be sent to moderators
shortly after the team signup deadline.

Registration is open[4] and prices will increase May 5th!

Continue to visit openinfra.dev/ptg for updates.

-Kendall (diablo_rojo)

[0] OpenInfra Summit Site:
[1] Team Survey: https://openinfra.dev/summit/vancouver-2023
https://openinfrafoundation.formstack.com/forms/june2023_ptg_survey
[2] Setup IRC: https://docs.openstack.org/contributors/common/irc.html
[3] PTGBot README:
https://opendev.org/openstack/ptgbot/src/branch/master/README.rst
[4] OpenInfra Summit Registration: https://vancouver2023.openinfra.dev/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/0f2431c4/attachment.htm>

From batmanustc at gmail.com  Wed Mar  1 01:34:55 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Wed, 1 Mar 2023 09:34:55 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
Message-ID: <CAOE=1Z1=TNdM-C-vLbUEUz5t9Bgw8xhAS8MpNFW4DVPDbiQN3Q@mail.gmail.com>

You got the point what I want to say ! Let me explain more:

1. The hole story is I want to deploy openstack Yoga, and the compute node
use DPU (BF2, BlueFiled2). So I follow this link:
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
2. After deploy exactly as the link said, I use these command to create a
VM (which also called instance).
```
openstack network create selfservice
openstack subnet create --network selfservice --subnet-range 192.168.1.0/24
selfservice-v4
openstack port create --network selfservice --vnic-type remote-managed \
--binding-profile '{"pci_vendor_info":"", "pci_slot":"",
"physical_network":"", "card_serial_number": "AB0123XX0042",
"pf_mac_address": "08:c0:eb:8e:bd:f4", "vf_num":1, "vnic_type":
"remote-managed"}' \
pf0vf0
openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1
cirros-os-dpu-test-1 --property "pci_passthrough:alias"="a1:2"

    All command above pass.

openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
        --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
        --security-group default provider-instance

    This command got ERROR. The ERROR is shown in the first email.
```
3. So I have few questions:
```
question 1: Why got ERROR? Why "No valid host was found"?
question 2: When I run command "openstack port create ...", I could specify
which VF-rep port (virtual function's representor port) plug into br-int in
DPU's OVS. As normal operate, I should start VM and plug THE VF into VM.
But in "openstack port create ...", how to specify THE VF ?
```
4. For question 1, I debug it as the first mail said. And I will check the
second email to solve it.
5. For question 2, I have no idea, as these is no document to refer this
question. What should I do ?


----
Simon Jones


Sean Mooney <smooney at redhat.com> ?2023?3?1??? 01:18???

> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote:
> > Hi all,
> >
> > I'm working on openstack Yoga's PCI passthrough feature, follow this
> link:
> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> >
> > I configure exactly as the link said, but when I create server use this
> > command, I found ERROR:
> > ```
> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
> >         --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
> >         --security-group default --key-name mykey provider-instance
> >
> >
> > > fault                               | {'code': 500, 'created':
> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are
> not
> > enough hosts available.', 'details': 'Traceback (most recent call
> last):\n
> >  File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line
> > 1548, in schedule_and_build_instances\n    host_lists =
> > self._schedule_instances(context, request_specs[0],\n  File
> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in
> > _schedule_instances\n    host_lists =
> > self.query_client.select_destinations(\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line 41,
> > in select_destinations\n    return
> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in
> > select_destinations\n    return cctxt.call(ctxt, \'select_destinations\',
> > **msg_args)\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189,
> in
> > call\n    result = self.transport._send(\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123,
> in
> > _send\n    return self._driver.send(target, ctxt, message,\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
> > line 689, in send\n    return self._send(target, ctxt, message,
> > wait_for_reply, timeout,\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
> > line 681, in _send\n    raise
> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was
> found.
> > There are not enough hosts available.\nTraceback (most recent call
> > last):\n\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 241,
> in
> > inner\n    return func(*args, **kwargs)\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in
> > select_destinations\n    selections = self._select_destinations(\n\n
> File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in
> > _select_destinations\n    selections = self._schedule(\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in
> > _schedule\n    self._ensure_sufficient_hosts(\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in
> > _ensure_sufficient_hosts\n    raise
> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No
> > valid host was found. There are not enough hosts available.\n\n'} |
> >
> > // this is what I configured:NovaInstance
> >
> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1
> > +----------------------------+------------------------------+
> > > Field                      | Value                        |
> > +----------------------------+------------------------------+
> > > OS-FLV-DISABLED:disabled   | False                        |
> > > OS-FLV-EXT-DATA:ephemeral  | 0                            |
> > > access_project_ids         | None                         |
> > > description                | None                         |
> > > disk                       | 1                            |
> > > id                         | 0                            |
> > > name                       | cirros-os-dpu-test-1         |
> > > os-flavor-access:is_public | True                         |
> > > properties                 | pci_passthrough:alias='a1:1' |
> > > ram                        | 64                           |
> > > rxtx_factor                | 1.0                          |
> > > swap                       |                              |
> > > vcpus                      | 1                            |
> > +----------------------------+------------------------------+
> >
> > // in controller node /etc/nova/nova.conf
> >
> > [filter_scheduler]
> > enabled_filters = PciPassthroughFilter
> > available_filters = nova.scheduler.filters.all_filters
> >
> > [pci]
> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > "physical_network": null, "remote_managed": "true"}
> > alias = { "vendor_id":"15b3", "product_id":"101e",
> "device_type":"type-VF",
> > "name":"a1" }
> >
> > // in compute node /etc/nova/nova.conf
> >
> > [pci]
> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > "physical_network": null, "remote_managed": "true"}
> > alias = { "vendor_id":"15b3", "product_id":"101e",
> "device_type":"type-VF",
> > "name":"a1" }
>
> "remote_managed": "true" is only valid for neutron sriov port
> not flavor based pci passhtough.
>
> so you need to use vnci_type=driect asusming you are trying to use
>
> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>
> which is not the same as generic pci passthough.
>
> if you just want to use geneic pci passthive via a flavor remove
> "remote_managed": "true"
>
> >
> > ```
> >
> > The detail ERROR I found is:
> > - The reason why "There are not enough hosts available" is,
> > nova-scheduler's log shows "There are 0 hosts available but 1 instances
> > requested to build", which means no hosts support PCI passthough feature.
> >
> > This is nova-schduler's log
> > ```
> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule
> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60']
> select_destinations
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141
> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] compute_status_filter
> > request filter added forbidden trait COMPUTE_STATUS_DISABLED
> > compute_status_filter
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254
> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'compute_status_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'accelerators_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'remote_managed_ports_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by
> >
> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
> > :: waited 0.000s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by
> >
> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
> > :: held 0.003s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode set
> > to
> >
> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
> > _check_effective_sql_mode
> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314
> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not
> found
> > for host c1c2. Not tracking instance info for this host.
> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
> 'c1c2')"
> > acquired by
> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" ::
> > waited 0.000s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> from
> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch":
> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features":
> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt",
> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand",
> "adx",
> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc",
> "xsavec",
> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", "avx2",
> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64",
> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov",
> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36",
> "clflushopt",
> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de",
> "invtsc",
> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse",
> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma",
> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid",
> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni",
> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe",
> "est",
> >
> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{"
> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{"
> > nova_object.name": "NUMACell", "nova_object.namespace": "nova",
> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [0,
> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7,
> 8,
> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0,
> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, 5],
> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0,
> "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0,
> "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}],
> > "network_metadata": {"nova_object.name": "NetworkMetadata",
> > "nova_object.namespace": "nova", "nova_object.version": "1.0",
> > "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id",
> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings",
> > "mempages", "memory"]}]}, "nova_object.changes":
> >
> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0)
> > _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > aggregates: [] _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a',
> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute',
> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None,
> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49,
> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61,
> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40,
> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2,
> 28,
> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted':
> > False} _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > instances: [] _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176
> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
> 'c1c2')"
> > "released" by
> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" ::
> > held 0.004s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1
> host(s)
> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ----
> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ----
> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545
> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546
> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI
> devices
> > left to satisfy request _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556
> > 2023-02-28 06:11:58.527 1942637 DEBUG
> > nova.scheduler.filters.pci_passthrough_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram:
> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required
> PCI
> > devices
> > (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest]))
> > host_passes
> >
> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52
> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter
> > PciPassthroughFilter returned 0 hosts
> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all
> > hosts for the request with instance ID
> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
> > [('PciPassthroughFilter', None)] get_filtered_objects
> > /usr/lib/python3/dist-packages/nova/filters.py:114
> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all
> > hosts for the request with instance ID
> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
> > ['PciPassthroughFilter: (start: 1, end: 0)']
> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered []
> > _get_sorted_hosts
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610
> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts
> > available but 1 instances requested to build. _ensure_sufficient_hosts
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450
> > ```
> >
> > Then I search database, I found PCI configure of compute node is not
> upload:
> > ```
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
> > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
> > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource class show PCI_DEVICE
> > +-------+------------+
> > > Field | Value      |
> > +-------+------------+
> > > name  | PCI_DEVICE |
> > +-------+------------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 1.5   |
> > > min_unit         | 1     |
> > > max_unit         | 31890 |
> > > reserved         | 512   |
> > > step_size        | 1     |
> > > total            | 31890 |
> > > used             | 0     |
> > +------------------+-------+
> >     ?? 31890 ????compute node resource tracker????????
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
> > ?^Cgyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 16.0  |
> > > min_unit         | 1     |
> > > max_unit         | 12    |
> > > reserved         | 0     |
> > > step_size        | 1     |
> > > total            | 12    |
> > > used             | 0     |
> > +------------------+-------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF
> > No inventory of class SRIOV_NET_VF for
> c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 1.0   |
> > > min_unit         | 1     |
> > > max_unit         | 456   |
> > > reserved         | 0     |
> > > step_size        | 1     |
> > > total            | 456   |
> > > used             | 0     |
> > +------------------+-------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS
> > No inventory of class IPV4_ADDRESS for
> c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> >
> > MariaDB [nova]> select * from compute_nodes;
> >
> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
> > > created_at          | updated_at          | deleted_at          | id |
> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used |
> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >                 | disk_available_least | free_ram_mb | free_disk_gb |
> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip
> >     | supported_instances
> >
> >
> >
> >
> >
> >
> >
> >
> >                                                | pci_stats
> >
> >
> > > metrics | extra_resources | stats                  | numa_topology
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >                  | host      | ram_allocation_ratio |
> cpu_allocation_ratio
> > > uuid                                 | disk_allocation_ratio | mapped |
> >
> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 |  1 |
> >     NULL |     4 |      3931 |       60 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov",
> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall",
> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku",
> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni",
> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms",
> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma",
> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f",
> "pclmuldq",
> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl",
> "clflushopt",
> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8",
> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl",
> "lahf_lm",
> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc",
> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", "vme",
> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush",
> > "rdrand"]} |                   49 |        3419 |           60 |
> >      0 |           0 | gyw                 |       1 | 192.168.2.99  |
> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
> > "hvm"], ["x86_64", "kvm", "hvm"]]
> >
> >
> >
> >
> >
> >
> >
> >                                                 | {"nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
> 2,
> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [],
> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "reserved", "size_kb", "total"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "reserved", "size_kb", "total"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null},
> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages",
> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket",
> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw
> >   |                  1.5 |                   16 |
> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b |                     1 |      1 |
> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 |  2 |
> >     NULL |     4 |      3931 |       60 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq",
> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic",
> "abm",
> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", "msr",
> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3",
> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx",
> "fpu",
> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves",
> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms",
> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx",
> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt",
> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb",
> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", "mmx",
> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities",
> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs",
> > "sep", "md-clear"]} |                   49 |        3419 |           60 |
> >              0 |           0 | c1c1                |       2 |
> 192.168.2.99
> >  | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
> > "hvm"], ["x86_64", "kvm", "hvm"]]
> >
> >
> >
> >
> >
> >
> >
> >                                                 | {"nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
> 2,
> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [],
> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "total", "size_kb", "reserved"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "total", "size_kb", "reserved"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null},
> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings",
> > "id", "mempages", "pinned_cpus", "memory", "pcpuset", "network_metadata",
> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1      |
> >        1.5 |                   16 | 1eac1c8d-d96a-4eeb-9868-5a341a80c6df
> |
> >                     1 |      0 |
> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 |  3 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni",
> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear",
> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no",
> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg",
> "smap",
> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16",
> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip",
> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq",
> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd",
> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae",
> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", "3dnowprefetch",
> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm",
> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx",
> "cx8",
> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2",
> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni",
> > "vaes", "aes"]} |                  416 |       31378 |          456 |
> >          0 |           0 | c-MS-7D42           |       3 | 192.168.2.99
> |
> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu",
> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
> > "reserved", "used", "size_kb"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
> > "reserved", "used", "size_kb"]}], "network_metadata": {"nova_object.name
> ":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id",
> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings",
> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 |
> >          1.5 |                   16 |
> f115a1c2-fda3-42c6-945a-8b54fef40daf
> > >                     1 |      0 |
> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 |  4 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no",
> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", "movdir64b",
> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16",
> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt",
> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2",
> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", "fsgsbase",
> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp",
> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx",
> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2",
> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", "pat",
> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all",
> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx",
> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", "ds_cpl",
> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni",
> "rdtscp"]} |
> >                  416 |       31378 |          456 |                0 |
> >       0 | c-MS-7D42           |       4 | 192.168.28.21 | [["alpha",
> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"],
> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"],
> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu",
> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "total", "used", "reserved"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket",
> > "pcpuset", "memory", "memory_usage", "id", "network_metadata",
> "cpu_usage",
> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >          1.5 |                   16 |
> 10ea8254-ad84-4db9-9acd-5c783cb8600e
> > >                     1 |      0 |
> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 |  5 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht",
> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall",
> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe",
> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx",
> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov",
> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed",
> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni",
> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt",
> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all",
> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme",
> "cx16",
> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline",
> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3",
> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq",
> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb",
> > "spec-ctrl", "waitpkg"]} |                  416 |       31378 |
> >  456 |                0 |           0 | c-MS-7D42           |       5 |
> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"],
> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu",
> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu",
> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"],
> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu",
> > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64",
> > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"],
> > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu",
> "hvm"],
> > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64",
> > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"],
> > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name": "PciDevicePoolList",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"objects": []}, "nova_object.changes": ["objects"]}
> |
> > []      | NULL            | {"failed_builds": "0"} | {"nova_object.name
> ":
> > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version":
> > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "size_kb", "total", "reserved"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "size_kb", "total", "reserved"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0},
> > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id",
> > "cpu_usage", "network_metadata", "siblings", "mempages", "socket",
> > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >                1.5 |                   16 |
> > 8efa100f-ab14-45fd-8c39-644b49772883 |                     1 |      0 |
> > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 |  6 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid",
> > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri",
> > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce",
> > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2",
> > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep",
> > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni",
> > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni",
> > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes",
> > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic",
> > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor",
> > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid",
> "xtpr",
> > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl",
> > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16",
> > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat",
> > "tsc-deadline"]} |                  416 |       31378 |          456 |
> >            0 |           0 | c-MS-7D42           |       6 |
> 192.168.28.21
> > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64",
> "qemu",
> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus",
> > "network_metadata", "pcpuset", "cpuset", "siblings", "socket",
> "cpu_usage",
> > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >    1.5 |                   16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 |
> >                 1 |      0 |
> > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 |  7 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid", "pse36",
> > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm",
> > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves", "erms",
> > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2",
> > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1",
> > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce",
> "md-clear",
> > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse",
> > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq",
> > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes",
> > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu",
> "sse4.2",
> > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est",
> "rdctl-no",
> > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt",
> "rdrand",
> > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx", "vme",
> > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} |
> >                416 |       31378 |          456 |                0 |
> >     0 | c-MS-7D42           |       7 | 192.168.28.21 | [["alpha",
> "qemu",
> > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris",
> > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["lm32",
> > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"],
> > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel",
> > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"],
> > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu",
> > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb",
> "qemu",
> > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> ["unicore32",
> > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"],
> > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved",
> > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/ffd1cfdc/attachment-0001.htm>

From batmanustc at gmail.com  Wed Mar  1 01:41:42 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Wed, 1 Mar 2023 09:41:42 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z1=TNdM-C-vLbUEUz5t9Bgw8xhAS8MpNFW4DVPDbiQN3Q@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z1=TNdM-C-vLbUEUz5t9Bgw8xhAS8MpNFW4DVPDbiQN3Q@mail.gmail.com>
Message-ID: <CAOE=1Z2x2fZAQWuBM0cq=tSLJ5KzW3dB7YA8q9MxuyWrbs16xw@mail.gmail.com>

Sorry, it should be

But in "openstack server create ...", how to specify THE VF ?
----
Simon Jones


Simon Jones <batmanustc at gmail.com> ?2023?3?1??? 09:34???

> You got the point what I want to say ! Let me explain more:
>
> 1. The hole story is I want to deploy openstack Yoga, and the compute node
> use DPU (BF2, BlueFiled2). So I follow this link:
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> 2. After deploy exactly as the link said, I use these command to create a
> VM (which also called instance).
> ```
> openstack network create selfservice
> openstack subnet create --network selfservice --subnet-range
> 192.168.1.0/24 selfservice-v4
> openstack port create --network selfservice --vnic-type remote-managed \
> --binding-profile '{"pci_vendor_info":"", "pci_slot":"",
> "physical_network":"", "card_serial_number": "AB0123XX0042",
> "pf_mac_address": "08:c0:eb:8e:bd:f4", "vf_num":1, "vnic_type":
> "remote-managed"}' \
> pf0vf0
> openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1
> cirros-os-dpu-test-1 --property "pci_passthrough:alias"="a1:2"
>
>     All command above pass.
>
> openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
>         --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
>         --security-group default provider-instance
>
>     This command got ERROR. The ERROR is shown in the first email.
> ```
> 3. So I have few questions:
> ```
> question 1: Why got ERROR? Why "No valid host was found"?
> question 2: When I run command "openstack port create ...", I could
> specify which VF-rep port (virtual function's representor port) plug into
> br-int in DPU's OVS. As normal operate, I should start VM and plug THE VF
> into VM. But in "openstack port create ...", how to specify THE VF ?
> ```
> 4. For question 1, I debug it as the first mail said. And I will check the
> second email to solve it.
> 5. For question 2, I have no idea, as these is no document to refer this
> question. What should I do ?
>
>
> ----
> Simon Jones
>
>
> Sean Mooney <smooney at redhat.com> ?2023?3?1??? 01:18???
>
>> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote:
>> > Hi all,
>> >
>> > I'm working on openstack Yoga's PCI passthrough feature, follow this
>> link:
>> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>> >
>> > I configure exactly as the link said, but when I create server use this
>> > command, I found ERROR:
>> > ```
>> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
>> >         --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
>> >         --security-group default --key-name mykey provider-instance
>> >
>> >
>> > > fault                               | {'code': 500, 'created':
>> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are
>> not
>> > enough hosts available.', 'details': 'Traceback (most recent call
>> last):\n
>> >  File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line
>> > 1548, in schedule_and_build_instances\n    host_lists =
>> > self._schedule_instances(context, request_specs[0],\n  File
>> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in
>> > _schedule_instances\n    host_lists =
>> > self.query_client.select_destinations(\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line
>> 41,
>> > in select_destinations\n    return
>> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in
>> > select_destinations\n    return cctxt.call(ctxt,
>> \'select_destinations\',
>> > **msg_args)\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line
>> 189, in
>> > call\n    result = self.transport._send(\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123,
>> in
>> > _send\n    return self._driver.send(target, ctxt, message,\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
>> > line 689, in send\n    return self._send(target, ctxt, message,
>> > wait_for_reply, timeout,\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
>> > line 681, in _send\n    raise
>> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was
>> found.
>> > There are not enough hosts available.\nTraceback (most recent call
>> > last):\n\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line
>> 241, in
>> > inner\n    return func(*args, **kwargs)\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in
>> > select_destinations\n    selections = self._select_destinations(\n\n
>> File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in
>> > _select_destinations\n    selections = self._schedule(\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in
>> > _schedule\n    self._ensure_sufficient_hosts(\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in
>> > _ensure_sufficient_hosts\n    raise
>> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No
>> > valid host was found. There are not enough hosts available.\n\n'} |
>> >
>> > // this is what I configured:NovaInstance
>> >
>> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1
>> > +----------------------------+------------------------------+
>> > > Field                      | Value                        |
>> > +----------------------------+------------------------------+
>> > > OS-FLV-DISABLED:disabled   | False                        |
>> > > OS-FLV-EXT-DATA:ephemeral  | 0                            |
>> > > access_project_ids         | None                         |
>> > > description                | None                         |
>> > > disk                       | 1                            |
>> > > id                         | 0                            |
>> > > name                       | cirros-os-dpu-test-1         |
>> > > os-flavor-access:is_public | True                         |
>> > > properties                 | pci_passthrough:alias='a1:1' |
>> > > ram                        | 64                           |
>> > > rxtx_factor                | 1.0                          |
>> > > swap                       |                              |
>> > > vcpus                      | 1                            |
>> > +----------------------------+------------------------------+
>> >
>> > // in controller node /etc/nova/nova.conf
>> >
>> > [filter_scheduler]
>> > enabled_filters = PciPassthroughFilter
>> > available_filters = nova.scheduler.filters.all_filters
>> >
>> > [pci]
>> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>> > "physical_network": null, "remote_managed": "true"}
>> > alias = { "vendor_id":"15b3", "product_id":"101e",
>> "device_type":"type-VF",
>> > "name":"a1" }
>> >
>> > // in compute node /etc/nova/nova.conf
>> >
>> > [pci]
>> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>> > "physical_network": null, "remote_managed": "true"}
>> > alias = { "vendor_id":"15b3", "product_id":"101e",
>> "device_type":"type-VF",
>> > "name":"a1" }
>>
>> "remote_managed": "true" is only valid for neutron sriov port
>> not flavor based pci passhtough.
>>
>> so you need to use vnci_type=driect asusming you are trying to use
>>
>> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>>
>> which is not the same as generic pci passthough.
>>
>> if you just want to use geneic pci passthive via a flavor remove
>> "remote_managed": "true"
>>
>> >
>> > ```
>> >
>> > The detail ERROR I found is:
>> > - The reason why "There are not enough hosts available" is,
>> > nova-scheduler's log shows "There are 0 hosts available but 1 instances
>> > requested to build", which means no hosts support PCI passthough
>> feature.
>> >
>> > This is nova-schduler's log
>> > ```
>> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule
>> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60']
>> select_destinations
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141
>> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default]
>> compute_status_filter
>> > request filter added forbidden trait COMPUTE_STATUS_DISABLED
>> > compute_status_filter
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254
>> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'compute_status_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'accelerators_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'remote_managed_ports_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
>> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by
>> >
>> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
>> > :: waited 0.000s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
>> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
>> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by
>> >
>> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
>> > :: held 0.003s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
>> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode
>> set
>> > to
>> >
>> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
>> > _check_effective_sql_mode
>> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314
>> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not
>> found
>> > for host c1c2. Not tracking instance info for this host.
>> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
>> 'c1c2')"
>> > acquired by
>> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
>> ::
>> > waited 0.000s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
>> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> from
>> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch":
>> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel",
>> "topology":
>> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features":
>> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt",
>> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand",
>> "adx",
>> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc",
>> "xsavec",
>> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase",
>> "avx2",
>> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64",
>> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov",
>> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36",
>> "clflushopt",
>> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de",
>> "invtsc",
>> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse",
>> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma",
>> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid",
>> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni",
>> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe",
>> "est",
>> >
>> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{"
>> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{"
>> > nova_object.name": "NUMACell", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset":
>> [0,
>> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7,
>> 8,
>> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0,
>> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4,
>> 5],
>> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0,
>> "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes":
>> ["size_kb",
>> > "used", "reserved", "total"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0,
>> "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}],
>> > "network_metadata": {"nova_object.name": "NetworkMetadata",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.0",
>> > "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id",
>> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings",
>> > "mempages", "memory"]}]}, "nova_object.changes":
>> >
>> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0)
>> > _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > aggregates: [] _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a',
>> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute',
>> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None,
>> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49,
>> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61,
>> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40,
>> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2,
>> 28,
>> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted':
>> > False} _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > instances: [] _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176
>> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
>> 'c1c2')"
>> > "released" by
>> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
>> ::
>> > held 0.004s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
>> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1
>> host(s)
>> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ----
>> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ----
>> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545
>> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546
>> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI
>> devices
>> > left to satisfy request _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556
>> > 2023-02-28 06:11:58.527 1942637 DEBUG
>> > nova.scheduler.filters.pci_passthrough_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram:
>> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required
>> PCI
>> > devices
>> > (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest]))
>> > host_passes
>> >
>> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52
>> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter
>> > PciPassthroughFilter returned 0 hosts
>> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed
>> all
>> > hosts for the request with instance ID
>> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
>> > [('PciPassthroughFilter', None)] get_filtered_objects
>> > /usr/lib/python3/dist-packages/nova/filters.py:114
>> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed
>> all
>> > hosts for the request with instance ID
>> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
>> > ['PciPassthroughFilter: (start: 1, end: 0)']
>> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered []
>> > _get_sorted_hosts
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610
>> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts
>> > available but 1 instances requested to build. _ensure_sufficient_hosts
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450
>> > ```
>> >
>> > Then I search database, I found PCI configure of compute node is not
>> upload:
>> > ```
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
>> > No inventory of class PCI_DEVICE for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
>> > No inventory of class PCI_DEVICE for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource class show PCI_DEVICE
>> > +-------+------------+
>> > > Field | Value      |
>> > +-------+------------+
>> > > name  | PCI_DEVICE |
>> > +-------+------------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 1.5   |
>> > > min_unit         | 1     |
>> > > max_unit         | 31890 |
>> > > reserved         | 512   |
>> > > step_size        | 1     |
>> > > total            | 31890 |
>> > > used             | 0     |
>> > +------------------+-------+
>> >     ?? 31890 ????compute node resource tracker????????
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
>> > ?^Cgyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 16.0  |
>> > > min_unit         | 1     |
>> > > max_unit         | 12    |
>> > > reserved         | 0     |
>> > > step_size        | 1     |
>> > > total            | 12    |
>> > > used             | 0     |
>> > +------------------+-------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF
>> > No inventory of class SRIOV_NET_VF for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 1.0   |
>> > > min_unit         | 1     |
>> > > max_unit         | 456   |
>> > > reserved         | 0     |
>> > > step_size        | 1     |
>> > > total            | 456   |
>> > > used             | 0     |
>> > +------------------+-------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS
>> > No inventory of class IPV4_ADDRESS for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> >
>> > MariaDB [nova]> select * from compute_nodes;
>> >
>> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
>> > > created_at          | updated_at          | deleted_at          | id |
>> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used
>> |
>> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                 | disk_available_least | free_ram_mb | free_disk_gb |
>> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip
>> >     | supported_instances
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                | pci_stats
>> >
>> >
>> > > metrics | extra_resources | stats                  | numa_topology
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                  | host      | ram_allocation_ratio |
>> cpu_allocation_ratio
>> > > uuid                                 | disk_allocation_ratio | mapped
>> |
>> >
>> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
>> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 |  1 |
>> >     NULL |     4 |      3931 |       60 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov",
>> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall",
>> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku",
>> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni",
>> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms",
>> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma",
>> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f",
>> "pclmuldq",
>> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl",
>> "clflushopt",
>> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8",
>> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl",
>> "lahf_lm",
>> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc",
>> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx",
>> "vme",
>> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush",
>> > "rdrand"]} |                   49 |        3419 |           60 |
>> >      0 |           0 | gyw                 |       1 | 192.168.2.99  |
>> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
>> > "hvm"], ["x86_64", "kvm", "hvm"]]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                 | {"nova_object.name":
>> > "PciDevicePoolList", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
>> 2,
>> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus":
>> [],
>> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "reserved", "size_kb", "total"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "reserved", "size_kb", "total"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null},
>> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages",
>> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket",
>> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw
>> >   |                  1.5 |                   16 |
>> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b |                     1 |      1 |
>> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 |  2 |
>> >     NULL |     4 |      3931 |       60 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq",
>> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic",
>> "abm",
>> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme",
>> "msr",
>> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3",
>> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx",
>> "fpu",
>> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves",
>> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms",
>> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx",
>> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt",
>> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb",
>> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36",
>> "mmx",
>> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities",
>> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs",
>> > "sep", "md-clear"]} |                   49 |        3419 |           60
>> |
>> >              0 |           0 | c1c1                |       2 |
>> 192.168.2.99
>> >  | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
>> > "hvm"], ["x86_64", "kvm", "hvm"]]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                 | {"nova_object.name":
>> > "PciDevicePoolList", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
>> 2,
>> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus":
>> [],
>> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "total", "size_kb", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "total", "size_kb", "reserved"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null},
>> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings",
>> > "id", "mempages", "pinned_cpus", "memory", "pcpuset",
>> "network_metadata",
>> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1      |
>> >        1.5 |                   16 |
>> 1eac1c8d-d96a-4eeb-9868-5a341a80c6df |
>> >                     1 |      0 |
>> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 |  3 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni",
>> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear",
>> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no",
>> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg",
>> "smap",
>> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16",
>> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip",
>> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq",
>> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd",
>> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae",
>> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr",
>> "3dnowprefetch",
>> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm",
>> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx",
>> "cx8",
>> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2",
>> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni",
>> > "vaes", "aes"]} |                  416 |       31378 |          456 |
>> >          0 |           0 | c-MS-7D42           |       3 |
>> 192.168.2.99  |
>> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64",
>> "qemu",
>> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686",
>> "kvm",
>> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
>> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu",
>> "hvm"],
>> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
>> "qemu",
>> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
>> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
>> ["sh4eb",
>> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
>> "kvm",
>> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> > nova_object.name": "PciDevicePoolList", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
>> > "reserved", "used", "size_kb"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
>> > "reserved", "used", "size_kb"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id",
>> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings",
>> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 |
>> >          1.5 |                   16 |
>> f115a1c2-fda3-42c6-945a-8b54fef40daf
>> > >                     1 |      0 |
>> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 |  4 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no",
>> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq",
>> "movdir64b",
>> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16",
>> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt",
>> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2",
>> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc",
>> "fsgsbase",
>> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp",
>> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx",
>> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2",
>> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry",
>> "pat",
>> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all",
>> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx",
>> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm",
>> "ds_cpl",
>> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni",
>> "rdtscp"]} |
>> >                  416 |       31378 |          456 |                0 |
>> >       0 | c-MS-7D42           |       4 | 192.168.28.21 | [["alpha",
>> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"],
>> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
>> "hvm"],
>> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu",
>> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
>> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
>> "qemu",
>> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
>> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
>> ["sh4eb",
>> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
>> "kvm",
>> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> > nova_object.name": "PciDevicePoolList", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
>> > "total", "used", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes":
>> ["size_kb",
>> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name
>> ":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket",
>> > "pcpuset", "memory", "memory_usage", "id", "network_metadata",
>> "cpu_usage",
>> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
>> >          1.5 |                   16 |
>> 10ea8254-ad84-4db9-9acd-5c783cb8600e
>> > >                     1 |      0 |
>> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 |  5 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht",
>> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall",
>> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe",
>> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx",
>> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov",
>> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed",
>> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni",
>> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt",
>> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all",
>> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme",
>> "cx16",
>> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline",
>> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3",
>> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq",
>> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb",
>> > "spec-ctrl", "waitpkg"]} |                  416 |       31378 |
>> >  456 |                0 |           0 | c-MS-7D42           |       5 |
>> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"],
>> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu",
>> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k",
>> "qemu",
>> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"],
>> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu",
>> > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64",
>> > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"],
>> > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu",
>> "hvm"],
>> > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64",
>> > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"],
>> > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name":
>> "PciDevicePoolList",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"objects": []}, "nova_object.changes":
>> ["objects"]} |
>> > []      | NULL            | {"failed_builds": "0"} | {"nova_object.name
>> ":
>> > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version":
>> > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "size_kb", "total", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "size_kb", "total", "reserved"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0},
>> > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id",
>> > "cpu_usage", "network_metadata", "siblings", "mempages", "socket",
>> > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
>> >                1.5 |                   16 |
>> > 8efa100f-ab14-45fd-8c39-644b49772883 |                     1 |      0 |
>> > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 |  6 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid",
>> > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri",
>> > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce",
>> > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2",
>> > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep",
>> > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni",
>> > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni",
>> > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes",
>> > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic",
>> > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor",
>> > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid",
>> "xtpr",
>> > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl",
>> > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16",
>> > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat",
>> > "tsc-deadline"]} |                  416 |       31378 |          456 |
>> >            0 |           0 | c-MS-7D42           |       6 |
>> 192.168.28.21
>> > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64",
>> "qemu",
>> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686",
>> "kvm",
>> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
>> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu",
>> "hvm"],
>> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
>> "qemu",
>> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
>> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
>> ["sh4eb",
>> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
>> "kvm",
>> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> > nova_object.name": "PciDevicePoolList", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
>> > "used", "total", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes":
>> ["size_kb",
>> > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name
>> ":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus",
>> > "network_metadata", "pcpuset", "cpuset", "siblings", "socket",
>> "cpu_usage",
>> > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
>> >    1.5 |                   16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 |
>> >                 1 |      0 |
>> > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 |  7 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid",
>> "pse36",
>> > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm",
>> > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves",
>> "erms",
>> > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2",
>> > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1",
>> > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce",
>> "md-clear",
>> > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse",
>> > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq",
>> > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes",
>> > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu",
>> "sse4.2",
>> > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est",
>> "rdctl-no",
>> > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt",
>> "rdrand",
>> > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx",
>> "vme",
>> > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} |
>> >                416 |       31378 |          456 |                0 |
>> >     0 | c-MS-7D42           |       7 | 192.168.28.21 | [["alpha",
>> "qemu",
>> > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris",
>> > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"],
>> ["lm32",
>> > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"],
>> > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel",
>> > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"],
>> > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu",
>> > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb",
>> "qemu",
>> > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> ["unicore32",
>> > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"],
>> > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> nova_object.name":
>> > "PciDevicePoolList", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved",
>> > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/86ff56b1/attachment-0001.htm>

From uday.dikshit at myrealdata.in  Wed Mar  1 05:24:16 2023
From: uday.dikshit at myrealdata.in (Uday Dikshit)
Date: Wed, 1 Mar 2023 05:24:16 +0000
Subject: Autoscaling in Kolla Ansible Wallaby Openstack release
In-Reply-To: <CAPgF-foMQXi_Fg1o0oburp8qsfvqe3GtZxsEe3f-7nmUuBEP5w@mail.gmail.com>
References: <BMXPR01MB242469F197BB484B143A6D0A8AAC9@BMXPR01MB2424.INDPRD01.PROD.OUTLOOK.COM>
 <CAPd_6As3b3JATgxiAjtiJBaqSjBHFZZEOSN3LRhAz+Q5M9YfOQ@mail.gmail.com>
 <CAPgF-foMQXi_Fg1o0oburp8qsfvqe3GtZxsEe3f-7nmUuBEP5w@mail.gmail.com>
Message-ID: <BMXPR01MB2424DAF4CF928F1E64A4E5ED8AAD9@BMXPR01MB2424.INDPRD01.PROD.OUTLOOK.COM>

I am currently working on a similar approach with Senlin, gnocchi and aodh, but I find gnocchi metrics inconsistent with data points. Hence autoscaling is working fine sometime but then not at all responding in some cases. So I was looking for another approach where we could achieve the same goal with quality in data.

Sent from Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Satish Patel <satish.txt at gmail.com>
Sent: Wednesday, March 1, 2023 3:05:12 AM
To: Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
Cc: Uday Dikshit <uday.dikshit at myrealdata.in>; openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: Autoscaling in Kolla Ansible Wallaby Openstack release

I did some lab work with senlin and its awesome project. I did deploy with OSA (openstack-ansible) - https://satishdotpatel.github.io/openstack-senlin-autoscaling/<https://urldefense.proofpoint.com/v2/url?u=https-3A__satishdotpatel.github.io_openstack-2Dsenlin-2Dautoscaling_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=ZjuQMZ7ZGfqrgFYes0S8Lke4m3-Qilm08iXc8hRhTdU&e=>

On Tue, Feb 28, 2023 at 2:53?PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com<mailto:noonedeadpunk at gmail.com>> wrote:
Hey,

There's an OpenStack project called Senlin [1] that provides auto-scaling of customer environments by leveraging heat templates. I have no idea if kolla does support it's deployment or not though.

[1] https://docs.openstack.org/senlin<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_senlin&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=oriKqHB4Kgawx3DL9WYWYO7tRJt3GcXSn54qiek_rSI&e=>

??, 28 ????. 2023??. ? 18:54, Uday Dikshit <uday.dikshit at myrealdata.in<mailto:uday.dikshit at myrealdata.in>>:
Hello Team
As a public cloud service providers our aim is to provide our customers with autoscaling for instances feature. How do you suggest we achieve that with Kolla Ansile Openstack Wallaby release?

Thanks & Regards,
[https://acefone.com/email-signature/logo-new.png]
[https://acefone.com/email-signature/facebook.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_real.time.data.services_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=MIyuZ6E4UGXNuTNz-Z6NjVm1AlsbXRViKZ89TOGfFKg&e=>
[https://acefone.com/email-signature/linkedin.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_real-2Dtime-2Ddata-2Dservices_mycompany_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=J5ZTtEzPCSk8l03QuBb0N_KJwV54MqzDPBukahzAyIw&e=>
[https://acefone.com/email-signature/twitter.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_rtdsindia&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=PdLgbncA0jO3XZlqR6FVS4Dr45OCZGYzWUYtX6wUnyM&e=>
[https://acefone.com/email-signature/youtube.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCq93lA3ch6Pt5GzOAhRwgpw&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=q0FNpp2EZYaIqwKHOhtXU4XIk-FvokQPmUfT0kTYRr0&e=>
[https://acefone.com/email-signature/glassdoor.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.glassdoor.co.in_Reviews_Real-2DTime-2DData-2DServices-2DReviews-2DE527061.htm&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=mCKXP0zaoWpRUZk7uPXo-O-CD8RavRGGV46QhSkxDQ4&e=>
Uday Dikshit
Cloud DevOps Engineer, Product Development
uday.dikshit at myrealdata.in<mailto:uday.dikshit at myrealdata.in>
www.myrealdata.in<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.myrealdata.in_&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=58AkTXQK-t27foam0JxQj_p2S7oML-5RlT2bY2LISOk&m=9itz3uw-UmvbsgC3DLVVNrosemGLmeS_9LiaW7mcdOg&s=RdDmEdX6Xixc-B8GMYeAw8BUyM7UpVwWC7nW0Q_achE&e=>
809-A Udyog Vihar,
Phase 5, Gurugram - 122015, Haryana


________________________________

This email has been scanned for spam and viruses by Proofpoint Essentials. Click here<https://us1.proofpointessentials.com/index01.php?mod_id=11&mod_option=logitem&mail_id=1677620124-IDMTJh_p1oUy&r_address=uday.dikshit%40myrealdata.in&report=1> to report this email as spam.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/d491c215/attachment-0001.htm>

From batmanustc at gmail.com  Wed Mar  1 06:51:36 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Wed, 1 Mar 2023 14:51:36 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
Message-ID: <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>

Hi,

1. I try the 2nd method, which remove "remote-managed" tag in
/etc/nova/nova.conf, but got ERROR in creating VM in compute node's
nova-compute service. Detail log refer to LOG-1 section bellow, I think
it's because hypervisor has no neutron-agent as I use DPU, neutron
anget?which is ovn-controller? is on DPU. Is right ?

2. So I want to try the 1st method in the email, which is use
vnic-type=direct. BUT, HOW TO USE ? IS THERE ANY DOCUMENT ?

THANKS.

LOG-1, which is compute node's nova-compute.log

> ```
> 2023-03-01 14:24:02.631 504488 DEBUG oslo_concurrency.processutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Running cmd
> (subprocess): /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824
> --cpu=30 -- env LC_ALL=C LANG=C qemu-img info
> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk
> --force-share --output=json execute
> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:384
> 2023-03-01 14:24:02.654 504488 DEBUG oslo_concurrency.processutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] CMD "/usr/bin/python3
> -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C
> qemu-img info
> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk
> --force-share --output=json" returned: 0 in 0.023s execute
> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:422
> 2023-03-01 14:24:02.655 504488 DEBUG nova.virt.disk.api
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Cannot resize image
> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk to a
> smaller size. can_resize_image
> /usr/lib/python3/dist-packages/nova/virt/disk/api.py:172
> 2023-03-01 14:24:02.655 504488 DEBUG nova.objects.instance
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lazy-loading
> 'migration_context' on Instance uuid a2603eeb-8db0-489b-ba40-dff1d74be21f
> obj_load_attr /usr/lib/python3/dist-packages/nova/objects/instance.py:1099
> 2023-03-01 14:24:02.673 504488 DEBUG nova.virt.libvirt.driver
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Created local disks _create_image
> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4768
> 2023-03-01 14:24:02.674 504488 DEBUG nova.virt.libvirt.driver
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Ensure instance console log exists:
> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/console.log
> _ensure_console_log_for_instance
> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4531
> 2023-03-01 14:24:02.674 504488 DEBUG oslo_concurrency.lockutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources"
> acquired by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" ::
> waited 0.000s inner
> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
> 2023-03-01 14:24:02.675 504488 DEBUG oslo_concurrency.lockutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources"
> "released" by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" ::
> held 0.000s inner
> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Instance failed network
> setup after 1 attempt(s): nova.exception.PortBindingFailed: Binding failed
> for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs
> for more information.
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager Traceback (most
> recent call last):
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
> _allocate_network_async
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     nwinfo =
> self.network_api.allocate_for_instance(
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in
> allocate_for_instance
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> created_port_ids = self._update_ports_for_instance(
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in
> _update_ports_for_instance
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     vif.destroy()
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in
> __exit__
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> self.force_reraise()
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in
> force_reraise
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     raise
> self.value
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in
> _update_ports_for_instance
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     updated_port
> = self._update_port(
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in
> _update_port
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> _ensure_no_port_binding_failure(port)
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in
> _ensure_no_port_binding_failure
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     raise
> exception.PortBindingFailed(port_id=port['id'])
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> nova.exception.PortBindingFailed: Binding failed for port
> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
> information.
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> nova.exception.PortBindingFailed: Binding failed for port
> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
> information.
> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance failed to spawn:
> nova.exception.PortBindingFailed: Binding failed for port
> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
> information.
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Traceback (most recent call last):
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2743, in
> _build_resources
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     yield resources
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2503, in
> _build_and_run_instance
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.driver.spawn(context,
> instance, image_meta,
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4329, in
> spawn
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     xml =
> self._get_guest_xml(context, instance, network_info,
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7288, in
> _get_guest_xml
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     network_info_str =
> str(network_info)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/model.py", line 620, in __str__
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self._sync_wrapper(fn,
> *args, **kwargs)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/model.py", line 603, in
> _sync_wrapper
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.wait()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/model.py", line 635, in wait
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self[:] = self._gt.wait()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 181, in wait
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self._exit_event.wait()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = hub.switch()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self.greenlet.switch()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 221, in main
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = function(*args, **kwargs)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return func(*args, **kwargs)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in
> _allocate_network_async
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise e
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
> _allocate_network_async
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = function(*args, **kwargs)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return func(*args, **kwargs)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in
> _allocate_network_async
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise e
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
> _allocate_network_async
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     nwinfo =
> self.network_api.allocate_for_instance(
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in
> allocate_for_instance
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     created_port_ids =
> self._update_ports_for_instance(
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in
> _update_ports_for_instance
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     vif.destroy()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in
> __exit__
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.force_reraise()
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in
> force_reraise
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise self.value
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in
> _update_ports_for_instance
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     updated_port = self._update_port(
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in
> _update_port
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]
> _ensure_no_port_binding_failure(port)
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in
> _ensure_no_port_binding_failure
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise
> exception.PortBindingFailed(port_id=port['id'])
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] nova.exception.PortBindingFailed:
> Binding failed for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check
> neutron logs for more information.
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]
> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance
> a073-e0f1c61fe178, please check neutron logs for more information.
> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f]
> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance
> 2023-03-01 14:24:03.349 504488 DEBUG oslo_concurrency.lockutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Acquired lock
> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock
> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:294
> 2023-03-01 14:24:03.350 504488 DEBUG nova.network.neutron
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Building network info cache for
> instance _get_instance_nw_info
> /usr/lib/python3/dist-packages/nova/network/neutron.py:2014
> 2023-03-01 14:24:03.431 504488 DEBUG nova.network.neutron
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance cache missing network info.
> _get_preexisting_port_ids
> /usr/lib/python3/dist-packages/nova/network/neutron.py:3327
> 2023-03-01 14:24:03.624 504488 DEBUG nova.network.neutron
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Updating instance_info_cache with
> network_info: [] update_instance_cache_with_nw_info
> /usr/lib/python3/dist-packages/nova/network/neutron.py:117
> 2023-03-01 14:24:03.638 504488 DEBUG oslo_concurrency.lockutils
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] Releasing lock
> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock
> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:312
> 2023-03-01 14:24:03.639 504488 DEBUG nova.compute.manager
> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
> a2603eeb-8db0-489b-ba40-dff1d74be21f] Start destroying the instance on the
> hypervisor. _shutdown_instance
> /usr/lib/python3/dist-packages/nova/compute/manager.py:2999
> 2023-03-01 14:24:03.648 504488 DEBUG nova.virt.libvirt.driver [-]
> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] During wait destroy,
> instance disappeared. _wait_for_destroy
> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:1483
> 2023-03-01 14:24:03.648 504488 INFO nova.virt.libvirt.driver [-]
> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance destroyed
> successfully.
> ```
>

----
Simon Jones


Sean Mooney <smooney at redhat.com> ?2023?3?1??? 01:18???

> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote:
> > Hi all,
> >
> > I'm working on openstack Yoga's PCI passthrough feature, follow this
> link:
> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> >
> > I configure exactly as the link said, but when I create server use this
> > command, I found ERROR:
> > ```
> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
> >         --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
> >         --security-group default --key-name mykey provider-instance
> >
> >
> > > fault                               | {'code': 500, 'created':
> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are
> not
> > enough hosts available.', 'details': 'Traceback (most recent call
> last):\n
> >  File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line
> > 1548, in schedule_and_build_instances\n    host_lists =
> > self._schedule_instances(context, request_specs[0],\n  File
> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in
> > _schedule_instances\n    host_lists =
> > self.query_client.select_destinations(\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line 41,
> > in select_destinations\n    return
> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in
> > select_destinations\n    return cctxt.call(ctxt, \'select_destinations\',
> > **msg_args)\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189,
> in
> > call\n    result = self.transport._send(\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123,
> in
> > _send\n    return self._driver.send(target, ctxt, message,\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
> > line 689, in send\n    return self._send(target, ctxt, message,
> > wait_for_reply, timeout,\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
> > line 681, in _send\n    raise
> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was
> found.
> > There are not enough hosts available.\nTraceback (most recent call
> > last):\n\n  File
> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line 241,
> in
> > inner\n    return func(*args, **kwargs)\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in
> > select_destinations\n    selections = self._select_destinations(\n\n
> File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in
> > _select_destinations\n    selections = self._schedule(\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in
> > _schedule\n    self._ensure_sufficient_hosts(\n\n  File
> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in
> > _ensure_sufficient_hosts\n    raise
> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No
> > valid host was found. There are not enough hosts available.\n\n'} |
> >
> > // this is what I configured:NovaInstance
> >
> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1
> > +----------------------------+------------------------------+
> > > Field                      | Value                        |
> > +----------------------------+------------------------------+
> > > OS-FLV-DISABLED:disabled   | False                        |
> > > OS-FLV-EXT-DATA:ephemeral  | 0                            |
> > > access_project_ids         | None                         |
> > > description                | None                         |
> > > disk                       | 1                            |
> > > id                         | 0                            |
> > > name                       | cirros-os-dpu-test-1         |
> > > os-flavor-access:is_public | True                         |
> > > properties                 | pci_passthrough:alias='a1:1' |
> > > ram                        | 64                           |
> > > rxtx_factor                | 1.0                          |
> > > swap                       |                              |
> > > vcpus                      | 1                            |
> > +----------------------------+------------------------------+
> >
> > // in controller node /etc/nova/nova.conf
> >
> > [filter_scheduler]
> > enabled_filters = PciPassthroughFilter
> > available_filters = nova.scheduler.filters.all_filters
> >
> > [pci]
> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > "physical_network": null, "remote_managed": "true"}
> > alias = { "vendor_id":"15b3", "product_id":"101e",
> "device_type":"type-VF",
> > "name":"a1" }
> >
> > // in compute node /etc/nova/nova.conf
> >
> > [pci]
> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > "physical_network": null, "remote_managed": "true"}
> > alias = { "vendor_id":"15b3", "product_id":"101e",
> "device_type":"type-VF",
> > "name":"a1" }
>
> "remote_managed": "true" is only valid for neutron sriov port
> not flavor based pci passhtough.
>
> so you need to use vnci_type=driect asusming you are trying to use
>
> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>
> which is not the same as generic pci passthough.
>
> if you just want to use geneic pci passthive via a flavor remove
> "remote_managed": "true"
>
> >
> > ```
> >
> > The detail ERROR I found is:
> > - The reason why "There are not enough hosts available" is,
> > nova-scheduler's log shows "There are 0 hosts available but 1 instances
> > requested to build", which means no hosts support PCI passthough feature.
> >
> > This is nova-schduler's log
> > ```
> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule
> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60']
> select_destinations
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141
> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] compute_status_filter
> > request filter added forbidden trait COMPUTE_STATUS_DISABLED
> > compute_status_filter
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254
> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'compute_status_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'accelerators_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
> > 'remote_managed_ports_filter' took 0.0 seconds wrapper
> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by
> >
> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
> > :: waited 0.000s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by
> >
> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
> > :: held 0.003s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode set
> > to
> >
> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
> > _check_effective_sql_mode
> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314
> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not
> found
> > for host c1c2. Not tracking instance info for this host.
> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
> 'c1c2')"
> > acquired by
> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" ::
> > waited 0.000s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> from
> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch":
> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features":
> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt",
> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand",
> "adx",
> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc",
> "xsavec",
> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase", "avx2",
> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64",
> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov",
> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36",
> "clflushopt",
> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de",
> "invtsc",
> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse",
> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma",
> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid",
> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni",
> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe",
> "est",
> >
> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{"
> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{"
> > nova_object.name": "NUMACell", "nova_object.namespace": "nova",
> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [0,
> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7,
> 8,
> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0,
> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4, 5],
> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0,
> "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0,
> "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}],
> > "network_metadata": {"nova_object.name": "NetworkMetadata",
> > "nova_object.namespace": "nova", "nova_object.version": "1.0",
> > "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id",
> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings",
> > "mempages", "memory"]}]}, "nova_object.changes":
> >
> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0)
> > _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > aggregates: [] _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a',
> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute',
> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None,
> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49,
> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61,
> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40,
> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2,
> 28,
> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted':
> > False} _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173
> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
> with
> > instances: [] _locked_update
> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176
> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
> 'c1c2')"
> > "released" by
> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" ::
> > held 0.004s inner
> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1
> host(s)
> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ----
> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543
> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ----
> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545
> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546
> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI
> devices
> > left to satisfy request _filter_pools
> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556
> > 2023-02-28 06:11:58.527 1942637 DEBUG
> > nova.scheduler.filters.pci_passthrough_filter
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram:
> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required
> PCI
> > devices
> > (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest]))
> > host_passes
> >
> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52
> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter
> > PciPassthroughFilter returned 0 hosts
> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all
> > hosts for the request with instance ID
> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
> > [('PciPassthroughFilter', None)] get_filtered_objects
> > /usr/lib/python3/dist-packages/nova/filters.py:114
> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed all
> > hosts for the request with instance ID
> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
> > ['PciPassthroughFilter: (start: 1, end: 0)']
> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered []
> > _get_sorted_hosts
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610
> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
> ff627ad39ed94479b9c5033bc462cf78
> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts
> > available but 1 instances requested to build. _ensure_sufficient_hosts
> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450
> > ```
> >
> > Then I search database, I found PCI configure of compute node is not
> upload:
> > ```
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
> > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
> > No inventory of class PCI_DEVICE for c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource class show PCI_DEVICE
> > +-------+------------+
> > > Field | Value      |
> > +-------+------------+
> > > name  | PCI_DEVICE |
> > +-------+------------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 1.5   |
> > > min_unit         | 1     |
> > > max_unit         | 31890 |
> > > reserved         | 512   |
> > > step_size        | 1     |
> > > total            | 31890 |
> > > used             | 0     |
> > +------------------+-------+
> >     ?? 31890 ????compute node resource tracker????????
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
> > ?^Cgyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 16.0  |
> > > min_unit         | 1     |
> > > max_unit         | 12    |
> > > reserved         | 0     |
> > > step_size        | 1     |
> > > total            | 12    |
> > > used             | 0     |
> > +------------------+-------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF
> > No inventory of class SRIOV_NET_VF for
> c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB
> > +------------------+-------+
> > > Field            | Value |
> > +------------------+-------+
> > > allocation_ratio | 1.0   |
> > > min_unit         | 1     |
> > > max_unit         | 456   |
> > > reserved         | 0     |
> > > step_size        | 1     |
> > > total            | 456   |
> > > used             | 0     |
> > +------------------+-------+
> > gyw at c1:~$ openstack resource provider inventory show
> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS
> > No inventory of class IPV4_ADDRESS for
> c360cc82-f0fd-4662-bccd-e1f02b27af51
> > (HTTP 404)
> >
> > MariaDB [nova]> select * from compute_nodes;
> >
> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
> > > created_at          | updated_at          | deleted_at          | id |
> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used |
> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >                 | disk_available_least | free_ram_mb | free_disk_gb |
> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip
> >     | supported_instances
> >
> >
> >
> >
> >
> >
> >
> >
> >                                                | pci_stats
> >
> >
> > > metrics | extra_resources | stats                  | numa_topology
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >                  | host      | ram_allocation_ratio |
> cpu_allocation_ratio
> > > uuid                                 | disk_allocation_ratio | mapped |
> >
> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 |  1 |
> >     NULL |     4 |      3931 |       60 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov",
> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall",
> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku",
> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni",
> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms",
> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma",
> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f",
> "pclmuldq",
> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl",
> "clflushopt",
> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8",
> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl",
> "lahf_lm",
> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc",
> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx", "vme",
> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush",
> > "rdrand"]} |                   49 |        3419 |           60 |
> >      0 |           0 | gyw                 |       1 | 192.168.2.99  |
> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
> > "hvm"], ["x86_64", "kvm", "hvm"]]
> >
> >
> >
> >
> >
> >
> >
> >                                                 | {"nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
> 2,
> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [],
> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "reserved", "size_kb", "total"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "reserved", "size_kb", "total"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null},
> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages",
> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket",
> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw
> >   |                  1.5 |                   16 |
> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b |                     1 |      1 |
> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 |  2 |
> >     NULL |     4 |      3931 |       60 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq",
> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic",
> "abm",
> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme", "msr",
> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3",
> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx",
> "fpu",
> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves",
> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms",
> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx",
> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt",
> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb",
> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36", "mmx",
> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities",
> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs",
> > "sep", "md-clear"]} |                   49 |        3419 |           60 |
> >              0 |           0 | c1c1                |       2 |
> 192.168.2.99
> >  | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
> > "hvm"], ["x86_64", "kvm", "hvm"]]
> >
> >
> >
> >
> >
> >
> >
> >                                                 | {"nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
> 2,
> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [],
> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "total", "size_kb", "reserved"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "total", "size_kb", "reserved"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null},
> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings",
> > "id", "mempages", "pinned_cpus", "memory", "pcpuset", "network_metadata",
> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1      |
> >        1.5 |                   16 | 1eac1c8d-d96a-4eeb-9868-5a341a80c6df
> |
> >                     1 |      0 |
> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 |  3 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni",
> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear",
> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no",
> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg",
> "smap",
> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16",
> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip",
> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq",
> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd",
> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae",
> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr", "3dnowprefetch",
> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm",
> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx",
> "cx8",
> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2",
> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni",
> > "vaes", "aes"]} |                  416 |       31378 |          456 |
> >          0 |           0 | c-MS-7D42           |       3 | 192.168.2.99
> |
> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu",
> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
> > "reserved", "used", "size_kb"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
> > "reserved", "used", "size_kb"]}], "network_metadata": {"nova_object.name
> ":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id",
> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings",
> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 |
> >          1.5 |                   16 |
> f115a1c2-fda3-42c6-945a-8b54fef40daf
> > >                     1 |      0 |
> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 |  4 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no",
> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq", "movdir64b",
> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16",
> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt",
> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2",
> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc", "fsgsbase",
> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp",
> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx",
> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2",
> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry", "pat",
> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all",
> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx",
> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm", "ds_cpl",
> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni",
> "rdtscp"]} |
> >                  416 |       31378 |          456 |                0 |
> >       0 | c-MS-7D42           |       4 | 192.168.28.21 | [["alpha",
> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"],
> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"],
> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu",
> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "total", "used", "reserved"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket",
> > "pcpuset", "memory", "memory_usage", "id", "network_metadata",
> "cpu_usage",
> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >          1.5 |                   16 |
> 10ea8254-ad84-4db9-9acd-5c783cb8600e
> > >                     1 |      0 |
> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 |  5 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht",
> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall",
> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe",
> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx",
> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov",
> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed",
> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni",
> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt",
> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all",
> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme",
> "cx16",
> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline",
> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3",
> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq",
> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb",
> > "spec-ctrl", "waitpkg"]} |                  416 |       31378 |
> >  456 |                0 |           0 | c-MS-7D42           |       5 |
> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"],
> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu",
> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu",
> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"],
> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "hvm"], ["mips64", "qemu",
> > "hvm"], ["mips64el", "qemu", "hvm"], ["ppc", "qemu", "hvm"], ["ppc64",
> > "qemu", "hvm"], ["ppc64le", "qemu", "hvm"], ["s390x", "qemu", "hvm"],
> > ["sh4", "qemu", "hvm"], ["sh4eb", "qemu", "hvm"], ["sparc", "qemu",
> "hvm"],
> > ["sparc64", "qemu", "hvm"], ["unicore32", "qemu", "hvm"], ["x86_64",
> > "qemu", "hvm"], ["x86_64", "kvm", "hvm"], ["xtensa", "qemu", "hvm"],
> > ["xtensaeb", "qemu", "hvm"]] | {"nova_object.name": "PciDevicePoolList",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"objects": []}, "nova_object.changes": ["objects"]}
> |
> > []      | NULL            | {"failed_builds": "0"} | {"nova_object.name
> ":
> > "NUMATopology", "nova_object.namespace": "nova", "nova_object.version":
> > "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "size_kb", "total", "reserved"]}, {"nova_object.name":
> "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["used", "size_kb", "total", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
> > "size_kb", "total", "reserved"]}], "network_metadata": {"
> nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": 0},
> > "nova_object.changes": ["pinned_cpus", "cpuset", "memory_usage", "id",
> > "cpu_usage", "network_metadata", "siblings", "mempages", "socket",
> > "memory", "pcpuset"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >                1.5 |                   16 |
> > 8efa100f-ab14-45fd-8c39-644b49772883 |                     1 |      0 |
> > > 2023-02-13 09:57:30 | 2023-02-13 09:57:31 | 2023-02-13 13:52:57 |  6 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdpid",
> > "intel-pt", "fxsr", "pclmuldq", "xsaveopt", "pae", "xsave", "movdiri",
> > "syscall", "ibrs-all", "mmx", "tsc_adjust", "abm", "ssbd", "sse", "mce",
> > "clwb", "vmx", "dtes64", "ssse3", "fsrm", "est", "bmi1", "mtrr", "avx2",
> > "pse36", "pat", "gfni", "mds-no", "clflushopt", "cmov", "fma", "sep",
> > "mca", "ss", "umip", "popcnt", "skip-l1dfl-vmentry", "ht", "sha-ni",
> > "pdcm", "pdpe1gb", "rdrand", "pge", "lahf_lm", "aes", "xsavec", "pni",
> > "smep", "md-clear", "waitpkg", "tm", "xgetbv1", "stibp", "apic", "vaes",
> > "fpu", "ds_cpl", "ds", "sse4.2", "3dnowprefetch", "smap", "x2apic",
> > "vpclmulqdq", "acpi", "avx", "de", "pbe", "sse2", "xsaves", "monitor",
> > "clflush", "tm2", "pschange-mc-no", "bmi2", "movbe", "pku", "pcid",
> "xtpr",
> > "erms", "movdir64b", "cx8", "nx", "rdctl-no", "invpcid", "spec-ctrl",
> > "tsc", "adx", "invtsc", "f16c", "rdtscp", "vme", "pse", "lm", "cx16",
> > "fsgsbase", "rdseed", "msr", "sse4.1", "arch-capabilities", "arat",
> > "tsc-deadline"]} |                  416 |       31378 |          456 |
> >            0 |           0 | c-MS-7D42           |       6 |
> 192.168.28.21
> > > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64",
> "qemu",
> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
> "qemu",
> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
> ["sh4eb",
> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
> "kvm",
> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> > nova_object.name": "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
> > 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {"
> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
> > "used", "total", "reserved"]}], "network_metadata": {"nova_object.name":
> > "NetworkMetadata", "nova_object.namespace": "nova",
> "nova_object.version":
> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
> > "nova_object.changes": ["memory_usage", "id", "mempages", "pinned_cpus",
> > "network_metadata", "pcpuset", "cpuset", "siblings", "socket",
> "cpu_usage",
> > "memory"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
> >    1.5 |                   16 | 8f5b58c5-d5d7-452c-9ec7-cff24baf6c94 |
> >                 1 |      0 |
> > > 2023-02-14 01:35:43 | 2023-02-14 01:35:43 | 2023-02-14 03:16:51 |  7 |
> >     NULL |    12 |     31890 |      456 |          0 |            512 |
> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology": {"cells":
> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["pcid", "pse36",
> > "movdir64b", "apic", "nx", "vpclmulqdq", "mtrr", "popcnt", "pdcm",
> > "fsgsbase", "lahf_lm", "sse2", "pae", "aes", "movdiri", "xsaves", "erms",
> > "invtsc", "waitpkg", "pbe", "ht", "pni", "avx2", "rdpid", "fxsr", "tm2",
> > "pku", "x2apic", "fma", "pge", "rdseed", "pdpe1gb", "mmx", "sse4.1",
> > "sha-ni", "xtpr", "tsc_adjust", "cx16", "xsave", "cx8", "mce",
> "md-clear",
> > "gfni", "clwb", "msr", "abm", "f16c", "ss", "xsaveopt", "ds_cpl", "pse",
> > "syscall", "cmov", "3dnowprefetch", "ssse3", "pclmuldq",
> > "arch-capabilities", "ibrs-all", "arat", "ds", "pat", "invpcid", "vaes",
> > "xsavec", "mds-no", "tm", "smep", "acpi", "fsrm", "movbe", "fpu",
> "sse4.2",
> > "umip", "rdtscp", "tsc-deadline", "skip-l1dfl-vmentry", "est",
> "rdctl-no",
> > "clflush", "spec-ctrl", "tsc", "lm", "avx", "vmx", "clflushopt",
> "rdrand",
> > "dtes64", "smap", "ssbd", "sse", "xgetbv1", "stibp", "mca", "adx", "vme",
> > "bmi1", "pschange-mc-no", "intel-pt", "de", "monitor", "bmi2", "sep"]} |
> >                416 |       31378 |          456 |                0 |
> >     0 | c-MS-7D42           |       7 | 192.168.28.21 | [["alpha",
> "qemu",
> > "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"], ["cris",
> > "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["lm32",
> > "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu", "hvm"],
> > ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"], ["mipsel",
> > "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el", "qemu", "hvm"],
> > ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le", "qemu",
> > "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"], ["sh4eb",
> "qemu",
> > "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
> ["unicore32",
> > "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"],
> > ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
> nova_object.name":
> > "PciDevicePoolList", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
> > "nova_object.changes": ["objects"]} | []      | NULL            |
> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10,
> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
> 1],
> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
> nova_object.name":
> > "NUMAPagesTopology", "nova_object.namespace": "nova",
> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total":
> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["reserved",
> > "total", "used", "size_kb"]}, {"nova_object.name": "NUMAPag
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/7aa77c9d/attachment-0001.htm>

From batmanustc at gmail.com  Wed Mar  1 07:20:51 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Wed, 1 Mar 2023 15:20:51 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
Message-ID: <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>

BTW, this link (
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html) said
I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?

----
Simon Jones


Simon Jones <batmanustc at gmail.com> ?2023?3?1??? 14:51???

> Hi,
>
> 1. I try the 2nd method, which remove "remote-managed" tag in
> /etc/nova/nova.conf, but got ERROR in creating VM in compute node's
> nova-compute service. Detail log refer to LOG-1 section bellow, I think
> it's because hypervisor has no neutron-agent as I use DPU, neutron
> anget?which is ovn-controller? is on DPU. Is right ?
>
> 2. So I want to try the 1st method in the email, which is use
> vnic-type=direct. BUT, HOW TO USE ? IS THERE ANY DOCUMENT ?
>
> THANKS.
>
> LOG-1, which is compute node's nova-compute.log
>
>> ```
>> 2023-03-01 14:24:02.631 504488 DEBUG oslo_concurrency.processutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Running cmd
>> (subprocess): /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824
>> --cpu=30 -- env LC_ALL=C LANG=C qemu-img info
>> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk
>> --force-share --output=json execute
>> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:384
>> 2023-03-01 14:24:02.654 504488 DEBUG oslo_concurrency.processutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] CMD "/usr/bin/python3
>> -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C
>> qemu-img info
>> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk
>> --force-share --output=json" returned: 0 in 0.023s execute
>> /usr/lib/python3/dist-packages/oslo_concurrency/processutils.py:422
>> 2023-03-01 14:24:02.655 504488 DEBUG nova.virt.disk.api
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Cannot resize image
>> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/disk to a
>> smaller size. can_resize_image
>> /usr/lib/python3/dist-packages/nova/virt/disk/api.py:172
>> 2023-03-01 14:24:02.655 504488 DEBUG nova.objects.instance
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lazy-loading
>> 'migration_context' on Instance uuid a2603eeb-8db0-489b-ba40-dff1d74be21f
>> obj_load_attr /usr/lib/python3/dist-packages/nova/objects/instance.py:1099
>> 2023-03-01 14:24:02.673 504488 DEBUG nova.virt.libvirt.driver
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Created local disks _create_image
>> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4768
>> 2023-03-01 14:24:02.674 504488 DEBUG nova.virt.libvirt.driver
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Ensure instance console log exists:
>> /var/lib/nova/instances/a2603eeb-8db0-489b-ba40-dff1d74be21f/console.log
>> _ensure_console_log_for_instance
>> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:4531
>> 2023-03-01 14:24:02.674 504488 DEBUG oslo_concurrency.lockutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources"
>> acquired by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" ::
>> waited 0.000s inner
>> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
>> 2023-03-01 14:24:02.675 504488 DEBUG oslo_concurrency.lockutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "vgpu_resources"
>> "released" by "nova.virt.libvirt.driver.LibvirtDriver._allocate_mdevs" ::
>> held 0.000s inner
>> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Instance failed network
>> setup after 1 attempt(s): nova.exception.PortBindingFailed: Binding failed
>> for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs
>> for more information.
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager Traceback (most
>> recent call last):
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
>> _allocate_network_async
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     nwinfo =
>> self.network_api.allocate_for_instance(
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in
>> allocate_for_instance
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> created_port_ids = self._update_ports_for_instance(
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in
>> _update_ports_for_instance
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> vif.destroy()
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in
>> __exit__
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> self.force_reraise()
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in
>> force_reraise
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     raise
>> self.value
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in
>> _update_ports_for_instance
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> updated_port = self._update_port(
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in
>> _update_port
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> _ensure_no_port_binding_failure(port)
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in
>> _ensure_no_port_binding_failure
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager     raise
>> exception.PortBindingFailed(port_id=port['id'])
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> nova.exception.PortBindingFailed: Binding failed for port
>> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
>> information.
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> nova.exception.PortBindingFailed: Binding failed for port
>> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
>> information.
>> 2023-03-01 14:24:03.325 504488 ERROR nova.compute.manager
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance failed to spawn:
>> nova.exception.PortBindingFailed: Binding failed for port
>> 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check neutron logs for more
>> information.
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Traceback (most recent call last):
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2743, in
>> _build_resources
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     yield resources
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2503, in
>> _build_and_run_instance
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.driver.spawn(context,
>> instance, image_meta,
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4329, in
>> spawn
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     xml =
>> self._get_guest_xml(context, instance, network_info,
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7288, in
>> _get_guest_xml
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     network_info_str =
>> str(network_info)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/model.py", line 620, in __str__
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self._sync_wrapper(fn,
>> *args, **kwargs)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/model.py", line 603, in
>> _sync_wrapper
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.wait()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/model.py", line 635, in wait
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self[:] = self._gt.wait()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 181, in wait
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self._exit_event.wait()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = hub.switch()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return self.greenlet.switch()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 221, in main
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = function(*args, **kwargs)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return func(*args, **kwargs)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in
>> _allocate_network_async
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise e
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
>> _allocate_network_async
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     result = function(*args, **kwargs)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/utils.py", line 656, in context_wrapper
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     return func(*args, **kwargs)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1890, in
>> _allocate_network_async
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise e
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1868, in
>> _allocate_network_async
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     nwinfo =
>> self.network_api.allocate_for_instance(
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1215, in
>> allocate_for_instance
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     created_port_ids =
>> self._update_ports_for_instance(
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1357, in
>> _update_ports_for_instance
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     vif.destroy()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in
>> __exit__
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     self.force_reraise()
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in
>> force_reraise
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise self.value
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 1326, in
>> _update_ports_for_instance
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     updated_port = self._update_port(
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 584, in
>> _update_port
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]
>> _ensure_no_port_binding_failure(port)
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]   File
>> "/usr/lib/python3/dist-packages/nova/network/neutron.py", line 293, in
>> _ensure_no_port_binding_failure
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]     raise
>> exception.PortBindingFailed(port_id=port['id'])
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] nova.exception.PortBindingFailed:
>> Binding failed for port 2a29da9c-a6db-4eff-a073-e0f1c61fe178, please check
>> neutron logs for more information.
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]
>> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance
>> a073-e0f1c61fe178, please check neutron logs for more information.
>> 2023-03-01 14:24:03.341 504488 ERROR nova.compute.manager [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f]
>> 2023-03-01 14:24:03.349 504488 INFO nova.compute.manager
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Terminating instance
>> 2023-03-01 14:24:03.349 504488 DEBUG oslo_concurrency.lockutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Acquired lock
>> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock
>> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:294
>> 2023-03-01 14:24:03.350 504488 DEBUG nova.network.neutron
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Building network info cache for
>> instance _get_instance_nw_info
>> /usr/lib/python3/dist-packages/nova/network/neutron.py:2014
>> 2023-03-01 14:24:03.431 504488 DEBUG nova.network.neutron
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance cache missing network info.
>> _get_preexisting_port_ids
>> /usr/lib/python3/dist-packages/nova/network/neutron.py:3327
>> 2023-03-01 14:24:03.624 504488 DEBUG nova.network.neutron
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Updating instance_info_cache with
>> network_info: [] update_instance_cache_with_nw_info
>> /usr/lib/python3/dist-packages/nova/network/neutron.py:117
>> 2023-03-01 14:24:03.638 504488 DEBUG oslo_concurrency.lockutils
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] Releasing lock
>> "refresh_cache-a2603eeb-8db0-489b-ba40-dff1d74be21f" lock
>> /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:312
>> 2023-03-01 14:24:03.639 504488 DEBUG nova.compute.manager
>> [req-d4bad4d7-71c7-498e-8fd1-bb6d8884899f ff627ad39ed94479b9c5033bc462cf78
>> 512866f9994f4ad8916d8539a7cdeec9 - default default] [instance:
>> a2603eeb-8db0-489b-ba40-dff1d74be21f] Start destroying the instance on the
>> hypervisor. _shutdown_instance
>> /usr/lib/python3/dist-packages/nova/compute/manager.py:2999
>> 2023-03-01 14:24:03.648 504488 DEBUG nova.virt.libvirt.driver [-]
>> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] During wait destroy,
>> instance disappeared. _wait_for_destroy
>> /usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py:1483
>> 2023-03-01 14:24:03.648 504488 INFO nova.virt.libvirt.driver [-]
>> [instance: a2603eeb-8db0-489b-ba40-dff1d74be21f] Instance destroyed
>> successfully.
>> ```
>>
>
> ----
> Simon Jones
>
>
> Sean Mooney <smooney at redhat.com> ?2023?3?1??? 01:18???
>
>> On Tue, 2023-02-28 at 19:43 +0800, Simon Jones wrote:
>> > Hi all,
>> >
>> > I'm working on openstack Yoga's PCI passthrough feature, follow this
>> link:
>> > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>> >
>> > I configure exactly as the link said, but when I create server use this
>> > command, I found ERROR:
>> > ```
>> > openstack server create --flavor cirros-os-dpu-test-1 --image cirros \
>> >         --nic net-id=066c8dc2-c98b-4fb8-a541-8b367e8f6e69 \
>> >         --security-group default --key-name mykey provider-instance
>> >
>> >
>> > > fault                               | {'code': 500, 'created':
>> > '2023-02-23T06:13:43Z', 'message': 'No valid host was found. There are
>> not
>> > enough hosts available.', 'details': 'Traceback (most recent call
>> last):\n
>> >  File "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line
>> > 1548, in schedule_and_build_instances\n    host_lists =
>> > self._schedule_instances(context, request_specs[0],\n  File
>> > "/usr/lib/python3/dist-packages/nova/conductor/manager.py", line 908, in
>> > _schedule_instances\n    host_lists =
>> > self.query_client.select_destinations(\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/client/query.py", line
>> 41,
>> > in select_destinations\n    return
>> > self.scheduler_rpcapi.select_destinations(context, spec_obj,\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/rpcapi.py", line 160, in
>> > select_destinations\n    return cctxt.call(ctxt,
>> \'select_destinations\',
>> > **msg_args)\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line
>> 189, in
>> > call\n    result = self.transport._send(\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123,
>> in
>> > _send\n    return self._driver.send(target, ctxt, message,\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
>> > line 689, in send\n    return self._send(target, ctxt, message,
>> > wait_for_reply, timeout,\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py",
>> > line 681, in _send\n    raise
>> > result\nnova.exception_Remote.NoValidHost_Remote: No valid host was
>> found.
>> > There are not enough hosts available.\nTraceback (most recent call
>> > last):\n\n  File
>> > "/usr/lib/python3/dist-packages/oslo_messaging/rpc/server.py", line
>> 241, in
>> > inner\n    return func(*args, **kwargs)\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 223, in
>> > select_destinations\n    selections = self._select_destinations(\n\n
>> File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 250, in
>> > _select_destinations\n    selections = self._schedule(\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 416, in
>> > _schedule\n    self._ensure_sufficient_hosts(\n\n  File
>> > "/usr/lib/python3/dist-packages/nova/scheduler/manager.py", line 455, in
>> > _ensure_sufficient_hosts\n    raise
>> > exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No
>> > valid host was found. There are not enough hosts available.\n\n'} |
>> >
>> > // this is what I configured:NovaInstance
>> >
>> > gyw at c1:~$ openstack flavor show cirros-os-dpu-test-1
>> > +----------------------------+------------------------------+
>> > > Field                      | Value                        |
>> > +----------------------------+------------------------------+
>> > > OS-FLV-DISABLED:disabled   | False                        |
>> > > OS-FLV-EXT-DATA:ephemeral  | 0                            |
>> > > access_project_ids         | None                         |
>> > > description                | None                         |
>> > > disk                       | 1                            |
>> > > id                         | 0                            |
>> > > name                       | cirros-os-dpu-test-1         |
>> > > os-flavor-access:is_public | True                         |
>> > > properties                 | pci_passthrough:alias='a1:1' |
>> > > ram                        | 64                           |
>> > > rxtx_factor                | 1.0                          |
>> > > swap                       |                              |
>> > > vcpus                      | 1                            |
>> > +----------------------------+------------------------------+
>> >
>> > // in controller node /etc/nova/nova.conf
>> >
>> > [filter_scheduler]
>> > enabled_filters = PciPassthroughFilter
>> > available_filters = nova.scheduler.filters.all_filters
>> >
>> > [pci]
>> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>> > "physical_network": null, "remote_managed": "true"}
>> > alias = { "vendor_id":"15b3", "product_id":"101e",
>> "device_type":"type-VF",
>> > "name":"a1" }
>> >
>> > // in compute node /etc/nova/nova.conf
>> >
>> > [pci]
>> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>> > "physical_network": null, "remote_managed": "true"}
>> > alias = { "vendor_id":"15b3", "product_id":"101e",
>> "device_type":"type-VF",
>> > "name":"a1" }
>>
>> "remote_managed": "true" is only valid for neutron sriov port
>> not flavor based pci passhtough.
>>
>> so you need to use vnci_type=driect asusming you are trying to use
>>
>> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>>
>> which is not the same as generic pci passthough.
>>
>> if you just want to use geneic pci passthive via a flavor remove
>> "remote_managed": "true"
>>
>> >
>> > ```
>> >
>> > The detail ERROR I found is:
>> > - The reason why "There are not enough hosts available" is,
>> > nova-scheduler's log shows "There are 0 hosts available but 1 instances
>> > requested to build", which means no hosts support PCI passthough
>> feature.
>> >
>> > This is nova-schduler's log
>> > ```
>> > 2023-02-28 06:11:58.329 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting to schedule
>> > for instances: ['8ddfbe2c-f929-4b62-8b73-67902df8fb60']
>> select_destinations
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:141
>> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default]
>> compute_status_filter
>> > request filter added forbidden trait COMPUTE_STATUS_DISABLED
>> > compute_status_filter
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:254
>> > 2023-02-28 06:11:58.330 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'compute_status_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.331 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'accelerators_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.332 1942637 DEBUG nova.scheduler.request_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Request filter
>> > 'remote_managed_ports_filter' took 0.0 seconds wrapper
>> > /usr/lib/python3/dist-packages/nova/scheduler/request_filter.py:46
>> > 2023-02-28 06:11:58.485 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
>> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" acquired by
>> >
>> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
>> > :: waited 0.000s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
>> > 2023-02-28 06:11:58.488 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock
>> > "567eb2f1-7173-4eee-b9e7-66932ed70fea" "released" by
>> >
>> "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections"
>> > :: held 0.003s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
>> > 2023-02-28 06:11:58.494 1942637 DEBUG oslo_db.sqlalchemy.engines
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] MySQL server mode
>> set
>> > to
>> >
>> STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
>> > _check_effective_sql_mode
>> > /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/engines.py:314
>> > 2023-02-28 06:11:58.520 1942637 INFO nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Host mapping not
>> found
>> > for host c1c2. Not tracking instance info for this host.
>> > 2023-02-28 06:11:58.520 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
>> 'c1c2')"
>> > acquired by
>> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
>> ::
>> > waited 0.000s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:386
>> > 2023-02-28 06:11:58.521 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> from
>> > compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"arch":
>> > "x86_64", "model": "Broadwell-noTSX-IBRS", "vendor": "Intel",
>> "topology":
>> > {"cells": 1, "sockets": 1, "cores": 6, "threads": 2}, "features":
>> > ["sse4.2", "mds-no", "stibp", "pdpe1gb", "xsaveopt", "ht", "intel-pt",
>> > "mtrr", "abm", "tm", "lm", "umip", "mca", "pku", "ds_cpl", "rdrand",
>> "adx",
>> > "rdseed", "lahf_lm", "xgetbv1", "nx", "invpcid", "rdtscp", "tsc",
>> "xsavec",
>> > "pcid", "arch-capabilities", "pclmuldq", "spec-ctrl", "fsgsbase",
>> "avx2",
>> > "md-clear", "vmx", "syscall", "mmx", "ds", "ssse3", "avx", "dtes64",
>> > "fxsr", "msr", "acpi", "vpclmulqdq", "smap", "erms", "pge", "cmov",
>> > "sha-ni", "fsrm", "x2apic", "xsaves", "cx8", "pse", "pse36",
>> "clflushopt",
>> > "vaes", "pni", "ssbd", "movdiri", "movbe", "clwb", "xtpr", "de",
>> "invtsc",
>> > "fpu", "tsc-deadline", "pae", "clflush", "ibrs-all", "waitpkg", "sse",
>> > "sse2", "bmi1", "3dnowprefetch", "cx16", "popcnt", "rdctl-no", "fma",
>> > "tsc_adjust", "xsave", "ss", "skip-l1dfl-vmentry", "sse4.1", "rdpid",
>> > "monitor", "vme", "tm2", "pat", "pschange-mc-no", "movdir64b", "gfni",
>> > "mce", "smep", "sep", "apic", "arat", "f16c", "bmi2", "aes", "pbe",
>> "est",
>> >
>> "pdcm"]}',created_at=2023-02-14T03:19:40Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=415,free_disk_gb=456,free_ram_mb=31378,host='c1c2',host_ip=192.168.28.21,hypervisor_hostname='c1c2',hypervisor_type='QEMU',hypervisor_version=4002001,id=8,local_gb=456,local_gb_used=0,mapped=0,memory_mb=31890,memory_mb_used=512,metrics='[]',numa_topology='{"
>> > nova_object.name": "NUMATopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.2", "nova_object.data": {"cells": [{"
>> > nova_object.name": "NUMACell", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset":
>> [0,
>> > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7,
>> 8,
>> > 9, 10, 11], "memory": 31890, "cpu_usage": 0, "memory_usage": 0,
>> > "pinned_cpus": [], "siblings": [[0, 1], [10, 11], [2, 3], [6, 7], [4,
>> 5],
>> > [8, 9]], "mempages": [{"nova_object.name": "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 4, "total": 8163962, "used": 0,
>> "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes":
>> ["size_kb",
>> > "used", "reserved", "total"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0,
>> "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "used", "reserved", "total"]}],
>> > "network_metadata": {"nova_object.name": "NetworkMetadata",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.0",
>> > "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["cpuset", "memory_usage", "cpu_usage", "id",
>> > "pinned_cpus", "pcpuset", "socket", "network_metadata", "siblings",
>> > "mempages", "memory"]}]}, "nova_object.changes":
>> >
>> ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-02-28T06:01:33Z,uuid=c360cc82-f0fd-4662-bccd-e1f02b27af51,vcpus=12,vcpus_used=0)
>> > _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:167
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > aggregates: [] _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:170
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > service dict: {'id': 17, 'uuid': '6d0921a6-427d-4a82-a7d2-41dfa003125a',
>> > 'host': 'c1c2', 'binary': 'nova-compute', 'topic': 'compute',
>> > 'report_count': 121959, 'disabled': False, 'disabled_reason': None,
>> > 'last_seen_up': datetime.datetime(2023, 2, 28, 6, 11, 49,
>> > tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 61,
>> > 'created_at': datetime.datetime(2023, 2, 14, 3, 19, 40,
>> > tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 2,
>> 28,
>> > 6, 11, 49, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted':
>> > False} _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:173
>> > 2023-02-28 06:11:58.524 1942637 DEBUG nova.scheduler.host_manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Update host state
>> with
>> > instances: [] _locked_update
>> > /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:176
>> > 2023-02-28 06:11:58.525 1942637 DEBUG oslo_concurrency.lockutils
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Lock "('c1c2',
>> 'c1c2')"
>> > "released" by
>> > "nova.scheduler.host_manager.HostState.update.<locals>._locked_update"
>> ::
>> > held 0.004s inner
>> > /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:400
>> > 2023-02-28 06:11:58.525 1942637 DEBUG nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Starting with 1
>> host(s)
>> > get_filtered_objects /usr/lib/python3/dist-packages/nova/filters.py:70
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- before ----
>> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:542
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:543
>> > 2023-02-28 06:11:58.526 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] ---- after ----
>> > _filter_pools /usr/lib/python3/dist-packages/nova/pci/stats.py:545
>> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] [] _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:546
>> > 2023-02-28 06:11:58.527 1942637 DEBUG nova.pci.stats
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Not enough PCI
>> devices
>> > left to satisfy request _filter_pools
>> > /usr/lib/python3/dist-packages/nova/pci/stats.py:556
>> > 2023-02-28 06:11:58.527 1942637 DEBUG
>> > nova.scheduler.filters.pci_passthrough_filter
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] (c1c2, c1c2) ram:
>> > 31378MB disk: 424960MB io_ops: 0 instances: 0 doesn't have the required
>> PCI
>> > devices
>> > (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest]))
>> > host_passes
>> >
>> /usr/lib/python3/dist-packages/nova/scheduler/filters/pci_passthrough_filter.py:52
>> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filter
>> > PciPassthroughFilter returned 0 hosts
>> > 2023-02-28 06:11:58.528 1942637 DEBUG nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed
>> all
>> > hosts for the request with instance ID
>> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
>> > [('PciPassthroughFilter', None)] get_filtered_objects
>> > /usr/lib/python3/dist-packages/nova/filters.py:114
>> > 2023-02-28 06:11:58.528 1942637 INFO nova.filters
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtering removed
>> all
>> > hosts for the request with instance ID
>> > '8ddfbe2c-f929-4b62-8b73-67902df8fb60'. Filter results:
>> > ['PciPassthroughFilter: (start: 1, end: 0)']
>> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] Filtered []
>> > _get_sorted_hosts
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:610
>> > 2023-02-28 06:11:58.529 1942637 DEBUG nova.scheduler.manager
>> > [req-13b1baee-e02d-40fc-926d-d497e70ca0dc
>> ff627ad39ed94479b9c5033bc462cf78
>> > 512866f9994f4ad8916d8539a7cdeec9 - default default] There are 0 hosts
>> > available but 1 instances requested to build. _ensure_sufficient_hosts
>> > /usr/lib/python3/dist-packages/nova/scheduler/manager.py:450
>> > ```
>> >
>> > Then I search database, I found PCI configure of compute node is not
>> upload:
>> > ```
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
>> > No inventory of class PCI_DEVICE for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 PCI_DEVICE
>> > No inventory of class PCI_DEVICE for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource class show PCI_DEVICE
>> > +-------+------------+
>> > > Field | Value      |
>> > +-------+------------+
>> > > name  | PCI_DEVICE |
>> > +-------+------------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 MEMORY_MB
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 1.5   |
>> > > min_unit         | 1     |
>> > > max_unit         | 31890 |
>> > > reserved         | 512   |
>> > > step_size        | 1     |
>> > > total            | 31890 |
>> > > used             | 0     |
>> > +------------------+-------+
>> >     ?? 31890 ????compute node resource tracker????????
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
>> > ?^Cgyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 VCPU
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 16.0  |
>> > > min_unit         | 1     |
>> > > max_unit         | 12    |
>> > > reserved         | 0     |
>> > > step_size        | 1     |
>> > > total            | 12    |
>> > > used             | 0     |
>> > +------------------+-------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 SRIOV_NET_VF
>> > No inventory of class SRIOV_NET_VF for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 DISK_GB
>> > +------------------+-------+
>> > > Field            | Value |
>> > +------------------+-------+
>> > > allocation_ratio | 1.0   |
>> > > min_unit         | 1     |
>> > > max_unit         | 456   |
>> > > reserved         | 0     |
>> > > step_size        | 1     |
>> > > total            | 456   |
>> > > used             | 0     |
>> > +------------------+-------+
>> > gyw at c1:~$ openstack resource provider inventory show
>> > c360cc82-f0fd-4662-bccd-e1f02b27af51 IPV4_ADDRESS
>> > No inventory of class IPV4_ADDRESS for
>> c360cc82-f0fd-4662-bccd-e1f02b27af51
>> > (HTTP 404)
>> >
>> > MariaDB [nova]> select * from compute_nodes;
>> >
>> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
>> > > created_at          | updated_at          | deleted_at          | id |
>> > service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used
>> |
>> > local_gb_used | hypervisor_type | hypervisor_version | cpu_info
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                 | disk_available_least | free_ram_mb | free_disk_gb |
>> > current_workload | running_vms | hypervisor_hostname | deleted | host_ip
>> >     | supported_instances
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                | pci_stats
>> >
>> >
>> > > metrics | extra_resources | stats                  | numa_topology
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                  | host      | ram_allocation_ratio |
>> cpu_allocation_ratio
>> > > uuid                                 | disk_allocation_ratio | mapped
>> |
>> >
>> +---------------------+---------------------+---------------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-----------------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+----------------------+----------------------+--------------------------------------+-----------------------+--------+
>> > > 2023-01-04 01:55:44 | 2023-01-04 03:02:28 | 2023-02-13 08:34:08 |  1 |
>> >     NULL |     4 |      3931 |       60 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pat", "cmov",
>> > "ibrs-all", "pge", "sse4.2", "sse", "mmx", "ibrs", "avx2", "syscall",
>> > "fpu", "mtrr", "xsaves", "mce", "invpcid", "tsc_adjust", "ssbd", "pku",
>> > "ibpb", "xsave", "xsaveopt", "pae", "lm", "pdcm", "bmi1", "avx512vnni",
>> > "stibp", "x2apic", "avx512dq", "pcid", "nx", "bmi2", "erms",
>> > "3dnowprefetch", "de", "avx512bw", "arch-capabilities", "pni", "fma",
>> > "rdctl-no", "sse4.1", "rdseed", "arat", "avx512vl", "avx512f",
>> "pclmuldq",
>> > "msr", "fxsr", "sse2", "amd-stibp", "hypervisor", "tsx-ctrl",
>> "clflushopt",
>> > "cx16", "clwb", "xgetbv1", "xsavec", "adx", "rdtscp", "mds-no", "cx8",
>> > "aes", "tsc-deadline", "pse36", "fsgsbase", "umip", "spec-ctrl",
>> "lahf_lm",
>> > "md-clear", "avx512cd", "amd-ssbd", "vmx", "apic", "f16c", "pse", "tsc",
>> > "movbe", "smep", "ss", "pschange-mc-no", "ssse3", "popcnt", "avx",
>> "vme",
>> > "smap", "pdpe1gb", "mca", "skip-l1dfl-vmentry", "abm", "sep", "clflush",
>> > "rdrand"]} |                   49 |        3419 |           60 |
>> >      0 |           0 | gyw                 |       1 | 192.168.2.99  |
>> > [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
>> > "hvm"], ["x86_64", "kvm", "hvm"]]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                 | {"nova_object.name":
>> > "PciDevicePoolList", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
>> 2,
>> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus":
>> [],
>> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 1006396, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "reserved", "size_kb", "total"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["used", "reserved", "size_kb", "total"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "reserved", "size_kb", "total"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": null},
>> > "nova_object.changes": ["cpuset", "pinned_cpus", "mempages",
>> > "network_metadata", "cpu_usage", "pcpuset", "memory", "id", "socket",
>> > "siblings", "memory_usage"]}]}, "nova_object.changes": ["cells"]} | gyw
>> >   |                  1.5 |                   16 |
>> > b1bf35bd-a9ad-4f0c-9033-776a5c6d1c9b |                     1 |      1 |
>> > > 2023-01-04 03:12:17 | 2023-01-31 06:36:36 | 2023-02-23 08:50:29 |  2 |
>> >     NULL |     4 |      3931 |       60 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 4, "cores": 1, "threads": 1}, "features": ["pclmuldq",
>> > "fsgsbase", "f16c", "fxsr", "ibpb", "adx", "movbe", "aes", "x2apic",
>> "abm",
>> > "mtrr", "arat", "sse4.2", "bmi1", "stibp", "sse4.1", "pae", "vme",
>> "msr",
>> > "skip-l1dfl-vmentry", "fma", "pcid", "avx2", "de", "ibrs-all", "ssse3",
>> > "apic", "umip", "xsavec", "3dnowprefetch", "amd-ssbd", "sse", "nx",
>> "fpu",
>> > "pse", "smap", "smep", "lahf_lm", "pni", "spec-ctrl", "xsave", "xsaves",
>> > "rdtscp", "vmx", "avx512f", "cmov", "invpcid", "hypervisor", "erms",
>> > "rdctl-no", "cx16", "cx8", "tsc", "pge", "pdcm", "rdrand", "avx",
>> > "amd-stibp", "avx512vl", "xsaveopt", "mds-no", "popcnt", "clflushopt",
>> > "sse2", "xgetbv1", "rdseed", "pdpe1gb", "pschange-mc-no", "clwb",
>> > "avx512vnni", "mca", "tsx-ctrl", "tsc_adjust", "syscall", "pse36",
>> "mmx",
>> > "avx512cd", "avx512bw", "pku", "tsc-deadline", "arch-capabilities",
>> > "avx512dq", "ssbd", "clflush", "mce", "ss", "pat", "bmi2", "lm", "ibrs",
>> > "sep", "md-clear"]} |                   49 |        3419 |           60
>> |
>> >              0 |           0 | c1c1                |       2 |
>> 192.168.2.99
>> >  | [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu",
>> > "hvm"], ["x86_64", "kvm", "hvm"]]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >                                                 | {"nova_object.name":
>> > "PciDevicePoolList", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3], "pcpuset": [0, 1,
>> 2,
>> > 3], "memory": 3931, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus":
>> [],
>> > "siblings": [[0], [1], [2], [3]], "mempages": [{"nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 1006393, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "total", "size_kb", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["used", "total", "size_kb", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used",
>> > "total", "size_kb", "reserved"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["tunneled", "physnets"]}, "socket": null},
>> > "nova_object.changes": ["memory_usage", "socket", "cpuset", "siblings",
>> > "id", "mempages", "pinned_cpus", "memory", "pcpuset",
>> "network_metadata",
>> > "cpu_usage"]}]}, "nova_object.changes": ["cells"]} | c1c1      |
>> >        1.5 |                   16 |
>> 1eac1c8d-d96a-4eeb-9868-5a341a80c6df |
>> >                     1 |      0 |
>> > > 2023-02-07 08:25:27 | 2023-02-07 08:25:27 | 2023-02-13 08:34:22 |  3 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["sha-ni",
>> > "intel-pt", "pat", "monitor", "movbe", "nx", "msr", "avx2", "md-clear",
>> > "popcnt", "rdseed", "pse36", "mds-no", "ds", "sse", "fsrm", "rdctl-no",
>> > "pse", "dtes64", "ds_cpl", "xgetbv1", "lahf_lm", "smep", "waitpkg",
>> "smap",
>> > "fsgsbase", "sep", "tsc_adjust", "cmov", "ibrs-all", "mtrr", "cx16",
>> > "f16c", "arch-capabilities", "pclmuldq", "clflush", "erms", "umip",
>> > "xsaves", "xsavec", "ssse3", "acpi", "tsc", "movdir64b", "vpclmulqdq",
>> > "skip-l1dfl-vmentry", "xsave", "arat", "mmx", "rdpid", "sse2", "ssbd",
>> > "pdpe1gb", "spec-ctrl", "adx", "pcid", "de", "pku", "est", "pae",
>> > "tsc-deadline", "pdcm", "clwb", "vme", "rdtscp", "fxsr",
>> "3dnowprefetch",
>> > "invpcid", "x2apic", "tm", "lm", "fma", "bmi1", "sse4.1", "abm",
>> > "xsaveopt", "pschange-mc-no", "syscall", "clflushopt", "pbe", "avx",
>> "cx8",
>> > "vmx", "gfni", "fpu", "mce", "tm2", "movdiri", "invtsc", "apic", "bmi2",
>> > "mca", "pge", "rdrand", "xtpr", "sse4.2", "stibp", "ht", "ss", "pni",
>> > "vaes", "aes"]} |                  416 |       31378 |          456 |
>> >          0 |           0 | c-MS-7D42           |       3 |
>> 192.168.2.99  |
>> > [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64",
>> "qemu",
>> > "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686",
>> "kvm",
>> > "hvm"], ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze",
>> > "qemu", "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu",
>> "hvm"],
>> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
>> "qemu",
>> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
>> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
>> ["sh4eb",
>> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
>> "kvm",
>> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> > nova_object.name": "PciDevicePoolList", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
>> > "reserved", "used", "size_kb"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["total", "reserved", "used", "size_kb"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["total",
>> > "reserved", "used", "size_kb"]}], "network_metadata": {"
>> nova_object.name":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["network_metadata", "cpuset", "mempages", "id",
>> > "socket", "cpu_usage", "memory", "pinned_cpus", "pcpuset", "siblings",
>> > "memory_usage"]}]}, "nova_object.changes": ["cells"]} | c-MS-7D42 |
>> >          1.5 |                   16 |
>> f115a1c2-fda3-42c6-945a-8b54fef40daf
>> > >                     1 |      0 |
>> > > 2023-02-07 09:53:12 | 2023-02-13 08:38:04 | 2023-02-13 08:39:33 |  4 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["rdctl-no",
>> > "acpi", "umip", "invpcid", "bmi1", "clflushopt", "pclmuldq",
>> "movdir64b",
>> > "ssbd", "apic", "rdpid", "ht", "fsrm", "pni", "pse", "xsaves", "cx16",
>> > "nx", "f16c", "arat", "popcnt", "mtrr", "vpclmulqdq", "intel-pt",
>> > "spec-ctrl", "syscall", "3dnowprefetch", "ds", "mce", "bmi2", "tm2",
>> > "md-clear", "fpu", "monitor", "pae", "erms", "dtes64", "tsc",
>> "fsgsbase",
>> > "xgetbv1", "est", "mds-no", "tm", "x2apic", "xsavec", "cx8", "stibp",
>> > "clflush", "ssse3", "pge", "movdiri", "pdpe1gb", "vaes", "gfni", "mmx",
>> > "clwb", "waitpkg", "xsaveopt", "pse36", "aes", "pschange-mc-no", "sse2",
>> > "abm", "ss", "pcid", "sep", "rdseed", "mca", "skip-l1dfl-vmentry",
>> "pat",
>> > "smap", "sse", "lahf_lm", "avx", "cmov", "sse4.1", "sse4.2", "ibrs-all",
>> > "smep", "vme", "tsc_adjust", "arch-capabilities", "fma", "movbe", "adx",
>> > "avx2", "xtpr", "pku", "pbe", "rdrand", "tsc-deadline", "pdcm",
>> "ds_cpl",
>> > "de", "invtsc", "xsave", "msr", "fxsr", "lm", "vmx", "sha-ni",
>> "rdtscp"]} |
>> >                  416 |       31378 |          456 |                0 |
>> >       0 | c-MS-7D42           |       4 | 192.168.28.21 | [["alpha",
>> > "qemu", "hvm"], ["armv7l", "qemu", "hvm"], ["aarch64", "qemu", "hvm"],
>> > ["cris", "qemu", "hvm"], ["i686", "qemu", "hvm"], ["i686", "kvm",
>> "hvm"],
>> > ["lm32", "qemu", "hvm"], ["m68k", "qemu", "hvm"], ["microblaze", "qemu",
>> > "hvm"], ["microblazeel", "qemu", "hvm"], ["mips", "qemu", "hvm"],
>> > ["mipsel", "qemu", "hvm"], ["mips64", "qemu", "hvm"], ["mips64el",
>> "qemu",
>> > "hvm"], ["ppc", "qemu", "hvm"], ["ppc64", "qemu", "hvm"], ["ppc64le",
>> > "qemu", "hvm"], ["s390x", "qemu", "hvm"], ["sh4", "qemu", "hvm"],
>> ["sh4eb",
>> > "qemu", "hvm"], ["sparc", "qemu", "hvm"], ["sparc64", "qemu", "hvm"],
>> > ["unicore32", "qemu", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64",
>> "kvm",
>> > "hvm"], ["xtensa", "qemu", "hvm"], ["xtensaeb", "qemu", "hvm"]] | {"
>> > nova_object.name": "PciDevicePoolList", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"objects": []},
>> > "nova_object.changes": ["objects"]} | []      | NULL            |
>> > {"failed_builds": "0"} | {"nova_object.name": "NUMATopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.2",
>> > "nova_object.data": {"cells": [{"nova_object.name": "NUMACell",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.5",
>> > "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> 10,
>> > 11], "pcpuset": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], "memory": 31890,
>> > "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[0,
>> 1],
>> > [10, 11], [2, 3], [6, 7], [4, 5], [8, 9]], "mempages": [{"
>> nova_object.name":
>> > "NUMAPagesTopology", "nova_object.namespace": "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4,
>> "total":
>> > 8163866, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb",
>> > "total", "used", "reserved"]}, {"nova_object.name":
>> "NUMAPagesTopology",
>> > "nova_object.namespace": "nova", "nova_object.version": "1.1",
>> > "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved":
>> > 0}, "nova_object.changes": ["size_kb", "total", "used", "reserved"]}, {"
>> > nova_object.name": "NUMAPagesTopology", "nova_object.namespace":
>> "nova",
>> > "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576,
>> > "total": 0, "used": 0, "reserved": 0}, "nova_object.changes":
>> ["size_kb",
>> > "total", "used", "reserved"]}], "network_metadata": {"nova_object.name
>> ":
>> > "NetworkMetadata", "nova_object.namespace": "nova",
>> "nova_object.version":
>> > "1.0", "nova_object.data": {"physnets": [], "tunneled": false},
>> > "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0},
>> > "nova_object.changes": ["siblings", "cpuset", "mempages", "socket",
>> > "pcpuset", "memory", "memory_usage", "id", "network_metadata",
>> "cpu_usage",
>> > "pinned_cpus"]}]}, "nova_object.changes": ["cells"]} | c1c2      |
>> >          1.5 |                   16 |
>> 10ea8254-ad84-4db9-9acd-5c783cb8600e
>> > >                     1 |      0 |
>> > > 2023-02-13 08:41:21 | 2023-02-13 08:41:22 | 2023-02-13 09:56:50 |  5 |
>> >     NULL |    12 |     31890 |      456 |          0 |            512 |
>> >         0 | QEMU            |            4002001 | {"arch": "x86_64",
>> > "model": "Broadwell-noTSX-IBRS", "vendor": "Intel", "topology":
>> {"cells":
>> > 1, "sockets": 1, "cores": 6, "threads": 2}, "features": ["bmi2", "ht",
>> > "pae", "pku", "monitor", "avx2", "sha-ni", "acpi", "ssbd", "syscall",
>> > "mca", "mmx", "mds-no", "erms", "fsrm", "arat", "xsaves", "movbe",
>> > "movdir64b", "fpu", "clflush", "nx", "mce", "pse", "cx8", "aes", "avx",
>> > "xsavec", "invpcid", "est", "xgetbv1", "fxsr", "rdrand", "vaes", "cmov",
>> > "intel-pt", "smep", "dtes64", "f16c", "adx", "sse2", "stibp", "rdseed",
>> > "xsave", "skip-l1dfl-vmentry", "sse4.1", "rdpid", "ds", "umip", "pni",
>> > "rdctl-no", "clwb", "md-clear", "pschange-mc-no", "msr", "popcnt",
>> > "sse4.2", "pge", "tm2", "pat", "xtpr", "fma", "gfni", "sep", "ibrs-all",
>> > "tsc", "ds_cpl", "tm", "clflushopt", "pcid", "de", "rdtscp", "vme",
>> "cx16",
>> > "lahf_lm", "ss", "pdcm", "x2apic", "pbe", "movdiri", "tsc-deadline",
>> > "invtsc", "apic", "fsgsbase", "mtrr", "vpclmulqdq", "ssse3",
>> > "3dnowprefetch", "abm", "xsaveopt", "tsc_adjust", "pse36", "pclmuldq",
>> > "bmi1", "smap", "arch-capabilities", "lm", "vmx", "sse", "pdpe1gb",
>> > "spec-ctrl", "waitpkg"]} |                  416 |       31378 |
>> >  456 |                0 |           0 | c-MS-7D42           |       5 |
>> > 192.168.28.21 | [["alpha", "qemu", "hvm"], ["armv7l", "qemu", "hvm"],
>> > ["aarch64", "qemu", "hvm"], ["cris", "qemu", "hvm"], ["i686", "qemu",
>> > "hvm"], ["i686", "kvm", "hvm"], ["lm32", "qemu", "hvm"], ["m68k",
>> "qemu",
>> > "hvm"], ["microblaze", "qemu", "hvm"], ["microblazeel", "qemu", "hvm"],
>> > ["mips", "qemu", "hvm"], ["mipsel", "qemu", "h
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/98bae6de/attachment-0001.htm>

From batmanustc at gmail.com  Wed Mar  1 10:12:11 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Wed, 1 Mar 2023 18:12:11 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
Message-ID: <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>

Thanks a lot !!!

As you say, I follow
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
And I want to use DPU mode. Not "disable DPU mode".
So I think I should follow the link above exactlly, so I use
vnic-type=remote_anaged.
In my opnion, after I run first three command (which is "openstack network
create ...", "openstack subnet create", "openstack port create ..."), the
VF rep port and OVN and OVS rules are all ready.
What I should do in "openstack server create ..." is to JUST add PCI device
into VM, do NOT call neutron-server in nova-compute of compute node ( like
call port_binding or something).

But as the log and steps said in the emails above, nova-compute call
port_binding to neutron-server while running the command "openstack server
create ...".

So I still have questions is:
1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT call
neutron-server in nova-compute of compute node ( like call port_binding or
something)" .
2) If it's right, how to deal with this? Which is how to JUST add PCI
device into VM, do NOT call neutron-server? By command or by configure? Is
there come document ?

----
Simon Jones


Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???

> On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> > BTW, this link (
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
> said
> > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?
>
> no its not wrong but for dpu smart nics you have to make a choice when you
> deploy
> either they can be used in dpu mode in which case remote_managed shoudl be
> set to true
> and you can only use them via neutron ports with vnic-type=remote_managed
> as descried in that doc
>
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
>
>
> or if you disable dpu mode in the nic frimware then you shoudl remvoe
> remote_managed form the pci device list and
> then it can be used liek a normal vf either for neutron sriov ports
> vnic-type=direct or via flavor based pci passthough.
>
> the issue you were havign is you configured the pci device list to contain
> "remote_managed: ture" which means
> the vf can only be consumed by a neutron port with
> vnic-type=remote_managed, when you have "remote_managed: false" or unset
> you can use it via vnic-type=direct i forgot that slight detail that
> vnic-type=remote_managed is required for "remote_managed: ture".
>
>
> in either case you foudn the correct doc
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> neutorn sriov port configuration is documented here
> https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> and nova flavor based pci passthough is documeted here
> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>
> all three server slightly differnt uses. both neutron proceedures are
> exclusivly fo network interfaces.
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> requires the use of ovn deployed on the dpu
> to configure the VF contolplane.
> https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses
> the sriov nic agent
> to manage the VF with ip tools.
> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is
> intended for pci passthough
> of stateless acclerorators like qat devices. while the nova flavor approch
> cna be used with nics it not how its generally
> ment to be used and when used to passthough a nic expectation is that its
> not related to a neuton network.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/6e11cad2/attachment.htm>

From rosmaita.fossdev at gmail.com  Wed Mar  1 17:02:59 2023
From: rosmaita.fossdev at gmail.com (Brian Rosmaita)
Date: Wed, 1 Mar 2023 12:02:59 -0500
Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin
 users via API
In-Reply-To: <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
 <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
Message-ID: <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>

On 2/28/23 9:02 PM, Ghanshyam Mann wrote:
[snip]

> I think removing from client is good way to stop exposing this old/not-recommended way to users
> but API is separate things and removing the API request parameter 'multiattach' from it can break
> the existing users using it this way. Tempest test is one good example of such users use case. To maintain
> the backward compatibility/interoperability it should be removed by bumping the microversion so that
> it continue working for older microversions. This way we will not break the existing users and will
> provide the new way for users to start using.

It's not just that this is not recommended, it can lead to data loss. 
We should only allow multiattach for volume types that actually support 
it.  So I see this as a case of "I broke your script now, but you'll 
thank me later".

We could microversion this, but then an end user has to go out of the 
way and add the correct mv to their request to get the correct behavior. 
  Someone using the default mv + multiattach=true will unknowingly put 
themselves into a data loss situation.  I think it's better to break 
that person's API request.


cheers,
brian


From gmann at ghanshyammann.com  Wed Mar  1 17:19:22 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 01 Mar 2023 09:19:22 -0800
Subject: [kolla] [train] [cinder] Volume multiattach exposed to
 non-admin users via API
In-Reply-To: <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
 <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
 <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>
Message-ID: <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com>

 ---- On Wed, 01 Mar 2023 09:02:59 -0800  Brian Rosmaita  wrote --- 
 > On 2/28/23 9:02 PM, Ghanshyam Mann wrote:
 > [snip]
 > 
 > > I think removing from client is good way to stop exposing this old/not-recommended way to users
 > > but API is separate things and removing the API request parameter 'multiattach' from it can break
 > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain
 > > the backward compatibility/interoperability it should be removed by bumping the microversion so that
 > > it continue working for older microversions. This way we will not break the existing users and will
 > > provide the new way for users to start using.
 > 
 > It's not just that this is not recommended, it can lead to data loss. 
 > We should only allow multiattach for volume types that actually support 
 > it.  So I see this as a case of "I broke your script now, but you'll 
 > thank me later".
 > 
 > We could microversion this, but then an end user has to go out of the 
 > way and add the correct mv to their request to get the correct behavior. 
 >   Someone using the default mv + multiattach=true will unknowingly put 
 > themselves into a data loss situation.  I think it's better to break 
 > that person's API request.

Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes)
then I think changing it without microversion bump makes sense. But if we know there is any success case
for xyz configuration/backend then I feel we should not break such success use case.

I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC,
the test does not check the data things so we do not completely test it in Tempest.

-gmann

 > 
 > 
 > cheers,
 > brian
 > 
 > 
 > 


From fungi at yuggoth.org  Wed Mar  1 20:01:59 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Wed, 1 Mar 2023 20:01:59 +0000
Subject: [security-sig] Polls in preparation to revive our meetings
In-Reply-To: <20230126183907.tiamhukqq6ixpp43@yuggoth.org>
References: <20230126183907.tiamhukqq6ixpp43@yuggoth.org>
Message-ID: <20230301200158.qjnoobr25sjmrpp2@yuggoth.org>

On 2023-01-26 18:40:08 +0000 (+0000), Jeremy Stanley wrote:
> As discussed at the last PTG, the present meeting time (15:00 UTC on
> the first Thursday of each month) is inconvenient for some
> attendees, and that combined with year-end holidays and general busy
> weeks recently have led to skipping them entirely. In order to start
> narrowing down the potential meeting schedule, I have two initial
> polls.
> 
> The first is to determine what frequency we should meet.
[...]
> The second poll is to hopefully determine what day of the week is
> optimal for potential attendees.
[...]

The results are in: we had three respondents weigh in on meeting
frequency, with a tie between switching to every two weeks or weekly
instead of monthly. I'll cast the tie-breaking vote for this and
say we'll try for weekly meetings initially. As for day of the week,
that poll had only one respondent, expressing a preference for
Wednesdays.

For the next step, we need to decide what time on Wednesdays to
meet. In order to determine that, I've started another poll here:

https://framadate.org/soF558fXvxrrXyt7

Please indicate your availability/preference by Wednesday, March 15
if you're interested in meeting, and that will give me a week to
analyze the results and publish the new time. I've set the initial
revised meeting date to Wednesday, March 22, which is the week prior
to the virtual PTG, so a good opportunity for us to plan a little
bit for how we can best utilize our upcoming PTG slot.

In the meantime, let's plan to skip the normal March meeting
tomorrow (Thursday) at the old time. Thanks to everyone who's
participated in these polls so far!
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/586bff08/attachment.sig>

From fungi at yuggoth.org  Wed Mar  1 20:53:45 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Wed, 1 Mar 2023 20:53:45 +0000
Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu to
 OpenStack community
Message-ID: <20230301205344.oypz2ceadu73vqqz@yuggoth.org>

The OpenStack Ironic project relies on VirtualPDU, which is no
longer actively developed. Ironic contributors reached out to the
VirtualPDU maintainers with an offer to officially assume
maintenance responsibilities in order to avoid forking it, and
received a response from one (Mathieu Mitchell) who indicated
support for that plan. Unfortunately, communication died off shortly
thereafter, and the Ironic team has been unable to raise any of the
current maintainers to actually add their access to the x/virtualpdu
project in OpenDev's Gerrit.

We don't have a documented process since this is the first time it's
really come up, but I'm officially announcing that I intend to use
my administrative permissions as an OpenDev sysadmin to add
membership of an OpenStack Ironic team representative to the
following Gerrit groups if no objections are raised before
Wednesday, March 8:

  * virtualpdu-core
  * virtualpdu-release

This will effectively grant full control of the x/virtualpdu
repository to OpenStack Ironic contributors. Please follow up to
the service-discuss at lists.opendev.org or
openstack-discuss at lists.openstack.org mailing list with any concerns
as soon as possible, or you can feel free to reach out to me
directly by email or in IRC (fungi in the #opendev channel on the
OFTC network).
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/bab630ef/attachment-0001.sig>

From fungi at yuggoth.org  Wed Mar  1 22:59:09 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Wed, 1 Mar 2023 22:59:09 +0000
Subject: [infra][ironic][tact-sig] Intent to grant control of
 x/virtualpdu to OpenStack community
In-Reply-To: <20230301205344.oypz2ceadu73vqqz@yuggoth.org>
References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org>
Message-ID: <20230301225909.enp5xetacmzpjg7o@yuggoth.org>

On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote:
[...]
> We don't have a documented process since this is the first time it's
> really come up, but I'm officially announcing that I intend to use
> my administrative permissions as an OpenDev sysadmin to add
> membership of an OpenStack Ironic team representative to the
> following Gerrit groups if no objections are raised before
> Wednesday, March 8
[...]

It seems the additional round of outreach worked, so no longer
requires my direct intervention:

https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/

-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/caa6a3fe/attachment.sig>

From alsotoes at gmail.com  Thu Mar  2 00:52:13 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Wed, 1 Mar 2023 18:52:13 -0600
Subject: (OpenStack-Upgrade)
In-Reply-To: <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
Message-ID: <CA+eLJkbRp5SRrH=OwyDZay_r5nyqRhRUvh76xvnPCiR=3eWCMg@mail.gmail.com>

Maybe you can get your way around this procedure:

https://github.com/openstack/ansible-role-openstack-operations/blob/master/README-backup-ops.md

Also, you can tar.gz the containers or take snapshots so you can rollback.

Cheers!

On Wed, Mar 1, 2023 at 6:47?AM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> Hey,
>
> Regarding rollaback of upgrade in OSA we indeed don't have any good
> established/documented process for that. At the same time it should be
> completely possible with some "BUT". It also depends on what exactly
> you want to rollback - roles, openstack services or both. As OSA roles
> can actually install any openstack service version.
>
> We keep all virtualenvs from the previous version, so during upgrade
> we build just new virtualenvs and reconfigure systemd units to point
> there. So fastest way likely would be to just edit systemd unit files
> and point them to old venv version and reload systemd daemon and
> service and restore DB from backup of course.
> You can also define  <service>_venv_tag (ie `glance_venv_tag`) to the
> old OSA version you was running and execute openstack-ansible
> os-<service>-install.yml --tags  systemd-service,uwsgi - that in most
> cases will be enough to just edit systemd units for the service and
> start old version of it. BUT running without tags will result in
> having new packages in old venv which is smth you totally want to
> avoid.
> To prevent that you can also define <service>_git_install_branch and
> requirements_git_install_branch in /etc/openstack_deploy/group_vars
> (it's important to use group vars if you want to rollback only one
> service) and take value from
>
> https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
> (ofc pick your old version!)
>
> For a full rollback and not in-place workarounds, I think it should be
> like that
> * checkout to previous osa version
> * re-execute scripts/bootstrap-ansible.sh
> * you should still take current versions of mariadb and rabbitmq and
> define them in user_variables (galera_major_version,
> galera_minor_version, rabbitmq_package_version,
> rabbitmq_erlang_version_spec) - it's close to never ends well
> downgrading these.
> * Restore DB backup
> * Re-run setup-openstack.yml
>
> It's quite a rough summary of how I do see this process, but to be
> frank I never had to execute full downgrade - I was limited mostly by
> downgrading 1 service tops after the upgrade.
>
> Hope that helps!
>
> ??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:
>
> >
> > hi Alvaro,
> >
> > i have installed using Openstack-ansible, The upgrade procedure is
> consistent
> >
> > but what is the roll back procedure , i m looking for
> >
> > Regards
> > Adivya Singh
> >
> > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
> >>
> >> That will depend on how did you installed your environment: OSA,
> TripleO, etc.
> >>
> >> Can you provide more information?
> >>
> >> ---
> >> Alvaro Soto.
> >>
> >> Note: My work hours may not be your work hours. Please do not feel the
> need to respond during a time that is not convenient for you.
> >> ----------------------------------------------------------
> >> Great people talk about ideas,
> >> ordinary people talk about things,
> >> small people talk... about other people.
> >>
> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com>
> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> I am planning to upgrade my Current Environment, The Upgrade procedure
> is available in OpenStack Site and Forums.
> >>>
> >>> But i am looking fwd to roll back Plan , Other then have a Local
> backup copy of galera Database
> >>>
> >>> Regards
> >>> Adivya Singh
>


-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230301/236a0f24/attachment.htm>

From tkajinam at redhat.com  Thu Mar  2 02:55:47 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 2 Mar 2023 11:55:47 +0900
Subject: [tc][heat][tacker] Moving governance of tosca-parser(and
 heat-translator ?) to Tacker
In-Reply-To: <ed9c33f6-80ff-6429-372c-4776d49092a1@gmail.com>
References: <CAL_crJS=_OmhOTzj8uZ6naaZO2RncngsMjD6V8xnctTw4B5p0A@mail.gmail.com>
 <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com>
 <CAL_crJQE5MKsmH3xHeeAfSFc9E-8Sb=w90v6D6QhozpEvDc6xg@mail.gmail.com>
 <ba10bd2a-84e8-ad82-52d0-7c8217079de5@gmail.com>
 <CAL_crJQ8ZN20xF_eNBNa89=TQ9sJYtwqDj3Lb4BsqiQBWwB0XQ@mail.gmail.com>
 <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com>
 <ed9c33f6-80ff-6429-372c-4776d49092a1@gmail.com>
Message-ID: <CAL_crJSjyqSep6gXb_Otf7uXWQ3LVbSHDpws6K=GxBR2yCnuPg@mail.gmail.com>

Thanks. So based on the agreement in this thread I've pushed the change to
the governance repository
to migrate tosca-parser and heat-translator to Tacker's governance.

https://review.opendev.org/c/openstack/governance/+/876012

I'll keep heat-core group in heat-translator-core group for now, but we can
revisit this in the future.


On Wed, Mar 1, 2023 at 6:41?PM Yasufumi Ogawa <yasufum.o at gmail.com> wrote:

> On 2023/02/28 3:49, Ghanshyam Mann wrote:
> >   ---- On Sun, 26 Feb 2023 19:54:45 -0800  Takashi Kajinami  wrote ---
> >   >
> >   >
> >   > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com>
> wrote:
> >   > Hi,
> >   >
> >   > On 2023/02/27 10:51, Takashi Kajinami wrote:
> >   > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann
> gmann at ghanshyammann.com>
> >   > > wrote:
> >   > >
> >   > >>   ---- On Sun, 19 Feb 2023 18:44:14 -0800  Takashi Kajinami
> wrote ---
> >   > >>   > Hello,
> >   > >>   >
> >   > >>   > Currently tosca-parser is part of heat's governance, but the
> core
> >   > >> reviewers of this repositorydoes not contain any active heat
> cores while we
> >   > >> see multiple Tacker cores in this group.Considering the fact the
> project is
> >   > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate
> this
> >   > >> repository to Tacker's governance. Most of the current heat cores
> are not
> >   > >> quitefamiliar with the codes in this repository, and if Tacker
> team is not
> >   > >> interested in maintainingthis repository then I'd propose
> retiring this.
> >   > As you mentioned, tacker still using tosca-parser and
> heat-translator.
> >   >
> >   > >>
> >   > >> I think it makes sense and I remember its usage/maintenance by
> the Tacker
> >   > >> team since starting.
> >   > >> But let's wait for the Tacker team opinion and accordingly you
> can propose
> >   > >> the governance patch.
> >   > Although I've not joined to tacker team since starting, it might not
> be
> >   > true because there was no cores of tosca-parser and heat-translator
> in
> >   > tacker team. We've started to help maintenance the projects because
> no
> >   > other active contributer.
> >   >
> >   > >>
> >   > >>   >
> >   > >>   > Similarly, we have heat-translator project which has both
> heat cores
> >   > >> and tacker cores as itscore reviewers. IIUC this is tightly
> related to the
> >   > >> work in tosca-parser, I'm wondering it makesmore sense to move
> this project
> >   > >> to Tacker, because the requirement is mostly made fromTacker side
> rather
> >   > >> than Heat side.
> >   > >>
> >   > >> I am not sure about this and from the name, it seems like more of
> a heat
> >   > >> thing but it is not got beyond the Tosca template
> >   > >> conversion. Are there no users of it outside of the Tacker
> service? or any
> >   > >> request to support more template conversions than
> >   > >> Tosca?
> >   > >>
> >   > >
> >   > > Current hea-translator supports only the TOSCA template[1].
> >   > > The heat-translator project can be a generic template converter by
> its
> >   > > nature but we haven't seen any interest
> >   > > in implementing support for different template formats.
> >   > >
> >   > > [1]
> >   > >
> https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49
> >   > >
> >   > >
> >   > >
> >   > >> If no other user or use case then I think one option can be to
> merge it
> >   > >> into Tosca-parser itself and retire heat-translator.
> >   > >>
> >   > >> Opinion?
> >   > Hmm, as a core of tosca-parser, I'm not sure it's a good idea
> because it
> >   > is just a parser TOSCA and independent from heat-translator. In
> >   > addition, there is no experts of Heat or HOT in current tacker team
> >   > actually, so it might be difficult to maintain heat-translator
> without
> >   > any help from heat team.
> >   >
> >   > The hea-translator project was initially created to implement a
> translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2]
> but we have never increased scope of tosca-parser. So it has beenno more
> than the TOSCA template translator.
> >   >
> >   > [1]
> https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2]
> <https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca%5B2%5D>
> https://review.opendev.org/c/openstack/project-config/+/211204
> >   > We (Heat team) can provide help with any problems with heat, but we
> own no actual use case of template translation.Maintaining the
> heat-translator repository with tacker, which currently provides actual use
> cases would make more sense.This also gives the benefit that Tacker team
> can decide when stable branches of heat-translator should be retiredalong
> with the other Tacker repos.
> >   >
> >   > By the way, may I ask what will be happened if the governance is
> move on
> >   > to tacker? Is there any extra tasks for maintenance?
> >   >
> >   > TC would have better (and more precise) explanation but my
> understanding is that - creating a release
> >   >  - maintaining stable branches
> >   >  - maintaining gate healthwould be the required tasks along with
> moderating dev discussion in mailing list/PTG/etc.
> >
> > I think you covered all and the Core team (Tacker members)  might be
> already doing a few of the tasks. From the
> > governance perspective, tacker PTL will be the point of contact for this
> repo in the case repo becomes inactive or so
> > but it will be the project team's decision to merge/split things
> whatever way makes maintenance easy.
> I understand. I've shared the proposal again in the previous meeting and
> no objection raised. So, we'd agree to move the governance as Tacker team.
>
> Thanks,
> Yasufumi
> >
> > -gmann
> >
> >
> >   >  Thanks,
> >   > Yasufumi
> >   >
> >   > >>
> >   > >
> >   > > That also sounds good to me.
> >   > >
> >   > >
> >   > >> Also, correcting the email subject tag as [tc].
> >   > >>
> >   > >> -gmann
> >   > >>
> >   > >>   >
> >   > >>   > [1]
> >   > >>
> https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2]
> >   > >>
> https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members
> >   > >>   > Thank you,Takashi
> >   > >>   >
> >   > >>
> >   > >>
> >   > >
> >   >
> >   >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/5c69740c/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Thu Mar  2 07:54:14 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 2 Mar 2023 13:24:14 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
Message-ID: <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>

Hi
I don't see any major packet loss.
It seems the problem is somewhere in rabbitmq maybe but not due to packet
loss.

with regards,
Swogat Pradhan

On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> Yes the MTU is the same as the default '1500'.
> Generally I haven't seen any packet loss, but never checked when launching
> the instance.
> I will check that and come back.
> But everytime i launch an instance the instance gets stuck at spawning
> state and there the hypervisor becomes down, so not sure if packet loss
> causes this.
>
> With regards,
> Swogat pradhan
>
> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>
>> One more thing coming to mind is MTU size. Are they identical between
>> central and edge site? Do you see packet loss through the tunnel?
>>
>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>
>> > Hi Eugen,
>> > Request you to please add my email either on 'to' or 'cc' as i am not
>> > getting email's from you.
>> > Coming to the issue:
>> >
>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p /
>> > Listing policies for vhost "/" ...
>> > vhost   name    pattern apply-to        definition      priority
>> > /       ha-all  ^(?!amq\.).*    queues
>> >  {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}
>>  0
>> >
>> > I have the edge site compute nodes up, it only goes down when i am
>> trying
>> > to launch an instance and the instance comes to a spawning state and
>> then
>> > gets stuck.
>> >
>> > I have a tunnel setup between the central and the edge sites.
>> >
>> > With regards,
>> > Swogat Pradhan
>> >
>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> > wrote:
>> >
>> >> Hi Eugen,
>> >> For some reason i am not getting your email to me directly, i am
>> checking
>> >> the email digest and there i am able to find your reply.
>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>> >> Yes, these logs are from the time when the issue occurred.
>> >>
>> >> *Note: i am able to create vm's and perform other activities in the
>> >> central site, only facing this issue in the edge site.*
>> >>
>> >> With regards,
>> >> Swogat Pradhan
>> >>
>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >> wrote:
>> >>
>> >>> Hi Eugen,
>> >>> Thanks for your response.
>> >>> I have actually a 4 controller setup so here are the details:
>> >>>
>> >>> *PCS Status:*
>> >>>   * Container bundle set: rabbitmq-bundle [
>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>  Started
>> >>> overcloud-controller-no-ceph-3
>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>  Started
>> >>> overcloud-controller-2
>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>  Started
>> >>> overcloud-controller-1
>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>  Started
>> >>> overcloud-controller-0
>> >>>
>> >>> I have tried restarting the bundle multiple times but the issue is
>> still
>> >>> present.
>> >>>
>> >>> *Cluster status:*
>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>> >>> Cluster status of node
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> >>> Basics
>> >>>
>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>> >>>
>> >>> Disk Nodes
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>
>> >>> Running Nodes
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>
>> >>> Versions
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>> RabbitMQ
>> >>> 3.8.3 on Erlang 22.3.4.1
>> >>>
>> >>> Alarms
>> >>>
>> >>> (none)
>> >>>
>> >>> Network Partitions
>> >>>
>> >>> (none)
>> >>>
>> >>> Listeners
>> >>>
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>> inter-node and
>> >>> CLI tool communication
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>> 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>
>> >>> Feature flags
>> >>>
>> >>> Flag: drop_unroutable_metric, state: enabled
>> >>> Flag: empty_basic_get_metric, state: enabled
>> >>> Flag: implicit_default_bindings, state: enabled
>> >>> Flag: quorum_queue, state: enabled
>> >>> Flag: virtual_host_metadata, state: enabled
>> >>>
>> >>> *Logs:*
>> >>> *(Attached)*
>> >>>
>> >>> With regards,
>> >>> Swogat Pradhan
>> >>>
>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>> Please find the nova conductor as well as nova api log.
>> >>>>
>> >>>> nova-conuctor:
>> >>>>
>> >>>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 16152921c1eb45c2b1f562087140168b
>> >>>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> >>>> 83dbe5f567a940b698acfe986f6194fa
>> >>>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due
>> to a
>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due
>> to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due
>> to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> b240e3e89d99489284cd731e75f2a5db
>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>> with
>> >>>> backend dogpile.cache.null.
>> >>>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due
>> to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>
>> >>>> With regards,
>> >>>> Swogat Pradhan
>> >>>>
>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>> >>>> swogatpradhan22 at gmail.com> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>> >>>>> launch vm's.
>> >>>>> When the VM is in spawning state the node goes down (openstack
>> compute
>> >>>>> service list), the node comes backup when i restart the nova compute
>> >>>>> service but then the launch of the vm fails.
>> >>>>>
>> >>>>> nova-compute.log
>> >>>>>
>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>> >>>>> instance usage
>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
>> >>>>> 2023-02-26 08:00:00. 0 instances.
>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>> >>>>> dcn01-hci-0.bdxworld.com
>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>> with
>> >>>>> backend dogpile.cache.null.
>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>> >>>>> privsep helper:
>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>> 'privsep-helper',
>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>> privsep
>> >>>>> daemon via rootwrap
>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> daemon starting
>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> process running with uid/gid: 0/0
>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> process running with capabilities (eff/prm/inh):
>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> daemon running as pid 2647
>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>> os_brick.initiator.connectors.nvmeof
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>> >>>>> execution error
>> >>>>> in _get_host_uuid: Unexpected error while running command.
>> >>>>> Command: blkid overlay -s UUID -o value
>> >>>>> Exit code: 2
>> >>>>> Stdout: ''
>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>> >>>>> Unexpected error while running command.
>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>> >>>>>
>> >>>>> Is there a way to solve this issue?
>> >>>>>
>> >>>>>
>> >>>>> With regards,
>> >>>>>
>> >>>>> Swogat Pradhan
>> >>>>>
>> >>>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/c2f3ae05/attachment-0001.htm>

From smooney at redhat.com  Thu Mar  2 10:54:38 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 02 Mar 2023 10:54:38 +0000
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
 <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
 <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>
Message-ID: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com>

adding Dmitrii who was the primary developer of the openstack integration so
they can provide more insight.

Dmitrii did you ever give a presentationon the DPU support and how its configured/integrated
that might help fill in the gaps for simon?

more inline.

On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote:
> E...
> 
> But there are these things:
> 
> 1) Show some real happened in my test:
> 
> - Let me clear that, I use DPU in compute node:
> The graph in
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html .
> 
> - I configure exactly follow
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html,
> which is said bellow in "3) Let me post all what I do follow this link".
> 
> - In my test, I found after first three command (which is "openstack
> network create ...", "openstack subnet create", "openstack port create ..."),
> there are network topology exist in DPU side, and there are rules exist in
> OVN north DB, south DB of controller, like this:
> 
> > ```
> > root at c1:~# ovn-nbctl show
> > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976
> > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice)
> >     port 01a68701-0e6a-4c30-bfba-904d1b9813e1
> >         addresses: ["unknown"]
> >     port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1)
> >         addresses: ["fa:16:3e:13:36:e2 172.1.1.228"]
> > 
> > gyw at c1:~$ sudo ovn-sbctl list Port_Binding
> > _uuid               : 61dc8bc0-ab33-4d67-ac13-0781f89c905a
> > chassis             : []
> > datapath            : 91d3509c-d794-496a-ba11-3706ebf143c8
> > encap               : []
> > external_ids        : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24",
> > "neutron:device_id"="", "neutron:device_owner"="",
> > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69,
> > "neutron:port_name"=pf0vf1,
> > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9",
> > "neutron:revision_number"="1",
> > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"}
> > 
> > root at c1c2dpu:~# sudo ovs-vsctl show
> > 62cf78e5-2c02-471e-927e-1d69c2c22195
> >     Bridge br-int
> >         fail_mode: secure
> >         datapath_type: system
> >         Port br-int
> >             Interface br-int
> >                 type: internal
> >         Port ovn--1
> >             Interface ovn--1
> >                 type: geneve
> >                 options: {csum="true", key=flow, remote_ip="172.168.2.98"}
> >         Port pf0vf1
> >             Interface pf0vf1
> >     ovs_version: "2.17.2-24a81c8"
> > ```
> > 
> That's why I guess "first three command" has already create network
> topology, and "openstack server create" command only need to plug VF into
> VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done.
no that jsut looks like the standard bridge toplogy that gets created when you provision
the dpu to be used with openstac vai ovn.

that looks unrelated to the neuton comamnd you ran.
> 
> - In my test, then I run "openstack server create" command, I got ERROR
> which said "No valid host...", which is what the email said above.
> The reason has already said, it's nova-scheduler's PCI filter module report
> no valid host. The reason "nova-scheduler's PCI filter module report no
> valid host" is nova-scheduler could NOT see PCI information of compute
> node. The reason "nova-scheduler could NOT see PCI information of compute
> node" is compute node's /etc/nova/nova.conf configure remote_managed tag
> like this:
> 
> > ```
> > [pci]
> > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > "physical_network": null, "remote_managed": "true"}
> > alias = { "vendor_id":"15b3", "product_id":"101e",
> > "device_type":"type-VF", "name":"a1" }
> > ```
> > 
> 
> 2) Discuss some detail design of "remote_managed" tag, I don't know if this
> is right in the design of openstack with DPU:
> 
> - In neutron-server side, use remote_managed tag in "openstack port create
> ..." command.
> This command will make neutron-server / OVN / ovn-controller / ovs to make
> the network topology done, like above said.
> I this this is right, because test shows that.
that is not correct
your test do not show what you think it does, they show the baisic bridge
toplogy and flow configuraiton that ovn installs by defualt when it manages
as ovs.

please read the design docs for this feature for both nova and neutron to understand how the interacction works.
https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html
> 
> - In nova side, there are 2 things should process, first is PCI passthrough
> filter, second is nova-compute to plug VF into VM.
> 
> If the link above is right, which remote_managed tag exists in
> /etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf of
> compute node.
> As above ("- In my test, then I run "openstack server create" command")
> said, got ERROR in this step.
> So what should do in "PCI passthrough filter" ? How to configure ?
> 
> Then, if "PCI passthrough filter" stage pass, what will do of  nova-compute
> in compute node?
> 
> 3) Post all what I do follow this link:
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> - build openstack physical env, link plug DPU into compute mode, use VM as
> controller ... etc.
> - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link.
> - configure DPU side /etc/neutron/neutron.conf
> - configure host side /etc/nova/nova.conf
> - configure host side /etc/nova/nova-compute.conf
> - run first 3 command
> - last, run this command, got ERROR
> 
> ----
> Simon Jones
> 
> 
> Sean Mooney <smooney at redhat.com> ?2023?3?1??? 18:35???
> 
> > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
> > > Thanks a lot !!!
> > > 
> > > As you say, I follow
> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> > > And I want to use DPU mode. Not "disable DPU mode".
> > > So I think I should follow the link above exactlly, so I use
> > > vnic-type=remote_anaged.
> > > In my opnion, after I run first three command (which is "openstack
> > network
> > > create ...", "openstack subnet create", "openstack port create ..."), the
> > > VF rep port and OVN and OVS rules are all ready.
> > not at that point nothign will have been done on ovn/ovs
> > 
> > that will only happen after the port is bound to a vm and host.
> > 
> > > What I should do in "openstack server create ..." is to JUST add PCI
> > device
> > > into VM, do NOT call neutron-server in nova-compute of compute node (
> > like
> > > call port_binding or something).
> > this is incorrect.
> > > 
> > > But as the log and steps said in the emails above, nova-compute call
> > > port_binding to neutron-server while running the command "openstack
> > server
> > > create ...".
> > > 
> > > So I still have questions is:
> > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT
> > call
> > > neutron-server in nova-compute of compute node ( like call port_binding
> > or
> > > something)" .
> > no this is not how its designed.
> > until you attach the logical port to a vm (either at runtime or as part of
> > vm create)
> > the logical port is not assocated with any host or phsical dpu/vf.
> > 
> > so its not possibel to instanciate the openflow rules in ovs form the
> > logical switch model
> > in the ovn north db as no chassie info has been populated and we do not
> > have the dpu serial
> > info in the port binding details.
> > > 2) If it's right, how to deal with this? Which is how to JUST add PCI
> > > device into VM, do NOT call neutron-server? By command or by configure?
> > Is
> > > there come document ?
> > no this happens automaticaly when nova does the port binding which cannot
> > happen until after
> > teh vm is schduled to a host.
> > > 
> > > ----
> > > Simon Jones
> > > 
> > > 
> > > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
> > > 
> > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> > > > > BTW, this link (
> > > > > 
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
> > > > said
> > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?
> > > > 
> > > > no its not wrong but for dpu smart nics you have to make a choice when
> > you
> > > > deploy
> > > > either they can be used in dpu mode in which case remote_managed
> > shoudl be
> > > > set to true
> > > > and you can only use them via neutron ports with
> > vnic-type=remote_managed
> > > > as descried in that doc
> > > > 
> > > > 
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
> > > > 
> > > > 
> > > > or if you disable dpu mode in the nic frimware then you shoudl remvoe
> > > > remote_managed form the pci device list and
> > > > then it can be used liek a normal vf either for neutron sriov ports
> > > > vnic-type=direct or via flavor based pci passthough.
> > > > 
> > > > the issue you were havign is you configured the pci device list to
> > contain
> > > > "remote_managed: ture" which means
> > > > the vf can only be consumed by a neutron port with
> > > > vnic-type=remote_managed, when you have "remote_managed: false" or
> > unset
> > > > you can use it via vnic-type=direct i forgot that slight detail that
> > > > vnic-type=remote_managed is required for "remote_managed: ture".
> > > > 
> > > > 
> > > > in either case you foudn the correct doc
> > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > > neutorn sriov port configuration is documented here
> > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> > > > and nova flavor based pci passthough is documeted here
> > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> > > > 
> > > > all three server slightly differnt uses. both neutron proceedures are
> > > > exclusivly fo network interfaces.
> > > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > > requires the use of ovn deployed on the dpu
> > > > to configure the VF contolplane.
> > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses
> > > > the sriov nic agent
> > > > to manage the VF with ip tools.
> > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is
> > > > intended for pci passthough
> > > > of stateless acclerorators like qat devices. while the nova flavor
> > approch
> > > > cna be used with nics it not how its generally
> > > > ment to be used and when used to passthough a nic expectation is that
> > its
> > > > not related to a neuton network.
> > > > 
> > > > 
> > 
> > 


From elfosardo at gmail.com  Thu Mar  2 11:26:41 2023
From: elfosardo at gmail.com (Riccardo Pittau)
Date: Thu, 2 Mar 2023 12:26:41 +0100
Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu
 to OpenStack community
In-Reply-To: <20230301225909.enp5xetacmzpjg7o@yuggoth.org>
References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org>
 <20230301225909.enp5xetacmzpjg7o@yuggoth.org>
Message-ID: <CAORRS=mfJaEWF_1URAo0cTV1NPMv0Ef=yE+Bb53Hhnb5CgB=ZA@mail.gmail.com>

Thanks for this anyway Jeremy!
Luckily one of the old maintainers added me to the virtualpdu-core and
virtualpdu-release groups, so now we can move forward with the repository
update and move easily.

Ciao!
Riccardo

On Thu, Mar 2, 2023 at 12:05?AM Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote:
> [...]
> > We don't have a documented process since this is the first time it's
> > really come up, but I'm officially announcing that I intend to use
> > my administrative permissions as an OpenDev sysadmin to add
> > membership of an OpenStack Ironic team representative to the
> > following Gerrit groups if no objections are raised before
> > Wednesday, March 8
> [...]
>
> It seems the additional round of outreach worked, so no longer
> requires my direct intervention:
>
>
> https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/
>
> --
> Jeremy Stanley
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/7ed11c64/attachment.htm>

From dmitrii.shcherbakov at canonical.com  Thu Mar  2 12:29:25 2023
From: dmitrii.shcherbakov at canonical.com (Dmitrii Shcherbakov)
Date: Thu, 2 Mar 2023 15:29:25 +0300
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
 <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
 <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>
 <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com>
Message-ID: <CAE1qvGS7zJG_UDbQY7vjyf9uBg5z-o96pTshwfzjrvZPYT0yhA@mail.gmail.com>

Hi {Sean, Simon},

> did you ever give a presentation on the DPU support

Yes, there were a couple at different stages.

The following is the one of the older ones that references the SMARTNIC
VNIC type but we later switched to REMOTE_MANAGED in the final code:
https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf,
however, it has a useful diagram on page 15 which shows the interactions of
different components. A lot of other content from it is present in the
OpenStack docs now which we added during the feature development.

There is also a presentation with a demo that we did at the Open Infra
summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared the
material after the features got merged).

Generally, as Sean described, the aim of this feature is to make the
interaction between components present at the hypervisor and the DPU side
automatic but, in order to make this workflow explicitly different from
SR-IOV or offload at the hypervisor side, one has to use the
"remote_managed" flag. This flag allows Nova to differentiate between
"regular" VFs and the ones that have to be programmed by a remote host
(DPU) - hence the name.

A port needs to be pre-created with the remote-managed type - that way when
Nova tries to schedule a VM with that port attached, it will find hosts
which actually have PCI devices tagged with the "remote_managed": "true" in
the PCI whitelist.

The important thing to note here is that you must not use PCI passthrough
directly for this - Nova will create a PCI device request automatically
with the remote_managed flag included. There is currently no way to
instruct Nova to choose one vendor/device ID vs the other for this (any
remote_managed=true device from a pool will match) but maybe the work that
was recently done to store PCI device information in the Placement service
will pave the way for such granularity in the future.

Best Regards,
Dmitrii Shcherbakov
LP/MM/oftc: dmitriis


On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney <smooney at redhat.com> wrote:

> adding Dmitrii who was the primary developer of the openstack integration
> so
> they can provide more insight.
>
> Dmitrii did you ever give a presentationon the DPU support and how its
> configured/integrated
> that might help fill in the gaps for simon?
>
> more inline.
>
> On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote:
> > E...
> >
> > But there are these things:
> >
> > 1) Show some real happened in my test:
> >
> > - Let me clear that, I use DPU in compute node:
> > The graph in
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html .
> >
> > - I configure exactly follow
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html,
> > which is said bellow in "3) Let me post all what I do follow this link".
> >
> > - In my test, I found after first three command (which is "openstack
> > network create ...", "openstack subnet create", "openstack port create
> ..."),
> > there are network topology exist in DPU side, and there are rules exist
> in
> > OVN north DB, south DB of controller, like this:
> >
> > > ```
> > > root at c1:~# ovn-nbctl show
> > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976
> > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice)
> > >     port 01a68701-0e6a-4c30-bfba-904d1b9813e1
> > >         addresses: ["unknown"]
> > >     port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1)
> > >         addresses: ["fa:16:3e:13:36:e2 172.1.1.228"]
> > >
> > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding
> > > _uuid               : 61dc8bc0-ab33-4d67-ac13-0781f89c905a
> > > chassis             : []
> > > datapath            : 91d3509c-d794-496a-ba11-3706ebf143c8
> > > encap               : []
> > > external_ids        : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24",
> > > "neutron:device_id"="", "neutron:device_owner"="",
> > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69,
> > > "neutron:port_name"=pf0vf1,
> > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9",
> > > "neutron:revision_number"="1",
> > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"}
> > >
> > > root at c1c2dpu:~# sudo ovs-vsctl show
> > > 62cf78e5-2c02-471e-927e-1d69c2c22195
> > >     Bridge br-int
> > >         fail_mode: secure
> > >         datapath_type: system
> > >         Port br-int
> > >             Interface br-int
> > >                 type: internal
> > >         Port ovn--1
> > >             Interface ovn--1
> > >                 type: geneve
> > >                 options: {csum="true", key=flow,
> remote_ip="172.168.2.98"}
> > >         Port pf0vf1
> > >             Interface pf0vf1
> > >     ovs_version: "2.17.2-24a81c8"
> > > ```
> > >
> > That's why I guess "first three command" has already create network
> > topology, and "openstack server create" command only need to plug VF into
> > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done.
> no that jsut looks like the standard bridge toplogy that gets created when
> you provision
> the dpu to be used with openstac vai ovn.
>
> that looks unrelated to the neuton comamnd you ran.
> >
> > - In my test, then I run "openstack server create" command, I got ERROR
> > which said "No valid host...", which is what the email said above.
> > The reason has already said, it's nova-scheduler's PCI filter module
> report
> > no valid host. The reason "nova-scheduler's PCI filter module report no
> > valid host" is nova-scheduler could NOT see PCI information of compute
> > node. The reason "nova-scheduler could NOT see PCI information of compute
> > node" is compute node's /etc/nova/nova.conf configure remote_managed tag
> > like this:
> >
> > > ```
> > > [pci]
> > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> > > "physical_network": null, "remote_managed": "true"}
> > > alias = { "vendor_id":"15b3", "product_id":"101e",
> > > "device_type":"type-VF", "name":"a1" }
> > > ```
> > >
> >
> > 2) Discuss some detail design of "remote_managed" tag, I don't know if
> this
> > is right in the design of openstack with DPU:
> >
> > - In neutron-server side, use remote_managed tag in "openstack port
> create
> > ..." command.
> > This command will make neutron-server / OVN / ovn-controller / ovs to
> make
> > the network topology done, like above said.
> > I this this is right, because test shows that.
> that is not correct
> your test do not show what you think it does, they show the baisic bridge
> toplogy and flow configuraiton that ovn installs by defualt when it manages
> as ovs.
>
> please read the design docs for this feature for both nova and neutron to
> understand how the interacction works.
>
> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>
> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html
> >
> > - In nova side, there are 2 things should process, first is PCI
> passthrough
> > filter, second is nova-compute to plug VF into VM.
> >
> > If the link above is right, which remote_managed tag exists in
> > /etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf
> of
> > compute node.
> > As above ("- In my test, then I run "openstack server create" command")
> > said, got ERROR in this step.
> > So what should do in "PCI passthrough filter" ? How to configure ?
> >
> > Then, if "PCI passthrough filter" stage pass, what will do of
> nova-compute
> > in compute node?
> >
> > 3) Post all what I do follow this link:
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> > - build openstack physical env, link plug DPU into compute mode, use VM
> as
> > controller ... etc.
> > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link.
> > - configure DPU side /etc/neutron/neutron.conf
> > - configure host side /etc/nova/nova.conf
> > - configure host side /etc/nova/nova-compute.conf
> > - run first 3 command
> > - last, run this command, got ERROR
> >
> > ----
> > Simon Jones
> >
> >
> > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 18:35???
> >
> > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
> > > > Thanks a lot !!!
> > > >
> > > > As you say, I follow
> > > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> > > > And I want to use DPU mode. Not "disable DPU mode".
> > > > So I think I should follow the link above exactlly, so I use
> > > > vnic-type=remote_anaged.
> > > > In my opnion, after I run first three command (which is "openstack
> > > network
> > > > create ...", "openstack subnet create", "openstack port create
> ..."), the
> > > > VF rep port and OVN and OVS rules are all ready.
> > > not at that point nothign will have been done on ovn/ovs
> > >
> > > that will only happen after the port is bound to a vm and host.
> > >
> > > > What I should do in "openstack server create ..." is to JUST add PCI
> > > device
> > > > into VM, do NOT call neutron-server in nova-compute of compute node (
> > > like
> > > > call port_binding or something).
> > > this is incorrect.
> > > >
> > > > But as the log and steps said in the emails above, nova-compute call
> > > > port_binding to neutron-server while running the command "openstack
> > > server
> > > > create ...".
> > > >
> > > > So I still have questions is:
> > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT
> > > call
> > > > neutron-server in nova-compute of compute node ( like call
> port_binding
> > > or
> > > > something)" .
> > > no this is not how its designed.
> > > until you attach the logical port to a vm (either at runtime or as
> part of
> > > vm create)
> > > the logical port is not assocated with any host or phsical dpu/vf.
> > >
> > > so its not possibel to instanciate the openflow rules in ovs form the
> > > logical switch model
> > > in the ovn north db as no chassie info has been populated and we do not
> > > have the dpu serial
> > > info in the port binding details.
> > > > 2) If it's right, how to deal with this? Which is how to JUST add PCI
> > > > device into VM, do NOT call neutron-server? By command or by
> configure?
> > > Is
> > > > there come document ?
> > > no this happens automaticaly when nova does the port binding which
> cannot
> > > happen until after
> > > teh vm is schduled to a host.
> > > >
> > > > ----
> > > > Simon Jones
> > > >
> > > >
> > > > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
> > > >
> > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> > > > > > BTW, this link (
> > > > > >
> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
> > > > > said
> > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that
> WRONG ?
> > > > >
> > > > > no its not wrong but for dpu smart nics you have to make a choice
> when
> > > you
> > > > > deploy
> > > > > either they can be used in dpu mode in which case remote_managed
> > > shoudl be
> > > > > set to true
> > > > > and you can only use them via neutron ports with
> > > vnic-type=remote_managed
> > > > > as descried in that doc
> > > > >
> > > > >
> > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
> > > > >
> > > > >
> > > > > or if you disable dpu mode in the nic frimware then you shoudl
> remvoe
> > > > > remote_managed form the pci device list and
> > > > > then it can be used liek a normal vf either for neutron sriov ports
> > > > > vnic-type=direct or via flavor based pci passthough.
> > > > >
> > > > > the issue you were havign is you configured the pci device list to
> > > contain
> > > > > "remote_managed: ture" which means
> > > > > the vf can only be consumed by a neutron port with
> > > > > vnic-type=remote_managed, when you have "remote_managed: false" or
> > > unset
> > > > > you can use it via vnic-type=direct i forgot that slight detail
> that
> > > > > vnic-type=remote_managed is required for "remote_managed: ture".
> > > > >
> > > > >
> > > > > in either case you foudn the correct doc
> > > > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > > > neutorn sriov port configuration is documented here
> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> > > > > and nova flavor based pci passthough is documeted here
> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> > > > >
> > > > > all three server slightly differnt uses. both neutron proceedures
> are
> > > > > exclusivly fo network interfaces.
> > > > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > > > requires the use of ovn deployed on the dpu
> > > > > to configure the VF contolplane.
> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> uses
> > > > > the sriov nic agent
> > > > > to manage the VF with ip tools.
> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> is
> > > > > intended for pci passthough
> > > > > of stateless acclerorators like qat devices. while the nova flavor
> > > approch
> > > > > cna be used with nics it not how its generally
> > > > > ment to be used and when used to passthough a nic expectation is
> that
> > > its
> > > > > not related to a neuton network.
> > > > >
> > > > >
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/19d42efc/attachment-0001.htm>

From gael.therond at bitswalk.com  Thu Mar  2 14:00:03 2023
From: gael.therond at bitswalk.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=)
Date: Thu, 2 Mar 2023 15:00:03 +0100
Subject: [OPENSTACKSDK] - Missing feature or bad reader?
Message-ID: <CABNe=UsFWoZKG-DDmdB7eaNEPL9GyHUWhAjUGqG=fDK_-RatLg@mail.gmail.com>

Hi everyone,

I'm currently adding a new module on ansible-collections-openstack,
however, I'm having a hard time finding the appropriate function for my
module.

Within ansible-collections-openstack, we have a compute_services_info
module that list services using the conn.compute.services() function that,
to my knowledge, come from the openstacksdk.compute.v2.service.py module.

Our ansible module replicate what you get with:
openstack --os-cloud compute service list command.
or
openstack --os-cloud volume service list command.
(If I'm not wrong, it seems, openstack client is leveraging osc-lib for
that and not the SDK).

My problem is, I want to add another similar module on our collection,
(volume_services_info) that would do the same but for volumes services:

Unfortunately, either I'm not looking at the right place, or any volume
endpoint (v2/v3) within the openstack sdk is implementing the appropriate
service module.

Did I miss something or is that class simply missing?

Thanks everyone!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/3f2bdbab/attachment.htm>

From ozzzo at yahoo.com  Thu Mar  2 14:15:03 2023
From: ozzzo at yahoo.com (Albert Braden)
Date: Thu, 2 Mar 2023 14:15:03 +0000 (UTC)
Subject: (OpenStack-Upgrade)
In-Reply-To: <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
Message-ID: <821443260.1309518.1677766503515@mail.yahoo.com>

 Having done a few upgrades, I can give you some general advice:

1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer.

2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep.

3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback.

4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues.

5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure.
     On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:  
 
 Hey,

Regarding rollaback of upgrade in OSA we indeed don't have any good
established/documented process for that. At the same time it should be
completely possible with some "BUT". It also depends on what exactly
you want to rollback - roles, openstack services or both. As OSA roles
can actually install any openstack service version.

We keep all virtualenvs from the previous version, so during upgrade
we build just new virtualenvs and reconfigure systemd units to point
there. So fastest way likely would be to just edit systemd unit files
and point them to old venv version and reload systemd daemon and
service and restore DB from backup of course.
You can also define? <service>_venv_tag (ie `glance_venv_tag`) to the
old OSA version you was running and execute openstack-ansible
os-<service>-install.yml --tags? systemd-service,uwsgi - that in most
cases will be enough to just edit systemd units for the service and
start old version of it. BUT running without tags will result in
having new packages in old venv which is smth you totally want to
avoid.
To prevent that you can also define <service>_git_install_branch and
requirements_git_install_branch in /etc/openstack_deploy/group_vars
(it's important to use group vars if you want to rollback only one
service) and take value from
https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
(ofc pick your old version!)

For a full rollback and not in-place workarounds, I think it should be like that
* checkout to previous osa version
* re-execute scripts/bootstrap-ansible.sh
* you should still take current versions of mariadb and rabbitmq and
define them in user_variables (galera_major_version,
galera_minor_version, rabbitmq_package_version,
rabbitmq_erlang_version_spec) - it's close to never ends well
downgrading these.
* Restore DB backup
* Re-run setup-openstack.yml

It's quite a rough summary of how I do see this process, but to be
frank I never had to execute full downgrade - I was limited mostly by
downgrading 1 service tops after the upgrade.

Hope that helps!

??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:

>
> hi Alvaro,
>
> i have installed using Openstack-ansible, The upgrade procedure is consistent
>
> but what is the roll back procedure , i m looking for
>
> Regards
> Adivya Singh
>
> On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
>>
>> That will depend on how did you installed your environment: OSA, TripleO, etc.
>>
>> Can you provide more information?
>>
>> ---
>> Alvaro Soto.
>>
>> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
>> ----------------------------------------------------------
>> Great people talk about ideas,
>> ordinary people talk about things,
>> small people talk... about other people.
>>
>> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:
>>>
>>> Hi Team,
>>>
>>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
>>>
>>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
>>>
>>> Regards
>>> Adivya Singh

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/c9673ea4/attachment.htm>

From artem.goncharov at gmail.com  Thu Mar  2 14:35:25 2023
From: artem.goncharov at gmail.com (artem.goncharov at gmail.com)
Date: Thu, 02 Mar 2023 15:35:25 +0100
Subject: [OPENSTACKSDK] - Missing feature or bad reader?
In-Reply-To: <CABNe=UsFWoZKG-DDmdB7eaNEPL9GyHUWhAjUGqG=fDK_-RatLg@mail.gmail.com>
References: <CABNe=UsFWoZKG-DDmdB7eaNEPL9GyHUWhAjUGqG=fDK_-RatLg@mail.gmail.com>
Message-ID: <2674425.mvXUDI8C0e@nuc>

Hi

On Thursday, 2 March 2023 15:00:03 CET Ga?l THEROND wrote:
> Hi everyone,
> 
> I'm currently adding a new module on ansible-collections-openstack,
> however, I'm having a hard time finding the appropriate function for my
> module.
> 
> Within ansible-collections-openstack, we have a compute_services_info
> module that list services using the conn.compute.services() function that,
> to my knowledge, come from the openstacksdk.compute.v2.service.py module.
> 
> Our ansible module replicate what you get with:
> openstack --os-cloud compute service list command.
> or
> openstack --os-cloud volume service list command.
> (If I'm not wrong, it seems, openstack client is leveraging osc-lib for
> that and not the SDK).
> 
> My problem is, I want to add another similar module on our collection,
> (volume_services_info) that would do the same but for volumes services:
> 
> Unfortunately, either I'm not looking at the right place, or any volume
> endpoint (v2/v3) within the openstack sdk is implementing the appropriate
> service module.

From what I see block_storage in SDK is currently missing implementation for 
service management (it is an admin-only and such APIs tend to be of lower prio 
in SDK).
> 
> Did I miss something or is that class simply missing?
> 
> Thanks everyone!

Artem


From batmanustc at gmail.com  Thu Mar  2 03:05:53 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Thu, 2 Mar 2023 11:05:53 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
 <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
Message-ID: <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>

E...

But there are these things:

1) Show some real happened in my test:

- Let me clear that, I use DPU in compute node:
The graph in
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html .

- I configure exactly follow
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html,
which is said bellow in "3) Let me post all what I do follow this link".

- In my test, I found after first three command (which is "openstack
network create ...", "openstack subnet create", "openstack port create ..."),
there are network topology exist in DPU side, and there are rules exist in
OVN north DB, south DB of controller, like this:

> ```
> root at c1:~# ovn-nbctl show
> switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976
> (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice)
>     port 01a68701-0e6a-4c30-bfba-904d1b9813e1
>         addresses: ["unknown"]
>     port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1)
>         addresses: ["fa:16:3e:13:36:e2 172.1.1.228"]
>
> gyw at c1:~$ sudo ovn-sbctl list Port_Binding
> _uuid               : 61dc8bc0-ab33-4d67-ac13-0781f89c905a
> chassis             : []
> datapath            : 91d3509c-d794-496a-ba11-3706ebf143c8
> encap               : []
> external_ids        : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24",
> "neutron:device_id"="", "neutron:device_owner"="",
> "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69,
> "neutron:port_name"=pf0vf1,
> "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9",
> "neutron:revision_number"="1",
> "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"}
>
> root at c1c2dpu:~# sudo ovs-vsctl show
> 62cf78e5-2c02-471e-927e-1d69c2c22195
>     Bridge br-int
>         fail_mode: secure
>         datapath_type: system
>         Port br-int
>             Interface br-int
>                 type: internal
>         Port ovn--1
>             Interface ovn--1
>                 type: geneve
>                 options: {csum="true", key=flow, remote_ip="172.168.2.98"}
>         Port pf0vf1
>             Interface pf0vf1
>     ovs_version: "2.17.2-24a81c8"
> ```
>
That's why I guess "first three command" has already create network
topology, and "openstack server create" command only need to plug VF into
VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done.

- In my test, then I run "openstack server create" command, I got ERROR
which said "No valid host...", which is what the email said above.
The reason has already said, it's nova-scheduler's PCI filter module report
no valid host. The reason "nova-scheduler's PCI filter module report no
valid host" is nova-scheduler could NOT see PCI information of compute
node. The reason "nova-scheduler could NOT see PCI information of compute
node" is compute node's /etc/nova/nova.conf configure remote_managed tag
like this:

> ```
> [pci]
> passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
> "physical_network": null, "remote_managed": "true"}
> alias = { "vendor_id":"15b3", "product_id":"101e",
> "device_type":"type-VF", "name":"a1" }
> ```
>

2) Discuss some detail design of "remote_managed" tag, I don't know if this
is right in the design of openstack with DPU:

- In neutron-server side, use remote_managed tag in "openstack port create
..." command.
This command will make neutron-server / OVN / ovn-controller / ovs to make
the network topology done, like above said.
I this this is right, because test shows that.

- In nova side, there are 2 things should process, first is PCI passthrough
filter, second is nova-compute to plug VF into VM.

If the link above is right, which remote_managed tag exists in
/etc/nova/nova.conf of controller node and exists in /etc/nova/nova.conf of
compute node.
As above ("- In my test, then I run "openstack server create" command")
said, got ERROR in this step.
So what should do in "PCI passthrough filter" ? How to configure ?

Then, if "PCI passthrough filter" stage pass, what will do of  nova-compute
in compute node?

3) Post all what I do follow this link:
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
- build openstack physical env, link plug DPU into compute mode, use VM as
controller ... etc.
- build openstack nova, neutron, ovn, ovn-vif, ovs follow that link.
- configure DPU side /etc/neutron/neutron.conf
- configure host side /etc/nova/nova.conf
- configure host side /etc/nova/nova-compute.conf
- run first 3 command
- last, run this command, got ERROR

----
Simon Jones


Sean Mooney <smooney at redhat.com> ?2023?3?1??? 18:35???

> On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
> > Thanks a lot !!!
> >
> > As you say, I follow
> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
> > And I want to use DPU mode. Not "disable DPU mode".
> > So I think I should follow the link above exactlly, so I use
> > vnic-type=remote_anaged.
> > In my opnion, after I run first three command (which is "openstack
> network
> > create ...", "openstack subnet create", "openstack port create ..."), the
> > VF rep port and OVN and OVS rules are all ready.
> not at that point nothign will have been done on ovn/ovs
>
> that will only happen after the port is bound to a vm and host.
>
> > What I should do in "openstack server create ..." is to JUST add PCI
> device
> > into VM, do NOT call neutron-server in nova-compute of compute node (
> like
> > call port_binding or something).
> this is incorrect.
> >
> > But as the log and steps said in the emails above, nova-compute call
> > port_binding to neutron-server while running the command "openstack
> server
> > create ...".
> >
> > So I still have questions is:
> > 1) Is my opinion right? Which is "JUST add PCI device into VM, do NOT
> call
> > neutron-server in nova-compute of compute node ( like call port_binding
> or
> > something)" .
> no this is not how its designed.
> until you attach the logical port to a vm (either at runtime or as part of
> vm create)
> the logical port is not assocated with any host or phsical dpu/vf.
>
> so its not possibel to instanciate the openflow rules in ovs form the
> logical switch model
> in the ovn north db as no chassie info has been populated and we do not
> have the dpu serial
> info in the port binding details.
> > 2) If it's right, how to deal with this? Which is how to JUST add PCI
> > device into VM, do NOT call neutron-server? By command or by configure?
> Is
> > there come document ?
> no this happens automaticaly when nova does the port binding which cannot
> happen until after
> teh vm is schduled to a host.
> >
> > ----
> > Simon Jones
> >
> >
> > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
> >
> > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
> > > > BTW, this link (
> > > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
> > > said
> > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that WRONG ?
> > >
> > > no its not wrong but for dpu smart nics you have to make a choice when
> you
> > > deploy
> > > either they can be used in dpu mode in which case remote_managed
> shoudl be
> > > set to true
> > > and you can only use them via neutron ports with
> vnic-type=remote_managed
> > > as descried in that doc
> > >
> > >
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
> > >
> > >
> > > or if you disable dpu mode in the nic frimware then you shoudl remvoe
> > > remote_managed form the pci device list and
> > > then it can be used liek a normal vf either for neutron sriov ports
> > > vnic-type=direct or via flavor based pci passthough.
> > >
> > > the issue you were havign is you configured the pci device list to
> contain
> > > "remote_managed: ture" which means
> > > the vf can only be consumed by a neutron port with
> > > vnic-type=remote_managed, when you have "remote_managed: false" or
> unset
> > > you can use it via vnic-type=direct i forgot that slight detail that
> > > vnic-type=remote_managed is required for "remote_managed: ture".
> > >
> > >
> > > in either case you foudn the correct doc
> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > neutorn sriov port configuration is documented here
> > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
> > > and nova flavor based pci passthough is documeted here
> > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
> > >
> > > all three server slightly differnt uses. both neutron proceedures are
> > > exclusivly fo network interfaces.
> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
> > > requires the use of ovn deployed on the dpu
> > > to configure the VF contolplane.
> > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses
> > > the sriov nic agent
> > > to manage the VF with ip tools.
> > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is
> > > intended for pci passthough
> > > of stateless acclerorators like qat devices. while the nova flavor
> approch
> > > cna be used with nics it not how its generally
> > > ment to be used and when used to passthough a nic expectation is that
> its
> > > not related to a neuton network.
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/5e61de48/attachment-0001.htm>

From noonedeadpunk at gmail.com  Thu Mar  2 15:47:36 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 2 Mar 2023 16:47:36 +0100
Subject: (OpenStack-Upgrade)
In-Reply-To: <821443260.1309518.1677766503515@mail.yahoo.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
 <821443260.1309518.1677766503515@mail.yahoo.com>
Message-ID: <CAPd_6Aupgv3n-OXFHEWE1a18OO2DY1xDp-MKtsKB-kMP8vqL5A@mail.gmail.com>

These are very weird statements and I can not agree with most of them.

1. You should upgrade in time. All problems come if you try to avoid
upgrades at any costs - then you're indeed in a situation when
upgrades are painful as you're running obsolete stuff that is not
supported anymore and not provided by your distro (or distro also is
not supported as well).
With SLURP releases you will be able to do upgrades yearly starting
with Antelope. Before that upgrades should be done each 6 month
basically. Jumping through 1 release was not supported before but is
doable given some preparation and small hacks. Jumping through more
than 1 release will almost certainly guarantee you pain. Upgrades to
next releases are well tested both by individual projects and by
OpenStack-Ansible, so given you've looked through release notes and
adjusted configuration - it should be just fine.

2. It's quite an easy and relatively smooth process as of today. Yes,
you will have small API interruptions during the upgrade and when
services do restart they drop connections. But we control HAproxy
backends to minimize the effect of this. In many cases upgrade can be
performed just running scripts/run_upgrade.sh - it will work given
it's ran against healthy cluster (meaning that you don't have dead
galera or rabbit node in your cluster). At the moment we spend around
a working day for upgrading a region, but planning to automate this
process soonish to perform upgrades of production environments using
Zuul. We also never had to rollback, as rollback is indeed painful
process that you can hard process. So I won't sugggest rolling back
production environment unless it's absolutely needed.

3. This is smth I will agree with. You can take a look at our MNAIO
[1] that can help you to spawn a virtual sandbox with multiple nodes
in it, where you can play with upgrades. Also I'd suggest running
tempest or rally tests regularly. They are helpful indeed.

4. I'm not sure what's meant here at all. I can hardly imagine how you
can fail an OpenStack upgrade in a way that you will lose customer
data. I can recall such failures with Ceph though, but it was
somewhere around Hammer release (0.84 or smth) which is not the case
for quite a while as well.


[1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio

??, 2 ???. 2023??. ? 15:15, Albert Braden <ozzzo at yahoo.com>:
>
> Having done a few upgrades, I can give you some general advice:
>
> 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer.
>
> 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep.
>
> 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback.
>
> 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues.
>
> 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure.
> On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
>
>
> Hey,
>
> Regarding rollaback of upgrade in OSA we indeed don't have any good
> established/documented process for that. At the same time it should be
> completely possible with some "BUT". It also depends on what exactly
> you want to rollback - roles, openstack services or both. As OSA roles
> can actually install any openstack service version.
>
> We keep all virtualenvs from the previous version, so during upgrade
> we build just new virtualenvs and reconfigure systemd units to point
> there. So fastest way likely would be to just edit systemd unit files
> and point them to old venv version and reload systemd daemon and
> service and restore DB from backup of course.
> You can also define  <service>_venv_tag (ie `glance_venv_tag`) to the
> old OSA version you was running and execute openstack-ansible
> os-<service>-install.yml --tags  systemd-service,uwsgi - that in most
> cases will be enough to just edit systemd units for the service and
> start old version of it. BUT running without tags will result in
> having new packages in old venv which is smth you totally want to
> avoid.
> To prevent that you can also define <service>_git_install_branch and
> requirements_git_install_branch in /etc/openstack_deploy/group_vars
> (it's important to use group vars if you want to rollback only one
> service) and take value from
> https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
> (ofc pick your old version!)
>
> For a full rollback and not in-place workarounds, I think it should be like that
> * checkout to previous osa version
> * re-execute scripts/bootstrap-ansible.sh
> * you should still take current versions of mariadb and rabbitmq and
> define them in user_variables (galera_major_version,
> galera_minor_version, rabbitmq_package_version,
> rabbitmq_erlang_version_spec) - it's close to never ends well
> downgrading these.
> * Restore DB backup
> * Re-run setup-openstack.yml
>
> It's quite a rough summary of how I do see this process, but to be
> frank I never had to execute full downgrade - I was limited mostly by
> downgrading 1 service tops after the upgrade.
>
> Hope that helps!
>
> ??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:
>
> >
> > hi Alvaro,
> >
> > i have installed using Openstack-ansible, The upgrade procedure is consistent
> >
> > but what is the roll back procedure , i m looking for
> >
> > Regards
> > Adivya Singh
> >
> > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
> >>
> >> That will depend on how did you installed your environment: OSA, TripleO, etc.
> >>
> >> Can you provide more information?
> >>
> >> ---
> >> Alvaro Soto.
> >>
> >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
> >> ----------------------------------------------------------
> >> Great people talk about ideas,
> >> ordinary people talk about things,
> >> small people talk... about other people.
> >>
> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
> >>>
> >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
> >>>
> >>> Regards
> >>> Adivya Singh
>


From gael.therond at bitswalk.com  Thu Mar  2 16:07:07 2023
From: gael.therond at bitswalk.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=)
Date: Thu, 2 Mar 2023 17:07:07 +0100
Subject: [OPENSTACKSDK] - Missing feature or bad reader?
Message-ID: <CABNe=UvNgg=SufHjzU+xcAguCiEu=_88TPdRro_nwvg3gZ9KzQ@mail.gmail.com>

> Hi everyone,
>
> I'm currently adding a new module on ansible-collections-openstack,
> however, I'm having a hard time finding the appropriate function for my
> module.
>
> Within ansible-collections-openstack, we have a compute_services_info
> module that list services using the conn.compute.services() function that,
> to my knowledge, come from the openstacksdk.compute.v2.service.py module.
>
> Our ansible module replicate what you get with:
> openstack --os-cloud compute service list command.
> or
> openstack --os-cloud volume service list command.
> (If I'm not wrong, it seems, openstack client is leveraging osc-lib for
> that and not the SDK).
>
> My problem is, I want to add another similar module on our collection,
> (volume_services_info) that would do the same but for volumes services:
>
> Unfortunately, either I'm not looking at the right place, or any volume
> endpoint (v2/v3) within the openstack sdk is implementing the appropriate
> service module.

>From what I see block_storage in SDK is currently missing implementation
for
>service management (it is an admin-only and such APIs tend to be of lower
prio
>in SDK).
>Artem


All right, thanks a lot for the confirmation Artem, gonna see it we could
add that pretty quickly on SDK as cinder already provide the appropriate
/os-service API endpoint.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/cd4793b8/attachment.htm>

From ozzzo at yahoo.com  Thu Mar  2 17:03:21 2023
From: ozzzo at yahoo.com (Albert Braden)
Date: Thu, 2 Mar 2023 17:03:21 +0000 (UTC)
Subject: (OpenStack-Upgrade)
In-Reply-To: <CAPd_6Aupgv3n-OXFHEWE1a18OO2DY1xDp-MKtsKB-kMP8vqL5A@mail.gmail.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
 <821443260.1309518.1677766503515@mail.yahoo.com>
 <CAPd_6Aupgv3n-OXFHEWE1a18OO2DY1xDp-MKtsKB-kMP8vqL5A@mail.gmail.com>
Message-ID: <1288757490.3453002.1677776601143@mail.yahoo.com>

 1. Of course you should upgrade every 6 months. I've never seen or heard of anyone doing that, but if you have the resources, I agree, that would be great. And yes, if you're upgrading a few versions, you may need to do one or more operating system upgrades along the way.

2. I've never seen an easy, smooth process. That being said, I've never done a single-version upgrade. If you upgrade every 6 months, then maybe it would be smooth and easy. The standard situation I saw during my contracting years is that a company has got themselves into a bind because they have a small team (or maybe 1 guy) running Openstack, and they haven't upgraded for a long time, so they hire me to clean up the mess.

4 (I think you meant 5 here?). I've never lost resources during an upgrade, but I would never promise customers that there is 0 percentage chance of loss. I always recommend that the customer be resilient against loss, for example by duplicating their application in multiple clusters and by maintaining backups of important data, and I strengthen that recommendation during upgrades.
     On Thursday, March 2, 2023, 10:56:47 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:  
 
 These are very weird statements and I can not agree with most of them.

1. You should upgrade in time. All problems come if you try to avoid
upgrades at any costs - then you're indeed in a situation when
upgrades are painful as you're running obsolete stuff that is not
supported anymore and not provided by your distro (or distro also is
not supported as well).
With SLURP releases you will be able to do upgrades yearly starting
with Antelope. Before that upgrades should be done each 6 month
basically. Jumping through 1 release was not supported before but is
doable given some preparation and small hacks. Jumping through more
than 1 release will almost certainly guarantee you pain. Upgrades to
next releases are well tested both by individual projects and by
OpenStack-Ansible, so given you've looked through release notes and
adjusted configuration - it should be just fine.

2. It's quite an easy and relatively smooth process as of today. Yes,
you will have small API interruptions during the upgrade and when
services do restart they drop connections. But we control HAproxy
backends to minimize the effect of this. In many cases upgrade can be
performed just running scripts/run_upgrade.sh - it will work given
it's ran against healthy cluster (meaning that you don't have dead
galera or rabbit node in your cluster). At the moment we spend around
a working day for upgrading a region, but planning to automate this
process soonish to perform upgrades of production environments using
Zuul. We also never had to rollback, as rollback is indeed painful
process that you can hard process. So I won't sugggest rolling back
production environment unless it's absolutely needed.

3. This is smth I will agree with. You can take a look at our MNAIO
[1] that can help you to spawn a virtual sandbox with multiple nodes
in it, where you can play with upgrades. Also I'd suggest running
tempest or rally tests regularly. They are helpful indeed.

4. I'm not sure what's meant here at all. I can hardly imagine how you
can fail an OpenStack upgrade in a way that you will lose customer
data. I can recall such failures with Ceph though, but it was
somewhere around Hammer release (0.84 or smth) which is not the case
for quite a while as well.


[1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio

??, 2 ???. 2023??. ? 15:15, Albert Braden <ozzzo at yahoo.com>:
>
> Having done a few upgrades, I can give you some general advice:
>
> 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer.
>
> 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep.
>
> 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback.
>
> 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues.
>
> 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure.
> On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
>
>
> Hey,
>
> Regarding rollaback of upgrade in OSA we indeed don't have any good
> established/documented process for that. At the same time it should be
> completely possible with some "BUT". It also depends on what exactly
> you want to rollback - roles, openstack services or both. As OSA roles
> can actually install any openstack service version.
>
> We keep all virtualenvs from the previous version, so during upgrade
> we build just new virtualenvs and reconfigure systemd units to point
> there. So fastest way likely would be to just edit systemd unit files
> and point them to old venv version and reload systemd daemon and
> service and restore DB from backup of course.
> You can also define? <service>_venv_tag (ie `glance_venv_tag`) to the
> old OSA version you was running and execute openstack-ansible
> os-<service>-install.yml --tags? systemd-service,uwsgi - that in most
> cases will be enough to just edit systemd units for the service and
> start old version of it. BUT running without tags will result in
> having new packages in old venv which is smth you totally want to
> avoid.
> To prevent that you can also define <service>_git_install_branch and
> requirements_git_install_branch in /etc/openstack_deploy/group_vars
> (it's important to use group vars if you want to rollback only one
> service) and take value from
> https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
> (ofc pick your old version!)
>
> For a full rollback and not in-place workarounds, I think it should be like that
> * checkout to previous osa version
> * re-execute scripts/bootstrap-ansible.sh
> * you should still take current versions of mariadb and rabbitmq and
> define them in user_variables (galera_major_version,
> galera_minor_version, rabbitmq_package_version,
> rabbitmq_erlang_version_spec) - it's close to never ends well
> downgrading these.
> * Restore DB backup
> * Re-run setup-openstack.yml
>
> It's quite a rough summary of how I do see this process, but to be
> frank I never had to execute full downgrade - I was limited mostly by
> downgrading 1 service tops after the upgrade.
>
> Hope that helps!
>
> ??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:
>
> >
> > hi Alvaro,
> >
> > i have installed using Openstack-ansible, The upgrade procedure is consistent
> >
> > but what is the roll back procedure , i m looking for
> >
> > Regards
> > Adivya Singh
> >
> > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
> >>
> >> That will depend on how did you installed your environment: OSA, TripleO, etc.
> >>
> >> Can you provide more information?
> >>
> >> ---
> >> Alvaro Soto.
> >>
> >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
> >> ----------------------------------------------------------
> >> Great people talk about ideas,
> >> ordinary people talk about things,
> >> small people talk... about other people.
> >>
> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
> >>>
> >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
> >>>
> >>> Regards
> >>> Adivya Singh
>

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/65e514a6/attachment-0001.htm>

From jimmy at openinfra.dev  Thu Mar  2 19:21:04 2023
From: jimmy at openinfra.dev (Jimmy McArthur)
Date: Thu, 2 Mar 2023 14:21:04 -0500
Subject: (OpenStack-Upgrade)
In-Reply-To: <821443260.1309518.1677766503515@mail.yahoo.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
 <821443260.1309518.1677766503515@mail.yahoo.com>
Message-ID: <D8F1EDE9-30F0-4A71-BBBD-F724DDED4C70@openinfra.dev>

Hi Albert,

I would highly recommend checking out a few episodes of OpenInfra Live specifically around large scale upgrades. For example [1]. There are a number of organizations running OpenStack that stay on the most current release. From large to small, it?s well worth your time to stay up to date as much as possible.

Cheers,
Jimmy


[1]  https://superuser.openstack.org/articles/upgrades-in-large-scale-openstack-infrastructure-openinfra-live-episode-6/

> On Mar 2, 2023, at 9:15 AM, Albert Braden <ozzzo at yahoo.com> wrote:
> 
> Having done a few upgrades, I can give you some general advice:
> 
> 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer.
> 
> 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep.
> 
> 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback.
> 
> 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues.
> 
> 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure.
> On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
> 
> 
> Hey,
> 
> Regarding rollaback of upgrade in OSA we indeed don't have any good
> established/documented process for that. At the same time it should be
> completely possible with some "BUT". It also depends on what exactly
> you want to rollback - roles, openstack services or both. As OSA roles
> can actually install any openstack service version.
> 
> We keep all virtualenvs from the previous version, so during upgrade
> we build just new virtualenvs and reconfigure systemd units to point
> there. So fastest way likely would be to just edit systemd unit files
> and point them to old venv version and reload systemd daemon and
> service and restore DB from backup of course.
> You can also define  <service>_venv_tag (ie `glance_venv_tag`) to the
> old OSA version you was running and execute openstack-ansible
> os-<service>-install.yml --tags  systemd-service,uwsgi - that in most
> cases will be enough to just edit systemd units for the service and
> start old version of it. BUT running without tags will result in
> having new packages in old venv which is smth you totally want to
> avoid.
> To prevent that you can also define <service>_git_install_branch and
> requirements_git_install_branch in /etc/openstack_deploy/group_vars
> (it's important to use group vars if you want to rollback only one
> service) and take value from
> https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml <https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml>
> (ofc pick your old version!)
> 
> For a full rollback and not in-place workarounds, I think it should be like that
> * checkout to previous osa version
> * re-execute scripts/bootstrap-ansible.sh
> * you should still take current versions of mariadb and rabbitmq and
> define them in user_variables (galera_major_version,
> galera_minor_version, rabbitmq_package_version,
> rabbitmq_erlang_version_spec) - it's close to never ends well
> downgrading these.
> * Restore DB backup
> * Re-run setup-openstack.yml
> 
> It's quite a rough summary of how I do see this process, but to be
> frank I never had to execute full downgrade - I was limited mostly by
> downgrading 1 service tops after the upgrade.
> 
> Hope that helps!
> 
> ??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com <mailto:adivya1.singh at gmail.com>>:
> 
> >
> > hi Alvaro,
> >
> > i have installed using Openstack-ansible, The upgrade procedure is consistent
> >
> > but what is the roll back procedure , i m looking for
> >
> > Regards
> > Adivya Singh
> >
> > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com <mailto:alsotoes at gmail.com>> wrote:
> >>
> >> That will depend on how did you installed your environment: OSA, TripleO, etc.
> >>
> >> Can you provide more information?
> >>
> >> ---
> >> Alvaro Soto.
> >>
> >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
> >> ----------------------------------------------------------
> >> Great people talk about ideas,
> >> ordinary people talk about things,
> >> small people talk... about other people.
> >>
> >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com <mailto:adivya1.singh at gmail.com>> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
> >>>
> >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
> >>>
> >>> Regards
> >>> Adivya Singh
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/dc7e3dd8/attachment.htm>

From noonedeadpunk at gmail.com  Thu Mar  2 19:24:26 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 2 Mar 2023 20:24:26 +0100
Subject: (OpenStack-Upgrade)
In-Reply-To: <1288757490.3453002.1677776601143@mail.yahoo.com>
References: <CA+ykd61UE2DcgU=-RhNidvi=jn-oCOM=E+=iSEjjJArP5W0R5g@mail.gmail.com>
 <CA+eLJkb14w_n1CaBhBB2o_e5mbWyKNJwShYPAKzAk_XWSSk=Cg@mail.gmail.com>
 <CA+ykd60fzceEj3rXZH+Jab7iqZ3_AK4asEnNaA0v=DX+6v7a8g@mail.gmail.com>
 <CAPd_6At6+P2ZAcXM2kNTodnHS4ZFb2o1H5PG-GHiGExpN1Yqfg@mail.gmail.com>
 <821443260.1309518.1677766503515@mail.yahoo.com>
 <CAPd_6Aupgv3n-OXFHEWE1a18OO2DY1xDp-MKtsKB-kMP8vqL5A@mail.gmail.com>
 <1288757490.3453002.1677776601143@mail.yahoo.com>
Message-ID: <CAPd_6AvQeRXcxsFycZ_oxn7paRJXENb+nXNTokd+zH18b7nSkA@mail.gmail.com>

Oh, well, that explains your attitude to upgrades then. But basically
it's all about collecting and sorting out a technical debt, that IS
collected by avoiding upgrades for as long as possible. Out of my
experience, a team of 3-4 engineers is capable of maintaining and
regularly upgrading OpenStack. Yes, maybe not once in 6 month, but
once a year for sure. And performing 2 sequential upgrades is not that
big of a deal - it's kind of 20 hours per year per region if you don't
have time or knowledge to deal with small hackeries for jumping
through 1 release (which is usually not a big deal).

Based on that my advice would be to prevent having and collecting
technical debt, as while it might feel cheaper to not invest time in
maintenance, dealing with debt is always more expensive. So do not be
afraid of upgrades if they're done in a timely manner, using
maintained and supported versions of software is always better then
legacy and EOLed ones.

We were also discussing the upgrade process with OpenStack-Ansible,
which is being used as a deployment tool, which does simplify the
upgrade process. I bet kolla-ansible also do a damn good job with
their upgrades. But I do understand how much a PITA heterogeneous
deployments can be.

And yeah, I meant 5. Regarding 4 - I kind of agree - each deployment
is individual especially with some time. And it's really true that on
production you will see issues you never saw in CI or DEV
environments, but such issues will be mostly related to the load or
not exact same configuration of dev envs. I'd say a good example of
that might be OVS or l3 agents, that will take way more time to
startup on production compared to sandbox where you won't spot any
downtime or issues.

??, 2 ???. 2023??. ? 18:03, Albert Braden <ozzzo at yahoo.com>:
>
> 1. Of course you should upgrade every 6 months. I've never seen or heard of anyone doing that, but if you have the resources, I agree, that would be great. And yes, if you're upgrading a few versions, you may need to do one or more operating system upgrades along the way.
>
> 2. I've never seen an easy, smooth process. That being said, I've never done a single-version upgrade. If you upgrade every 6 months, then maybe it would be smooth and easy. The standard situation I saw during my contracting years is that a company has got themselves into a bind because they have a small team (or maybe 1 guy) running Openstack, and they haven't upgraded for a long time, so they hire me to clean up the mess.
>
> 4 (I think you meant 5 here?). I've never lost resources during an upgrade, but I would never promise customers that there is 0 percentage chance of loss. I always recommend that the customer be resilient against loss, for example by duplicating their application in multiple clusters and by maintaining backups of important data, and I strengthen that recommendation during upgrades.
> On Thursday, March 2, 2023, 10:56:47 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
>
>
> These are very weird statements and I can not agree with most of them.
>
> 1. You should upgrade in time. All problems come if you try to avoid
> upgrades at any costs - then you're indeed in a situation when
> upgrades are painful as you're running obsolete stuff that is not
> supported anymore and not provided by your distro (or distro also is
> not supported as well).
> With SLURP releases you will be able to do upgrades yearly starting
> with Antelope. Before that upgrades should be done each 6 month
> basically. Jumping through 1 release was not supported before but is
> doable given some preparation and small hacks. Jumping through more
> than 1 release will almost certainly guarantee you pain. Upgrades to
> next releases are well tested both by individual projects and by
> OpenStack-Ansible, so given you've looked through release notes and
> adjusted configuration - it should be just fine.
>
> 2. It's quite an easy and relatively smooth process as of today. Yes,
> you will have small API interruptions during the upgrade and when
> services do restart they drop connections. But we control HAproxy
> backends to minimize the effect of this. In many cases upgrade can be
> performed just running scripts/run_upgrade.sh - it will work given
> it's ran against healthy cluster (meaning that you don't have dead
> galera or rabbit node in your cluster). At the moment we spend around
> a working day for upgrading a region, but planning to automate this
> process soonish to perform upgrades of production environments using
> Zuul. We also never had to rollback, as rollback is indeed painful
> process that you can hard process. So I won't sugggest rolling back
> production environment unless it's absolutely needed.
>
> 3. This is smth I will agree with. You can take a look at our MNAIO
> [1] that can help you to spawn a virtual sandbox with multiple nodes
> in it, where you can play with upgrades. Also I'd suggest running
> tempest or rally tests regularly. They are helpful indeed.
>
> 4. I'm not sure what's meant here at all. I can hardly imagine how you
> can fail an OpenStack upgrade in a way that you will lose customer
> data. I can recall such failures with Ceph though, but it was
> somewhere around Hammer release (0.84 or smth) which is not the case
> for quite a while as well.
>
>
> [1] https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio
>
> ??, 2 ???. 2023??. ? 15:15, Albert Braden <ozzzo at yahoo.com>:
> >
> > Having done a few upgrades, I can give you some general advice:
> >
> > 1. If you can avoid upgrading, do it! If you are lucky enough to have customers who are willing (or can be forced) to accept a "refresh" strategy whereby you build a new cluster and move them to it, that is substantially easier and safer.
> >
> > 2. If you must upgrade, go into it with the understanding that it is a difficult and dangerous process, and that avoiding failure will require meticulous preparation. Try to duplicate all of the weird things that your customers are doing, in your lab environment, then upgrade and roll it back repeatedly, documenting the steps in great detail (ideally automating them as much as possible) until you can roll forward and back in your sleep.
> >
> > 3. Develop a comprehensive test procedure (ideally automated) that tests standard, edge and corner cases before and after the upgrade/rollback.
> >
> > 4. Expect different clusters to behave differently during the upgrade, and to present unique problems, even though as far as you know they are setup identically. Expect to see issues in your prod clusters that you didn't see in lab/dev/QA, and budget extra downtime to solve those issues.
> >
> > 5. Recommend to your customers that they backup their data and configurations, so that they can recover if an upgrade fails and their resources are lost. Set the expectation that there is a non-zero probability of failure.
> > On Wednesday, March 1, 2023, 07:54:30 AM EST, Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
> >
> >
> > Hey,
> >
> > Regarding rollaback of upgrade in OSA we indeed don't have any good
> > established/documented process for that. At the same time it should be
> > completely possible with some "BUT". It also depends on what exactly
> > you want to rollback - roles, openstack services or both. As OSA roles
> > can actually install any openstack service version.
> >
> > We keep all virtualenvs from the previous version, so during upgrade
> > we build just new virtualenvs and reconfigure systemd units to point
> > there. So fastest way likely would be to just edit systemd unit files
> > and point them to old venv version and reload systemd daemon and
> > service and restore DB from backup of course.
> > You can also define  <service>_venv_tag (ie `glance_venv_tag`) to the
> > old OSA version you was running and execute openstack-ansible
> > os-<service>-install.yml --tags  systemd-service,uwsgi - that in most
> > cases will be enough to just edit systemd units for the service and
> > start old version of it. BUT running without tags will result in
> > having new packages in old venv which is smth you totally want to
> > avoid.
> > To prevent that you can also define <service>_git_install_branch and
> > requirements_git_install_branch in /etc/openstack_deploy/group_vars
> > (it's important to use group vars if you want to rollback only one
> > service) and take value from
> > https://opendev.org/openstack/openstack-ansible/src/tag/26.0.1/playbooks/defaults/repo_packages/openstack_services.yml
> > (ofc pick your old version!)
> >
> > For a full rollback and not in-place workarounds, I think it should be like that
> > * checkout to previous osa version
> > * re-execute scripts/bootstrap-ansible.sh
> > * you should still take current versions of mariadb and rabbitmq and
> > define them in user_variables (galera_major_version,
> > galera_minor_version, rabbitmq_package_version,
> > rabbitmq_erlang_version_spec) - it's close to never ends well
> > downgrading these.
> > * Restore DB backup
> > * Re-run setup-openstack.yml
> >
> > It's quite a rough summary of how I do see this process, but to be
> > frank I never had to execute full downgrade - I was limited mostly by
> > downgrading 1 service tops after the upgrade.
> >
> > Hope that helps!
> >
> > ??, 1 ???. 2023??. ? 12:06, Adivya Singh <adivya1.singh at gmail.com>:
> >
> > >
> > > hi Alvaro,
> > >
> > > i have installed using Openstack-ansible, The upgrade procedure is consistent
> > >
> > > but what is the roll back procedure , i m looking for
> > >
> > > Regards
> > > Adivya Singh
> > >
> > > On Wed, Mar 1, 2023 at 12:46?PM Alvaro Soto <alsotoes at gmail.com> wrote:
> > >>
> > >> That will depend on how did you installed your environment: OSA, TripleO, etc.
> > >>
> > >> Can you provide more information?
> > >>
> > >> ---
> > >> Alvaro Soto.
> > >>
> > >> Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.
> > >> ----------------------------------------------------------
> > >> Great people talk about ideas,
> > >> ordinary people talk about things,
> > >> small people talk... about other people.
> > >>
> > >> On Tue, Feb 28, 2023, 11:46 PM Adivya Singh <adivya1.singh at gmail.com> wrote:
> > >>>
> > >>> Hi Team,
> > >>>
> > >>> I am planning to upgrade my Current Environment, The Upgrade procedure is available in OpenStack Site and Forums.
> > >>>
> > >>> But i am looking fwd to roll back Plan , Other then have a Local backup copy of galera Database
> > >>>
> > >>> Regards
> > >>> Adivya Singh
> >
>


From kozhukalov at gmail.com  Thu Mar  2 21:07:06 2023
From: kozhukalov at gmail.com (Vladimir Kozhukalov)
Date: Fri, 3 Mar 2023 00:07:06 +0300
Subject: [openstack-helm] Get rid of cephfs and rbd provisioners
Message-ID: <CANxTg76183NFZm9CnTD3zgUQb9u9Z46LJWQ1NL0qQqLpKhaVjQ@mail.gmail.com>

Hi everyone,

I would like to suggest getting rid of cephfs and rbd provisioners. They
have been retired and have not been maintained for about 2.5 years now [1].
I believe the CSI approach is what all users rely on nowadays and we can
safely remove them.

The trigger for this suggestion is that we are currently experiencing
issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing
this is just wasting time. [2] Stephen spent some time debugging the issues
and can give more details if needed.

What do you think?

[1] https://github.com/kubernetes-retired/external-storage/tree/master/ceph
[2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976
-- 
Best regards,
Kozhukalov Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/954bcd2f/attachment.htm>

From mnaser at vexxhost.com  Thu Mar  2 22:13:47 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Thu, 2 Mar 2023 23:13:47 +0100
Subject: [openstack-helm] Get rid of cephfs and rbd provisioners
In-Reply-To: <CANxTg76183NFZm9CnTD3zgUQb9u9Z46LJWQ1NL0qQqLpKhaVjQ@mail.gmail.com>
References: <CANxTg76183NFZm9CnTD3zgUQb9u9Z46LJWQ1NL0qQqLpKhaVjQ@mail.gmail.com>
Message-ID: <CAEs876hHtapdszaEX1_v6d-dn7rtD0W8UT7vmME4CsBn7Q_hng@mail.gmail.com>

Hi Vladimir,

I agree.  I also think we should stop maintaining the CSI provisioner chart
and simply deploy the one provided by the Ceph CSI team

Less code we maintain, the better.

Thanks
Mohammed

On Thu, Mar 2, 2023 at 10:13?PM Vladimir Kozhukalov <kozhukalov at gmail.com>
wrote:

> Hi everyone,
>
> I would like to suggest getting rid of cephfs and rbd provisioners. They
> have been retired and have not been maintained for about 2.5 years now [1].
> I believe the CSI approach is what all users rely on nowadays and we can
> safely remove them.
>
> The trigger for this suggestion is that we are currently experiencing
> issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing
> this is just wasting time. [2] Stephen spent some time debugging the issues
> and can give more details if needed.
>
> What do you think?
>
> [1]
> https://github.com/kubernetes-retired/external-storage/tree/master/ceph
> [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976
> --
> Best regards,
> Kozhukalov Vladimir
>


-- 
Mohammed Naser
VEXXHOST, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/a56601da/attachment.htm>

From jay at gr-oss.io  Thu Mar  2 23:04:57 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 2 Mar 2023 15:04:57 -0800
Subject: [infra][ironic][tact-sig] Intent to grant control of x/virtualpdu
 to OpenStack community
In-Reply-To: <CAORRS=mfJaEWF_1URAo0cTV1NPMv0Ef=yE+Bb53Hhnb5CgB=ZA@mail.gmail.com>
References: <20230301205344.oypz2ceadu73vqqz@yuggoth.org>
 <20230301225909.enp5xetacmzpjg7o@yuggoth.org>
 <CAORRS=mfJaEWF_1URAo0cTV1NPMv0Ef=yE+Bb53Hhnb5CgB=ZA@mail.gmail.com>
Message-ID: <CA+sTGNdOs_fE2Y4Hz1fnu9YoOMw+AsBvtyyCwqgHcK=pSQPa7g@mail.gmail.com>

Thanks for the work, VirtualPDU-maintainers-emeritus, Jeremy, and Riccardo.

The change just merged into governance for Ironic to officially take over
this repo (
https://opendev.org/openstack/governance/commit/db8597ad9245ce2c115b4d5140a6d81d63b2a9af
). Over the next handful of weeks we'll work with our OpenDev sysadmin
partners to get the repo moved into the openstack/ namespace.

-
Jay Faulkner
Ironic PTL
TC Vice-Chair


On Thu, Mar 2, 2023 at 3:38?AM Riccardo Pittau <elfosardo at gmail.com> wrote:

> Thanks for this anyway Jeremy!
> Luckily one of the old maintainers added me to the virtualpdu-core and
> virtualpdu-release groups, so now we can move forward with the repository
> update and move easily.
>
> Ciao!
> Riccardo
>
> On Thu, Mar 2, 2023 at 12:05?AM Jeremy Stanley <fungi at yuggoth.org> wrote:
>
>> On 2023-03-01 20:53:45 +0000 (+0000), Jeremy Stanley wrote:
>> [...]
>> > We don't have a documented process since this is the first time it's
>> > really come up, but I'm officially announcing that I intend to use
>> > my administrative permissions as an OpenDev sysadmin to add
>> > membership of an OpenStack Ironic team representative to the
>> > following Gerrit groups if no objections are raised before
>> > Wednesday, March 8
>> [...]
>>
>> It seems the additional round of outreach worked, so no longer
>> requires my direct intervention:
>>
>>
>> https://lists.opendev.org/archives/list/service-discuss at lists.opendev.org/thread/BOF56L5PPIP6CQZJ75LCPHVN7532ZKNJ/
>>
>> --
>> Jeremy Stanley
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230302/237f3e87/attachment.htm>

From yasufum.o at gmail.com  Fri Mar  3 00:45:11 2023
From: yasufum.o at gmail.com (Yasufumi Ogawa)
Date: Fri, 3 Mar 2023 09:45:11 +0900
Subject: [tc][heat][tacker] Moving governance of tosca-parser(and
 heat-translator ?) to Tacker
In-Reply-To: <CAL_crJSjyqSep6gXb_Otf7uXWQ3LVbSHDpws6K=GxBR2yCnuPg@mail.gmail.com>
References: <CAL_crJS=_OmhOTzj8uZ6naaZO2RncngsMjD6V8xnctTw4B5p0A@mail.gmail.com>
 <1867ac70656.c5de609e1065667.3634775558652795921@ghanshyammann.com>
 <CAL_crJQE5MKsmH3xHeeAfSFc9E-8Sb=w90v6D6QhozpEvDc6xg@mail.gmail.com>
 <ba10bd2a-84e8-ad82-52d0-7c8217079de5@gmail.com>
 <CAL_crJQ8ZN20xF_eNBNa89=TQ9sJYtwqDj3Lb4BsqiQBWwB0XQ@mail.gmail.com>
 <1869435593c.10a5026ca1424633.8160143839607463616@ghanshyammann.com>
 <ed9c33f6-80ff-6429-372c-4776d49092a1@gmail.com>
 <CAL_crJSjyqSep6gXb_Otf7uXWQ3LVbSHDpws6K=GxBR2yCnuPg@mail.gmail.com>
Message-ID: <ad070b8f-039e-9c19-88cf-9da72a88ded7@gmail.com>

On 2023/03/02 11:55, Takashi Kajinami wrote:
> Thanks. So based on the agreement in this thread I've pushed the change to
> the governance repository
> to migrate tosca-parser and heat-translator to Tacker's governance.
> 
> https://review.opendev.org/c/openstack/governance/+/876012
> 
> I'll keep heat-core group in heat-translator-core group for now, but we can
> revisit this in the future.
Thanks for the update.
> 
> 
> On Wed, Mar 1, 2023 at 6:41?PM Yasufumi Ogawa <yasufum.o at gmail.com> wrote:
> 
>> On 2023/02/28 3:49, Ghanshyam Mann wrote:
>>>    ---- On Sun, 26 Feb 2023 19:54:45 -0800  Takashi Kajinami  wrote ---
>>>    >
>>>    >
>>>    > On Mon, Feb 27, 2023 at 11:38?AM Yasufumi Ogawa yasufum.o at gmail.com>
>> wrote:
>>>    > Hi,
>>>    >
>>>    > On 2023/02/27 10:51, Takashi Kajinami wrote:
>>>    > > On Thu, Feb 23, 2023 at 5:18?AM Ghanshyam Mann
>> gmann at ghanshyammann.com>
>>>    > > wrote:
>>>    > >
>>>    > >>   ---- On Sun, 19 Feb 2023 18:44:14 -0800  Takashi Kajinami
>> wrote ---
>>>    > >>   > Hello,
>>>    > >>   >
>>>    > >>   > Currently tosca-parser is part of heat's governance, but the
>> core
>>>    > >> reviewers of this repositorydoes not contain any active heat
>> cores while we
>>>    > >> see multiple Tacker cores in this group.Considering the fact the
>> project is
>>>    > >> mainly maintained by Tacker cores, I'm wondering if we canmigrate
>> this
>>>    > >> repository to Tacker's governance. Most of the current heat cores
>> are not
>>>    > >> quitefamiliar with the codes in this repository, and if Tacker
>> team is not
>>>    > >> interested in maintainingthis repository then I'd propose
>> retiring this.
>>>    > As you mentioned, tacker still using tosca-parser and
>> heat-translator.
>>>    >
>>>    > >>
>>>    > >> I think it makes sense and I remember its usage/maintenance by
>> the Tacker
>>>    > >> team since starting.
>>>    > >> But let's wait for the Tacker team opinion and accordingly you
>> can propose
>>>    > >> the governance patch.
>>>    > Although I've not joined to tacker team since starting, it might not
>> be
>>>    > true because there was no cores of tosca-parser and heat-translator
>> in
>>>    > tacker team. We've started to help maintenance the projects because
>> no
>>>    > other active contributer.
>>>    >
>>>    > >>
>>>    > >>   >
>>>    > >>   > Similarly, we have heat-translator project which has both
>> heat cores
>>>    > >> and tacker cores as itscore reviewers. IIUC this is tightly
>> related to the
>>>    > >> work in tosca-parser, I'm wondering it makesmore sense to move
>> this project
>>>    > >> to Tacker, because the requirement is mostly made fromTacker side
>> rather
>>>    > >> than Heat side.
>>>    > >>
>>>    > >> I am not sure about this and from the name, it seems like more of
>> a heat
>>>    > >> thing but it is not got beyond the Tosca template
>>>    > >> conversion. Are there no users of it outside of the Tacker
>> service? or any
>>>    > >> request to support more template conversions than
>>>    > >> Tosca?
>>>    > >>
>>>    > >
>>>    > > Current hea-translator supports only the TOSCA template[1].
>>>    > > The heat-translator project can be a generic template converter by
>> its
>>>    > > nature but we haven't seen any interest
>>>    > > in implementing support for different template formats.
>>>    > >
>>>    > > [1]
>>>    > >
>> https://github.com/openstack/heat-translator/blob/master/translator/osc/v1/translate.py#L49
>>>    > >
>>>    > >
>>>    > >
>>>    > >> If no other user or use case then I think one option can be to
>> merge it
>>>    > >> into Tosca-parser itself and retire heat-translator.
>>>    > >>
>>>    > >> Opinion?
>>>    > Hmm, as a core of tosca-parser, I'm not sure it's a good idea
>> because it
>>>    > is just a parser TOSCA and independent from heat-translator. In
>>>    > addition, there is no experts of Heat or HOT in current tacker team
>>>    > actually, so it might be difficult to maintain heat-translator
>> without
>>>    > any help from heat team.
>>>    >
>>>    > The hea-translator project was initially created to implement a
>> translator from TOSCA parser to HOT[1].Later tosca-parser was split out[2]
>> but we have never increased scope of tosca-parser. So it has beenno more
>> than the TOSCA template translator.
>>>    >
>>>    > [1]
>> https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca[2]
>> <https://blueprints.launchpad.net/heat/+spec/heat-translator-tosca%5B2%5D>
>> https://review.opendev.org/c/openstack/project-config/+/211204
>>>    > We (Heat team) can provide help with any problems with heat, but we
>> own no actual use case of template translation.Maintaining the
>> heat-translator repository with tacker, which currently provides actual use
>> cases would make more sense.This also gives the benefit that Tacker team
>> can decide when stable branches of heat-translator should be retiredalong
>> with the other Tacker repos.
>>>    >
>>>    > By the way, may I ask what will be happened if the governance is
>> move on
>>>    > to tacker? Is there any extra tasks for maintenance?
>>>    >
>>>    > TC would have better (and more precise) explanation but my
>> understanding is that - creating a release
>>>    >  - maintaining stable branches
>>>    >  - maintaining gate healthwould be the required tasks along with
>> moderating dev discussion in mailing list/PTG/etc.
>>>
>>> I think you covered all and the Core team (Tacker members)  might be
>> already doing a few of the tasks. From the
>>> governance perspective, tacker PTL will be the point of contact for this
>> repo in the case repo becomes inactive or so
>>> but it will be the project team's decision to merge/split things
>> whatever way makes maintenance easy.
>> I understand. I've shared the proposal again in the previous meeting and
>> no objection raised. So, we'd agree to move the governance as Tacker team.
>>
>> Thanks,
>> Yasufumi
>>>
>>> -gmann
>>>
>>>
>>>    >  Thanks,
>>>    > Yasufumi
>>>    >
>>>    > >>
>>>    > >
>>>    > > That also sounds good to me.
>>>    > >
>>>    > >
>>>    > >> Also, correcting the email subject tag as [tc].
>>>    > >>
>>>    > >> -gmann
>>>    > >>
>>>    > >>   >
>>>    > >>   > [1]
>>>    > >>
>> https://review.opendev.org/admin/groups/1f7855baf3cf14fedf72e443eef18d844bcd43fa,members[2]
>>>    > >>
>> https://review.opendev.org/admin/groups/66028971dcbb58add6f0e7c17ac72643c4826956,members
>>>    > >>   > Thank you,Takashi
>>>    > >>   >
>>>    > >>
>>>    > >>
>>>    > >
>>>    >
>>>    >
>>
>>
> 


From rosmaita.fossdev at gmail.com  Fri Mar  3 03:17:32 2023
From: rosmaita.fossdev at gmail.com (Brian Rosmaita)
Date: Thu, 2 Mar 2023 22:17:32 -0500
Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin
 users via API
In-Reply-To: <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
 <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
 <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>
 <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com>
Message-ID: <f4f6223e-509f-e8d3-5955-36689e3f6d67@gmail.com>

On 3/1/23 12:19 PM, Ghanshyam Mann wrote:
>   ---- On Wed, 01 Mar 2023 09:02:59 -0800  Brian Rosmaita  wrote ---
>   > On 2/28/23 9:02 PM, Ghanshyam Mann wrote:
>   > [snip]
>   >
>   > > I think removing from client is good way to stop exposing this old/not-recommended way to users
>   > > but API is separate things and removing the API request parameter 'multiattach' from it can break
>   > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain
>   > > the backward compatibility/interoperability it should be removed by bumping the microversion so that
>   > > it continue working for older microversions. This way we will not break the existing users and will
>   > > provide the new way for users to start using.
>   >
>   > It's not just that this is not recommended, it can lead to data loss.
>   > We should only allow multiattach for volume types that actually support
>   > it.  So I see this as a case of "I broke your script now, but you'll
>   > thank me later".
>   >
>   > We could microversion this, but then an end user has to go out of the
>   > way and add the correct mv to their request to get the correct behavior.
>   >   Someone using the default mv + multiattach=true will unknowingly put
>   > themselves into a data loss situation.  I think it's better to break
>   > that person's API request.
> 
> Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes)
> then I think changing it without microversion bump makes sense. But if we know there is any success case
> for xyz configuration/backend then I feel we should not break such success use case.

Thanks, Ghanshyam.  An end user is setting themselves up for data loss 
if they rely on the request parameter rather than on using a volume type 
that explicitly supports multiattach.  They could get lucky and not lose 
any data, but that's not really a success, so I think the best thing to 
do here is make this breaking change without a microversion.

> I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC,
> the test does not check the data things so we do not completely test it in Tempest.

It's good that Tempest is there to keep us honest!  I think what we can 
do to help out people whose scripts break is to return a specific error 
message explaining that the 'multiattach' element is not allowed in a 
volume-create request and instead the user should select a 
multiattach-capable volume type.

> 
> -gmann
> 
>   >
>   >
>   > cheers,
>   > brian
>   >
>   >
>   >


From gmann at ghanshyammann.com  Fri Mar  3 03:24:22 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Thu, 02 Mar 2023 19:24:22 -0800
Subject: [kolla] [train] [cinder] Volume multiattach exposed to
 non-admin users via API
In-Reply-To: <f4f6223e-509f-e8d3-5955-36689e3f6d67@gmail.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
 <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
 <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>
 <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com>
 <f4f6223e-509f-e8d3-5955-36689e3f6d67@gmail.com>
Message-ID: <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com>

 ---- On Thu, 02 Mar 2023 19:17:32 -0800  Brian Rosmaita  wrote --- 
 > On 3/1/23 12:19 PM, Ghanshyam Mann wrote:
 > >   ---- On Wed, 01 Mar 2023 09:02:59 -0800  Brian Rosmaita  wrote ---
 > >   > On 2/28/23 9:02 PM, Ghanshyam Mann wrote:
 > >   > [snip]
 > >   >
 > >   > > I think removing from client is good way to stop exposing this old/not-recommended way to users
 > >   > > but API is separate things and removing the API request parameter 'multiattach' from it can break
 > >   > > the existing users using it this way. Tempest test is one good example of such users use case. To maintain
 > >   > > the backward compatibility/interoperability it should be removed by bumping the microversion so that
 > >   > > it continue working for older microversions. This way we will not break the existing users and will
 > >   > > provide the new way for users to start using.
 > >   >
 > >   > It's not just that this is not recommended, it can lead to data loss.
 > >   > We should only allow multiattach for volume types that actually support
 > >   > it.  So I see this as a case of "I broke your script now, but you'll
 > >   > thank me later".
 > >   >
 > >   > We could microversion this, but then an end user has to go out of the
 > >   > way and add the correct mv to their request to get the correct behavior.
 > >   >   Someone using the default mv + multiattach=true will unknowingly put
 > >   > themselves into a data loss situation.  I think it's better to break
 > >   > that person's API request.
 > > 
 > > Ok, if multiattach=True in the request is always an unsuccessful case (or unknown successful sometimes)
 > > then I think changing it without microversion bump makes sense. But if we know there is any success case
 > > for xyz configuration/backend then I feel we should not break such success use case.
 > 
 > Thanks, Ghanshyam.  An end user is setting themselves up for data loss 
 > if they rely on the request parameter rather than on using a volume type 
 > that explicitly supports multiattach.  They could get lucky and not lose 
 > any data, but that's not really a success, so I think the best thing to 
 > do here is make this breaking change without a microversion.
 > 
 > > I was just thinking from the Tempest test perspective which was passing but as you corrected me in IRC,
 > > the test does not check the data things so we do not completely test it in Tempest.
 > 
 > It's good that Tempest is there to keep us honest!  I think what we can 
 > do to help out people whose scripts break is to return a specific error 
 > message explaining that the 'multiattach' element is not allowed in a 
 > volume-create request and instead the user should select a 
 > multiattach-capable volume type.

Thanks, Brian for explaining. This sounds good to me. Explaining the situation in release notes and error message
will be really helpful for users.

 I am +2 on the tempest change now - https://review.opendev.org/c/openstack/tempest/+/875372

-gmann

 > 
 > > 
 > > -gmann
 > > 
 > >   >
 > >   >
 > >   > cheers,
 > >   > brian
 > >   >
 > >   >
 > >   >
 > 
 > 
 > 


From rdhasman at redhat.com  Fri Mar  3 09:49:09 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Fri, 3 Mar 2023 15:19:09 +0530
Subject: [kolla] [train] [cinder] Volume multiattach exposed to non-admin
 users via API
In-Reply-To: <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com>
References: <1708281385.5319584.1677085955832.ref@mail.yahoo.com>
 <1708281385.5319584.1677085955832@mail.yahoo.com>
 <LO2P265MB5773A1E6A5F55FE7C450C2BA9AAA9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <2009529524.2155590.1677101634600@mail.yahoo.com>
 <CAARK8KQcH9DqyW1tqHO4sfJjfFt7Q6DXUfti9pPo-7bpSt6rDA@mail.gmail.com>
 <1869ae83b09.febbf56f1544728.2561236161356691953@ghanshyammann.com>
 <8b5099fa-1ad5-5dd3-5975-239ba8d4cd69@gmail.com>
 <1869e2f8e6d.1105d15a326258.870388982387601498@ghanshyammann.com>
 <f4f6223e-509f-e8d3-5955-36689e3f6d67@gmail.com>
 <186a57fcf87.e774c5b5154313.6206570147952069935@ghanshyammann.com>
Message-ID: <CAARK8KSyKZFgj84LHp4dMumUCVk+cLyvZSMG=uZD-XqJ7ru3Hw@mail.gmail.com>

Thanks Brian and Ghanshyam for the discussion. I've updated the tempest
patch[1] to update one test that I missed earlier and also the cinder
patch[2] which now returns a BadRequest stating the reason for the error
and how to fix it.

$ curl -g -i -X POST
http://127.0.0.1/volume/v3/d6634f35c00f409883ecb10361b556c3/volumes -H
"Accept: application/json" -H "Content-Type: application/json" -H
"User-Agent: python-cinderclient" -H "X-Auth-Token:
gAAAAABkAbZkWgdbpXNgObizvGy8jS6LoMGuxzMnMaMOw6wm2j5i5KrG2xIzWCDxrSAiaMJWqneNpKrwn8P852mPOyJB_WmxrhrmKiuafcP0KSljyW44mFwDtGN74VL50NLoVC-srL63L3xduyeF5EIlPEyDsWRqPSRZZwau7wQrngAZ8XBP3M8"
-d '{"volume": {"size": 1, "consistencygroup_id": null, "snapshot_id":
null, "name": null, "description": null, "volume_type": null,
"availability_zone": null, "metadata": {}, "imageRef": null,
"source_volid": null, "backup_id": null, "multiattach": "true"}}'
HTTP/1.1 400 Bad Request
Date: Fri, 03 Mar 2023 09:04:38 GMT
Server: Apache/2.4.41 (Ubuntu)
OpenStack-API-Version: volume 3.0
Vary: OpenStack-API-Version
Content-Length: 261
Content-Type: application/json
x-compute-request-id: req-a9f9999e-01e3-4970-9c32-35de193c04c1
x-openstack-request-id: req-a9f9999e-01e3-4970-9c32-35de193c04c1
Connection: close

{"badRequest": {"code": 400, "message": "multiattach parameter has been
removed. The default behavior is to use multiattach enabled volume types.
Contact your administrator to create a multiattach enabled volume type and
use it to create multiattach volumes."}}

[1] https://review.opendev.org/c/openstack/tempest/+/875372
[2] https://review.opendev.org/c/openstack/cinder/+/874865

On Fri, Mar 3, 2023 at 9:00?AM Ghanshyam Mann <gmann at ghanshyammann.com>
wrote:

>  ---- On Thu, 02 Mar 2023 19:17:32 -0800  Brian Rosmaita  wrote ---
>  > On 3/1/23 12:19 PM, Ghanshyam Mann wrote:
>  > >   ---- On Wed, 01 Mar 2023 09:02:59 -0800  Brian Rosmaita  wrote ---
>  > >   > On 2/28/23 9:02 PM, Ghanshyam Mann wrote:
>  > >   > [snip]
>  > >   >
>  > >   > > I think removing from client is good way to stop exposing this
> old/not-recommended way to users
>  > >   > > but API is separate things and removing the API request
> parameter 'multiattach' from it can break
>  > >   > > the existing users using it this way. Tempest test is one good
> example of such users use case. To maintain
>  > >   > > the backward compatibility/interoperability it should be
> removed by bumping the microversion so that
>  > >   > > it continue working for older microversions. This way we will
> not break the existing users and will
>  > >   > > provide the new way for users to start using.
>  > >   >
>  > >   > It's not just that this is not recommended, it can lead to data
> loss.
>  > >   > We should only allow multiattach for volume types that actually
> support
>  > >   > it.  So I see this as a case of "I broke your script now, but
> you'll
>  > >   > thank me later".
>  > >   >
>  > >   > We could microversion this, but then an end user has to go out of
> the
>  > >   > way and add the correct mv to their request to get the correct
> behavior.
>  > >   >   Someone using the default mv + multiattach=true will
> unknowingly put
>  > >   > themselves into a data loss situation.  I think it's better to
> break
>  > >   > that person's API request.
>  > >
>  > > Ok, if multiattach=True in the request is always an unsuccessful case
> (or unknown successful sometimes)
>  > > then I think changing it without microversion bump makes sense. But
> if we know there is any success case
>  > > for xyz configuration/backend then I feel we should not break such
> success use case.
>  >
>  > Thanks, Ghanshyam.  An end user is setting themselves up for data loss
>  > if they rely on the request parameter rather than on using a volume
> type
>  > that explicitly supports multiattach.  They could get lucky and not
> lose
>  > any data, but that's not really a success, so I think the best thing to
>  > do here is make this breaking change without a microversion.
>  >
>  > > I was just thinking from the Tempest test perspective which was
> passing but as you corrected me in IRC,
>  > > the test does not check the data things so we do not completely test
> it in Tempest.
>  >
>  > It's good that Tempest is there to keep us honest!  I think what we can
>  > do to help out people whose scripts break is to return a specific error
>  > message explaining that the 'multiattach' element is not allowed in a
>  > volume-create request and instead the user should select a
>  > multiattach-capable volume type.
>
> Thanks, Brian for explaining. This sounds good to me. Explaining the
> situation in release notes and error message
> will be really helpful for users.
>
>  I am +2 on the tempest change now -
> https://review.opendev.org/c/openstack/tempest/+/875372
>
> -gmann
>
>  >
>  > >
>  > > -gmann
>  > >
>  > >   >
>  > >   >
>  > >   > cheers,
>  > >   > brian
>  > >   >
>  > >   >
>  > >   >
>  >
>  >
>  >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/9a19e0d8/attachment-0001.htm>

From ralonsoh at redhat.com  Fri Mar  3 11:00:20 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Fri, 3 Mar 2023 12:00:20 +0100
Subject: [neutron] Drivers meeting cancelled
Message-ID: <CAECr9X7kxWy4a7+p-H0iQ8JuvS7CEuWnZrC2dJoQx5wZnTTkBw@mail.gmail.com>

Hello Neutrinos:

Due to the lack of agenda [1], today's meeting is cancelled.

Have a nice weekend.

*PS: do not forget to add your topics to the PTG agenda [2]**. PTG is
coming!*

[1]https://wiki.openstack.org/wiki/Meetings/NeutronDrivers
[2]https://etherpad.opendev.org/p/neutron-bobcat-ptg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/c939b36f/attachment.htm>

From rdhasman at redhat.com  Fri Mar  3 11:04:52 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Fri, 3 Mar 2023 16:34:52 +0530
Subject: [cinder] proposing Jon Bernard for cinder core
Message-ID: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>

Hello everyone,

I would like to propose Jon Bernard as cinder core. Looking at the review
stats
for the past 60[1], 90[2], 120[3] days, he has been consistently in the top
5
reviewers with a good +/- ratio and leaving helpful comments indicating good
quality of reviews. He has been managing the stable branch releases for the
past 2 cycles (Zed and 2023.1) and has helped in releasing security issues
as well.

Jon has been part of the cinder and OpenStack community for a long time and
has shown very active interest in upstream activities, be it release
liaison, review
contribution, attending cinder meetings and also involving in outreachy
activities.
He will be a very good addition to our team helping out with the review
bandwidth
and adding valuable input in our discussions.

I will leave this thread open for a week and if there are no objections, I
will add
Jon Bernard to the cinder core team.

[1]
https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
[2]
https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
[3]
https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120

Thanks
Rajat Dhasmana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/9abf5169/attachment.htm>

From rosmaita.fossdev at gmail.com  Fri Mar  3 14:15:31 2023
From: rosmaita.fossdev at gmail.com (Brian Rosmaita)
Date: Fri, 3 Mar 2023 09:15:31 -0500
Subject: [cinder] proposing Jon Bernard for cinder core
In-Reply-To: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
References: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
Message-ID: <6f0e5056-ecae-83cb-2389-7117bb253ab6@gmail.com>

On 3/3/23 6:04 AM, Rajat Dhasmana wrote:
> Hello everyone,
> 
> I would like to propose Jon Bernard as cinder core.
[snip]> I will leave this thread open for a week and if there are no 
objections,
> I will add Jon Bernard to the cinder core team.

No objections from me!  Jon is a careful and knowledgeable reviewer and 
he will be a great addition to the cinder core team.

cheers,
brian


From thierry at openstack.org  Fri Mar  3 14:51:13 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Fri, 3 Mar 2023 15:51:13 +0100
Subject: [release] Release countdown for week R-2, March 6-10
Message-ID: <2a831c77-bc39-539b-c6c0-e9e198c84e10@openstack.org>

Development Focus
-----------------

At this point we should have release candidates (RC1 or recent 
intermediary release) for almost all the deliverables. Teams should be 
working on any release-critical bugs that would require another RC or 
intermediary release before the final release.

Actions
-------

Early in the week, the release team will be proposing stable/2023.1 
branch creation for all deliverables that have not branched yet, using 
the latest available 2023.1 Antelope release as the branch point. If 
your team is ready to go for creating that branch, please let us know by 
leaving a +1 on these patches.

If you would like to wait for another release before branching, you can 
-1 the patch and update it later in the week with the new release you 
would like to use. By the end of the week the release team will merge 
those patches though, unless an exception is granted.

Once stable/2023.1 branches are created, if a release-critical bug is 
detected, you will need to fix the issue in the master branch first, 
then backport the fix to the stable/2023.1 branch before releasing out 
of the stable/2023.1 branch.

After all of the cycle-with-rc projects have branched we will branch 
devstack, grenade, and the requirements repos. This will effectively 
open them up for Bobcat development, though the focus should still be on 
finishing up Antelope until the final 2023.1 release.

For projects with translations, watch for any translation patches coming 
through and merge them quickly. A new release should be produced so that 
translations are included in the final 2023.1 Antelope release.

Finally, now is a good time to finalize release notes. In particular, 
consider adding any relevant "prelude" content. Release notes are 
targeted for the downstream consumers of your project, so it would be 
great to include any useful information for those that are going to pick 
up and use or deploy the 2023.1 Antelope version of your project.

Upcoming Deadlines & Dates
--------------------------

Final RC deadline: March 17 (end of R-1 week)
Final 2023.1 Antelope release: March 22
Virtual PTG: March 27-31

-- 
Thierry Carrez (ttx)


From dtantsur at redhat.com  Fri Mar  3 15:31:22 2023
From: dtantsur at redhat.com (Dmitry Tantsur)
Date: Fri, 3 Mar 2023 16:31:22 +0100
Subject: [ironic] The future of x/ironic-staging-drivers
Message-ID: <CACNgkFzZk7BisO5hr6vpe9fKzibmLhK_ycgAvsQ8Pnrdv9+gQg@mail.gmail.com>

Hi folks!

I have been maintaining $subj together with Riccardo and a few occasional
volunteers for many years. Now that our priorities have changed, it is not
maintained any more. We haven't even created stable/zed, and I'm afraid to
check the CI status.

We're looking for volunteers to maintain the repository long-term. If none
are found, the project will be deprecated and frozen in its current state.

Please speak up if you care.

Dmitry

-- 

Red Hat GmbH <https://www.redhat.com/de/global/dach>, Registered seat:
Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,Managing
Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/1365d734/attachment.htm>

From ozzzo at yahoo.com  Fri Mar  3 16:34:44 2023
From: ozzzo at yahoo.com (Albert Braden)
Date: Fri, 3 Mar 2023 16:34:44 +0000 (UTC)
Subject: Paying for Openstack support
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
Message-ID: <1739978420.3955026.1677861284070@mail.yahoo.com>

I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience.

Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it?

If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators.


From vrook at wikimedia.org  Fri Mar  3 18:09:24 2023
From: vrook at wikimedia.org (Vivian Rook)
Date: Fri, 3 Mar 2023 13:09:24 -0500
Subject: [magnum] security groups for magnum nodes
Message-ID: <CAKxn-9QxJ16j-KjCb1ic1xE_c5x0P-yQqJxorXARtcdf7hByzQ@mail.gmail.com>

Is there an option for adding security groups to a given magnum template,
and thus the nodes that such a template would create?

I have an NFS server, and it is setup to only allow connections from nodes
with the "nfs" security group. A few pods in my cluster mount the NFS
server, and are blocked as a result. Is it possible to setup magnum so that
it adds the "nfs" security group to the worker nodes (it would be alright
if it has to be worker and control nodes)?

Thank you!

-- 

*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/a0ea9ebc/attachment.htm>

From ignaziocassano at gmail.com  Fri Mar  3 19:57:41 2023
From: ignaziocassano at gmail.com (Ignazio Cassano)
Date: Fri, 3 Mar 2023 20:57:41 +0100
Subject: Paying for Openstack support
In-Reply-To: <1739978420.3955026.1677861284070@mail.yahoo.com>
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
 <1739978420.3955026.1677861284070@mail.yahoo.com>
Message-ID: <CAB7j8cUdk1MqdDunQH3qA1vBzttmVYudmVHU5ihLErh0K5_7jQ@mail.gmail.com>

Hello, I think you must look at
https://www.openstack.org/marketplace/distros/ for adopting a supported
distro.
Ignazio

Il Ven 3 Mar 2023, 17:37 Albert Braden <ozzzo at yahoo.com> ha scritto:

> I have a question for the operators here. Is anyone paying for Openstack
> support and getting good value for your money? Can you contact someone for
> help with an issue, and get a useful response in a reasonable time? If you
> have an emergency, can you get help quickly? If so, I would like to hear
> about your experience.
>
> Who are you getting good support from? Do they support your operating
> system too? If not, where do you get your OS support, and how good is it?
>
> If you work for a company that provides openstack and/or Linux support,
> you are welcome to send me a sales pitch, but my goal is to hear from
> operators.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/bf4e4d47/attachment-0001.htm>

From jimmy at openinfra.dev  Fri Mar  3 20:02:47 2023
From: jimmy at openinfra.dev (Jimmy McArthur)
Date: Fri, 3 Mar 2023 14:02:47 -0600
Subject: Paying for Openstack support
In-Reply-To: <CAB7j8cUdk1MqdDunQH3qA1vBzttmVYudmVHU5ihLErh0K5_7jQ@mail.gmail.com>
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
 <1739978420.3955026.1677861284070@mail.yahoo.com>
 <CAB7j8cUdk1MqdDunQH3qA1vBzttmVYudmVHU5ihLErh0K5_7jQ@mail.gmail.com>
Message-ID: <2BE356FB-1EFA-43BE-BFBB-DFD05F82636D@openinfra.dev>

You can also look at https://www.openstack.org/marketplace/consultants for organizations working with and without distros.


> On Mar 3, 2023, at 1:57 PM, Ignazio Cassano <ignaziocassano at gmail.com> wrote:
> 
> Hello, I think you must look at 
> https://www.openstack.org/marketplace/distros/ <https://www.openstack.org/marketplace/distros/> for adopting a supported distro.
> Ignazio
> 
> Il Ven 3 Mar 2023, 17:37 Albert Braden <ozzzo at yahoo.com <mailto:ozzzo at yahoo.com>> ha scritto:
> I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience.
> 
> Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it?
> 
> If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/6d9e8194/attachment.htm>

From john.vanommen at gmail.com  Fri Mar  3 20:03:49 2023
From: john.vanommen at gmail.com (John van Ommen)
Date: Fri, 3 Mar 2023 12:03:49 -0800
Subject: Paying for Openstack support
In-Reply-To: <1739978420.3955026.1677861284070@mail.yahoo.com>
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
 <1739978420.3955026.1677861284070@mail.yahoo.com>
Message-ID: <CAL0qdMP7Er7GxDGoWX2NOGat3T9cr2RURrKaJfA_0Jwp_6sXMg@mail.gmail.com>

I've worked on a number of RHOSP deployments that relied on Red Hat
support, and the experience was positive.

The first OpenStack project that I did, it depended on SwiftStack for
support and they were great. But they were acquired by Nvidia.

AFAIK, there aren't many companies that still provide OpenStack support in
the United States. From what I understand, RackSpace has been pivoting
towards doing AWS support.

On Fri, Mar 3, 2023 at 8:36?AM Albert Braden <ozzzo at yahoo.com> wrote:

> I have a question for the operators here. Is anyone paying for Openstack
> support and getting good value for your money? Can you contact someone for
> help with an issue, and get a useful response in a reasonable time? If you
> have an emergency, can you get help quickly? If so, I would like to hear
> about your experience.
>
> Who are you getting good support from? Do they support your operating
> system too? If not, where do you get your OS support, and how good is it?
>
> If you work for a company that provides openstack and/or Linux support,
> you are welcome to send me a sales pitch, but my goal is to hear from
> operators.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/029594d7/attachment.htm>

From jimmy at openinfra.dev  Fri Mar  3 20:08:45 2023
From: jimmy at openinfra.dev (Jimmy McArthur)
Date: Fri, 3 Mar 2023 14:08:45 -0600
Subject: Paying for Openstack support
In-Reply-To: <CAL0qdMP7Er7GxDGoWX2NOGat3T9cr2RURrKaJfA_0Jwp_6sXMg@mail.gmail.com>
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
 <1739978420.3955026.1677861284070@mail.yahoo.com>
 <CAL0qdMP7Er7GxDGoWX2NOGat3T9cr2RURrKaJfA_0Jwp_6sXMg@mail.gmail.com>
Message-ID: <4EDD9DD7-A2A5-48B4-8152-41B58D0CDEB7@openinfra.dev>

Just to note, there are plenty of organizations that provide support in the US: Virtuozzo, Vexxhost, Red Hat, Canonical, SharkTech, Mirantis, OpenMetal (for hosted private cloud), to name a few.

> On Mar 3, 2023, at 2:03 PM, John van Ommen <john.vanommen at gmail.com> wrote:
> 
> I've worked on a number of RHOSP deployments that relied on Red Hat support, and the experience was positive.
> 
> The first OpenStack project that I did, it depended on SwiftStack for support and they were great. But they were acquired by Nvidia.
> 
> AFAIK, there aren't many companies that still provide OpenStack support in the United States. From what I understand, RackSpace has been pivoting towards doing AWS support.
> 
> On Fri, Mar 3, 2023 at 8:36?AM Albert Braden <ozzzo at yahoo.com <mailto:ozzzo at yahoo.com>> wrote:
> I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience.
> 
> Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it?
> 
> If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/bc322fc0/attachment.htm>

From swogatpradhan22 at gmail.com  Sat Mar  4 18:19:36 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Sat, 4 Mar 2023 23:49:36 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
Message-ID: <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>

Hi,
Can someone please help me out on this issue?

With regards,
Swogat Pradhan

On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi
> I don't see any major packet loss.
> It seems the problem is somewhere in rabbitmq maybe but not due to packet
> loss.
>
> with regards,
> Swogat Pradhan
>
> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> Yes the MTU is the same as the default '1500'.
>> Generally I haven't seen any packet loss, but never checked when
>> launching the instance.
>> I will check that and come back.
>> But everytime i launch an instance the instance gets stuck at spawning
>> state and there the hypervisor becomes down, so not sure if packet loss
>> causes this.
>>
>> With regards,
>> Swogat pradhan
>>
>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>
>>> One more thing coming to mind is MTU size. Are they identical between
>>> central and edge site? Do you see packet loss through the tunnel?
>>>
>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>
>>> > Hi Eugen,
>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>> > getting email's from you.
>>> > Coming to the issue:
>>> >
>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>> /
>>> > Listing policies for vhost "/" ...
>>> > vhost   name    pattern apply-to        definition      priority
>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >
>>> > I have the edge site compute nodes up, it only goes down when i am
>>> trying
>>> > to launch an instance and the instance comes to a spawning state and
>>> then
>>> > gets stuck.
>>> >
>>> > I have a tunnel setup between the central and the edge sites.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Eugen,
>>> >> For some reason i am not getting your email to me directly, i am
>>> checking
>>> >> the email digest and there i am able to find your reply.
>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>
>>> >> *Note: i am able to create vm's and perform other activities in the
>>> >> central site, only facing this issue in the edge site.*
>>> >>
>>> >> With regards,
>>> >> Swogat Pradhan
>>> >>
>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Hi Eugen,
>>> >>> Thanks for your response.
>>> >>> I have actually a 4 controller setup so here are the details:
>>> >>>
>>> >>> *PCS Status:*
>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-no-ceph-3
>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-2
>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-1
>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-0
>>> >>>
>>> >>> I have tried restarting the bundle multiple times but the issue is
>>> still
>>> >>> present.
>>> >>>
>>> >>> *Cluster status:*
>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>> >>> Cluster status of node
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>> >>> Basics
>>> >>>
>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>
>>> >>> Disk Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Running Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Versions
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>
>>> >>> Alarms
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Network Partitions
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Listeners
>>> >>>
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and
>>> >>> CLI tool communication
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>
>>> >>> Feature flags
>>> >>>
>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>> Flag: quorum_queue, state: enabled
>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>
>>> >>> *Logs:*
>>> >>> *(Attached)*
>>> >>>
>>> >>> With regards,
>>> >>> Swogat Pradhan
>>> >>>
>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>>
>>> >>>> nova-conuctor:
>>> >>>>
>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>> backend dogpile.cache.null.
>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>
>>> >>>>> Hi,
>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>> >>>>> launch vm's.
>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>> compute
>>> >>>>> service list), the node comes backup when i restart the nova
>>> compute
>>> >>>>> service but then the launch of the vm fails.
>>> >>>>>
>>> >>>>> nova-compute.log
>>> >>>>>
>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>> >>>>> instance usage
>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>> to
>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>> name:
>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>>> backend dogpile.cache.null.
>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>> >>>>> privsep helper:
>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> 'privsep-helper',
>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>> privsep
>>> >>>>> daemon via rootwrap
>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon starting
>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with uid/gid: 0/0
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon running as pid 2647
>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>> >>>>> execution error
>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>>> Exit code: 2
>>> >>>>> Stdout: ''
>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>>> Unexpected error while running command.
>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>>>
>>> >>>>> Is there a way to solve this issue?
>>> >>>>>
>>> >>>>>
>>> >>>>> With regards,
>>> >>>>>
>>> >>>>> Swogat Pradhan
>>> >>>>>
>>> >>>>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230304/f40e07f9/attachment-0001.htm>

From eblock at nde.ag  Sat Mar  4 20:47:45 2023
From: eblock at nde.ag (Eugen Block)
Date: Sat, 04 Mar 2023 20:47:45 +0000
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
Message-ID: <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>

Hi,

I tried to help someone with a similar issue some time ago in this thread:
https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor

But apparently a neutron reinstallation fixed it for that user, not  
sure if that could apply here. But is it possible that your nova and  
neutron versions are different between central and edge site? Have you  
restarted nova and neutron services on the compute nodes after  
installation? Have you debug logs of nova-conductor and maybe  
nova-compute? Maybe they can help narrow down the issue.
If there isn't any additional information in the debug logs I probably  
would start "tearing down" rabbitmq. I didn't have to do that in a  
production system yet so be careful. I can think of two routes:

- Either remove queues, exchanges etc. while rabbit is running, this  
will most likely impact client IO depending on your load. Check out  
the rabbitmqctl commands.
- Or stop the rabbitmq cluster, remove the mnesia tables from all  
nodes and restart rabbitmq so the exchanges, queues etc. rebuild.

I can imagine that the failed reply "survives" while being replicated  
across the rabbit nodes. But I don't really know the rabbit internals  
too well, so maybe someone else can chime in here and give a better  
advice.

Regards,
Eugen

Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:

> Hi,
> Can someone please help me out on this issue?
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi
>> I don't see any major packet loss.
>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>> loss.
>>
>> with regards,
>> Swogat Pradhan
>>
>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Yes the MTU is the same as the default '1500'.
>>> Generally I haven't seen any packet loss, but never checked when
>>> launching the instance.
>>> I will check that and come back.
>>> But everytime i launch an instance the instance gets stuck at spawning
>>> state and there the hypervisor becomes down, so not sure if packet loss
>>> causes this.
>>>
>>> With regards,
>>> Swogat pradhan
>>>
>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>
>>>> One more thing coming to mind is MTU size. Are they identical between
>>>> central and edge site? Do you see packet loss through the tunnel?
>>>>
>>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>
>>>> > Hi Eugen,
>>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>>> > getting email's from you.
>>>> > Coming to the issue:
>>>> >
>>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>>> /
>>>> > Listing policies for vhost "/" ...
>>>> > vhost   name    pattern apply-to        definition      priority
>>>> > /       ha-all  ^(?!amq\.).*    queues
>>>> >
>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>> >
>>>> > I have the edge site compute nodes up, it only goes down when i am
>>>> trying
>>>> > to launch an instance and the instance comes to a spawning state and
>>>> then
>>>> > gets stuck.
>>>> >
>>>> > I have a tunnel setup between the central and the edge sites.
>>>> >
>>>> > With regards,
>>>> > Swogat Pradhan
>>>> >
>>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> > wrote:
>>>> >
>>>> >> Hi Eugen,
>>>> >> For some reason i am not getting your email to me directly, i am
>>>> checking
>>>> >> the email digest and there i am able to find your reply.
>>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>> >> Yes, these logs are from the time when the issue occurred.
>>>> >>
>>>> >> *Note: i am able to create vm's and perform other activities in the
>>>> >> central site, only facing this issue in the edge site.*
>>>> >>
>>>> >> With regards,
>>>> >> Swogat Pradhan
>>>> >>
>>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi Eugen,
>>>> >>> Thanks for your response.
>>>> >>> I have actually a 4 controller setup so here are the details:
>>>> >>>
>>>> >>> *PCS Status:*
>>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>>  Started
>>>> >>> overcloud-controller-no-ceph-3
>>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>>  Started
>>>> >>> overcloud-controller-2
>>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>>  Started
>>>> >>> overcloud-controller-1
>>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>>  Started
>>>> >>> overcloud-controller-0
>>>> >>>
>>>> >>> I have tried restarting the bundle multiple times but the issue is
>>>> still
>>>> >>> present.
>>>> >>>
>>>> >>> *Cluster status:*
>>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>> >>> Cluster status of node
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>> >>> Basics
>>>> >>>
>>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>> >>>
>>>> >>> Disk Nodes
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>
>>>> >>> Running Nodes
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>
>>>> >>> Versions
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>> >>>
>>>> >>> Alarms
>>>> >>>
>>>> >>> (none)
>>>> >>>
>>>> >>> Network Partitions
>>>> >>>
>>>> >>> (none)
>>>> >>>
>>>> >>> Listeners
>>>> >>>
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>> inter-node and
>>>> >>> CLI tool communication
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>>> 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>
>>>> >>> Feature flags
>>>> >>>
>>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>> >>> Flag: implicit_default_bindings, state: enabled
>>>> >>> Flag: quorum_queue, state: enabled
>>>> >>> Flag: virtual_host_metadata, state: enabled
>>>> >>>
>>>> >>> *Logs:*
>>>> >>> *(Attached)*
>>>> >>>
>>>> >>> With regards,
>>>> >>> Swogat Pradhan
>>>> >>>
>>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Hi,
>>>> >>>> Please find the nova conductor as well as nova api log.
>>>> >>>>
>>>> >>>> nova-conuctor:
>>>> >>>>
>>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> with
>>>> >>>> backend dogpile.cache.null.
>>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>
>>>> >>>>> Hi,
>>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>> >>>>> launch vm's.
>>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>>> compute
>>>> >>>>> service list), the node comes backup when i restart the nova
>>>> compute
>>>> >>>>> service but then the launch of the vm fails.
>>>> >>>>>
>>>> >>>>> nova-compute.log
>>>> >>>>>
>>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>> >>>>> instance usage
>>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>>> to
>>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>> >>>>> dcn01-hci-0.bdxworld.com
>>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>>> name:
>>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> with
>>>> >>>>> backend dogpile.cache.null.
>>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>> >>>>> privsep helper:
>>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>> 'privsep-helper',
>>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>>> privsep
>>>> >>>>> daemon via rootwrap
>>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> daemon starting
>>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> process running with uid/gid: 0/0
>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> process running with capabilities (eff/prm/inh):
>>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> daemon running as pid 2647
>>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>> os_brick.initiator.connectors.nvmeof
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>> >>>>> execution error
>>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>>>> Command: blkid overlay -s UUID -o value
>>>> >>>>> Exit code: 2
>>>> >>>>> Stdout: ''
>>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> >>>>> Unexpected error while running command.
>>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>> >>>>>
>>>> >>>>> Is there a way to solve this issue?
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> With regards,
>>>> >>>>>
>>>> >>>>> Swogat Pradhan
>>>> >>>>>
>>>> >>>>
>>>>
>>>>
>>>>
>>>>


From jvisser at redhat.com  Fri Mar  3 13:57:39 2023
From: jvisser at redhat.com (John Visser)
Date: Fri, 3 Mar 2023 08:57:39 -0500
Subject: [cinder] proposing Jon Bernard for cinder core
In-Reply-To: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
References: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
Message-ID: <CALaTo30Ut--1_OXL441ze9UkVbV4LBOG3-dSQxKyUg53AxmJTw@mail.gmail.com>

That's great, I fully support Jon being a core member, and he's certainly
hitting all the important aspects.

jv


On Fri, Mar 3, 2023 at 6:10?AM Rajat Dhasmana <rdhasman at redhat.com> wrote:

> Hello everyone,
>
> I would like to propose Jon Bernard as cinder core. Looking at the review
> stats
> for the past 60[1], 90[2], 120[3] days, he has been consistently in the
> top 5
> reviewers with a good +/- ratio and leaving helpful comments indicating
> good
> quality of reviews. He has been managing the stable branch releases for the
> past 2 cycles (Zed and 2023.1) and has helped in releasing security issues
> as well.
>
> Jon has been part of the cinder and OpenStack community for a long time and
> has shown very active interest in upstream activities, be it release
> liaison, review
> contribution, attending cinder meetings and also involving in outreachy
> activities.
> He will be a very good addition to our team helping out with the review
> bandwidth
> and adding valuable input in our discussions.
>
> I will leave this thread open for a week and if there are no objections, I
> will add
> Jon Bernard to the cinder core team.
>
> [1]
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
> [2]
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
> [3]
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120
>
> Thanks
> Rajat Dhasmana
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230303/2fe3e12a/attachment-0001.htm>

From bshephar at redhat.com  Sun Mar  5 10:32:11 2023
From: bshephar at redhat.com (Brendan Shephard)
Date: Sun, 5 Mar 2023 20:32:11 +1000
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
Message-ID: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>

Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it?

One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface.

I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance.

In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue.

Regards,

Brendan Shephard
Senior Software Engineer
Red Hat Australia


> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
> 
> Hi,
> 
> I tried to help someone with a similar issue some time ago in this thread:
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
> 
> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue.
> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes:
> 
> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands.
> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
> 
> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice.
> 
> Regards,
> Eugen
> 
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> 
>> Hi,
>> Can someone please help me out on this issue?
>> 
>> With regards,
>> Swogat Pradhan
>> 
>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>> 
>>> Hi
>>> I don't see any major packet loss.
>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>>> loss.
>>> 
>>> with regards,
>>> Swogat Pradhan
>>> 
>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> Yes the MTU is the same as the default '1500'.
>>>> Generally I haven't seen any packet loss, but never checked when
>>>> launching the instance.
>>>> I will check that and come back.
>>>> But everytime i launch an instance the instance gets stuck at spawning
>>>> state and there the hypervisor becomes down, so not sure if packet loss
>>>> causes this.
>>>> 
>>>> With regards,
>>>> Swogat pradhan
>>>> 
>>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>> 
>>>>> One more thing coming to mind is MTU size. Are they identical between
>>>>> central and edge site? Do you see packet loss through the tunnel?
>>>>> 
>>>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>> 
>>>>> > Hi Eugen,
>>>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>>>> > getting email's from you.
>>>>> > Coming to the issue:
>>>>> >
>>>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>>>> /
>>>>> > Listing policies for vhost "/" ...
>>>>> > vhost   name    pattern apply-to        definition      priority
>>>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>> >
>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>> >
>>>>> > I have the edge site compute nodes up, it only goes down when i am
>>>>> trying
>>>>> > to launch an instance and the instance comes to a spawning state and
>>>>> then
>>>>> > gets stuck.
>>>>> >
>>>>> > I have a tunnel setup between the central and the edge sites.
>>>>> >
>>>>> > With regards,
>>>>> > Swogat Pradhan
>>>>> >
>>>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> >> Hi Eugen,
>>>>> >> For some reason i am not getting your email to me directly, i am
>>>>> checking
>>>>> >> the email digest and there i am able to find your reply.
>>>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>>> >> Yes, these logs are from the time when the issue occurred.
>>>>> >>
>>>>> >> *Note: i am able to create vm's and perform other activities in the
>>>>> >> central site, only facing this issue in the edge site.*
>>>>> >>
>>>>> >> With regards,
>>>>> >> Swogat Pradhan
>>>>> >>
>>>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >>> Hi Eugen,
>>>>> >>> Thanks for your response.
>>>>> >>> I have actually a 4 controller setup so here are the details:
>>>>> >>>
>>>>> >>> *PCS Status:*
>>>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>>> Started
>>>>> >>> overcloud-controller-no-ceph-3
>>>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>>> Started
>>>>> >>> overcloud-controller-2
>>>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>>> Started
>>>>> >>> overcloud-controller-1
>>>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>>> Started
>>>>> >>> overcloud-controller-0
>>>>> >>>
>>>>> >>> I have tried restarting the bundle multiple times but the issue is
>>>>> still
>>>>> >>> present.
>>>>> >>>
>>>>> >>> *Cluster status:*
>>>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>>> >>> Cluster status of node
>>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>> >>> Basics
>>>>> >>>
>>>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>> >>>
>>>>> >>> Disk Nodes
>>>>> >>>
>>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>
>>>>> >>> Running Nodes
>>>>> >>>
>>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>
>>>>> >>> Versions
>>>>> >>>
>>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>> 3.8.3
>>>>> >>> on Erlang 22.3.4.1
>>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>> 3.8.3
>>>>> >>> on Erlang 22.3.4.1
>>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>> 3.8.3
>>>>> >>> on Erlang 22.3.4.1
>>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>> RabbitMQ
>>>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>> >>>
>>>>> >>> Alarms
>>>>> >>>
>>>>> >>> (none)
>>>>> >>>
>>>>> >>> Network Partitions
>>>>> >>>
>>>>> >>> (none)
>>>>> >>>
>>>>> >>> Listeners
>>>>> >>>
>>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> tool
>>>>> >>> communication
>>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>> and AMQP 1.0
>>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> tool
>>>>> >>> communication
>>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>> and AMQP 1.0
>>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> tool
>>>>> >>> communication
>>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>> and AMQP 1.0
>>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> interface:
>>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> ,
>>>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>>> inter-node and
>>>>> >>> CLI tool communication
>>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> ,
>>>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>>>> 0-9-1
>>>>> >>> and AMQP 1.0
>>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> ,
>>>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>
>>>>> >>> Feature flags
>>>>> >>>
>>>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>> >>> Flag: quorum_queue, state: enabled
>>>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>> >>>
>>>>> >>> *Logs:*
>>>>> >>> *(Attached)*
>>>>> >>>
>>>>> >>> With regards,
>>>>> >>> Swogat Pradhan
>>>>> >>>
>>>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com>
>>>>> >>> wrote:
>>>>> >>>
>>>>> >>>> Hi,
>>>>> >>>> Please find the nova conductor as well as nova api log.
>>>>> >>>>
>>>>> >>>> nova-conuctor:
>>>>> >>>>
>>>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>>>> due to a
>>>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>>> Abandoning...:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>>>> due to a
>>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> Abandoning...:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>>>> due to a
>>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> Abandoning...:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>>> with
>>>>> >>>> backend dogpile.cache.null.
>>>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>>>> due to a
>>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> Abandoning...:
>>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>
>>>>> >>>> With regards,
>>>>> >>>> Swogat Pradhan
>>>>> >>>>
>>>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>>> Hi,
>>>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>>> >>>>> launch vm's.
>>>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>>>> compute
>>>>> >>>>> service list), the node comes backup when i restart the nova
>>>>> compute
>>>>> >>>>> service but then the launch of the vm fails.
>>>>> >>>>>
>>>>> >>>>> nova-compute.log
>>>>> >>>>>
>>>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>>> >>>>> instance usage
>>>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>>>> to
>>>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>>>> name:
>>>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>>> with
>>>>> >>>>> backend dogpile.cache.null.
>>>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>>> >>>>> privsep helper:
>>>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>>> 'privsep-helper',
>>>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>>>> privsep
>>>>> >>>>> daemon via rootwrap
>>>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>>> daemon starting
>>>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>>> process running with uid/gid: 0/0
>>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>>> daemon running as pid 2647
>>>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>> os_brick.initiator.connectors.nvmeof
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>>> >>>>> execution error
>>>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>> >>>>> Exit code: 2
>>>>> >>>>> Stdout: ''
>>>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>>> >>>>> Unexpected error while running command.
>>>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>> >>>>>
>>>>> >>>>> Is there a way to solve this issue?
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> With regards,
>>>>> >>>>>
>>>>> >>>>> Swogat Pradhan
>>>>> >>>>>
>>>>> >>>>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230305/882abe3a/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Sun Mar  5 11:00:26 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Sun, 5 Mar 2023 16:30:26 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
Message-ID: <CAH0LXPqzWdgoFZsnEKNYqr6Xz8zymdkAHnVK2+Asu_RPRtJsFg@mail.gmail.com>

Hi Brendan,
Thank you for your response.
The edge1 site was just for testing so i used active-backup on 1gbe bonded
interface.
We are in the process of adding another edge site where we are using 2
linux bond vlan templates. I will test and try launching vm in the 2nd edge
site and confirm if I am facing the same issue or no issue at all.

With regards,
Swogat Pradhan

On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com> wrote:

> Does your environment use different network interfaces for each of the
> networks? Or does it have a bond with everything on it?
>
> One issue I have seen before is that when launching instances, there is a
> lot of network traffic between nodes as the hypervisor needs to download
> the image from Glance. Along with various other services sending normal
> network traffic, it can be enough to cause issues if everything is running
> over a single 1Gbe interface.
>
> I have seen the same situation in fact when using a single active/backup
> bond on 1Gbe nics. It?s worth checking the network traffic while you try to
> spawn the instance to see if you?re dropping packets. In the situation I
> described, there were dropped packets which resulted in a loss of
> communication between nova_compute and RMQ, so the node appeared offline.
> You should also confirm that nova_compute is being disconnected in the
> nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
>
> In my case, changing from active/backup to LACP helped. So, based on that
> experience, from my perspective, is certainly sounds like some kind of
> network issue.
>
> Regards,
>
> Brendan Shephard
> Senior Software Engineer
> Red Hat Australia
>
>
>
> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>
> Hi,
>
> I tried to help someone with a similar issue some time ago in this thread:
>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>
> But apparently a neutron reinstallation fixed it for that user, not sure
> if that could apply here. But is it possible that your nova and neutron
> versions are different between central and edge site? Have you restarted
> nova and neutron services on the compute nodes after installation? Have you
> debug logs of nova-conductor and maybe nova-compute? Maybe they can help
> narrow down the issue.
> If there isn't any additional information in the debug logs I probably
> would start "tearing down" rabbitmq. I didn't have to do that in a
> production system yet so be careful. I can think of two routes:
>
> - Either remove queues, exchanges etc. while rabbit is running, this will
> most likely impact client IO depending on your load. Check out the
> rabbitmqctl commands.
> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes
> and restart rabbitmq so the exchanges, queues etc. rebuild.
>
> I can imagine that the failed reply "survives" while being replicated
> across the rabbit nodes. But I don't really know the rabbit internals too
> well, so maybe someone else can chime in here and give a better advice.
>
> Regards,
> Eugen
>
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>
> Hi,
> Can someone please help me out on this issue?
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
> Hi
> I don't see any major packet loss.
> It seems the problem is somewhere in rabbitmq maybe but not due to packet
> loss.
>
> with regards,
> Swogat Pradhan
>
> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
> Hi,
> Yes the MTU is the same as the default '1500'.
> Generally I haven't seen any packet loss, but never checked when
> launching the instance.
> I will check that and come back.
> But everytime i launch an instance the instance gets stuck at spawning
> state and there the hypervisor becomes down, so not sure if packet loss
> causes this.
>
> With regards,
> Swogat pradhan
>
> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>
> One more thing coming to mind is MTU size. Are they identical between
> central and edge site? Do you see packet loss through the tunnel?
>
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>
> > Hi Eugen,
> > Request you to please add my email either on 'to' or 'cc' as i am not
> > getting email's from you.
> > Coming to the issue:
> >
> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
> /
> > Listing policies for vhost "/" ...
> > vhost   name    pattern apply-to        definition      priority
> > /       ha-all  ^(?!amq\.).*    queues
> >
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >
> > I have the edge site compute nodes up, it only goes down when i am
> trying
> > to launch an instance and the instance comes to a spawning state and
> then
> > gets stuck.
> >
> > I have a tunnel setup between the central and the edge sites.
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> > wrote:
> >
> >> Hi Eugen,
> >> For some reason i am not getting your email to me directly, i am
> checking
> >> the email digest and there i am able to find your reply.
> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> >> Yes, these logs are from the time when the issue occurred.
> >>
> >> *Note: i am able to create vm's and perform other activities in the
> >> central site, only facing this issue in the edge site.*
> >>
> >> With regards,
> >> Swogat Pradhan
> >>
> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >> wrote:
> >>
> >>> Hi Eugen,
> >>> Thanks for your response.
> >>> I have actually a 4 controller setup so here are the details:
> >>>
> >>> *PCS Status:*
> >>>   * Container bundle set: rabbitmq-bundle [
> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-no-ceph-3
> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-2
> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-1
> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-0
> >>>
> >>> I have tried restarting the bundle multiple times but the issue is
> still
> >>> present.
> >>>
> >>> *Cluster status:*
> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> >>> Cluster status of node
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> >>> Basics
> >>>
> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>
> >>> Disk Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Running Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Versions
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> RabbitMQ
> >>> 3.8.3 on Erlang 22.3.4.1
> >>>
> >>> Alarms
> >>>
> >>> (none)
> >>>
> >>> Network Partitions
> >>>
> >>> (none)
> >>>
> >>> Listeners
> >>>
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: [::], port: 25672, protocol: clustering, purpose:
> inter-node and
> >>> CLI tool communication
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
> >>>
> >>> Feature flags
> >>>
> >>> Flag: drop_unroutable_metric, state: enabled
> >>> Flag: empty_basic_get_metric, state: enabled
> >>> Flag: implicit_default_bindings, state: enabled
> >>> Flag: quorum_queue, state: enabled
> >>> Flag: virtual_host_metadata, state: enabled
> >>>
> >>> *Logs:*
> >>> *(Attached)*
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>> Please find the nova conductor as well as nova api log.
> >>>>
> >>>> nova-conuctor:
> >>>>
> >>>> 2023-02-26 08:45:01.108 31 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> 2023-02-26 08:45:02.144 26 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> 2023-02-26 08:45:02.314 32 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.282 35 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.303 33 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> b240e3e89d99489284cd731e75f2a5db
> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> with
> >>>> backend dogpile.cache.null.
> >>>> 2023-02-26 08:50:01.264 27 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>
> >>>> With regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
> >>>>> launch vm's.
> >>>>> When the VM is in spawning state the node goes down (openstack
> compute
> >>>>> service list), the node comes backup when i restart the nova
> compute
> >>>>> service but then the launch of the vm fails.
> >>>>>
> >>>>> nova-compute.log
> >>>>>
> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
> >>>>> instance usage
> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
> to
> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
> >>>>> dcn01-hci-0.bdxworld.com
> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
> name:
> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> with
> >>>>> backend dogpile.cache.null.
> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
> >>>>> privsep helper:
> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> 'privsep-helper',
> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
> privsep
> >>>>> daemon via rootwrap
> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon starting
> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with uid/gid: 0/0
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with capabilities (eff/prm/inh):
> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon running as pid 2647
> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> os_brick.initiator.connectors.nvmeof
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
> >>>>> execution error
> >>>>> in _get_host_uuid: Unexpected error while running command.
> >>>>> Command: blkid overlay -s UUID -o value
> >>>>> Exit code: 2
> >>>>> Stdout: ''
> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> >>>>> Unexpected error while running command.
> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>>>
> >>>>> Is there a way to solve this issue?
> >>>>>
> >>>>>
> >>>>> With regards,
> >>>>>
> >>>>> Swogat Pradhan
> >>>>>
> >>>>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230305/d5c2bbc8/attachment-0001.htm>

From ddorra at t-online.de  Sun Mar  5 13:02:51 2023
From: ddorra at t-online.de (ddorra at t-online.de)
Date: Sun, 5 Mar 2023 14:02:51 +0100 (CET)
Subject: [Openstack Trove] Fails to create DB container - "context deadline
 exceeded", even though images can be pulled manually
Message-ID: <1678021371655.583885.3e98060dd2330376c64446c87304aa08d5d3462b@spica.telekom.de>

 
Hi,
 
my Trove service installed into a Openstack Victoria fails in starting DB 
instances:
 
   2023-03-05 12:06:17.172 1015 INFO 
trove.guestagent.datastore.mysql_common.service [-]
   Starting docker container, image: mysql:5.7.29
   2023-03-05 12:06:17.174 1015 WARNING trove.guestagent.utils.docker [-]
   Failed to get container database: docker.errors.NotFound: 404 Client 
Error: Not Found
   ("No such container: database")
   2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service [-] Failed to start mysql:  
             docker.errors.APIError: 500 Server Error:
   Internal Server Error ("Get "https://registry-1.docker.io/v2/": context 
deadline exceeded")
 
 
It sounds that it's unable to connect to Docker in order to download the 
image.
 
However, after logging in to the DB instance I am able to pull docker 
images manually, or even a cloudinit action can successfully pull images 
during instance creation...
 
   root at t1:~# docker images
   REPOSITORY TAG IMAGE ID CREATED SIZE
   mysql 5.7.29 5d9483f9a7b2 2 years ago 455MB
   root at t1:~# <mailto:root at t1:~#>
 
The cloudinit I need to adjust the default GW in oder to reach internet. I 
hope that is early enough in the boot process and doesn't hamper the 
guestagent.
 
Any hint what I can do?
 
 
============= Trove cloudinit
root at voscontrol:~# cat /etc/trove/cloudinit/mysql.cloudinit
#cloud-config
write_files:
- content: |
...
path: /etc/ssh/sshd_config
- content: |
...
path: /root/.bash_aliases
runcmd:
- [ ip, route, delete, default, via, 10.0.0.79 ]
- [ ip, route, add, default, via, 10.0.0.62 ]
- [ service, ssh, restart ]
- [ docker, pull, "mysql:5.7.29" ]
root at voscontrol:~#
 

================ guest config
root at t1:/etc/trove# cat ./conf.d/guest_info.conf
[DEFAULT]
guest_id=291070bc-671d-437a-a68c-9cee840e614c
datastore_manager=mysql
datastore_version=5.7.29
tenant_id=606b291caeab4dc2bb1072e2e43b082a
instance_rpc_encr_key=EEKenGHSe4exOOU8a0ivV2aR8FwOr2yw
root at t1:/etc/trove# cat ./conf.d/trove-guestagent.conf
[DEFAULT]
log_file = trove-guestagent.log
log_dir = /var/log/trove/
ignore_users = os_admin
control_exchange = trove
#transport_url = rabbit://openstack:pass123 at 192.168.100.79:5672/
transport_url = rabbit://openstack:pass123 at 10.0.0.79:5672/
rpc_backend = rabbit
command_process_timeout = 60
use_syslog = False
debug = True
[service_credentials]
#auth_url = http://192.168.100.79:5000/v3
auth_url = http://10.0.0.79:5000/v3
region_name = RegionOne
project_name = service
password = pass123
project_domain_name = Default
user_domain_name = Default
username = trove
root at t1:/etc/trove#
 
================ /var/log/trove/trove-guestagent.log
...
2023-03-05 12:06:17.162 1015 DEBUG oslo_concurrency.processutils [-] CMD 
"sudo mkdir -p /var/run/mysqld" returned: 0 in 0.009s execute 
/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
2023-03-05 12:06:17.162 1015 DEBUG oslo_concurrency.processutils [-] 
Running cmd (subprocess): sudo chown -R 1001:1001 /var/run/mysqld execute 
/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
2023-03-05 12:06:17.171 1015 DEBUG oslo_concurrency.processutils [-] CMD 
"sudo chown -R 1001:1001 /var/run/mysqld" returned: 0 in 0.009s execute 
/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
2023-03-05 12:06:17.172 1015 INFO 
trove.guestagent.datastore.mysql_common.service [-] Starting docker 
container, image: mysql:5.7.29
2023-03-05 12:06:17.174 1015 WARNING trove.guestagent.utils.docker [-] 
Failed to get container database: docker.errors.NotFound: 404 Client Error: 
Not Found ("No such container: database")
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service [-] Failed to start mysql: 
docker.errors.APIError: 500 Server Error: Internal Server Error ("Get 
"https://registry-1.docker.io/v2/": context deadline exceeded")
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service response.raise_for_status()
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/database/json
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service During handling of the 
above exception, another exception occurred:
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 58, in start_container
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service container = 
client.containers.get(name)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 887, in get
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service resp = 
self.client.api.inspect_container(container_id)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", 
line 19, in wrapped
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service return f(self, resource_id, 
*args, **kwargs)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 771, in inspect_container
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
self._get(self._url("/containers/{0}/json", container)), True
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
self._raise_for_status(response)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service docker.errors.NotFound: 404 
Client Error: Not Found ("No such container: database")
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service During handling of the 
above exception, another exception occurred:
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service response.raise_for_status()
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/create?name=database
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service During handling of the 
above exception, another exception occurred:
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 810, in run
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service detach=detach, **kwargs)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 868, in create
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service resp = 
self.client.api.create_container(**create_kwargs)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 430, in create_container
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service return 
self.create_container_from_config(config, name)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 441, in create_container_from_config
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service return self._result(res, 
True)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
self._raise_for_status(response)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: 
mysql:5.7.29")
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service During handling of the 
above exception, another exception occurred:
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service response.raise_for_status()
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for 
url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service During handling of the 
above exception, another exception occurred:
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service Traceback (most recent call 
last):
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", 
line 612, in start_db
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service command=command
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 73, in start_container
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service command=command
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 812, in run
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
self.client.images.pull(image, platform=platform)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", 
line 445, in pull
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service repository, tag=tag, 
stream=True, **kwargs
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", 
line 415, in pull
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service 
self._raise_for_status(response)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service docker.errors.APIError: 500 
Server Error: Internal Server Error ("Get 
"https://registry-1.docker.io/v2/": context deadline exceeded")
2023-03-05 12:06:32.540 1015 ERROR 
trove.guestagent.datastore.mysql_common.service
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager [-] 
Failed to prepare datastore: Failed to start mysql: 
trove.common.exception.TroveError: Failed to start mysql
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
response.raise_for_status()
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/database/json
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 58, in start_container
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
container = client.containers.get(name)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 887, in get
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager resp 
= self.client.api.inspect_container(container_id)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", 
line 19, in wrapped
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
return f(self, resource_id, *args, **kwargs)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 771, in inspect_container
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self._get(self._url("/containers/{0}/json", container)), True
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self._raise_for_status(response)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
cls(e, response=response, explanation=explanation)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
docker.errors.NotFound: 404 Client Error: Not Found ("No such container: 
database")
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
response.raise_for_status()
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/create?name=database
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 810, in run
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
detach=detach, **kwargs)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 868, in create
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager resp 
= self.client.api.create_container(**create_kwargs)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 430, in create_container
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
return self.create_container_from_config(config, name)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 441, in create_container_from_config
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
return self._result(res, True)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self._raise_for_status(response)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
cls(e, response=response, explanation=explanation)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: 
mysql:5.7.29")
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
response.raise_for_status()
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for 
url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", 
line 612, in start_db
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
command=command
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 73, in start_container
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
command=command
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 812, in run
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self.client.images.pull(image, platform=platform)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", 
line 445, in pull
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
repository, tag=tag, stream=True, **kwargs
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", 
line 415, in pull
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self._raise_for_status(response)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
cls(e, response=response, explanation=explanation)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
docker.errors.APIError: 500 Server Error: Internal Server Error ("Get 
"https://registry-1.docker.io/v2/": context deadline exceeded")
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
During handling of the above exception, another exception occurred:
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
Traceback (most recent call last):
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", 
line 223, in _prepare
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
cluster_config, snapshot, ds_version=ds_version)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", 
line 160, in wrapper
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
result = f(*args, **kwargs)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/manager.py", 
line 96, in do_prepare
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
self.app.start_db(ds_version=ds_version, command=command)
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", 
line 620, in start_db
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager raise 
exception.TroveError(_("Failed to start mysql"))
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager 
trove.common.exception.TroveError: Failed to start mysql
2023-03-05 12:06:32.546 1015 ERROR trove.guestagent.datastore.manager
2023-03-05 12:06:32.549 1015 INFO trove.guestagent.datastore.manager [-] 
Ending datastore prepare for 'mysql'.
2023-03-05 12:06:32.550 1015 INFO trove.guestagent.datastore.service [-] 
Set final status to failed to spawn.
2023-03-05 12:06:32.550 1015 DEBUG trove.guestagent.datastore.service [-] 
Casting set_status message to conductor (status is 'failed to spawn'). 
set_status 
/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/service.py:144
2023-03-05 12:06:32.550 1015 DEBUG trove.conductor.api [-] Making async 
call to cast heartbeat for instance: 291070bc-671d-437a-a68c-9cee840e614c 
heartbeat 
/opt/guest-agent-venv/lib/python3.6/site-packages/trove/conductor/api.py:73
2023-03-05 12:06:32.553 1015 DEBUG trove.guestagent.datastore.service [-] 
Successfully cast set_status. set_status 
/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/service.py:151
2023-03-05 12:06:32.553 1015 DEBUG trove.conductor.api [-] Making async 
call to cast error notification notify_exc_info 
/opt/guest-agent-venv/lib/python3.6/site-packages/trove/conductor/api.py:115
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server [-] Exception 
during message handling: trove.common.exception.TroveError: Failed to start 
mysql
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
response.raise_for_status()
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/database/json
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 58, in start_container
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server container = 
client.containers.get(name)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 887, in get
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server resp = 
self.client.api.inspect_container(container_id)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/utils/decorators.py", 
line 19, in wrapped
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return f(self, 
resource_id, *args, **kwargs)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 771, in inspect_container
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self._get(self._url("/containers/{0}/json", container)), True
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self._raise_for_status(response)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
docker.errors.NotFound: 404 Client Error: Not Found ("No such container: 
database")
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
response.raise_for_status()
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: 
http+docker://localhost/v1.41/containers/create?name=database
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 810, in run
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server detach=detach, 
**kwargs)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 868, in create
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server resp = 
self.client.api.create_container(**create_kwargs)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 430, in create_container
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return 
self.create_container_from_config(config, name)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/container.py", 
line 441, in create_container_from_config
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return 
self._result(res, True)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 265, in _result
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self._raise_for_status(response)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: 
mysql:5.7.29")
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 259, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
response.raise_for_status()
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/requests/models.py", 
line 941, in raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
HTTPError(http_error_msg, response=self)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for 
url: http+docker://localhost/v1.41/images/create?tag=5.7.29&fromImage=mysql
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", 
line 612, in start_db
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
command=command
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/utils/docker.py", 
line 73, in start_container
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
command=command
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/containers.py", 
line 812, in run
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self.client.images.pull(image, platform=platform)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/models/images.py", 
line 445, in pull
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server repository, 
tag=tag, stream=True, **kwargs
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/image.py", 
line 415, in pull
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self._raise_for_status(response)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/api/client.py", 
line 261, in _raise_for_status
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
create_api_error_from_http_exception(e)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/docker/errors.py", line 
31, in create_api_error_from_http_exception
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise cls(e, 
response=response, explanation=explanation)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
docker.errors.APIError: 500 Server Error: Internal Server Error ("Get 
"https://registry-1.docker.io/v2/": context deadline exceeded")
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server During 
handling of the above exception, another exception occurred:
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", 
line 165, in _process_incoming
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 309, in dispatch
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 229, in _do_dispatch
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", 
line 160, in wrapper
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", 
line 207, in prepare
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
ds_version=ds_version)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py", 
line 223, in _prepare
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
cluster_config, snapshot, ds_version=ds_version)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/osprofiler/profiler.py", 
line 160, in wrapper
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server result = 
f(*args, **kwargs)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/manager.py", 
line 96, in do_prepare
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
self.app.start_db(ds_version=ds_version, command=command)
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server File 
"/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/mysql_common/service.py", 
line 620, in start_db
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server raise 
exception.TroveError(_("Failed to start mysql"))
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server 
trove.common.exception.TroveError: Failed to start mysql
2023-03-05 12:06:32.557 1015 ERROR oslo_messaging.rpc.server
2023-03-05 12:06:58.303 1015 DEBUG trove.guestagent.datastore.manager [-] 
Getting file system stats for '/var/lib/mysql' get_filesystem_stats 
/opt/guest-agent-venv/lib/python3.6/site-packages/trove/guestagent/datastore/manager.py:368
2023-03-05 12:07:16.911 1015 DEBUG oslo_service.periodic_task [-] Running 
periodic task Manager.update_status run_periodic_tasks 
/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_service/periodic_task.py:211
2023-03-05 12:07:16.911 1015 INFO trove.guestagent.datastore.manager [-] 
Database service is not installed, skip status check
2023-03-05 12:08:16.918 1015 DEBUG oslo_service.periodic_task [-] Running 
periodic task Manager.update_status run_periodic_tasks 
/opt/guest-agent-venv/lib/python3.6/site-packages/oslo_service/periodic_task.py:211
2023-03-05 12:08:16.919 1015 INFO trove.guestagent.datastore.manager [-] 
Database service is not installed, skip status check
root at t1:/var/log/trove#
 
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230305/1af42f07/attachment-0001.htm>

From mnasiadka at gmail.com  Mon Mar  6 09:05:28 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Mon, 6 Mar 2023 10:05:28 +0100
Subject: [kolla] Weekly meeting on 8th March cancelled
Message-ID: <4FB2F7F9-9368-4352-ADFA-C34718807B3C@gmail.com>

Hola Koalas,

Weekly meeting on Wed this week is cancelled - I?m off on vacation.

See you next week!

Best regards,
Michal

From thierry at openstack.org  Mon Mar  6 09:08:05 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Mon, 6 Mar 2023 10:08:05 +0100
Subject: [largescale-sig] Next meeting: March 8, 9utc
Message-ID: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org>

Hi everyone,

The Large Scale SIG will be meeting this Wednesday in 
#openstack-operators on OFTC IRC, at 9UTC, our APAC+EU-friendly time.

You can doublecheck how that UTC time translates locally at:
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20230308T09

Feel free to add topics to the agenda:
https://etherpad.opendev.org/p/large-scale-sig-meeting

Regards,

-- 
Thierry Carrez


From lokendrarathour at gmail.com  Mon Mar  6 09:34:23 2023
From: lokendrarathour at gmail.com (Lokendra Rathour)
Date: Mon, 6 Mar 2023 15:04:23 +0530
Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal Instance
Message-ID: <CAJm6b-6QqdP03AHEpAB=vafo1-bNW4mmmU6A2WWbj92xpM2uoQ@mail.gmail.com>

Hi Team,
we have a PoC where we wish to try Creating OpenStack Baremetal Instance
using Alma Linux.
Please help in case that is possible, any reference where we can use the
Alma Images to instantiate the Baremetal Instance.

-- 
~ Lokendra
skype: lokendrarathour
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/223c5fa3/attachment.htm>

From katonalala at gmail.com  Mon Mar  6 10:35:47 2023
From: katonalala at gmail.com (Lajos Katona)
Date: Mon, 6 Mar 2023 11:35:47 +0100
Subject: [neutron] Bug deputy report (week 09, starting on Feb-27-2022)
Message-ID: <CALg0jKm5F6nSSWEXDsPjroSv7gdJ06PvO749Ym+wNY75UzRAkg@mail.gmail.com>

Hi neutrinos,

Here are the bugs reported between February 27 and March 05.

*First the unassigned bugs:*
* https://bugs.launchpad.net/neutron/+bug/2009043 neutron-l3-agent restart
some random ha routers get wrong state

*High prio bugs*
* https://bugs.launchpad.net/neutron/+bug/2008712 Security group rule
deleted by cascade (because its remote group had been deleted) is not
deleted in the backend
* https://bugs.launchpad.net/neutron/+bug/2008947 Investigate
test_restart_rpc_on_sighup_multiple_workers failure
* https://bugs.launchpad.net/neutron/+bug/2009055 Performance issue when
creating lots of ports
* https://bugs.launchpad.net/neutron/+bug/2009215 [OVS] Error during OVS
agent start

*Medium or lower:*
* https://bugs.launchpad.net/neutron/+bug/2008858 Call the api and do not
return for a long time
* https://bugs.launchpad.net/neutron/+bug/2008912
"_validate_create_network_callback" failing with 'NoneType' object has no
attribute 'qos_policy_id'
* https://bugs.launchpad.net/neutron/+bug/2008943 OVN DB Sync utility
cannot find NB DB Port Group
* https://bugs.launchpad.net/neutron/+bug/2009053 OVN: default stateless SG
blocks metadata traffic
* https://bugs.launchpad.net/neutron/+bug/2009221 [OVS] Custom ethertype
traffic is not coming into the VM

*One incomplete:*
* https://bugs.launchpad.net/neutron/+bug/2008808 Duplicate packet when
ping to external with out floating ip

*Already merged / released bugs:*
* https://bugs.launchpad.net/neutron/+bug/2008695 Remove any LB HM
references from the external_ids upon deleting an HM
* https://bugs.launchpad.net/neutron/+bug/2008767 [sqlalchemy-20][vnpaas]
SQL execution without transaction in progress

Have a bugless week :-)
Lajos (lajoskatona)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/b10addaa/attachment.htm>

From geguileo at redhat.com  Mon Mar  6 11:35:43 2023
From: geguileo at redhat.com (Gorka Eguileor)
Date: Mon, 6 Mar 2023 12:35:43 +0100
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
Message-ID: <20230306113543.a57aywefbn4cgsu3@localhost>

On 16/02, Rishat Azizov wrote:
> Hello!
>
> We have an error with creating backups from iscsi volume. Usually, this
> happens with large backups over 100GB.
>
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> [req-f6619913-6f96-4226-8d75-2da3fca722f1 23de1b92e7674cf59486f07ac75b886b
> a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message handling:
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
> Command: multipath -f 3624a93705842cfae35d7483200015ec6
> Exit code: 1
> Stdout: ''
> Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a
> multipath device\n'
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Traceback
> (most recent call last):
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165,
> in _process_incoming
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     res =
> self.dispatcher.dispatch(message)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line
> 309, in dispatch
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> self._do_dispatch(endpoint, method, ctxt, args)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line
> 229, in _do_dispatch
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     result =
> func(ctxt, **new_args)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> func(self, *args, **kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in
> create_backup
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> volume_utils.update_backup_error(backup, str(err))
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in
> __exit__
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> self.force_reraise()
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in
> force_reraise
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     raise
> self.value
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in
> create_backup
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     updates =
> self._run_backup(context, backup, volume)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in
> _run_backup
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> ignore_errors=True)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, in
> _detach_device
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> force=force, ignore_errors=ignore_errors)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in
> trace_logging_wrapper
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> f(*args, **kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360,
> in inner
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> f(*args, **kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> line 880, in disconnect_volume
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> is_disconnect_call=True)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> line 942, in _cleanup_connection
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> self._linuxscsi.flush_multipath_device(multipath_name)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line
> 382, in flush_multipath_device
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> root_helper=self._root_helper)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in
> _execute
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     result =
> self.__execute(*args, **kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line
> 172, in execute
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> execute_root(*cmd, **kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247,
> in _wrap
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> self.channel.remote_call(name, args, kwargs)
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in
> remote_call
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     raise
> exc_type(*result[2])
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command:
> multipath -f 3624a93705842cfae35d7483200015ec6
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit code: 1
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: ''
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: 'Feb
> 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath device\n'
> 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
>
> Could you please help with this error?

Hi,

Does it work for smaller volumes or does it also fail?

What are your defaults in your /etc/multipath.conf file?

What Cinder release are you using?

Cheers,
Gorka.


From jungleboyj at gmail.com  Mon Mar  6 13:56:03 2023
From: jungleboyj at gmail.com (Jay Bryant)
Date: Mon, 6 Mar 2023 07:56:03 -0600
Subject: [cinder] proposing Jon Bernard for cinder core
In-Reply-To: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
References: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
Message-ID: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com>

No objections from me!? I think Jon would be a great addition!

Thanks,

Jay

On 3/3/2023 5:04 AM, Rajat Dhasmana wrote:
> Hello everyone,
>
> I would like to propose Jon Bernard as cinder core. Looking at the 
> review stats
> for the past 60[1], 90[2], 120[3] days, he has been consistently in 
> the top 5
> reviewers with a good?+/- ratio and leaving helpful comments 
> indicating good
> quality of reviews. He has been managing the stable?branch releases 
> for the
> past 2 cycles (Zed and 2023.1) and has helped in releasing security 
> issues as well.
>
> Jon has been part of the cinder and OpenStack community for a long 
> time and
> has shown very active interest in upstream activities, be it release 
> liaison, review
> contribution, attending cinder meetings and also involving in 
> outreachy activities.
> He will be a very good addition to our team helping out with the 
> review bandwidth
> and adding valuable input in our discussions.
>
> I will leave this thread open for a week and if there are no 
> objections, I will add
> Jon Bernard to the cinder core team.
>
> [1] 
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60 
> <https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60>
> [2] 
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90 
> <https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90>
> [3] 
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120 
> <https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120>
>
> Thanks
> Rajat Dhasmana


From steveftaylor at gmail.com  Mon Mar  6 15:40:36 2023
From: steveftaylor at gmail.com (Steve Taylor)
Date: Mon, 06 Mar 2023 08:40:36 -0700
Subject: [openstack-helm] Get rid of cephfs and rbd provisioners
In-Reply-To: <CAEs876hHtapdszaEX1_v6d-dn7rtD0W8UT7vmME4CsBn7Q_hng@mail.gmail.com>
References: <CANxTg76183NFZm9CnTD3zgUQb9u9Z46LJWQ1NL0qQqLpKhaVjQ@mail.gmail.com>
 <CAEs876hHtapdszaEX1_v6d-dn7rtD0W8UT7vmME4CsBn7Q_hng@mail.gmail.com>
Message-ID: <Mailbird-f89a889f-a1e4-4628-bc8e-c49ae850f177@gmail.com>

I agree as well. I have been working to update the Ceph client images to Quincy and Focal to match the Ceph daemons that have already been updated, and the CephFS provisioner is proving difficult. It is outdated and incompatible with Python 3, but newer librados packages are only built for Python 3.

If we want to keep the old provisioners around, the path that makes the most sense is to update them to be compatible with more modern frameworks and libraries, but personally, I don't see a need. I think pretty much everyone has moved to CSI, and anyone that hasn't probably should. I am in favor of removing the outdated provisioners.

Steve
On 3/2/2023 3:22:20 PM, Mohammed Naser <mnaser at vexxhost.com> wrote:
Hi Vladimir,

I agree.? I also think we should stop maintaining the CSI provisioner chart and simply deploy the one provided by the Ceph CSI team

Less code we maintain, the better.

Thanks
Mohammed

On Thu, Mar 2, 2023 at 10:13 PM Vladimir Kozhukalov <kozhukalov at gmail.com [mailto:kozhukalov at gmail.com]> wrote:

Hi everyone,

I would like to suggest getting rid of cephfs and rbd provisioners. They have been retired and have not been maintained for about 2.5 years now [1]. I believe the CSI approach is what all users rely on nowadays and we can safely remove them.?

The trigger for this suggestion is that we are currently experiencing issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing this is just wasting time. [2] Stephen spent some time debugging the issues and can give more details if needed.?

What?do you think?
?
[1]?https://github.com/kubernetes-retired/external-storage/tree/master/ceph [https://github.com/kubernetes-retired/external-storage/tree/master/ceph]

[2]?https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976 [https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976]
--

Best regards,
Kozhukalov Vladimir


--

Mohammed Naser
VEXXHOST, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/c0ef7bf9/attachment.htm>

From garcetto at gmail.com  Mon Mar  6 16:10:44 2023
From: garcetto at gmail.com (garcetto)
Date: Mon, 6 Mar 2023 17:10:44 +0100
Subject: [manila] ha for share server
Message-ID: <CADA6Efu8-GSL8Y1t0mR_2=T-AfakRJYjBwDSb8e+h5UM03MHQQ@mail.gmail.com>

good afternoon,
 is it possible to have HA for share server vm in openstack?
i mean, the vm that is created on every tenant and used as nfs server.
thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/f5867af7/attachment.htm>

From garcetto at gmail.com  Mon Mar  6 17:05:55 2023
From: garcetto at gmail.com (garcetto)
Date: Mon, 6 Mar 2023 18:05:55 +0100
Subject: [manila] share server with multiple tenant networks
Message-ID: <CADA6EfuMSvN1HAP9=E3E0QBkzKxU4kMVJwGKRJ8w+pJLvHLb1A@mail.gmail.com>

good afternoon,
 i am trying to add a second share server or share to existing share server
on different network inside same tenant, actually i have:

tenant-net-01 (with the share server), ok working.
tenant-net-02 (tried to add a share network, but the error says
"
create: Could not find an existing share server or allocate one on the
share network provided. You may use a different share network, or verify
the network details in the share network and retry your request. If this
doesn't work, contact your administrator to troubleshoot issues with your
network.
"
any clue or doc i can read?
thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/bc30a41e/attachment.htm>

From skaplons at redhat.com  Mon Mar  6 17:19:26 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Mon, 06 Mar 2023 18:19:26 +0100
Subject: [neutron] CI meeting Tuesday 7.03.2023 cancelled
Message-ID: <5222101.MrzyGnTNMV@p1>

Hi,

Due to some internal event which I have tomorrow in the same time as our CI meeting I will not be able to run the CI meeting. I don't see any really serious issues in our CI this week and after discussing that with Rodolfo we decided to cancel this week's CI meeting. See You on the meeting next week.

-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/4be9c80a/attachment-0001.sig>

From james.denton at rackspace.com  Mon Mar  6 17:25:15 2023
From: james.denton at rackspace.com (James Denton)
Date: Mon, 6 Mar 2023 17:25:15 +0000
Subject: Paying for Openstack support
In-Reply-To: <CAL0qdMP7Er7GxDGoWX2NOGat3T9cr2RURrKaJfA_0Jwp_6sXMg@mail.gmail.com>
References: <1739978420.3955026.1677861284070.ref@mail.yahoo.com>
 <1739978420.3955026.1677861284070@mail.yahoo.com>
 <CAL0qdMP7Er7GxDGoWX2NOGat3T9cr2RURrKaJfA_0Jwp_6sXMg@mail.gmail.com>
Message-ID: <BL1PR20MB4609E9645C32D805650490628EB69@BL1PR20MB4609.namprd20.prod.outlook.com>

+1 for Red Hat support.

Rackspace is still very much in the OpenStack game, especially for private clouds, with deployments mainly based on RHOSP and OpenStack-Ansible. Happy to put you in touch with someone if you?d like more info on various support services (short or long term).

--
James Denton
Principal Architect
Rackspace Private Cloud - OpenStack
james.denton at rackspace.com

From: John van Ommen <john.vanommen at gmail.com>
Date: Friday, March 3, 2023 at 2:11 PM
To: ozzzo at yahoo.com <ozzzo at yahoo.com>
Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
Subject: Re: Paying for Openstack support
CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!

I've worked on a number of RHOSP deployments that relied on Red Hat support, and the experience was positive.

The first OpenStack project that I did, it depended on SwiftStack for support and they were great. But they were acquired by Nvidia.

AFAIK, there aren't many companies that still provide OpenStack support in the United States. From what I understand, RackSpace has been pivoting towards doing AWS support.

On Fri, Mar 3, 2023 at 8:36?AM Albert Braden <ozzzo at yahoo.com<mailto:ozzzo at yahoo.com>> wrote:
I have a question for the operators here. Is anyone paying for Openstack support and getting good value for your money? Can you contact someone for help with an issue, and get a useful response in a reasonable time? If you have an emergency, can you get help quickly? If so, I would like to hear about your experience.

Who are you getting good support from? Do they support your operating system too? If not, where do you get your OS support, and how good is it?

If you work for a company that provides openstack and/or Linux support, you are welcome to send me a sales pitch, but my goal is to hear from operators.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/38b00044/attachment.htm>

From pshchelokovskyy at mirantis.com  Mon Mar  6 17:32:40 2023
From: pshchelokovskyy at mirantis.com (Pavlo Shchelokovskyy)
Date: Mon, 6 Mar 2023 19:32:40 +0200
Subject: [barbican] database is growing and can not be purged
Message-ID: <CACfB1ut8giZzcEKgGCy0CyEop8bbqeKnbSPH8owqqR6eLmQGPA@mail.gmail.com>

Hi all,

we are observing the following behavior in Barbican:
- OpenStack environment is using both encrypted Cinder volumes and
encrypted local storage (lvm) for Nova instances
- over the time, the secrets and orders tables are growing
- many soft-deleted entries in secrets DB can not be purged by the db
cleanup script

As I understand what is happening - both Cinder and Nova create secrets in
Barbican on behalf of the user when creating an encrypted volume or booting
an instance with encrypted local storage. They both do it via castellan
library, that under the hood creates orders in Barbican, waits for them to
become active and returns to the caller only the ID of the generated
secret. When time comes to delete the thing (volume or instance)
Cinder/Nova again use castellan, but only delete the secret, not the order
(they are not aware that there was any 'order' created anyway). As a
result, the orders are left in place, and DB cleanup procedure does not
delete soft-deleted secrets when there's an ACTIVE order referencing such
secret.

This is troublesomes on many levels - users who use Cinder or Nova may not
even be aware that they are creating something in Barbican. Orders
accumulating like that may eventually result in cryptic errors when e.g.
when you run out of quota for orders. And what's more, default Barbican
policies do allow 'normal' (creator) users to create an order, but not
delete it (only project admin can do it), so even if the users are aware of
Barbican involvement, they can not delete those orders manually anyway.
Plus there's no good way in API to determine outright which orders are
referencing deleted secrets.

I see several ways of dealing with that and would like to ask for your
opinion on what would be the best one:
1. Amend Barbican API to allow filtering orders by the secrets, when
castellan deletes a secret - search for corresponding order and delete it
as well, change default policy to actually allow order deletion by the same
users who can create them.
2. Cascade-delete orders when deleting secrets - this is easy but probably
violates that very policy that disallowed normal users to delete orders.
3. improve the database cleanup so it first marks any order that references
a deleted secret also as deleted, so later when time comes both could be
purged (or something like that). This also has a similar downside to the
previous option by not being explicit enough.

I've filed a bug for that https://storyboard.openstack.org/#!/story/2010625
and proposed a patch for option 2 (cascade delete), but would like to ask
what would you see as the most appropriate way  or may be there's something
else that I've missed.

Btw, the problem is probably even more pronounced with keypairs - when
castellan is used to create those, under the hood both order and container
are created besides the actual secrets, and again only the secret ids are
returned to the caller. When time comes to delete things, the caller only
knows about secret IDs, and can only delete them, leaving both container
and order behind.
Luckily, I did not find any place across OpenStack that actually creates
keypairs using castellan... but the problem is definitely there.

Best regards,
-- 
Dr. Pavlo Shchelokovskyy
Principal Software Engineer
Mirantis Inc
www.mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/9ecf5b59/attachment-0001.htm>

From gmann at ghanshyammann.com  Mon Mar  6 18:36:30 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Mon, 06 Mar 2023 10:36:30 -0800
Subject: [qa][stable][gate] Tempest master failing on stable/yoga|xena, fix
 is in gate
Message-ID: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com>

Hi All,

In case any of you are seeing failure on stable/yoga and stable/xena with the below error,
please hold the recheck until the fix (revert) in Tempest is merged

"AttributeError: type object 'Draft4Validator' has no attribute 'FORMAT_CHECKER'"

- https://review.opendev.org/c/openstack/tempest/+/876218

-gmann


From kozhukalov at gmail.com  Mon Mar  6 18:47:29 2023
From: kozhukalov at gmail.com (Vladimir Kozhukalov)
Date: Mon, 6 Mar 2023 21:47:29 +0300
Subject: [openstack-helm] Get rid of cephfs and rbd provisioners
In-Reply-To: <Mailbird-f89a889f-a1e4-4628-bc8e-c49ae850f177@gmail.com>
References: <CANxTg76183NFZm9CnTD3zgUQb9u9Z46LJWQ1NL0qQqLpKhaVjQ@mail.gmail.com>
 <CAEs876hHtapdszaEX1_v6d-dn7rtD0W8UT7vmME4CsBn7Q_hng@mail.gmail.com>
 <Mailbird-f89a889f-a1e4-4628-bc8e-c49ae850f177@gmail.com>
Message-ID: <CANxTg75WV0DMVUHCpp4cqOq7gozE37rbMdBO9nHWbstJ679Cug@mail.gmail.com>

Guys,

Thank you for your thoughts. I appreciate it. Looks like we all agree about
removal of old style provisioners. I'll prepare a PS for this.

Also thanks  for the good idea to switch to the upstream charts for CSI
Ceph provisioners  [1]. Let's do this as a separate PS.

[1] https://github.com/ceph/ceph-csi/tree/devel/charts

On Mon, Mar 6, 2023 at 6:40?PM Steve Taylor <steveftaylor at gmail.com> wrote:

> I agree as well. I have been working to update the Ceph client images to
> Quincy and Focal to match the Ceph daemons that have already been updated,
> and the CephFS provisioner is proving difficult. It is outdated and
> incompatible with Python 3, but newer librados packages are only built for
> Python 3.
>
> If we want to keep the old provisioners around, the path that makes the
> most sense is to update them to be compatible with more modern frameworks
> and libraries, but personally, I don't see a need. I think pretty much
> everyone has moved to CSI, and anyone that hasn't probably should. I am in
> favor of removing the outdated provisioners.
>
> Steve
>
> On 3/2/2023 3:22:20 PM, Mohammed Naser <mnaser at vexxhost.com> wrote:
> Hi Vladimir,
>
> I agree.  I also think we should stop maintaining the CSI provisioner
> chart and simply deploy the one provided by the Ceph CSI team
>
> Less code we maintain, the better.
>
> Thanks
> Mohammed
>
> On Thu, Mar 2, 2023 at 10:13?PM Vladimir Kozhukalov <kozhukalov at gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I would like to suggest getting rid of cephfs and rbd provisioners. They
>> have been retired and have not been maintained for about 2.5 years now [1].
>> I believe the CSI approach is what all users rely on nowadays and we can
>> safely remove them.
>>
>> The trigger for this suggestion is that we are currently experiencing
>> issues while trying to switch cephfs provisioner to Ubuntu Focal and fixing
>> this is just wasting time. [2] Stephen spent some time debugging the issues
>> and can give more details if needed.
>>
>> What do you think?
>>
>> [1]
>> https://github.com/kubernetes-retired/external-storage/tree/master/ceph
>> [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/872976
>> --
>> Best regards,
>> Kozhukalov Vladimir
>>
>
>
> --
> Mohammed Naser
> VEXXHOST, Inc.
>
>

-- 
Best regards,
Kozhukalov Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/d77a0f40/attachment.htm>

From jay at gr-oss.io  Mon Mar  6 18:47:52 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Mon, 6 Mar 2023 10:47:52 -0800
Subject: [ironic] PTL Availability
Message-ID: <CA+sTGNcyFQ69rmL5wyRSrjN_rfDv84GFF+rmrrnhhW2S00eFRQ@mail.gmail.com>

Hi all,

Just a heads up -- I'll be out of town and mostly out of IRC from
Wednesday, 3/8 to Monday 3/13. If there are emergent issues that need to be
addressed urgently, please send an email directly to me and I can have a
look. Alternatively, I have a large amount of trust in our delegated
release managers and former Ironic PTLs and support them if they need to
take action in my stead.

-
Jay Faulkenr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/07725d27/attachment.htm>

From gouthampravi at gmail.com  Mon Mar  6 18:56:35 2023
From: gouthampravi at gmail.com (Goutham Pacha Ravi)
Date: Mon, 6 Mar 2023 10:56:35 -0800
Subject: [manila] share server with multiple tenant networks
In-Reply-To: <CADA6EfuMSvN1HAP9=E3E0QBkzKxU4kMVJwGKRJ8w+pJLvHLb1A@mail.gmail.com>
References: <CADA6EfuMSvN1HAP9=E3E0QBkzKxU4kMVJwGKRJ8w+pJLvHLb1A@mail.gmail.com>
Message-ID: <CAKSuTPa6aZPAhxyOXQhDk-qxeCGcyZDgDpBcVxh7CPR=XFwqoA@mail.gmail.com>

Hi Garcetto,

On Mon, Mar 6, 2023 at 9:07?AM garcetto <garcetto at gmail.com> wrote:

> good afternoon,
>  i am trying to add a second share server or share to existing share
> server on different network inside same tenant, actually i have:
>

> tenant-net-01 (with the share server), ok working.
> tenant-net-02 (tried to add a share network, but the error says
> "
> create: Could not find an existing share server or allocate one on the
> share network provided. You may use a different share network, or verify
> the network details in the share network and retry your request. If this
> doesn't work, contact your administrator to troubleshoot issues with your
> network.
> "
>

As the message suggests, manila was unable to obtain network allocations to
create a share server for you on the second network. Are you a user, or an
administrator of this cloud? If you are an administrator, you should look
at the logs from manila's share-manager service to check what failed.

You cannot expect to attach multiple tenant networks to the same share
server unfortunately - that isn't supported today.


> any clue or doc i can read?
> thank you
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/bac00ab0/attachment.htm>

From gouthampravi at gmail.com  Mon Mar  6 19:06:22 2023
From: gouthampravi at gmail.com (Goutham Pacha Ravi)
Date: Mon, 6 Mar 2023 11:06:22 -0800
Subject: [manila] ha for share server
In-Reply-To: <CADA6Efu8-GSL8Y1t0mR_2=T-AfakRJYjBwDSb8e+h5UM03MHQQ@mail.gmail.com>
References: <CADA6Efu8-GSL8Y1t0mR_2=T-AfakRJYjBwDSb8e+h5UM03MHQQ@mail.gmail.com>
Message-ID: <CAKSuTPbtAVb44bVcmeN=A_J=zpLFV1OXSrZ8mC_JcimydgLb4g@mail.gmail.com>

Hi Garcetto,

On Mon, Mar 6, 2023 at 8:11?AM garcetto <garcetto at gmail.com> wrote:

> good afternoon,
>  is it possible to have HA for share server vm in openstack?
>
i mean, the vm that is created on every tenant and used as nfs server.
>

VMs created by Manila's generic driver aren't set up with any sort of HA.
We've had ideas in the past [1] to configure HA. For now, the investment in
the generic driver in the upstream community is to just provide a reference
architecture for a hard multi-tenancy driver. We'd love to have help to
revive those efforts.

[1] https://review.opendev.org/c/openstack/manila-specs/+/504987


> thank you
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/0eb34666/attachment-0001.htm>

From gouthampravi at gmail.com  Mon Mar  6 21:02:28 2023
From: gouthampravi at gmail.com (Goutham Pacha Ravi)
Date: Mon, 6 Mar 2023 13:02:28 -0800
Subject: [manila] share server with multiple tenant networks
In-Reply-To: <CADA6EftXHHZQ3LWDUu1ct3_xMpH_wdimujp2FKXMXBsz5Jeigg@mail.gmail.com>
References: <CADA6EfuMSvN1HAP9=E3E0QBkzKxU4kMVJwGKRJ8w+pJLvHLb1A@mail.gmail.com>
 <CAKSuTPa6aZPAhxyOXQhDk-qxeCGcyZDgDpBcVxh7CPR=XFwqoA@mail.gmail.com>
 <CADA6EftXHHZQ3LWDUu1ct3_xMpH_wdimujp2FKXMXBsz5Jeigg@mail.gmail.com>
Message-ID: <CAKSuTPYqHbhQe91dphKjdoFejXGBHcW1E8iBnM+kKrc8jKEM+Q@mail.gmail.com>

On Mon, Mar 6, 2023 at 12:12?PM garcetto <garcetto at gmail.com> wrote:

> thank you, so how can i have multiple shares on different tenant networks,
> creating more share servers right?
>

Yes; map these tenant networks into share networks in manila, and create
your shares with the share networks. Each share network will trigger the
creation of a share server if one already doesn't exist.


>
>
> On Mon, Mar 6, 2023 at 7:56?PM Goutham Pacha Ravi <gouthampravi at gmail.com>
> wrote:
>
>> Hi Garcetto,
>>
>> On Mon, Mar 6, 2023 at 9:07?AM garcetto <garcetto at gmail.com> wrote:
>>
>>> good afternoon,
>>>  i am trying to add a second share server or share to existing share
>>> server on different network inside same tenant, actually i have:
>>>
>>
>>> tenant-net-01 (with the share server), ok working.
>>> tenant-net-02 (tried to add a share network, but the error says
>>> "
>>> create: Could not find an existing share server or allocate one on the
>>> share network provided. You may use a different share network, or verify
>>> the network details in the share network and retry your request. If this
>>> doesn't work, contact your administrator to troubleshoot issues with your
>>> network.
>>> "
>>>
>>
>> As the message suggests, manila was unable to obtain network allocations
>> to create a share server for you on the second network. Are you a user, or
>> an administrator of this cloud? If you are an administrator, you
>> should look at the logs from manila's share-manager service to check what
>> failed.
>>
>> You cannot expect to attach multiple tenant networks to the same share
>> server unfortunately - that isn't supported today.
>>
>>
>>
>>> any clue or doc i can read?
>>> thank you
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230306/1b094db1/attachment.htm>

From nguyenhuukhoinw at gmail.com  Mon Mar  6 21:34:42 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Tue, 7 Mar 2023 04:34:42 +0700
Subject: [openstack][backup] Experience for instance backup
Message-ID: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>

Hello guys.
I am looking for instance backup solution. I am using Cinder backup with
nfs backup but it looks not too fast. I am using a 10Gbps network. I would
like to know experience for best practice for instance backup solutions on
Openstack.
Thank you.
Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/7815e5b5/attachment.htm>

From haiwu.us at gmail.com  Mon Mar  6 22:27:23 2023
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 6 Mar 2023 16:27:23 -0600
Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on
 production Openstack system
Message-ID: <CAJ1=nZctC6nOj8j0SW_bA=U1-MrsYbPwCXP_LeXrfYUttrLHBw@mail.gmail.com>

Is there any concern with using osprofiler/rally directly on the
production Openstack system?


From andr.kurilin at gmail.com  Mon Mar  6 23:09:51 2023
From: andr.kurilin at gmail.com (Andriy Kurilin)
Date: Tue, 7 Mar 2023 00:09:51 +0100
Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on
 production Openstack system
In-Reply-To: <CAJ1=nZctC6nOj8j0SW_bA=U1-MrsYbPwCXP_LeXrfYUttrLHBw@mail.gmail.com>
References: <CAJ1=nZctC6nOj8j0SW_bA=U1-MrsYbPwCXP_LeXrfYUttrLHBw@mail.gmail.com>
Message-ID: <CAOv_6peBitLPFQDcbOhBR+QhSau4Juy1eOT_craY2TkqJ-s0aA@mail.gmail.com>

Hi!

OSProfiler enables tracing only requests with a special header. The header
is not embedded in each request (even if you configure osprofiler in your
system), you need to use a special CLI argument to set it. So even if the
tracing of one particular request slows done the flow (which should not
happen), it should give zero impact on the performance of the whole system.

As for Rally (in particular, the task component), it creates resources with
a special naming format that allows to filter out only these resources
during the cleanup process. We are using Rally in production as a part of
the monitoring for ~5 years or so.

??, 6 ???. 2023??. ? 23:34, hai wu <haiwu.us at gmail.com>:

> Is there any concern with using osprofiler/rally directly on the
> production Openstack system?
>
>

-- 
Best regards,
Andrey Kurilin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/2f0f9203/attachment.htm>

From tomas.bredar at gmail.com  Mon Mar  6 23:32:11 2023
From: tomas.bredar at gmail.com (=?UTF-8?B?VG9tw6HFoSBCcmVkw6Fy?=)
Date: Tue, 7 Mar 2023 00:32:11 +0100
Subject: [ovn] safely change bridge_mappings
Message-ID: <CAMEY_LY-L_Yv9TSKtqUb4u9PPjz283sYFw3QmhUiyJRPmF8TUw@mail.gmail.com>

Hi,

I have a running production OpenStack deployment - version Wallaby
installed using TripleO. I'm using the default OVN/OVS networking.
For provider networks I have two bridges on the compute nodes br-ex and
br-ex2. Instances mainly use br-ex for provider networks, but there are
some instances which started using a provider network which should be
mapped to br-ex2, however I didn't specify "bridge_mappings" on
ml2_conf.ini, so the traffic wants to flow through the default
datacentre:br-ex.
My questions is, what services should I restart on the controller and
compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And
if this operation is safe and if the instances already using br-ex will
lose connectivity?

Thanks for your help

Tomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/e8b555bb/attachment.htm>

From haiwu.us at gmail.com  Tue Mar  7 01:36:37 2023
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 6 Mar 2023 19:36:37 -0600
Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on
 production Openstack system
In-Reply-To: <CAOv_6peBitLPFQDcbOhBR+QhSau4Juy1eOT_craY2TkqJ-s0aA@mail.gmail.com>
References: <CAJ1=nZctC6nOj8j0SW_bA=U1-MrsYbPwCXP_LeXrfYUttrLHBw@mail.gmail.com>
 <CAOv_6peBitLPFQDcbOhBR+QhSau4Juy1eOT_craY2TkqJ-s0aA@mail.gmail.com>
Message-ID: <CAJ1=nZcm3wV65vC9QGGzfw4Ucz6YDBaTSF9R4sj+SGj1Vc_M5w@mail.gmail.com>

Thanks Andriy! That sounds very promising. I just tried to install
rally, and hit one known bug:
https://bugs.launchpad.net/rally/+bug/2004022. I downgraded SQLAlchemy
and was able to get rally installed. But I could not find any sample
task to run per this document url:
https://rally.readthedocs.io/en/latest/quick_start/tutorial/step_1_setting_up_env_and_running_benchmark_from_samples.html#tutorial-step-1-setting-up-env-and-running-benchmark-from-samples.
It seems most of the sample tasks have been deleted already per its
github history.

Which sample task I could use to try this out (for example, list all
openstack instances..)? It seems its documentation is out of date..

Thanks,
Hai

On Mon, Mar 6, 2023 at 5:10?PM Andriy Kurilin <andr.kurilin at gmail.com> wrote:
>
> Hi!
>
> OSProfiler enables tracing only requests with a special header. The header is not embedded in each request (even if you configure osprofiler in your system), you need to use a special CLI argument to set it. So even if the tracing of one particular request slows done the flow (which should not happen), it should give zero impact on the performance of the whole system.
>
> As for Rally (in particular, the task component), it creates resources with a special naming format that allows to filter out only these resources during the cleanup process. We are using Rally in production as a part of the monitoring for ~5 years or so.
>
> ??, 6 ???. 2023??. ? 23:34, hai wu <haiwu.us at gmail.com>:
>>
>> Is there any concern with using osprofiler/rally directly on the
>> production Openstack system?
>>
>
>
> --
> Best regards,
> Andrey Kurilin.


From haiwu.us at gmail.com  Tue Mar  7 01:41:34 2023
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 6 Mar 2023 19:41:34 -0600
Subject: [osprofiler][rally][openstack] Using osprofiler/rally directly on
 production Openstack system
In-Reply-To: <CAJ1=nZcm3wV65vC9QGGzfw4Ucz6YDBaTSF9R4sj+SGj1Vc_M5w@mail.gmail.com>
References: <CAJ1=nZctC6nOj8j0SW_bA=U1-MrsYbPwCXP_LeXrfYUttrLHBw@mail.gmail.com>
 <CAOv_6peBitLPFQDcbOhBR+QhSau4Juy1eOT_craY2TkqJ-s0aA@mail.gmail.com>
 <CAJ1=nZcm3wV65vC9QGGzfw4Ucz6YDBaTSF9R4sj+SGj1Vc_M5w@mail.gmail.com>
Message-ID: <CAJ1=nZf4pLnvBAxw+35-MyKa1V7GGaEFH6=wwRO-Vm1Ay455vQ@mail.gmail.com>

It seems they might have been moved here?
https://github.com/openstack/rally-openstack/tree/master/samples/tasks/scenarios/nova.

If so, the rally documentation needs to be updated ..

On Mon, Mar 6, 2023 at 7:36?PM hai wu <haiwu.us at gmail.com> wrote:
>
> Thanks Andriy! That sounds very promising. I just tried to install
> rally, and hit one known bug:
> https://bugs.launchpad.net/rally/+bug/2004022. I downgraded SQLAlchemy
> and was able to get rally installed. But I could not find any sample
> task to run per this document url:
> https://rally.readthedocs.io/en/latest/quick_start/tutorial/step_1_setting_up_env_and_running_benchmark_from_samples.html#tutorial-step-1-setting-up-env-and-running-benchmark-from-samples.
> It seems most of the sample tasks have been deleted already per its
> github history.
>
> Which sample task I could use to try this out (for example, list all
> openstack instances..)? It seems its documentation is out of date..
>
> Thanks,
> Hai
>
> On Mon, Mar 6, 2023 at 5:10?PM Andriy Kurilin <andr.kurilin at gmail.com> wrote:
> >
> > Hi!
> >
> > OSProfiler enables tracing only requests with a special header. The header is not embedded in each request (even if you configure osprofiler in your system), you need to use a special CLI argument to set it. So even if the tracing of one particular request slows done the flow (which should not happen), it should give zero impact on the performance of the whole system.
> >
> > As for Rally (in particular, the task component), it creates resources with a special naming format that allows to filter out only these resources during the cleanup process. We are using Rally in production as a part of the monitoring for ~5 years or so.
> >
> > ??, 6 ???. 2023??. ? 23:34, hai wu <haiwu.us at gmail.com>:
> >>
> >> Is there any concern with using osprofiler/rally directly on the
> >> production Openstack system?
> >>
> >
> >
> > --
> > Best regards,
> > Andrey Kurilin.


From gmann at ghanshyammann.com  Tue Mar  7 02:28:19 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Mon, 06 Mar 2023 18:28:19 -0800
Subject: [qa][stable][gate] Tempest master failing on stable/yoga|xena,
 fix is in gate
In-Reply-To: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com>
References: <186b835f813.12bb9c445376631.7478464121939418993@ghanshyammann.com>
Message-ID: <186b9e5ee5f.1290e52e2388064.1234243886718861499@ghanshyammann.com>

 ---- On Mon, 06 Mar 2023 10:36:30 -0800  Ghanshyam Mann  wrote --- 
 > Hi All,
 > 
 > In case any of you are seeing failure on stable/yoga and stable/xena with the below error,
 > please hold the recheck until the fix (revert) in Tempest is merged
 > 
 > "AttributeError: type object 'Draft4Validator' has no attribute 'FORMAT_CHECKER'"
 > 
 > - https://review.opendev.org/c/openstack/tempest/+/876218

It is merged, feel free to recheck.

-gmann

 > 
 > -gmann
 > 
 > 


From ralonsoh at redhat.com  Tue Mar  7 09:12:54 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Tue, 7 Mar 2023 10:12:54 +0100
Subject: [ovn] safely change bridge_mappings
In-Reply-To: <CAMEY_LY-L_Yv9TSKtqUb4u9PPjz283sYFw3QmhUiyJRPmF8TUw@mail.gmail.com>
References: <CAMEY_LY-L_Yv9TSKtqUb4u9PPjz283sYFw3QmhUiyJRPmF8TUw@mail.gmail.com>
Message-ID: <CAECr9X7zh54xUgK_TwhyEBNwfb0W0=yP+nV6=0P409dzgpPfhw@mail.gmail.com>

Hello Tom??:

You need to follow the steps in [1]:
* You need to create the new physical bridge "br-ex2".
* Then you need to add to the bridge the physical interface.
* In the compute node you need to add the bridge mappings to the OVN
database Open vSwitch register
* In the controller, you need to add the reference for this second provider
network in "flat_networks" and "network_vlan_ranges" (in the ml2.ini file).
Then you need to restart the Neutron server to read these new parameters
(this step is not mentioned in this link).
  $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini
  [ml2_type_flat]
  flat_networks = public,public2
  [ml2_type_vlan]
  network_vlan_ranges = public:11:200,public2:11:200

Regards.

[1]
https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html

On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r <tomas.bredar at gmail.com> wrote:

> Hi,
>
> I have a running production OpenStack deployment - version Wallaby
> installed using TripleO. I'm using the default OVN/OVS networking.
> For provider networks I have two bridges on the compute nodes br-ex and
> br-ex2. Instances mainly use br-ex for provider networks, but there are
> some instances which started using a provider network which should be
> mapped to br-ex2, however I didn't specify "bridge_mappings" on
> ml2_conf.ini, so the traffic wants to flow through the default
> datacentre:br-ex.
> My questions is, what services should I restart on the controller and
> compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And
> if this operation is safe and if the instances already using br-ex will
> lose connectivity?
>
> Thanks for your help
>
> Tomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/9655e3ae/attachment.htm>

From hjensas at redhat.com  Tue Mar  7 09:33:40 2023
From: hjensas at redhat.com (Harald Jensas)
Date: Tue, 7 Mar 2023 10:33:40 +0100
Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal
 Instance
In-Reply-To: <CAJm6b-6QqdP03AHEpAB=vafo1-bNW4mmmU6A2WWbj92xpM2uoQ@mail.gmail.com>
References: <CAJm6b-6QqdP03AHEpAB=vafo1-bNW4mmmU6A2WWbj92xpM2uoQ@mail.gmail.com>
Message-ID: <ceca4922-3504-6839-03bb-2144eb640b4c@redhat.com>

On 3/6/23 10:34, Lokendra Rathour wrote:
> Hi Team,
> we have a PoC where we wish to try Creating OpenStack Baremetal Instance 
> using Alma Linux.
> Please help in case that is possible, any reference where we can use the 
> Alma Images to instantiate?the Baremetal?Instance.
> 

Since Alma is aiming to be binary compatible with RHEL I think doing 
this would certainly be possible. You may have to keep local, or propose 
patches to diskimage-builder to add Alma support. Also RDO packages are 
built against CentOS so re-building the RDO RPM's from source on Alma is 
probably required. (Since CentOS-Stream is not binary compatible with 
RHEL, I assume some RDO packages won't work without a re-compile against 
the ALMA libraries)


--
Harald


From rdhasman at redhat.com  Tue Mar  7 11:00:17 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Tue, 7 Mar 2023 16:30:17 +0530
Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) PTG Planning
Message-ID: <CAARK8KS1PGZQjLcM6LoF_R_Y8Ca_O6istQ6ROJdaTZ5mTnQxvQ@mail.gmail.com>

Hello All,

The 2023.2 (Bobcat) virtual PTG is approaching and will be held between
27-31 March, 2023.
I've created a planning etherpad[1] and a PTG etherpad[2] to gather topics
for the PTG.
Note that you only need to add topics in the planning etherpad and those
will be arranged
in the PTG etherpad later.

Dates: Tuesday (28th March) to Friday (31st March) 2023
Time: 1300 to 1700 UTC
Etherpad: https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning

Please add the topics as early as possible as finalizing and arranging
topics would require some
buffer time.

[1] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning
[2] https://etherpad.opendev.org/p/bobcat-ptg-cinder

Thanks
Rajat Dhasmana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/d52fac03/attachment.htm>

From fungi at yuggoth.org  Tue Mar  7 12:58:58 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Tue, 7 Mar 2023 12:58:58 +0000
Subject: [TripleO][Wallaby][openStack] - Creating Alma based Baremetal
 Instance
In-Reply-To: <ceca4922-3504-6839-03bb-2144eb640b4c@redhat.com>
References: <CAJm6b-6QqdP03AHEpAB=vafo1-bNW4mmmU6A2WWbj92xpM2uoQ@mail.gmail.com>
 <ceca4922-3504-6839-03bb-2144eb640b4c@redhat.com>
Message-ID: <20230307125857.gqokp4mcpz7i22vj@yuggoth.org>

On 2023-03-07 10:33:40 +0100 (+0100), Harald Jensas wrote:
[...]
> Since Alma is aiming to be binary compatible with RHEL I think doing this
> would certainly be possible. You may have to keep local, or propose patches
> to diskimage-builder to add Alma support. Also RDO packages are built
> against CentOS so re-building the RDO RPM's from source on Alma is probably
> required. (Since CentOS-Stream is not binary compatible with RHEL, I assume
> some RDO packages won't work without a re-compile against the ALMA
> libraries)

There's already support in diskimage-builder for Rocky Linux and
OpenEuler, both of which are RHEL clones, so Alma is likely very
close to those (closer than to CentOS Stream anyway).
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/129d2ed8/attachment.sig>

From knikolla at bu.edu  Tue Mar  7 14:46:19 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Tue, 7 Mar 2023 14:46:19 +0000
Subject: [all][tc] Technical Committee next weekly meeting on 2023 Mar 8 at
 1600 UTC
Message-ID: <1D8445C6-68E3-475F-9A14-051EAE33BE09@bu.edu>

Hi all,

This is a reminder that the next weekly Technical Committee meeting is to be held tomorrow (March 8) at 1600 UTC on #openstack-tc on OFTC IRC

A copy of the preliminary agenda can be found below. Items can be proposed by editing the wiki page at https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting

* Roll call
* Follow up on past action items
* Deciding on meeting time
* Gate health check
* TC 2023.1 tracker status checks
** https://etherpad.opendev.org/p/tc-2023.1-tracker
* Deprecation process for TripleO
** https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032083.html
* Cleanup of PyPI maintainer list for OpenStack Projects
** Etherpad for audit and cleanup of additional PyPi maintainers
*** https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
** ML discussion
*** https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html
* Recurring tasks check
** Bare 'recheck' state
*** https://etherpad.opendev.org/p/recheck-weekly-summary
* Virtual PTG Planning
** March 27-31, 2023, there's the Virtual PTG. 
* Check in on the voting for version names
** https://review.opendev.org/c/openstack/governance/+/874484
** https://review.opendev.org/c/openstack/governance/+/875942
* Open Reviews
** https://review.opendev.org/q/projects:openstack/governance+is:open

Thank you,
Kristi Nikolla

From corey.bryant at canonical.com  Tue Mar  7 16:19:26 2023
From: corey.bryant at canonical.com (Corey Bryant)
Date: Tue, 7 Mar 2023 11:19:26 -0500
Subject: cryptography min version (non-rust) through 2024.1
Message-ID: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>

Hi All,

As you probably know, recent versions of cryptography have hard
dependencies on rust.  Are there any community plans to continue supporting
a minimum (non-rust) version of cryptography until a specific release?

The concern I have downstream in Ubuntu is that we need to continue being
compatible with cryptography 3.4.8 through openstack 2024.1. This is
because all releases through 2024.1 will be backported to the ubuntu 22.04
cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we
will be backporting to 24.04 cloud archives, which will have the new
rust-based versions of cryptography.

The current upper-constraint for cryptography is 38.0.2, but the various
requirements.txt min versions are much lower (e.g. keystone has
cryptography>=2.7). This is likely to lead to patches landing with features
that are only in 38.0.2, so it will likely be difficult to enforce min
version support. But perhaps a stance toward maintaining compatibility
could be established.

Thoughts?

Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/fd3ac614/attachment.htm>

From smooney at redhat.com  Tue Mar  7 17:13:17 2023
From: smooney at redhat.com (Sean Mooney)
Date: Tue, 07 Mar 2023 17:13:17 +0000
Subject: cryptography min version (non-rust) through 2024.1
In-Reply-To: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
Message-ID: <32af22810a975f0eddaa4361638b12a2d25f8959.camel@redhat.com>

On Tue, 2023-03-07 at 11:19 -0500, Corey Bryant wrote:
> Hi All,
> 
> As you probably know, recent versions of cryptography have hard
> dependencies on rust.  Are there any community plans to continue supporting
> a minimum (non-rust) version of cryptography until a specific release?

i tought we had already raised the min above the version that required rust
so not that i am aware of. cryptography>=2.7 is our curret stated minium but we have
been testing with a much much newwer version for alont time since we do not test miniums anymore
https://github.com/openstack/nova/commit/6caedfd97675940eb3cf07e2f019926dae45d02c
> 
> The concern I have downstream in Ubuntu is that we need to continue being
> compatible with cryptography 3.4.8 through openstack 2024.1. This is
> because all releases through 2024.1 will be backported to the ubuntu 22.04
> cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we
> will be backporting to 24.04 cloud archives, which will have the new
> rust-based versions of cryptography.
> 
> The current upper-constraint for cryptography is 38.0.2, but the various
> requirements.txt min versions are much lower (e.g. keystone has
> cryptography>=2.7). This is likely to lead to patches landing with features
> that are only in 38.0.2, so it will likely be difficult to enforce min
> version support. But perhaps a stance toward maintaining compatibility
> could be established.
https://github.com/openstack/governance/blob/584e06b0c186d4355d1d51f2d6df96e822253bef/resolutions/20220414-drop-lower-constraints.rst
we decided to "Drop Lower Constraints Maintenance" relitivly recently 
while we have pti guidance for some lanagues rust is not one of them
https://github.com/openstack/governance/tree/584e06b0c186d4355d1d51f2d6df96e822253bef/reference/pti
and its also not part of the tested runtims 
https://github.com/openstack/governance/blob/master/reference/runtimes/2023.2.rst

so i would proably try to avoid makign any commitment to continuting to supprot non rust based pycryptography
release
> 
> Thoughts?
> 
> Corey


From fungi at yuggoth.org  Tue Mar  7 17:23:18 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Tue, 7 Mar 2023 17:23:18 +0000
Subject: [dev][requirements][security-sig][tc]cryptography min version
 (non-rust) through 2024.1
In-Reply-To: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
Message-ID: <20230307172318.apcjjebwcf4atgyx@yuggoth.org>

On 2023-03-07 11:19:26 -0500 (-0500), Corey Bryant wrote:
[...]
> The current upper-constraint for cryptography is 38.0.2, but the
> various requirements.txt min versions are much lower (e.g.
> keystone has cryptography>=2.7). This is likely to lead to patches
> landing with features that are only in 38.0.2, so it will likely
> be difficult to enforce min version support. But perhaps a stance
> toward maintaining compatibility could be established.
[...]

While introducing specific tests for this would not be trivial,
maybe it's one of those situations where we try to avoid breaking
compatibility with older versions and don't reject patches when
people find that something has inadvertently started depending on a
feature only available in the Rust-based builds?
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/58b239cf/attachment-0001.sig>

From sbauza at redhat.com  Tue Mar  7 18:14:55 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Tue, 7 Mar 2023 19:14:55 +0100
Subject: [nova][ptg] Strawman proposal for vPTG timeslots
Message-ID: <CALOCmun47NC_DL2174n8vV+OowBVVpueb1F99-yR9Jb8+m8i1w@mail.gmail.com>

Hi team,

We recently discussed this in the Nova meeting [1] but I'd like to
reemphasize here and now that I proposed to book 4 timeslots of one hour
during 4 days for the next vPTG.
https://ptg.opendev.org/ptg.html

As you see, the proposed timeline will be :
Tuesday, Wednesday, Thursday, Friday between 13:00UTC and 17:00UTC.

May you have concerns with this proposal, please express them by replying
to this thread.

As a reminder, please add the topics you'd like to cover during the vPTG in
the PTG etherpad : https://etherpad.opendev.org/p/nova-bobcat-ptg

Thanks,
-Sylvain

[1]
https://meetings.opendev.org/meetings/nova/2023/nova.2023-03-07-16.00.log.html#l-229
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/fb39e9c7/attachment.htm>

From corey.bryant at canonical.com  Tue Mar  7 19:28:23 2023
From: corey.bryant at canonical.com (Corey Bryant)
Date: Tue, 7 Mar 2023 14:28:23 -0500
Subject: [dev][requirements][security-sig][tc]cryptography min version
 (non-rust) through 2024.1
In-Reply-To: <20230307172318.apcjjebwcf4atgyx@yuggoth.org>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
 <20230307172318.apcjjebwcf4atgyx@yuggoth.org>
Message-ID: <CADn0iZ0gMYajykH4rKjfFMs+9HK6RpbTT3ossdFTxk8YWPcS9w@mail.gmail.com>

On Tue, Mar 7, 2023 at 12:30?PM Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2023-03-07 11:19:26 -0500 (-0500), Corey Bryant wrote:
> [...]
> > The current upper-constraint for cryptography is 38.0.2, but the
> > various requirements.txt min versions are much lower (e.g.
> > keystone has cryptography>=2.7). This is likely to lead to patches
> > landing with features that are only in 38.0.2, so it will likely
> > be difficult to enforce min version support. But perhaps a stance
> > toward maintaining compatibility could be established.
> [...]
>
> While introducing specific tests for this would not be trivial,
> maybe it's one of those situations where we try to avoid breaking
> compatibility with older versions and don't reject patches when
> people find that something has inadvertently started depending on a
> feature only available in the Rust-based builds?
> --
> Jeremy Stanley
>

I'd be okay with an approach like this. Would this need to be formally
adopted by the TC?

Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/7cece63c/attachment.htm>

From jay at gr-oss.io  Tue Mar  7 19:54:43 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Tue, 7 Mar 2023 11:54:43 -0800
Subject: cryptography min version (non-rust) through 2024.1
In-Reply-To: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
Message-ID: <CA+sTGNddyzmRGoGjuWet0d2yJ6VNER458NXdr76sp40-GUVhgw@mail.gmail.com>

> The concern I have downstream in Ubuntu is that we need to continue being
> compatible with cryptography 3.4.8 through openstack 2024.1. This is
> because all releases through 2024.1 will be backported to the ubuntu 22.04
> cloud archives which will use cryptography 3.4.8. Once we get to 2024.2, we
> will be backporting to 24.04 cloud archives, which will have the new
> rust-based versions of cryptography.
>
> The current upper-constraint for cryptography is 38.0.2, but the various
> requirements.txt min versions are much lower (e.g. keystone has
> cryptography>=2.7). This is likely to lead to patches landing with features
> that are only in 38.0.2, so it will likely be difficult to enforce min
> version support. But perhaps a stance toward maintaining compatibility
> could be established.
>
>
What is the impetus needed for us to raise the lower-constraint? When do we
decide we should do that, generally -- is it just ad-hoc, someone requests
it, or is there a more involved process? We certainly don't dictate all
versions of required compilers and such in our TC testing docs (although
that is implied in distribution-platform) -- so I think there's another
piece to it when talking about python dependencies.


I do not think it's wise to commit to supporting older versions of
cryptography through 2024.1. In fact, you *must* have a cryptography
release that is rust-enabled in order to get OpenSSL 3 support. Not to
mention the memory safety benefits from using a rust version. I'm not
saying we should force newer cryptography immediately; but it is reason
enough to give me significant pause about answering a question about
supporting it through two additional releases.

Thanks,
Jay Faulkner
Ironic PTL
TC Member
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/f9f293a4/attachment.htm>

From fungi at yuggoth.org  Tue Mar  7 20:43:46 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Tue, 7 Mar 2023 20:43:46 +0000
Subject: cryptography min version (non-rust) through 2024.1
In-Reply-To: <CA+sTGNddyzmRGoGjuWet0d2yJ6VNER458NXdr76sp40-GUVhgw@mail.gmail.com>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
 <CA+sTGNddyzmRGoGjuWet0d2yJ6VNER458NXdr76sp40-GUVhgw@mail.gmail.com>
Message-ID: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org>

On 2023-03-07 11:54:43 -0800 (-0800), Jay Faulkner wrote:
[...]
> I do not think it's wise to commit to supporting older versions of
> cryptography through 2024.1. In fact, you *must* have a cryptography
> release that is rust-enabled in order to get OpenSSL 3 support. Not to
> mention the memory safety benefits from using a rust version. I'm not
> saying we should force newer cryptography immediately; but it is reason
> enough to give me significant pause about answering a question about
> supporting it through two additional releases.

Well, the context here is not "supporting old versions of the
PYCA/Cryptography library," it's rather "supporting downstream
distributors who backport patches to their stable forks of
PYCA/Cryptography." While it may sound like the same thing, there's
a subtle difference. Obviously nobody should be running old versions
of the library because they'll be missing critical security fixes,
but there are stable distributions who take care of backporting
security patches for their LTS versions and want newer OpenStack
releases to still be usable there.

The PYCA/Cryptography library has been particularly challenging
here, since it decided to go all-in on Rust which, while a very
exciting and compelling language from a security perspective, is not
exactly the most stabilized ecosystem yet and has seen a lot of
churn over the past few years leading to Rust-based projects often
being entirely unfit for inclusion in stable server distros due to
continually requiring newer toolchain versions and replacing build
systems.

This isn't just Ubuntu. The latest Debian stable release carries a
python3-cryptography based on 3.3.2 from two years ago, older than
what Corey's trying to support but still quite new from the
perspective of an LTS server distribution. Rocky 9.1 (which I assume
is the same as RHEL but I can never find where to look up RHEL
package versions) is carrying a python3-cryptography based on 36.0.1
from 2021, so newer than what Corey is trying to support on Ubuntu
but not by much (approximately 3 months).

The point is, we can't reasonably test with all these different
versions of the library, and that's just one library out of hundreds
we're depending on for that matter... but what we can do is say that
if people find regressions due to us testing exclusively with newer
features of these libraries than are available on platforms we
expect our users to deploy on, we'll gladly accept patches to fix
that situation.

I expect the TC is going to choose Ubuntu 22.04 LTS as a target
platform for at least the OpenStack 2023.2 and 2024.1 coordinated
releases, but almost certainly the 2024.2 coordinated release as
well since Ubuntu 24.04 LTS won't be officially available before we
start that development cycle. That means the first coordinated
OpenStack release which would be able to effectively depend on
features from a newer python3-cryptography package on Ubuntu is
going to be 2025.1. Food for thought.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/e08ae19f/attachment.sig>

From vrook at wikimedia.org  Tue Mar  7 21:22:45 2023
From: vrook at wikimedia.org (Vivian Rook)
Date: Tue, 7 Mar 2023 16:22:45 -0500
Subject: [magnum] certificate authority key location
Message-ID: <CAKxn-9TFRGph=sUcn6aJzJw8UEH=6oGqCZtn4sQSq2FzdWd1bQ@mail.gmail.com>

If I want to create a credential for a user to access a magnum cluster I
can do so as described in
https://docs.openstack.org/magnum/latest/user/#id4
Namely by running:
openstack coe ca sign secure-k8s-cluster client.csr > cert.pem

I would like to do this without calling the openstack cli. Where does
magnum store its ca key file? I could not find it on the control node under
/etc/kubernetes/certs (the ca.crt is there, though no ca.key)

Thank you!
-- 

*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230307/4b6563ba/attachment-0001.htm>

From jake.yip at ardc.edu.au  Wed Mar  8 01:16:38 2023
From: jake.yip at ardc.edu.au (Jake Yip)
Date: Wed, 8 Mar 2023 12:16:38 +1100
Subject: [magnum] security groups for magnum nodes
In-Reply-To: <CAKxn-9QxJ16j-KjCb1ic1xE_c5x0P-yQqJxorXARtcdf7hByzQ@mail.gmail.com>
References: <CAKxn-9QxJ16j-KjCb1ic1xE_c5x0P-yQqJxorXARtcdf7hByzQ@mail.gmail.com>
Message-ID: <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au>

Hi Vivian,

I'm not aware of that, sorry.

As an alternative, have you tried adding the security group of the 
workers to the NFS server instead?

Regards,
Jake

On 4/3/2023 5:09 am, Vivian Rook wrote:
> Is there an option for adding security groups to a given magnum 
> template, and thus the nodes that such a template would create?
> 
> I have an NFS server, and it is setup to only allow connections from 
> nodes with the "nfs" security group. A few pods in my cluster mount the 
> NFS server, and are blocked as a result. Is it possible to setup magnum 
> so that it adds the "nfs" security group to the worker nodes (it would 
> be alright if it has to be worker and control nodes)?
> 
> Thank you!
> 
> -- 
> 	*Vivian Rook (They/Them)
> *
> Site Reliability Engineer
> Wikimedia Foundation <https://wikimediafoundation.org/>
> 


From hanguangyu2 at gmail.com  Wed Mar  8 06:50:13 2023
From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=)
Date: Wed, 8 Mar 2023 06:50:13 +0000
Subject: [cinder] Could I use system lvm volume group for cinder-volume
Message-ID: <CAOirdiWs98DkChjs0c_y+Fj4NnC7BygjwqJACwfKU1+-P2b2KA@mail.gmail.com>

Hello,

I have a physical node and the system uses lvm partitions. With the
service already running above, is it possible for me to deploy
cinder-volume that uses the lvm backend without affecting the existing
service?


The physical disk space is basically allocated to the /dev/sda4
physical volume, and the pv wad added to the `uniontechos` volume
group. All the space of `uniontechos` vg is allocated to the system
logical volume (/dev/uniontechos/root), there is 5T space (a lot of
free space). I want to try to reduce the system logical
volume??/dev/uniontechos/root, and let cinder-volume use the
uniontechos volume group at the same time , I don't know if it is
feasible.

Could I get some advices? Thank for any help!

```shell
# pvdisplay
...
  --- Physical volume ---
  PV Name               /dev/sda4
  VG Name               uniontechos
  PV Size               <5.43 TiB / not usable 2.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              1422313
  Free PE               0
  Allocated PE          1422313
  PV UUID               8LF270-LYD1-kuP1-iWZb-BZEQ-LdA8-UwEa58

# lvdisplay
  --- Logical volume ---
  LV Path                /dev/uniontechos/root
  LV Name                root
  VG Name                uniontechos
  LV UUID                cEypcY-xcbC-JFoO-d3MS-uMPU-ntQa-WEeE1c
  LV Write Access        read/write
  LV Creation host, time compute2, 2022-04-28 15:21:27 +0800
  LV Status              available
  # open                 1
  LV Size                5.42 TiB
  Current LE             1421289
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0
```

Best wishes,
Han


From manchandavishal143 at gmail.com  Wed Mar  8 07:38:05 2023
From: manchandavishal143 at gmail.com (vishal manchanda)
Date: Wed, 8 Mar 2023 13:08:05 +0530
Subject: [horizon] Cancelling Today's Weekly meeting
Message-ID: <CADrq38vdZ8Z60cZY4oKa4AWAju_vefOe=8MDzBQnU+TUSSxY-Q@mail.gmail.com>

Hello Team,

As it is a holiday for me, I will not be able to host today's horizon
weekly meeting.
So let's cancel today's weekly meeting.

If anything urgent, please reach out to the horizon core team.

Thanks & regards,
Vishal Manchanda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/ebb9eda0/attachment.htm>

From thierry at openstack.org  Wed Mar  8 09:13:11 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Wed, 8 Mar 2023 10:13:11 +0100
Subject: [largescale-sig] Next meeting: March 8, 9utc
In-Reply-To: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org>
References: <17105066-2c6a-d4ff-bbde-15fcd33edbad@openstack.org>
Message-ID: <33527716-9684-8606-553f-42e757981264@openstack.org>

Due to lack of participants, we had a rather short SIG meeting today. I 
just gave an update on the preparation for our next OpenInfra Live 
episode and participation to Vancouver summit.

You can read the detailed meeting logs at:

https://meetings.opendev.org/meetings/large_scale_sig/2023/large_scale_sig.2023-03-08-09.00.html

Our next IRC meeting will be March 22, at 1500utc on 
#openstack-operators on OFTC.

Regards,

-- 
Thierry Carrez (ttx)


From bkslash at poczta.onet.pl  Wed Mar  8 09:32:35 2023
From: bkslash at poczta.onet.pl (A Tom)
Date: Wed, 8 Mar 2023 10:32:35 +0100
Subject: [Magnum] Magnum service status in Openstack Antelope
Message-ID: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl>

Hi,
Will magnum be supported in Antelope and further releases of Openstack? I was looking for some informations about this and I?m quite confused, because here I don?t see Magnum service:
https://releases.openstack.org/antelope/index.html

and here Magnum is mentioned:
https://docs.openstack.org/2023.1.antelope/projects.html

Which version of kubernetes is supported in Zed, and which (if) will be in Antelope? Because there?s no such information in compatibility matrix:
https://wiki.openstack.org/wiki/Magnum


Best regards,

Adam Tomas

From ralonsoh at redhat.com  Wed Mar  8 10:43:52 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Wed, 8 Mar 2023 11:43:52 +0100
Subject: [neutron] New Neutron releases for Xena, Yoga and Zed
Message-ID: <CAECr9X59i0-hZ4aqANDHVK_cW+cyFxPD36HHYJZnSSW7Rvfrvg@mail.gmail.com>

Hello Neutrinos:

Please check the patches for the new stable releases for the Neutron
projects:
* Xena: https://review.opendev.org/c/openstack/releases/+/876827
* Yoga: https://review.opendev.org/c/openstack/releases/+/876828
* Zed: https://review.opendev.org/c/openstack/releases/+/876835

Feel free to comment on the patch if you found something wrong or a missing
pending patch.

Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/9ea478db/attachment.htm>

From vrook at wikimedia.org  Wed Mar  8 11:48:24 2023
From: vrook at wikimedia.org (Vivian Rook)
Date: Wed, 8 Mar 2023 06:48:24 -0500
Subject: [magnum] security groups for magnum nodes
In-Reply-To: <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au>
References: <CAKxn-9QxJ16j-KjCb1ic1xE_c5x0P-yQqJxorXARtcdf7hByzQ@mail.gmail.com>
 <09c975e9-4c4d-ffce-4e63-709846aaa725@ardc.edu.au>
Message-ID: <CAKxn-9Qk6D3no-LuVERvE41HSVg6TQLS9nNLkx3HAXxOQwNSfg@mail.gmail.com>

Hi Jake,

Yeah I gave that a try, and it does work. Though when I've tried similar it
causes problems with removing a cluster, failing on not being able to
remove the cluster security group because something other than the cluster
is using it. Mostly that is the answer that I was looking for, that this
feature doesn't exist. So I can add and remove the security group manually,
and can probably do something better in terraform, but we're not quite
there yet :)

Thank you!

On Tue, Mar 7, 2023 at 8:16?PM Jake Yip <jake.yip at ardc.edu.au> wrote:

> Hi Vivian,
>
> I'm not aware of that, sorry.
>
> As an alternative, have you tried adding the security group of the
> workers to the NFS server instead?
>
> Regards,
> Jake
>
> On 4/3/2023 5:09 am, Vivian Rook wrote:
> > Is there an option for adding security groups to a given magnum
> > template, and thus the nodes that such a template would create?
> >
> > I have an NFS server, and it is setup to only allow connections from
> > nodes with the "nfs" security group. A few pods in my cluster mount the
> > NFS server, and are blocked as a result. Is it possible to setup magnum
> > so that it adds the "nfs" security group to the worker nodes (it would
> > be alright if it has to be worker and control nodes)?
> >
> > Thank you!
> >
> > --
> >       *Vivian Rook (They/Them)
> > *
> > Site Reliability Engineer
> > Wikimedia Foundation <https://wikimediafoundation.org/>
> >
>


-- 

*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/319b80df/attachment-0001.htm>

From arnaud.morin at gmail.com  Wed Mar  8 13:29:37 2023
From: arnaud.morin at gmail.com (Arnaud Morin)
Date: Wed, 8 Mar 2023 13:29:37 +0000
Subject: [neutron][largescale-sig] agent_down_time and report_interval
Message-ID: <ZAiNwazzFQCALe40@sync2>

Hello neutron and large-scalers,

Is there any recommendation on tuning the neutron report_interval
(agent) and agent_down_time (server) to "optimize" the communication
between agents and servers without putting to much heavy duty on both
rabbit and database?

We are currently facing some scaling issue regarding this, and we found
out that, at least, CERN did some tweak about this ([1] and [2])

Is there anyone else with specific configuration on that part?

I have the feeling that this could be increased (so report will happen
less often). One obvious side effect of this is the fact that the server
will take more time to see a down agent, is there any other side effect
that could happen?

Is it something we can eventually add in our large-scale
documentation ([3])?

Cheers,


[1] https://youtu.be/5WL47L1P5kE?t=1173
[2] https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/presentation-media/Evolution-of-OpenStack-Networking-at-CERN3.pdf
[3] https://docs.openstack.org/large-scale/journey/configure/index.html


From felix.huettner at mail.schwarz  Wed Mar  8 13:58:56 2023
From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=)
Date: Wed, 8 Mar 2023 13:58:56 +0000
Subject: [neutron][largescale-sig] agent_down_time and report_interval
In-Reply-To: <ZAiNwazzFQCALe40@sync2>
References: <ZAiNwazzFQCALe40@sync2>
Message-ID: <DU0PR10MB5244CCCA25ED30CC027D2174EAB49@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>

Hi everyone,

i can share that we use a agent_down_time of 3600 seconds and the default report_interval for clusters with around 600 compute nodes.

--
Felix Huettner

> -----Original Message-----
> From: Arnaud Morin <arnaud.morin at gmail.com>
> Sent: Wednesday, March 8, 2023 2:30 PM
> To: discuss openstack <openstack-discuss at lists.openstack.org>
> Subject: [neutron][largescale-sig] agent_down_time and report_interval
>
> Hello neutron and large-scalers,
>
> Is there any recommendation on tuning the neutron report_interval
> (agent) and agent_down_time (server) to "optimize" the communication
> between agents and servers without putting to much heavy duty on both
> rabbit and database?
>
> We are currently facing some scaling issue regarding this, and we found
> out that, at least, CERN did some tweak about this ([1] and [2])
>
> Is there anyone else with specific configuration on that part?
>
> I have the feeling that this could be increased (so report will happen
> less often). One obvious side effect of this is the fact that the server
> will take more time to see a down agent, is there any other side effect
> that could happen?
>
> Is it something we can eventually add in our large-scale
> documentation ([3])?
>
> Cheers,
>

Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.


From michal.arbet at ultimum.io  Wed Mar  8 14:37:34 2023
From: michal.arbet at ultimum.io (Michal Arbet)
Date: Wed, 8 Mar 2023 15:37:34 +0100
Subject: [Magnum] Magnum service status in Openstack Antelope
In-Reply-To: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl>
References: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl>
Message-ID: <CAKz7_JS00COXCuESg4NLzpGtvaP6DhDGZWQUt_00rB5=sgP3oA@mail.gmail.com>

Hi,

I am also curious about that. What I registered in email communication
there were the plans to implement
https://github.com/vexxhost/magnum-cluster-api, so I hope that it will be
supported because I wanted to try it :).

Any magnum core or developers to give us this information ?

Thanks
Michal Arbet
Openstack Engineer

Ultimum Technologies a.s.
Na Po???? 1047/26, 11000 Praha 1
Czech Republic

+420 604 228 897
michal.arbet at ultimum.io
*https://ultimum.io <https://ultimum.io/>*

LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
<https://twitter.com/ultimumtech> | Facebook
<https://www.facebook.com/ultimumtechnologies/timeline>


st 8. 3. 2023 v 10:38 odes?latel A Tom <bkslash at poczta.onet.pl> napsal:

> Hi,
> Will magnum be supported in Antelope and further releases of Openstack? I
> was looking for some informations about this and I?m quite confused,
> because here I don?t see Magnum service:
> https://releases.openstack.org/antelope/index.html
>
> and here Magnum is mentioned:
> https://docs.openstack.org/2023.1.antelope/projects.html
>
> Which version of kubernetes is supported in Zed, and which (if) will be in
> Antelope? Because there?s no such information in compatibility matrix:
> https://wiki.openstack.org/wiki/Magnum
>
>
> Best regards,
>
> Adam Tomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/709cd389/attachment.htm>

From fungi at yuggoth.org  Wed Mar  8 14:46:05 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Wed, 8 Mar 2023 14:46:05 +0000
Subject: [Magnum] Magnum service status in Openstack Antelope
In-Reply-To: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl>
References: <340E3118-3D66-4C91-81DC-E7D27AEBA22D@poczta.onet.pl>
Message-ID: <20230308144605.qp2lpcyf3pojhmhu@yuggoth.org>

On 2023-03-08 10:32:35 +0100 (+0100), A Tom wrote:
> Will magnum be supported in Antelope and further releases of
> Openstack? I was looking for some informations about this and I?m
> quite confused, because here I don?t see Magnum service:
> https://releases.openstack.org/antelope/index.html
[...]

There's an initial release candidate linked for it there now, a
handful of OpenStack services needed a little more time to tag their
RCs and got an requested an extension on last week's deadline.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/0aa8e174/attachment.sig>

From kristin at openinfra.dev  Wed Mar  8 18:26:40 2023
From: kristin at openinfra.dev (Kristin Barrientos)
Date: Wed, 8 Mar 2023 12:26:40 -0600
Subject: OpenInfra Live March 9, 2023, at 9 a.m. CT 
Message-ID: <C6A49109-666A-4BFA-A95E-D0B0DB746B33@openinfra.dev>

Hi everyone,

This week?s OpenInfra Live episode is brought to you by OpenMetal

Episode: Examination of Cost Differences b/w Private Cloud and Hyperscalers

Todd Robinson, president of OpenMetal, will be discussing the examination of cost differences between private cloud and hyperscalers. 

Date and time: March 9, 2023, at 9 a.m. CT (15:00 UTC)

You can watch us live on:
YouTube: https://www.youtube.com/watch?v=6GCGhuRpPqM
LinkedIn: https://www.linkedin.com/events/7038909988492242944/comments/
WeChat: recording will be posted on OpenStack WeChat after the live stream

Speakers: Todd Robinson 

Have an idea for a future episode? Share it now at ideas.openinfra.live. 

Thanks, 

Kristin Barrientos
Marketing Coordinator 
OpenInfra Foundation


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/5216c05b/attachment-0001.htm>

From roberto.acosta at luizalabs.com  Wed Mar  8 20:49:12 2023
From: roberto.acosta at luizalabs.com (Roberto Bartzen Acosta)
Date: Wed, 8 Mar 2023 17:49:12 -0300
Subject: [neutron] Openstack Network Interconnection
Message-ID: <CALsEdxS0dGBqSki+wQ-Gfehmk9eeRypNoof9F7H_t+Lp-XBopg@mail.gmail.com>

Hey folks.

Does anyone have ideas on how to interconnect different Openstack
deployments?
Consider that we have multiple Datacenters and need to interconnect tenant
networks. How could this be done in the context of OpenStack (without using
VPN) ?

We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks
like a great solution to create a network layer between DCs/AZs with the
help of the OVN driver. However, Neutron does not support the Transit
Switches (OVN-IC design) that are required for this application.

We've seen references to abandoned projects like [1] [2] [3].

Does anyone use something similar in production or have an idea about how
to do it? Imagine that we need to put workloads on two different AZs that
run different Openstack installations, and we want to communicate with the
local networks without using a FIP.
I believe that the most coherent way to maintain databases consistent in
each Openstack would be an integration with Neutron, but I haven't seen any
movement on that.

Regards,
Roberto

[1] https://www.youtube.com/watch?v=GizLmSiH1Q0
[2]
https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html
[3] https://opendev.org/x/neutron-interconnection

-- 


_?Esta mensagem ? direcionada apenas para os endere?os constantes no 
cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no 
cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa 
mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o 
imediatamente anuladas e proibidas?._


*?**?Apesar do Magazine Luiza tomar 
todas as precau??es razo?veis para assegurar que nenhum v?rus esteja 
presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.*


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230308/dc7d0f6f/attachment.htm>

From desirebarine16 at gmail.com  Thu Mar  9 06:45:21 2023
From: desirebarine16 at gmail.com (Desire Barine)
Date: Thu, 9 Mar 2023 07:45:21 +0100
Subject: [outreachy][cinder]
Message-ID: <CAMr4psz0JV6Z3iigP7fkgmRjJ62rz7V0YjoG-=j=QDrxo5Na5A@mail.gmail.com>

Hello  Sofia Enriquez,

I'm Desire Barine, an Outreachy applicant. I would love to work on Extend
automated validation of API reference request/response samples project. I
would like to get started with the contribution.
I am currently going over the instructions on contributions given. This is
my first time contributing on an open source project but I'm really
excited to get started.
I'm proficient in python, bash and have worked on Rest api creation before.
I would love to hear from you.

Desire.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/da729337/attachment.htm>

From felix.huettner at mail.schwarz  Thu Mar  9 08:24:21 2023
From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=)
Date: Thu, 9 Mar 2023 08:24:21 +0000
Subject: [neutron] Openstack Network Interconnection
In-Reply-To: <CALsEdxS0dGBqSki+wQ-Gfehmk9eeRypNoof9F7H_t+Lp-XBopg@mail.gmail.com>
References: <CALsEdxS0dGBqSki+wQ-Gfehmk9eeRypNoof9F7H_t+Lp-XBopg@mail.gmail.com>
Message-ID: <DU0PR10MB5244316B6372451A8A7FB613EAB59@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>

Hi Roberto,

We will face a similar issue in the future and have also looked at ovn-interconnect (but not yet tested it).
There is also ovn-bgp-agent [1] which has an evpn mode that might be relevant.

Whatever you find I would definitely be interested in your results

[1] https://opendev.org/x/ovn-bgp-agent

--
Felix Huettner

From: Roberto Bartzen Acosta <roberto.acosta at luizalabs.com>
Sent: Wednesday, March 8, 2023 9:49 PM
To: openstack-discuss at lists.openstack.org
Cc: Tiago Pires <tiago.pires at luizalabs.com>
Subject: [neutron] Openstack Network Interconnection

Hey folks.

Does anyone have ideas on how to interconnect different Openstack deployments?
Consider that we have multiple Datacenters and need to interconnect tenant networks. How could this be done in the context of OpenStack (without using VPN) ?

We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks like a great solution to create a network layer between DCs/AZs with the help of the OVN driver. However, Neutron does not support the Transit Switches (OVN-IC design) that are required for this application.

We've seen references to abandoned projects like [1] [2] [3].

Does anyone use something similar in production or have an idea about how to do it? Imagine that we need to put workloads on two different AZs that run different Openstack installations, and we want to communicate with the local networks without using a FIP.
I believe that the most coherent way to maintain databases consistent in each Openstack would be an integration with Neutron, but I haven't seen any movement on that.

Regards,
Roberto

[1] https://www.youtube.com/watch?v=GizLmSiH1Q0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DGizLmSiH1Q0&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NdmjPTTQd9HisMRNF8yim%2BjOeXZrlqVY6Q4tKTpb%2FPo%3D&reserved=0>
[2] https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspecs.openstack.org%2Fopenstack%2Fneutron-specs%2Fspecs%2Fstein%2Fneutron-interconnection.html&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MSh%2BPpWY6L1UoA5kE21cn5khtFWgq%2F8J00kWdOLmHCI%3D&reserved=0>
[3] https://opendev.org/x/neutron-interconnection<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopendev.org%2Fx%2Fneutron-interconnection&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pb%2B85Y2eobtcu7HbcZpdN68qNjmd2WuedSLteQ3cli8%3D&reserved=0>


'Esta mensagem ? direcionada apenas para os endere?os constantes no cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o imediatamente anuladas e proibidas'.

 'Apesar do Magazine Luiza tomar todas as precau??es razo?veis para assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos'.

Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/baf07eb5/attachment-0001.htm>

From geguileo at redhat.com  Thu Mar  9 09:55:14 2023
From: geguileo at redhat.com (Gorka Eguileor)
Date: Thu, 9 Mar 2023 10:55:14 +0100
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
Message-ID: <20230309095514.l3i67tys2ujaq6dp@localhost>

On 06/03, Rishat Azizov wrote:
> Hi,
>
> It works with smaller volumes.
>
> multipath.conf attached to thist email.
>
> Cinder version - 18.2.0 Wallaby

Hi,

After giving it some thought I think I may know what is going on.

If you have DEBUG logs enabled in cinder-backup when it fails, how many
calls do you see in the cinder-backup to "multipath -f" from os-brick,
only one or do you see more?

Cheers,
Gorka.

>
> ??, 6 ???. 2023??. ? 17:35, Gorka Eguileor <geguileo at redhat.com>:
>
> > On 16/02, Rishat Azizov wrote:
> > > Hello!
> > >
> > > We have an error with creating backups from iscsi volume. Usually, this
> > > happens with large backups over 100GB.
> > >
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > [req-f6619913-6f96-4226-8d75-2da3fca722f1
> > 23de1b92e7674cf59486f07ac75b886b
> > > a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message
> > handling:
> > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error
> > while
> > > running command.
> > > Command: multipath -f 3624a93705842cfae35d7483200015ec6
> > > Exit code: 1
> > > Stdout: ''
> > > Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a
> > > multipath device\n'
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Traceback
> > > (most recent call last):
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line
> > 165,
> > > in _process_incoming
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     res =
> > > self.dispatcher.dispatch(message)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line
> > > 309, in dispatch
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > self._do_dispatch(endpoint, method, ctxt, args)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line
> > > 229, in _do_dispatch
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     result =
> > > func(ctxt, **new_args)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > func(self, *args, **kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in
> > > create_backup
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > volume_utils.update_backup_error(backup, str(err))
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in
> > > __exit__
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > self.force_reraise()
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in
> > > force_reraise
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     raise
> > > self.value
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in
> > > create_backup
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     updates
> > =
> > > self._run_backup(context, backup, volume)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in
> > > _run_backup
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > ignore_errors=True)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066,
> > in
> > > _detach_device
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > force=force, ignore_errors=ignore_errors)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in
> > > trace_logging_wrapper
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > f(*args, **kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line
> > 360,
> > > in inner
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > f(*args, **kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > >
> > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> > > line 880, in disconnect_volume
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > is_disconnect_call=True)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > >
> > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> > > line 942, in _cleanup_connection
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > self._linuxscsi.flush_multipath_device(multipath_name)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line
> > > 382, in flush_multipath_device
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > root_helper=self._root_helper)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in
> > > _execute
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     result =
> > > self.__execute(*args, **kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line
> > > 172, in execute
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > execute_root(*cmd, **kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line
> > 247,
> > > in _wrap
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     return
> > > self.channel.remote_call(name, args, kwargs)
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in
> > > remote_call
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     raise
> > > exc_type(*result[2])
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error
> > while
> > > running command.
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command:
> > > multipath -f 3624a93705842cfae35d7483200015ec6
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit code: 1
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout: ''
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr: 'Feb
> > > 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath
> > device\n'
> > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > >
> > > Could you please help with this error?
> >
> > Hi,
> >
> > Does it work for smaller volumes or does it also fail?
> >
> > What are your defaults in your /etc/multipath.conf file?
> >
> > What Cinder release are you using?
> >
> > Cheers,
> > Gorka.
> >
> >

> defaults {
>         user_friendly_names no
>         find_multipaths yes
>         enable_foreign "^$"
> }
>
> blacklist_exceptions {
>         property "(SCSI_IDENT_|ID_WWN)"
> }
>
> blacklist {
> }
>
> devices {
>   device {
>         vendor "PURE"
>         product "FlashArray"
>         fast_io_fail_tmo 10
>         path_grouping_policy "group_by_prio"
>         failback "immediate"
>         prio "alua"
>         hardware_handler "1 alua"
>         max_sectors_kb 4096
>     }
> }


From arnaud.morin at gmail.com  Thu Mar  9 10:35:13 2023
From: arnaud.morin at gmail.com (Arnaud Morin)
Date: Thu, 9 Mar 2023 10:35:13 +0000
Subject: [neutron][largescale-sig] agent_down_time and report_interval
In-Reply-To: <DU0PR10MB5244CCCA25ED30CC027D2174EAB49@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
References: <ZAiNwazzFQCALe40@sync2>
 <DU0PR10MB5244CCCA25ED30CC027D2174EAB49@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <ZAm2YcjJpoB81CHo@sync2>

Thanks!


On 08.03.23 - 13:58, Felix H?ttner wrote:
> Hi everyone,
> 
> i can share that we use a agent_down_time of 3600 seconds and the default report_interval for clusters with around 600 compute nodes.
> 
> --
> Felix Huettner
> 
> > -----Original Message-----
> > From: Arnaud Morin <arnaud.morin at gmail.com>
> > Sent: Wednesday, March 8, 2023 2:30 PM
> > To: discuss openstack <openstack-discuss at lists.openstack.org>
> > Subject: [neutron][largescale-sig] agent_down_time and report_interval
> >
> > Hello neutron and large-scalers,
> >
> > Is there any recommendation on tuning the neutron report_interval
> > (agent) and agent_down_time (server) to "optimize" the communication
> > between agents and servers without putting to much heavy duty on both
> > rabbit and database?
> >
> > We are currently facing some scaling issue regarding this, and we found
> > out that, at least, CERN did some tweak about this ([1] and [2])
> >
> > Is there anyone else with specific configuration on that part?
> >
> > I have the feeling that this could be increased (so report will happen
> > less often). One obvious side effect of this is the fact that the server
> > will take more time to see a down agent, is there any other side effect
> > that could happen?
> >
> > Is it something we can eventually add in our large-scale
> > documentation ([3])?
> >
> > Cheers,
> >
> 
> Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.


From nicolas.melot at lunarc.lu.se  Thu Mar  9 10:38:57 2023
From: nicolas.melot at lunarc.lu.se (Nicolas Melot)
Date: Thu, 9 Mar 2023 11:38:57 +0100
Subject: [cinder] Openstack-ansible and cinder GPFS backend
Message-ID: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se>

Hi,

I can find doc on using various backends for cinder
(https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm)
and some documentation to configure a GPFS backend for cinder
(https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html)
but I cannot find any documentation to deploy cinder with GPFS backend
using openstack-ansible. Does this exist at all? Is there any
documentation?

/Nicolas


From noonedeadpunk at gmail.com  Thu Mar  9 11:32:36 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 9 Mar 2023 12:32:36 +0100
Subject: [cinder] Openstack-ansible and cinder GPFS backend
In-Reply-To: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se>
References: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se>
Message-ID: <CAPd_6AuZuokY7ds5V+f+yK3m6jVm6iq-HgxUW2KrNMt6ydGjvQ@mail.gmail.com>

Hi, Nicolas,

No, we don't really maintain documentation for each cinder driver
that's available. So we assume using an override variable for
adjustment of cinder configuration to match the desired state.

So basically, you can use smth like that in your user_variables.yml:

cinder_backends:
  GPFSNFS:
    volume_backend_name: GPFSNFS
    volume_driver: cinder.volume.drivers.ibm.gpfs.GPFSNFSDriver

cinder_cinder_conf_overrides:
 DEFAULT:
    gpfs_hosts: ip.add.re.ss
    gpfs_storage_pool: cinder
    gpfs_images_share_mode: copy_on_write
    ....

I have no idea though if gpfs_* variables can be defined or not inside
the backend section, as they're referenced in DEFAULT in docs. But
overrides will work regardless.

??, 9 ???. 2023??. ? 11:41, Nicolas Melot <nicolas.melot at lunarc.lu.se>:
>
> Hi,
>
> I can find doc on using various backends for cinder
> (https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm)
> and some documentation to configure a GPFS backend for cinder
> (https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html)
> but I cannot find any documentation to deploy cinder with GPFS backend
> using openstack-ansible. Does this exist at all? Is there any
> documentation?
>
> /Nicolas
>


From finarffin at gmail.com  Thu Mar  9 12:51:45 2023
From: finarffin at gmail.com (Jan Wasilewski)
Date: Thu, 9 Mar 2023 13:51:45 +0100
Subject: [manila] Share configuration with cinder as a backend
Message-ID: <CAN4DDNgqr8_5NuMM+ZepsUiwNTKHyjBSp6dFQKHf9cZnvdRwXw@mail.gmail.com>

Hi,

I am looking for instructions on how to configure a Manila service with
Cinder as a backend. I have gone through the
https://github.com/openstack/manila/blob/master/doc/source/configuration/shared-file-systems/drivers/generic-driver.rst
page, and I am wondering if someone has a link to a preconfigured "golden
image" that can be used as a service_image. Additionally, I am wondering if
this configuration can be used with driver_handles_share_servers=True
(generally, I see that both options are supported here), but are there any
specific limitations?

I have already configured other backends (standalone ZFS, NFS, and Huawei
driver), but I would like to test share snapshot retrieval using Cinder as
a backend. Unfortunately, as I see, Cinder is not as straightforward as
other backends(or no one is using it), which is why I am asking for some
hints.

Thanks in advance
/Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/65eaaf36/attachment.htm>

From ces.eduardo98 at gmail.com  Thu Mar  9 14:06:23 2023
From: ces.eduardo98 at gmail.com (Carlos Silva)
Date: Thu, 9 Mar 2023 11:06:23 -0300
Subject: [manila] Share configuration with cinder as a backend
In-Reply-To: <CAN4DDNgqr8_5NuMM+ZepsUiwNTKHyjBSp6dFQKHf9cZnvdRwXw@mail.gmail.com>
References: <CAN4DDNgqr8_5NuMM+ZepsUiwNTKHyjBSp6dFQKHf9cZnvdRwXw@mail.gmail.com>
Message-ID: <CAE51gQJwti5pXbo2yF_OeY+cykPM2jTn6N4OPzLrjkx2e1mbaw@mail.gmail.com>

Hello, Jan!

Em qui., 9 de mar. de 2023 ?s 09:56, Jan Wasilewski <finarffin at gmail.com>
escreveu:

> Hi,
>
> I am looking for instructions on how to configure a Manila service with
> Cinder as a backend. I have gone through the
> https://github.com/openstack/manila/blob/master/doc/source/configuration/shared-file-systems/drivers/generic-driver.rst
> page, and I am wondering if someone has a link to a preconfigured "golden
> image" that can be used as a service_image.
>
As for the image to be used as a service image, we usually use [1] on CI.
It is a Ubuntu 22 image with minimal things installed. You would need to
create a Glance image with this image and then add the service image name
to your Manila.conf in case you are deploying with
driver_handles_share_servers=True (DHSS=True). Or in case you want to use
DHSS=False, you will need to create the VM and add the instance name or ID
to the manila.conf. The admin guide for Manila has instructions for both
approaches [2].

> Additionally, I am wondering if this configuration can be used with
> driver_handles_share_servers=True (generally, I see that both options are
> supported here), but are there any specific limitations?
>
Yes, you can use this configuration with DHSS=True. There are known
restrictions documented here [3].

> I have already configured other backends (standalone ZFS, NFS, and Huawei
> driver), but I would like to test share snapshot retrieval using Cinder as
> a backend. Unfortunately, as I see, Cinder is not as straightforward as
> other backends(or no one is using it), which is why I am asking for some
> hints.
>
The Generic driver is mostly used for testing on CI. We are aware of people
using it in production environments but it's not something we recommend.
You should be able to create and delete snapshots using the Generic driver,
as well as create shares from snapshots. Reverting to snapshots is not
something available, as per the feature support mapping [4].


> Thanks in advance
> /Jan
>

[1]
http://tarballs.openstack.org/manila-image-elements/images/manila-service-image-master.qcow2
[2] https://docs.openstack.org/manila/latest/admin/generic_driver.html
[3]
https://docs.openstack.org/manila/latest/admin/generic_driver.html#known-restrictions
[4]
https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html

Please let me know if you have more questions
carloss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/73c6d588/attachment.htm>

From batmanustc at gmail.com  Thu Mar  9 01:18:56 2023
From: batmanustc at gmail.com (Simon Jones)
Date: Thu, 9 Mar 2023 09:18:56 +0800
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAE1qvGS7zJG_UDbQY7vjyf9uBg5z-o96pTshwfzjrvZPYT0yhA@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
 <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
 <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>
 <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com>
 <CAE1qvGS7zJG_UDbQY7vjyf9uBg5z-o96pTshwfzjrvZPYT0yhA@mail.gmail.com>
Message-ID: <CAOE=1Z3PjWyvuKDVi_MzmtbDdfFUKQ3z8pxzTY9oT4atKa_Qag@mail.gmail.com>

Hi, all

At last, I got the root cause of this 2 problem.
And I suggest add these words to
https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html:
```
Prerequisites:
libvirt >= 7.9.0 . Like ubuntu-22.04, which use libvirt-8.0.0 by default.
```

Root cause of problem 1, which is "no valid host":
- Because libvirt version is too low.

Root cause of problem 2, which is "why there are topology in DPU in
openstack create port command":
- Because add --binding-profile params in openstack create port command,
which is NOT right.

----
Simon Jones


Dmitrii Shcherbakov <dmitrii.shcherbakov at canonical.com> ?2023?3?2???
20:30???

> Hi {Sean, Simon},
>
> > did you ever give a presentation on the DPU support
>
> Yes, there were a couple at different stages.
>
> The following is the one of the older ones that references the SMARTNIC
> VNIC type but we later switched to REMOTE_MANAGED in the final code:
> https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf,
> however, it has a useful diagram on page 15 which shows the interactions of
> different components. A lot of other content from it is present in the
> OpenStack docs now which we added during the feature development.
>
> There is also a presentation with a demo that we did at the Open Infra
> summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared
> the material after the features got merged).
>
> Generally, as Sean described, the aim of this feature is to make the
> interaction between components present at the hypervisor and the DPU side
> automatic but, in order to make this workflow explicitly different from
> SR-IOV or offload at the hypervisor side, one has to use the
> "remote_managed" flag. This flag allows Nova to differentiate between
> "regular" VFs and the ones that have to be programmed by a remote host
> (DPU) - hence the name.
>
> A port needs to be pre-created with the remote-managed type - that way
> when Nova tries to schedule a VM with that port attached, it will find
> hosts which actually have PCI devices tagged with the "remote_managed":
> "true" in the PCI whitelist.
>
> The important thing to note here is that you must not use PCI passthrough
> directly for this - Nova will create a PCI device request automatically
> with the remote_managed flag included. There is currently no way to
> instruct Nova to choose one vendor/device ID vs the other for this (any
> remote_managed=true device from a pool will match) but maybe the work that
> was recently done to store PCI device information in the Placement service
> will pave the way for such granularity in the future.
>
> Best Regards,
> Dmitrii Shcherbakov
> LP/MM/oftc: dmitriis
>
>
> On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney <smooney at redhat.com> wrote:
>
>> adding Dmitrii who was the primary developer of the openstack integration
>> so
>> they can provide more insight.
>>
>> Dmitrii did you ever give a presentationon the DPU support and how its
>> configured/integrated
>> that might help fill in the gaps for simon?
>>
>> more inline.
>>
>> On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote:
>> > E...
>> >
>> > But there are these things:
>> >
>> > 1) Show some real happened in my test:
>> >
>> > - Let me clear that, I use DPU in compute node:
>> > The graph in
>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html .
>> >
>> > - I configure exactly follow
>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html,
>> > which is said bellow in "3) Let me post all what I do follow this link".
>> >
>> > - In my test, I found after first three command (which is "openstack
>> > network create ...", "openstack subnet create", "openstack port create
>> ..."),
>> > there are network topology exist in DPU side, and there are rules exist
>> in
>> > OVN north DB, south DB of controller, like this:
>> >
>> > > ```
>> > > root at c1:~# ovn-nbctl show
>> > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976
>> > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice)
>> > >     port 01a68701-0e6a-4c30-bfba-904d1b9813e1
>> > >         addresses: ["unknown"]
>> > >     port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1)
>> > >         addresses: ["fa:16:3e:13:36:e2 172.1.1.228"]
>> > >
>> > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding
>> > > _uuid               : 61dc8bc0-ab33-4d67-ac13-0781f89c905a
>> > > chassis             : []
>> > > datapath            : 91d3509c-d794-496a-ba11-3706ebf143c8
>> > > encap               : []
>> > > external_ids        : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24",
>> > > "neutron:device_id"="", "neutron:device_owner"="",
>> > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69,
>> > > "neutron:port_name"=pf0vf1,
>> > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9",
>> > > "neutron:revision_number"="1",
>> > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"}
>> > >
>> > > root at c1c2dpu:~# sudo ovs-vsctl show
>> > > 62cf78e5-2c02-471e-927e-1d69c2c22195
>> > >     Bridge br-int
>> > >         fail_mode: secure
>> > >         datapath_type: system
>> > >         Port br-int
>> > >             Interface br-int
>> > >                 type: internal
>> > >         Port ovn--1
>> > >             Interface ovn--1
>> > >                 type: geneve
>> > >                 options: {csum="true", key=flow,
>> remote_ip="172.168.2.98"}
>> > >         Port pf0vf1
>> > >             Interface pf0vf1
>> > >     ovs_version: "2.17.2-24a81c8"
>> > > ```
>> > >
>> > That's why I guess "first three command" has already create network
>> > topology, and "openstack server create" command only need to plug VF
>> into
>> > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done.
>> no that jsut looks like the standard bridge toplogy that gets created
>> when you provision
>> the dpu to be used with openstac vai ovn.
>>
>> that looks unrelated to the neuton comamnd you ran.
>> >
>> > - In my test, then I run "openstack server create" command, I got ERROR
>> > which said "No valid host...", which is what the email said above.
>> > The reason has already said, it's nova-scheduler's PCI filter module
>> report
>> > no valid host. The reason "nova-scheduler's PCI filter module report no
>> > valid host" is nova-scheduler could NOT see PCI information of compute
>> > node. The reason "nova-scheduler could NOT see PCI information of
>> compute
>> > node" is compute node's /etc/nova/nova.conf configure remote_managed tag
>> > like this:
>> >
>> > > ```
>> > > [pci]
>> > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>> > > "physical_network": null, "remote_managed": "true"}
>> > > alias = { "vendor_id":"15b3", "product_id":"101e",
>> > > "device_type":"type-VF", "name":"a1" }
>> > > ```
>> > >
>> >
>> > 2) Discuss some detail design of "remote_managed" tag, I don't know if
>> this
>> > is right in the design of openstack with DPU:
>> >
>> > - In neutron-server side, use remote_managed tag in "openstack port
>> create
>> > ..." command.
>> > This command will make neutron-server / OVN / ovn-controller / ovs to
>> make
>> > the network topology done, like above said.
>> > I this this is right, because test shows that.
>> that is not correct
>> your test do not show what you think it does, they show the baisic bridge
>> toplogy and flow configuraiton that ovn installs by defualt when it
>> manages
>> as ovs.
>>
>> please read the design docs for this feature for both nova and neutron to
>> understand how the interacction works.
>>
>> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>>
>> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html
>> >
>> > - In nova side, there are 2 things should process, first is PCI
>> passthrough
>> > filter, second is nova-compute to plug VF into VM.
>> >
>> > If the link above is right, which remote_managed tag exists in
>> > /etc/nova/nova.conf of controller node and exists in
>> /etc/nova/nova.conf of
>> > compute node.
>> > As above ("- In my test, then I run "openstack server create" command")
>> > said, got ERROR in this step.
>> > So what should do in "PCI passthrough filter" ? How to configure ?
>> >
>> > Then, if "PCI passthrough filter" stage pass, what will do of
>> nova-compute
>> > in compute node?
>> >
>> > 3) Post all what I do follow this link:
>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
>> > - build openstack physical env, link plug DPU into compute mode, use VM
>> as
>> > controller ... etc.
>> > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link.
>> > - configure DPU side /etc/neutron/neutron.conf
>> > - configure host side /etc/nova/nova.conf
>> > - configure host side /etc/nova/nova-compute.conf
>> > - run first 3 command
>> > - last, run this command, got ERROR
>> >
>> > ----
>> > Simon Jones
>> >
>> >
>> > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 18:35???
>> >
>> > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
>> > > > Thanks a lot !!!
>> > > >
>> > > > As you say, I follow
>> > > >
>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
>> > > > And I want to use DPU mode. Not "disable DPU mode".
>> > > > So I think I should follow the link above exactlly, so I use
>> > > > vnic-type=remote_anaged.
>> > > > In my opnion, after I run first three command (which is "openstack
>> > > network
>> > > > create ...", "openstack subnet create", "openstack port create
>> ..."), the
>> > > > VF rep port and OVN and OVS rules are all ready.
>> > > not at that point nothign will have been done on ovn/ovs
>> > >
>> > > that will only happen after the port is bound to a vm and host.
>> > >
>> > > > What I should do in "openstack server create ..." is to JUST add PCI
>> > > device
>> > > > into VM, do NOT call neutron-server in nova-compute of compute node
>> (
>> > > like
>> > > > call port_binding or something).
>> > > this is incorrect.
>> > > >
>> > > > But as the log and steps said in the emails above, nova-compute call
>> > > > port_binding to neutron-server while running the command "openstack
>> > > server
>> > > > create ...".
>> > > >
>> > > > So I still have questions is:
>> > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do
>> NOT
>> > > call
>> > > > neutron-server in nova-compute of compute node ( like call
>> port_binding
>> > > or
>> > > > something)" .
>> > > no this is not how its designed.
>> > > until you attach the logical port to a vm (either at runtime or as
>> part of
>> > > vm create)
>> > > the logical port is not assocated with any host or phsical dpu/vf.
>> > >
>> > > so its not possibel to instanciate the openflow rules in ovs form the
>> > > logical switch model
>> > > in the ovn north db as no chassie info has been populated and we do
>> not
>> > > have the dpu serial
>> > > info in the port binding details.
>> > > > 2) If it's right, how to deal with this? Which is how to JUST add
>> PCI
>> > > > device into VM, do NOT call neutron-server? By command or by
>> configure?
>> > > Is
>> > > > there come document ?
>> > > no this happens automaticaly when nova does the port binding which
>> cannot
>> > > happen until after
>> > > teh vm is schduled to a host.
>> > > >
>> > > > ----
>> > > > Simon Jones
>> > > >
>> > > >
>> > > > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
>> > > >
>> > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
>> > > > > > BTW, this link (
>> > > > > >
>> > > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>> )
>> > > > > said
>> > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that
>> WRONG ?
>> > > > >
>> > > > > no its not wrong but for dpu smart nics you have to make a choice
>> when
>> > > you
>> > > > > deploy
>> > > > > either they can be used in dpu mode in which case remote_managed
>> > > shoudl be
>> > > > > set to true
>> > > > > and you can only use them via neutron ports with
>> > > vnic-type=remote_managed
>> > > > > as descried in that doc
>> > > > >
>> > > > >
>> > >
>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
>> > > > >
>> > > > >
>> > > > > or if you disable dpu mode in the nic frimware then you shoudl
>> remvoe
>> > > > > remote_managed form the pci device list and
>> > > > > then it can be used liek a normal vf either for neutron sriov
>> ports
>> > > > > vnic-type=direct or via flavor based pci passthough.
>> > > > >
>> > > > > the issue you were havign is you configured the pci device list to
>> > > contain
>> > > > > "remote_managed: ture" which means
>> > > > > the vf can only be consumed by a neutron port with
>> > > > > vnic-type=remote_managed, when you have "remote_managed: false" or
>> > > unset
>> > > > > you can use it via vnic-type=direct i forgot that slight detail
>> that
>> > > > > vnic-type=remote_managed is required for "remote_managed: ture".
>> > > > >
>> > > > >
>> > > > > in either case you foudn the correct doc
>> > > > >
>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>> > > > > neutorn sriov port configuration is documented here
>> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
>> > > > > and nova flavor based pci passthough is documeted here
>> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>> > > > >
>> > > > > all three server slightly differnt uses. both neutron proceedures
>> are
>> > > > > exclusivly fo network interfaces.
>> > > > >
>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>> > > > > requires the use of ovn deployed on the dpu
>> > > > > to configure the VF contolplane.
>> > > > > https://docs.openstack.org/neutron/latest/admin/config-sriov.html
>> uses
>> > > > > the sriov nic agent
>> > > > > to manage the VF with ip tools.
>> > > > > https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>> is
>> > > > > intended for pci passthough
>> > > > > of stateless acclerorators like qat devices. while the nova flavor
>> > > approch
>> > > > > cna be used with nics it not how its generally
>> > > > > ment to be used and when used to passthough a nic expectation is
>> that
>> > > its
>> > > > > not related to a neuton network.
>> > > > >
>> > > > >
>> > >
>> > >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/a2a3a60e/attachment-0001.htm>

From grasza at redhat.com  Thu Mar  9 13:01:28 2023
From: grasza at redhat.com (Grzegorz Grasza)
Date: Thu, 9 Mar 2023 14:01:28 +0100
Subject: [barbican] Canceling weekly meeting (March 14th) + PTG discussion
 topics
Message-ID: <CAPGYNFt6fjYu4aFRjvbEfem+4oa95F6ehseUxMuxEfP11a-v1w@mail.gmail.com>

Hi all,

I'm on PTO next week, so I'm canceling the weekly meeting.

We have just one last meeting before the Virtual PTG, so I booked a time
slot for us on Tuesday, March 28th at 13:00 UTC. Please add topics you
would like to discuss to the etherpad:

https://etherpad.opendev.org/p/march2023-ptg-barbican

If there are more things to discuss, I'll book another time slot later in
the week.

Thanks,
/ Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/984f816e/attachment.htm>

From x3om6ak at gmail.com  Thu Mar  9 13:24:06 2023
From: x3om6ak at gmail.com (=?UTF-8?B?0JzQuNGF0LDQuNC7?=)
Date: Thu, 9 Mar 2023 16:24:06 +0300
Subject: [neutron][ovn] domain-search dhcp option per network/subnet level
Message-ID: <CADkBF1Tb+v5vzEoCzYR28_HBJTL3NktmaU04BztbdPZEN5NCxA@mail.gmail.com>

Greetings to all!
In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking )
I try to find a way to setup a scheme where my instances should get dhcp
search domain option on network/subnet level.

For example - when I create network/subnet, I want to tell neutron, that
all created ports in that network/subnet must receive my dhcp search-domain
option

Any help advices will appreciated. Thank you!

-- 
Best regards, Mikhail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/41ed6424/attachment.htm>

From ralonsoh at redhat.com  Thu Mar  9 15:13:03 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 9 Mar 2023 16:13:03 +0100
Subject: [neutron][ovn] domain-search dhcp option per network/subnet level
In-Reply-To: <CADkBF1Tb+v5vzEoCzYR28_HBJTL3NktmaU04BztbdPZEN5NCxA@mail.gmail.com>
References: <CADkBF1Tb+v5vzEoCzYR28_HBJTL3NktmaU04BztbdPZEN5NCxA@mail.gmail.com>
Message-ID: <CAECr9X6apMrD2LWKm5e=ixN1oSW9O+ZH8RpGTOfLaMv-pbn=Zw@mail.gmail.com>

Hello Mikhail:

Short answer is no, we don't support this. Please check [1] and [2] for
more context.

Regards.

[1]https://bugs.launchpad.net/neutron/+bug/1960850
[2]
https://review.opendev.org/c/openstack/neutron-specs/+/832658/12/specs/zed/support-dns-subdomains-at-a-network-level.rst

On Thu, Mar 9, 2023 at 3:19?PM ?????? <x3om6ak at gmail.com> wrote:

> Greetings to all!
> In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking )
> I try to find a way to setup a scheme where my instances should get dhcp
> search domain option on network/subnet level.
>
> For example - when I create network/subnet, I want to tell neutron, that
> all created ports in that network/subnet must receive my dhcp search-domain
> option
>
> Any help advices will appreciated. Thank you!
>
> --
> Best regards, Mikhail.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/f0a7f7ae/attachment.htm>

From x3om6ak at gmail.com  Thu Mar  9 15:23:38 2023
From: x3om6ak at gmail.com (=?UTF-8?B?0JzQuNGF0LDQuNC7?=)
Date: Thu, 9 Mar 2023 18:23:38 +0300
Subject: [neutron][ovn] domain-search dhcp option per network/subnet level
In-Reply-To: <CAECr9X6apMrD2LWKm5e=ixN1oSW9O+ZH8RpGTOfLaMv-pbn=Zw@mail.gmail.com>
References: <CADkBF1Tb+v5vzEoCzYR28_HBJTL3NktmaU04BztbdPZEN5NCxA@mail.gmail.com>
 <CAECr9X6apMrD2LWKm5e=ixN1oSW9O+ZH8RpGTOfLaMv-pbn=Zw@mail.gmail.com>
Message-ID: <CADkBF1QWjRfPjT0jeN=4BwYsmO9VhJSsVBa=23oDPLimjANkKw@mail.gmail.com>

Thank you for your response.

??, 9 ???. 2023 ?., 18:13 Rodolfo Alonso Hernandez <ralonsoh at redhat.com>:

> Hello Mikhail:
>
> Short answer is no, we don't support this. Please check [1] and [2] for
> more context.
>
> Regards.
>
> [1]https://bugs.launchpad.net/neutron/+bug/1960850
> [2]
> https://review.opendev.org/c/openstack/neutron-specs/+/832658/12/specs/zed/support-dns-subdomains-at-a-network-level.rst
>
> On Thu, Mar 9, 2023 at 3:19?PM ?????? <x3om6ak at gmail.com> wrote:
>
>> Greetings to all!
>> In my openstack setup (vanilla ubuntu 22.04 Zed release + OVN networking
>> ) I try to find a way to setup a scheme where my instances should get dhcp
>> search domain option on network/subnet level.
>>
>> For example - when I create network/subnet, I want to tell neutron, that
>> all created ports in that network/subnet must receive my dhcp search-domain
>> option
>>
>> Any help advices will appreciated. Thank you!
>>
>> --
>> Best regards, Mikhail.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/a5528fb9/attachment.htm>

From kristin at openinfra.dev  Thu Mar  9 18:06:33 2023
From: kristin at openinfra.dev (Kristin Barrientos)
Date: Thu, 9 Mar 2023 12:06:33 -0600
Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope
Message-ID: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>

Hi everyone,

As we get closer to the OpenStack release, I wanted to reach out to see if any PTL?s were interested in providing their Antelope cycle highlights in an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we would get 4-6 projects represented. Previous examples of OpenStack release episodes can be found here[2]? <http://www.youtube.com/watch?v=hwPfjvshxOM>and here[3] <https://www.youtube.com/watch?v=MSbB3L9_MeY>. 

Please let me know if you?re interested and I can provide next steps. If you would like to provide a project update but that time doesn?t work for you, please share a recording with me and I can get it added to the project navigator. 

Thanks, 

Kristin Barrientos
Marketing Coordinator 
OpenInfra Foundation

[1] https://openinfra.dev/live/ 
[2] https://www.youtube.com/watch?v=hwPfjvshxOM 
[3] https://www.youtube.com/watch?v=MSbB3L9_MeY


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/3dd22f41/attachment-0001.htm>

From satish.txt at gmail.com  Thu Mar  9 21:43:02 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Thu, 9 Mar 2023 16:43:02 -0500
Subject: [neutron] bonding sriov nic inside VMs
Message-ID: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>

Folks,

As you know, SR-IOV doesn't support bonding so the only solution is to
implement LACP bonding inside the VM.

I did some tests in the lab to create two physnet and map them with two
physical nic and create VF and attach them to VM. So far all good but one
problem I am seeing is each neutron port I create has an IP address
associated and I can use only one IP on bond but that is just a waste of IP
in the Public IP pool.

Are there any way to create sriov port but without IP address?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/521d55bd/attachment.htm>

From jay at gr-oss.io  Thu Mar  9 22:52:20 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 9 Mar 2023 14:52:20 -0800
Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope
In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
Message-ID: <CA+sTGNenR+TbNtzTUV_cFKkm_aTbQKAeaAbk8JgBA_KHmUfFaA@mail.gmail.com>

I'll gladly represent Ironic again for the Antelope cycle highlights
session.

Thanks for running  it!

-
Jay Faulkner
Ironic PTL
TC Member

On Thu, Mar 9, 2023 at 10:19?AM Kristin Barrientos <kristin at openinfra.dev>
wrote:

> Hi everyone,
>
> As we get closer to the OpenStack release, I wanted to reach out to see if
> any PTL?s were interested in providing their Antelope cycle highlights in
> an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we
> would get 4-6 projects represented. Previous examples of OpenStack release
> episodes can be found here[2]
> <http://www.youtube.com/watch?v=hwPfjvshxOM>and here[3]
> <https://www.youtube.com/watch?v=MSbB3L9_MeY>.
>
> Please let me know if you?re interested and I can provide next steps. If
> you would like to provide a project update but that time doesn?t work for
> you, please share a recording with me and I can get it added to the project
> navigator.
>
> Thanks,
>
> Kristin Barrientos
> Marketing Coordinator
> OpenInfra Foundation
>
> [1] https://openinfra.dev/live/
>
> [2] https://www.youtube.com/watch?v=hwPfjvshxOM
>
> [3] https://www.youtube.com/watch?v=MSbB3L9_MeY
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/b8637d2e/attachment.htm>

From jay at gr-oss.io  Thu Mar  9 23:15:43 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 9 Mar 2023 15:15:43 -0800
Subject: [ironic][ptg] vPTG scheduling
Message-ID: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>

Hey all,

The vPTG will be upon us soon, the week of March 27.

I booked the following times on behalf of Ironic + BM SIG Operator hour, in
accordance with what times worked in Antelope. It's my hope that since
we've had little contributor turnover, these times continue to work. I'm
completely open to having things moved around if it's more convenient to
participants.

I've booked the following times, all in Folsom:
- Tuesday 1400 UTC - 1700 UTC
- Wednesday 1300 UTC Operator hour: baremetal SIG
- Wednesday 1400 UTC - 1600 UTC
- Wednesday 2200 - 2300 UTC


I propose that after the Ironic meeting on March 20, we shortly sync up in
the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg)
to pick topics and assign time.


Again, this is all meant to be a suggestion, I'm happy to move things
around but didn't want us to miss out on getting things booked.


-
Jay Faulkner
Ironic PTL
TC Member
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230309/5b223695/attachment.htm>

From tkajinam at redhat.com  Fri Mar 10 07:20:57 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Fri, 10 Mar 2023 16:20:57 +0900
Subject: [all] broken pepe8 jobs caused by bandit 1.7.5
Message-ID: <CAL_crJQ-Cx3A_CgwE_S_e3w=9Zm+xZvBFnzcPSxLDN2668Ljxw@mail.gmail.com>

fyi;

It seems the new release of bandit (1.7.5) just came out and this
introduces a new lint rule
to require defining the timeout parameter for all "requests" calls.

https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb

This is currently affecting heat and quick search shows some of the other
projects contain some code
not compliant with this rule(barbican, ceilometer, cinder, glance, manila,
nova, ...).
Also, it seems we do not pin bandit by u-c for some reason this likely
affects all stable branches.
Actually I first noticed this when I tried to backport one fix to 2023.1
branch of heat...

Thank you,
Takashi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/fffca54c/attachment.htm>

From tkajinam at redhat.com  Fri Mar 10 07:27:59 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Fri, 10 Mar 2023 16:27:59 +0900
Subject: [all] broken pepe8 jobs caused by bandit 1.7.5
In-Reply-To: <CAL_crJQ-Cx3A_CgwE_S_e3w=9Zm+xZvBFnzcPSxLDN2668Ljxw@mail.gmail.com>
References: <CAL_crJQ-Cx3A_CgwE_S_e3w=9Zm+xZvBFnzcPSxLDN2668Ljxw@mail.gmail.com>
Message-ID: <CAL_crJRpqweAaQwQMM9NtOvr5eb14=E9KAxYCP415Q2yX-FSVw@mail.gmail.com>

On Fri, Mar 10, 2023 at 4:20?PM Takashi Kajinami <tkajinam at redhat.com>
wrote:

> fyi;
>
> It seems the new release of bandit (1.7.5) just came out and this
> introduces a new lint rule
> to require defining the timeout parameter for all "requests" calls.
>
> https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb
>
> This is currently affecting heat and quick search shows some of the other
> projects contain some code
> not compliant with this rule(barbican, ceilometer, cinder, glance, manila,
> nova, ...).
>
Seems some of these (ceilometer, cinder, glance and manila) are not using
bandit and others(nova) have
the upper version defined. SO it might not affect  limited number of
projects using bandit without upper version
but I'd recommend you check your own projects .


> Also, it seems we do not pin bandit by u-c for some reason this likely
> affects all stable branches.
> Actually I first noticed this when I tried to backport one fix to 2023.1
> branch of heat...
>
> Thank you,
> Takashi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/ef7302a2/attachment-0001.htm>

From sbauza at redhat.com  Fri Mar 10 10:04:47 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Fri, 10 Mar 2023 11:04:47 +0100
Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope
In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
Message-ID: <CALOCmukcAsrT3C1JMCWXU1BQ+KJLNV-ZgkdVk1Q1Am+BdeFKag@mail.gmail.com>

Le jeu. 9 mars 2023 ? 19:13, Kristin Barrientos <kristin at openinfra.dev> a
?crit :

> Hi everyone,
>
> As we get closer to the OpenStack release, I wanted to reach out to see if
> any PTL?s were interested in providing their Antelope cycle highlights in
> an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we
> would get 4-6 projects represented. Previous examples of OpenStack release
> episodes can be found here[2]
> <http://www.youtube.com/watch?v=hwPfjvshxOM>and here[3]
> <https://www.youtube.com/watch?v=MSbB3L9_MeY>.
>
> Please let me know if you?re interested and I can provide next steps. If
> you would like to provide a project update but that time doesn?t work for
> you, please share a recording with me and I can get it added to the project
> navigator.
>
>
I can help again for the Nova project.


> Thanks,
>
> Kristin Barrientos
> Marketing Coordinator
> OpenInfra Foundation
>
> [1] https://openinfra.dev/live/
>
> [2] https://www.youtube.com/watch?v=hwPfjvshxOM
>
> [3] https://www.youtube.com/watch?v=MSbB3L9_MeY
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/22a6dd2f/attachment.htm>

From sbauza at redhat.com  Fri Mar 10 10:44:30 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Fri, 10 Mar 2023 11:44:30 +0100
Subject: [all] broken pepe8 jobs caused by bandit 1.7.5
In-Reply-To: <CAL_crJRpqweAaQwQMM9NtOvr5eb14=E9KAxYCP415Q2yX-FSVw@mail.gmail.com>
References: <CAL_crJQ-Cx3A_CgwE_S_e3w=9Zm+xZvBFnzcPSxLDN2668Ljxw@mail.gmail.com>
 <CAL_crJRpqweAaQwQMM9NtOvr5eb14=E9KAxYCP415Q2yX-FSVw@mail.gmail.com>
Message-ID: <CALOCmukHzcdP4SY90XEo8F=LFd3oYFQLeE+i1xWCk3cdx71Bqg@mail.gmail.com>

Le ven. 10 mars 2023 ? 08:33, Takashi Kajinami <tkajinam at redhat.com> a
?crit :

>
>
> On Fri, Mar 10, 2023 at 4:20?PM Takashi Kajinami <tkajinam at redhat.com>
> wrote:
>
>> fyi;
>>
>> It seems the new release of bandit (1.7.5) just came out and this
>> introduces a new lint rule
>> to require defining the timeout parameter for all "requests" calls.
>>
>> https://github.com/PyCQA/bandit/commit/5ff73ff8ff956df7d63fde49c3bd671db8e821eb
>>
>> This is currently affecting heat and quick search shows some of the other
>> projects contain some code
>> not compliant with this rule(barbican, ceilometer, cinder, glance,
>> manila, nova, ...).
>>
> Seems some of these (ceilometer, cinder, glance and manila) are not using
> bandit and others(nova) have
> the upper version defined. SO it might not affect  limited number of
> projects using bandit without upper version
> but I'd recommend you check your own projects .
>
>

AFAIK, the Nova bandit specific tox target [1] isn't run on CI by any of
the Zuul jobs we have [2] (we don't include a bandit check as part of a
pep8 validation)
I tested both 1.7.4 and 1.7.5 bandit versions on the tox target locally,
and I don't see much of a difference.

Sounds the issue is then unrelated to the Nova project, to clarify.
-Sylvain


[1] https://github.com/openstack/nova/blob/master/tox.ini#L260-L265
[2] https://github.com/openstack/nova/blob/master/.zuul.yaml

Also, it seems we do not pin bandit by u-c for some reason this likely
>> affects all stable branches.
>> Actually I first noticed this when I tried to backport one fix to 2023.1
>> branch of heat...
>>
>> Thank you,
>> Takashi
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/f64a9a6e/attachment.htm>

From xek at redhat.com  Fri Mar 10 10:48:56 2023
From: xek at redhat.com (Grzegorz Grasza)
Date: Fri, 10 Mar 2023 11:48:56 +0100
Subject: [barbican] Canceling weekly meeting (March 14th) + PTG discussion
 topics
Message-ID: <CAPGYNFuUfgFSLvii9MtaBw3upS3M9mMfMD+vd8gid8Y3o3DXpA@mail.gmail.com>

Hi all,

I'm on PTO next week, so I'm canceling the weekly meeting.

We have just one last meeting before the Virtual PTG, so I booked a time
slot for us on Tuesday, March 28th at 13:00 UTC. Please add topics you
would like to discuss to the etherpad:

https://etherpad.opendev.org/p/march2023-ptg-barbican

If there are more things to discuss, I'll book another time slot later in
the week.

Thanks,
/ Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/05d8a73e/attachment.htm>

From smooney at redhat.com  Fri Mar 10 11:57:28 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 10 Mar 2023 11:57:28 +0000
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
Message-ID: <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>

On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> Folks,
> 
> As you know, SR-IOV doesn't support bonding so the only solution is to
> implement LACP bonding inside the VM.
> 
> I did some tests in the lab to create two physnet and map them with two
> physical nic and create VF and attach them to VM. So far all good but one
> problem I am seeing is each neutron port I create has an IP address
> associated and I can use only one IP on bond but that is just a waste of IP
> in the Public IP pool.
> 
> Are there any way to create sriov port but without IP address?
techinially we now support adressless port in neutron and nova.
so that shoudl be possible.?
if you tried to do this with hardware offloaed ovs rather then the standard sriov with the sriov
nic agent you likel will need to also use the allowed_adress_pairs extension to ensure that ovs did not
drop the packets based on the ip adress. if you are using heriarcical port binding where you TOR is manged
by an ml2 driver you might also need the allowed_adress_pairs extension with the sriov nic agent to make sure
the packets are not drop at the swtitch level.

as you likely arlready no we do not support VF bonding in openstack or bonded ports in general in then neutron api.
there was an effort a few years ago to make a bond port extention that mirror hwo trunk ports work
i.e. hanving 2 neutron subport and a bond port that  agreates them but we never got that far with
the design. that would have enabeld boning to be implemtned in diffent ml2 driver  like ovs/sriov/ovn ectra with
a consitent/common api.

some people have used mellonox's VF lag functionalty howver that was never actully enable propelry in nova/neutron
so its not officlaly supported upstream but that functional allow you to attach only a singel VF to the guest form
bonded ports on a single card.

there is no supprot in nova/neutron for that offically as i said it just happens to work unitnetionally so i would not
advise that you use it in produciton unless your happy to work though any issues you find yourself.


From roberto.acosta at luizalabs.com  Fri Mar 10 11:58:32 2023
From: roberto.acosta at luizalabs.com (Roberto Bartzen Acosta)
Date: Fri, 10 Mar 2023 08:58:32 -0300
Subject: [neutron] Openstack Network Interconnection
In-Reply-To: <DU0PR10MB5244316B6372451A8A7FB613EAB59@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
References: <CALsEdxS0dGBqSki+wQ-Gfehmk9eeRypNoof9F7H_t+Lp-XBopg@mail.gmail.com>
 <DU0PR10MB5244316B6372451A8A7FB613EAB59@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <CALsEdxT5d_iZd9nRcnXiSy-61pcQCZ090Liux-5c=nByQ4gVoQ@mail.gmail.com>

Hi Felix,

Thanks for your feedback.

The ovn-bgp-agent is a very powerful application to interconnect
multi-tenancy networks using BGP evpn type 5. This application integrates
the br-ext with FRR and provides the interconnect using the BGP session.
That would be one way to do it, but the problem is that bgpvpn service
plugin is only integrated with Neutron. Imagine in the future that we need
to integrate the tenant network between different cloud solutions (e.g
using OpenStack, Kubernetes, LXD, etc.)... this could be possible if
everyone uses OVN as a network backend and ovn-ic to interconnect the LRPs
between AZs.

Maybe I'm missing some point and there's no community interest in something
like that. But back to the OpenStack/Neutron case, it might be interesting
to continue the work on Neutron interconnect (or something like that), but
maybe this time with the service plugin for ovn-ic.

Regards,
Roberto

Em qui., 9 de mar. de 2023 ?s 05:24, Felix H?ttner
<felix.huettner at mail.schwarz> escreveu:

> Hi Roberto,
>
>
>
> We will face a similar issue in the future and have also looked at
> ovn-interconnect (but not yet tested it).
>
> There is also ovn-bgp-agent [1] which has an evpn mode that might be
> relevant.
>
>
>
> Whatever you find I would definitely be interested in your results
>
>
>
> [1] https://opendev.org/x/ovn-bgp-agent
>
>
>
> --
>
> Felix Huettner
>
>
>
> *From:* Roberto Bartzen Acosta <roberto.acosta at luizalabs.com>
> *Sent:* Wednesday, March 8, 2023 9:49 PM
> *To:* openstack-discuss at lists.openstack.org
> *Cc:* Tiago Pires <tiago.pires at luizalabs.com>
> *Subject:* [neutron] Openstack Network Interconnection
>
>
>
> Hey folks.
>
> Does anyone have ideas on how to interconnect different Openstack
> deployments?
> Consider that we have multiple Datacenters and need to interconnect tenant
> networks. How could this be done in the context of OpenStack (without using
> VPN) ?
>
> We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks
> like a great solution to create a network layer between DCs/AZs with the
> help of the OVN driver. However, Neutron does not support the Transit
> Switches (OVN-IC design) that are required for this application.
>
> We've seen references to abandoned projects like [1] [2] [3].
>
> Does anyone use something similar in production or have an idea about how
> to do it? Imagine that we need to put workloads on two different AZs that
> run different Openstack installations, and we want to communicate with the
> local networks without using a FIP.
>
> I believe that the most coherent way to maintain databases consistent in
> each Openstack would be an integration with Neutron, but I haven't seen any
> movement on that.
>
> Regards,
> Roberto
>
> [1] https://www.youtube.com/watch?v=GizLmSiH1Q0
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DGizLmSiH1Q0&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NdmjPTTQd9HisMRNF8yim%2BjOeXZrlqVY6Q4tKTpb%2FPo%3D&reserved=0>
> [2]
> https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspecs.openstack.org%2Fopenstack%2Fneutron-specs%2Fspecs%2Fstein%2Fneutron-interconnection.html&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MSh%2BPpWY6L1UoA5kE21cn5khtFWgq%2F8J00kWdOLmHCI%3D&reserved=0>
> [3] https://opendev.org/x/neutron-interconnection
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopendev.org%2Fx%2Fneutron-interconnection&data=05%7C01%7C%7C7c899166e1ec438b1f7908db20182331%7Cd04f47175a6e4b98b3f96918e0385f4c%7C0%7C0%7C638139060268571269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pb%2B85Y2eobtcu7HbcZpdN68qNjmd2WuedSLteQ3cli8%3D&reserved=0>
>
>
>
>
>
> *?Esta mensagem ? direcionada apenas para os endere?os constantes no
> cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no
> cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa
> mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o
> imediatamente anuladas e proibidas?.*
>
> * ?Apesar do Magazine Luiza tomar todas as precau??es razo?veis para
> assegurar que nenhum v?rus esteja presente nesse e-mail, a empresa n?o
> poder? aceitar a responsabilidade por quaisquer perdas ou danos causados
> por esse e-mail ou por seus anexos?.*
>
> Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r
> die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht
> der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich
> in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie
> hier <https://www.datenschutz.schwarz>.
>

-- 


_?Esta mensagem ? direcionada apenas para os endere?os constantes no 
cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no 
cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa 
mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o 
imediatamente anuladas e proibidas?._


*?**?Apesar do Magazine Luiza tomar 
todas as precau??es razo?veis para assegurar que nenhum v?rus esteja 
presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.*


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/b71c39c0/attachment-0001.htm>

From ralonsoh at redhat.com  Fri Mar 10 12:27:01 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Fri, 10 Mar 2023 13:27:01 +0100
Subject: [neutron] Drivers meeting cancelled
Message-ID: <CAECr9X5BQOxgx=BSHi6iCCqxiYhGb6UFxmBtYZPNs22RLr3z9A@mail.gmail.com>

Hello Neutrinos:

Due to the lack of agenda, today's drivers meeting is cancelled.

Have a nice weekend!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/594503a0/attachment.htm>

From satish.txt at gmail.com  Fri Mar 10 13:30:21 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 10 Mar 2023 08:30:21 -0500
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
 <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
Message-ID: <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>

Thanks Sean,

I don't have NIC which supports hardware offloading or any kind of feature.
I am using intel nic 82599 just for SRIOV and looking for bonding
support which is only possible inside VM. As you know we already run a
large SRIOV environment with openstack but my biggest issue is to upgrade
switches without downtime. I want to be more resilient to not worry
about that.

Do you still think it's dangerous or not a good idea to bond sriov nic
inside VM?  what could go wrong here just trying to understand before i go
crazy :)


On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney <smooney at redhat.com> wrote:

> On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > Folks,
> >
> > As you know, SR-IOV doesn't support bonding so the only solution is to
> > implement LACP bonding inside the VM.
> >
> > I did some tests in the lab to create two physnet and map them with two
> > physical nic and create VF and attach them to VM. So far all good but one
> > problem I am seeing is each neutron port I create has an IP address
> > associated and I can use only one IP on bond but that is just a waste of
> IP
> > in the Public IP pool.
> >
> > Are there any way to create sriov port but without IP address?
> techinially we now support adressless port in neutron and nova.
> so that shoudl be possible.
> if you tried to do this with hardware offloaed ovs rather then the
> standard sriov with the sriov
> nic agent you likel will need to also use the allowed_adress_pairs
> extension to ensure that ovs did not
> drop the packets based on the ip adress. if you are using heriarcical port
> binding where you TOR is manged
> by an ml2 driver you might also need the allowed_adress_pairs extension
> with the sriov nic agent to make sure
> the packets are not drop at the swtitch level.
>
> as you likely arlready no we do not support VF bonding in openstack or
> bonded ports in general in then neutron api.
> there was an effort a few years ago to make a bond port extention that
> mirror hwo trunk ports work
> i.e. hanving 2 neutron subport and a bond port that  agreates them but we
> never got that far with
> the design. that would have enabeld boning to be implemtned in diffent ml2
> driver  like ovs/sriov/ovn ectra with
> a consitent/common api.
>
> some people have used mellonox's VF lag functionalty howver that was never
> actully enable propelry in nova/neutron
> so its not officlaly supported upstream but that functional allow you to
> attach only a singel VF to the guest form
> bonded ports on a single card.
>
> there is no supprot in nova/neutron for that offically as i said it just
> happens to work unitnetionally so i would not
> advise that you use it in produciton unless your happy to work though any
> issues you find yourself.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/6d18408c/attachment.htm>

From smooney at redhat.com  Fri Mar 10 14:02:00 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 10 Mar 2023 14:02:00 +0000
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
 <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
 <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>
Message-ID: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com>

On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote:
> Thanks Sean,
> 
> I don't have NIC which supports hardware offloading or any kind of feature.
> I am using intel nic 82599 just for SRIOV and looking for bonding
> support which is only possible inside VM. As you know we already run a
> large SRIOV environment with openstack but my biggest issue is to upgrade
> switches without downtime. I want to be more resilient to not worry
> about that.
> 
> Do you still think it's dangerous or not a good idea to bond sriov nic
> inside VM?  what could go wrong here just trying to understand before i go
> crazy :)
lacp bond mode generaly dont work fully but you should be abel to get basic failover bondign working
and perhaps tcp loadbalcing provide it does not require switch coperator to work form inside the guest.

just keep in mind that by defintion if you decalre a network as on a seperate phsynet to another
then you as the operator are asserting that there is no l2 connectivity between those networks.

as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on phsynet_2

if you break that and use phsynets to select PFs you are also breaking neutron multi teancy model
meaning it is not safy to aloow end uers to create vlan networks and instead you can only use provider created
vlan networks.

so what you want to do is proably achiveable but you menthion phsyntes per pf and that sounds like you are breaking
the physnets are seperate isolagged phsycial netowrks rule.

> 
> 
> 
> 
> On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > > Folks,
> > > 
> > > As you know, SR-IOV doesn't support bonding so the only solution is to
> > > implement LACP bonding inside the VM.
> > > 
> > > I did some tests in the lab to create two physnet and map them with two
> > > physical nic and create VF and attach them to VM. So far all good but one
> > > problem I am seeing is each neutron port I create has an IP address
> > > associated and I can use only one IP on bond but that is just a waste of
> > IP
> > > in the Public IP pool.
> > > 
> > > Are there any way to create sriov port but without IP address?
> > techinially we now support adressless port in neutron and nova.
> > so that shoudl be possible.
> > if you tried to do this with hardware offloaed ovs rather then the
> > standard sriov with the sriov
> > nic agent you likel will need to also use the allowed_adress_pairs
> > extension to ensure that ovs did not
> > drop the packets based on the ip adress. if you are using heriarcical port
> > binding where you TOR is manged
> > by an ml2 driver you might also need the allowed_adress_pairs extension
> > with the sriov nic agent to make sure
> > the packets are not drop at the swtitch level.
> > 
> > as you likely arlready no we do not support VF bonding in openstack or
> > bonded ports in general in then neutron api.
> > there was an effort a few years ago to make a bond port extention that
> > mirror hwo trunk ports work
> > i.e. hanving 2 neutron subport and a bond port that  agreates them but we
> > never got that far with
> > the design. that would have enabeld boning to be implemtned in diffent ml2
> > driver  like ovs/sriov/ovn ectra with
> > a consitent/common api.
> > 
> > some people have used mellonox's VF lag functionalty howver that was never
> > actully enable propelry in nova/neutron
> > so its not officlaly supported upstream but that functional allow you to
> > attach only a singel VF to the guest form
> > bonded ports on a single card.
> > 
> > there is no supprot in nova/neutron for that offically as i said it just
> > happens to work unitnetionally so i would not
> > advise that you use it in produciton unless your happy to work though any
> > issues you find yourself.
> > 
> > 


From katonalala at gmail.com  Fri Mar 10 14:39:15 2023
From: katonalala at gmail.com (Lajos Katona)
Date: Fri, 10 Mar 2023 15:39:15 +0100
Subject: [openstack-dev][PCI passthrough] How to use PCI passthrough
 feature correctly? And is this BUG in
 update_devices_from_hypervisor_resources?
In-Reply-To: <CAOE=1Z3PjWyvuKDVi_MzmtbDdfFUKQ3z8pxzTY9oT4atKa_Qag@mail.gmail.com>
References: <CAOE=1Z1eQyaGuFtvux97gd3RcZUHs37Vw9fLNEy5_MyWP3YoJQ@mail.gmail.com>
 <a80ccd900109a9833276d2076c81d85a8e511161.camel@redhat.com>
 <CAOE=1Z0bCAOUZb19oW60hyiRgZqwxoHmWN4jkQqzPLqYyiiHvQ@mail.gmail.com>
 <CAOE=1Z3S7xJhXKL7UU9ZWh-hMvAxwmaqKK20G95A1XLkGXYabw@mail.gmail.com>
 <e77628a5fb4fdef2b6d251ad190e798bda1e8abb.camel@redhat.com>
 <CAOE=1Z0_zdu0ojy3z02AoUO8OvBBepun+pVmLf71h=hqhtWpHA@mail.gmail.com>
 <a4a5f522a3ba1b78427d0df8e2e9ba49a03bd638.camel@redhat.com>
 <CAOE=1Z1DjTO9fgQChQbKtCszTniL-SB_0ETVQpvaBDTERAgWOg@mail.gmail.com>
 <48505965e0a9f0b8ae67358079864711d1755274.camel@redhat.com>
 <CAE1qvGS7zJG_UDbQY7vjyf9uBg5z-o96pTshwfzjrvZPYT0yhA@mail.gmail.com>
 <CAOE=1Z3PjWyvuKDVi_MzmtbDdfFUKQ3z8pxzTY9oT4atKa_Qag@mail.gmail.com>
Message-ID: <CALg0jKmXerTsi0mhtYq4ABbzdXgCNwNujdoGCD=3nVCVC84x+A@mail.gmail.com>

Hi,
Could you please open a doc bug for this issue on launchpad:
https://bugs.launchpad.net/neutron

Thanks for the efforts.

Lajos Katona (lajoskatona)


Simon Jones <batmanustc at gmail.com> ezt ?rta (id?pont: 2023. m?rc. 9., Cs,
15:29):

> Hi, all
>
> At last, I got the root cause of this 2 problem.
> And I suggest add these words to
> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html:
> ```
> Prerequisites:
> libvirt >= 7.9.0 . Like ubuntu-22.04, which use libvirt-8.0.0 by default.
> ```
>
> Root cause of problem 1, which is "no valid host":
> - Because libvirt version is too low.
>
> Root cause of problem 2, which is "why there are topology in DPU in
> openstack create port command":
> - Because add --binding-profile params in openstack create port command,
> which is NOT right.
>
> ----
> Simon Jones
>
>
> Dmitrii Shcherbakov <dmitrii.shcherbakov at canonical.com> ?2023?3?2???
> 20:30???
>
>> Hi {Sean, Simon},
>>
>> > did you ever give a presentation on the DPU support
>>
>> Yes, there were a couple at different stages.
>>
>> The following is the one of the older ones that references the SMARTNIC
>> VNIC type but we later switched to REMOTE_MANAGED in the final code:
>> https://www.openvswitch.org/support/ovscon2021/slides/smartnic_port_binding.pdf,
>> however, it has a useful diagram on page 15 which shows the interactions of
>> different components. A lot of other content from it is present in the
>> OpenStack docs now which we added during the feature development.
>>
>> There is also a presentation with a demo that we did at the Open Infra
>> summit https://youtu.be/Amxp-9yEnsU (I could not attend but we prepared
>> the material after the features got merged).
>>
>> Generally, as Sean described, the aim of this feature is to make the
>> interaction between components present at the hypervisor and the DPU side
>> automatic but, in order to make this workflow explicitly different from
>> SR-IOV or offload at the hypervisor side, one has to use the
>> "remote_managed" flag. This flag allows Nova to differentiate between
>> "regular" VFs and the ones that have to be programmed by a remote host
>> (DPU) - hence the name.
>>
>> A port needs to be pre-created with the remote-managed type - that way
>> when Nova tries to schedule a VM with that port attached, it will find
>> hosts which actually have PCI devices tagged with the "remote_managed":
>> "true" in the PCI whitelist.
>>
>> The important thing to note here is that you must not use PCI passthrough
>> directly for this - Nova will create a PCI device request automatically
>> with the remote_managed flag included. There is currently no way to
>> instruct Nova to choose one vendor/device ID vs the other for this (any
>> remote_managed=true device from a pool will match) but maybe the work that
>> was recently done to store PCI device information in the Placement service
>> will pave the way for such granularity in the future.
>>
>> Best Regards,
>> Dmitrii Shcherbakov
>> LP/MM/oftc: dmitriis
>>
>>
>> On Thu, Mar 2, 2023 at 1:54?PM Sean Mooney <smooney at redhat.com> wrote:
>>
>>> adding Dmitrii who was the primary developer of the openstack
>>> integration so
>>> they can provide more insight.
>>>
>>> Dmitrii did you ever give a presentationon the DPU support and how its
>>> configured/integrated
>>> that might help fill in the gaps for simon?
>>>
>>> more inline.
>>>
>>> On Thu, 2023-03-02 at 11:05 +0800, Simon Jones wrote:
>>> > E...
>>> >
>>> > But there are these things:
>>> >
>>> > 1) Show some real happened in my test:
>>> >
>>> > - Let me clear that, I use DPU in compute node:
>>> > The graph in
>>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>>> .
>>> >
>>> > - I configure exactly follow
>>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html,
>>> > which is said bellow in "3) Let me post all what I do follow this
>>> link".
>>> >
>>> > - In my test, I found after first three command (which is "openstack
>>> > network create ...", "openstack subnet create", "openstack port create
>>> ..."),
>>> > there are network topology exist in DPU side, and there are rules
>>> exist in
>>> > OVN north DB, south DB of controller, like this:
>>> >
>>> > > ```
>>> > > root at c1:~# ovn-nbctl show
>>> > > switch 9bdacdd4-ca2a-4e35-82ca-0b5fbd3a5976
>>> > > (neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69) (aka selfservice)
>>> > >     port 01a68701-0e6a-4c30-bfba-904d1b9813e1
>>> > >         addresses: ["unknown"]
>>> > >     port 18a44c6f-af50-4830-ba86-54865abb60a1 (aka pf0vf1)
>>> > >         addresses: ["fa:16:3e:13:36:e2 172.1.1.228"]
>>> > >
>>> > > gyw at c1:~$ sudo ovn-sbctl list Port_Binding
>>> > > _uuid               : 61dc8bc0-ab33-4d67-ac13-0781f89c905a
>>> > > chassis             : []
>>> > > datapath            : 91d3509c-d794-496a-ba11-3706ebf143c8
>>> > > encap               : []
>>> > > external_ids        : {name=pf0vf1, "neutron:cidrs"="172.1.1.241/24
>>> ",
>>> > > "neutron:device_id"="", "neutron:device_owner"="",
>>> > > "neutron:network_name"=neutron-066c8dc2-c98b-4fb8-a541-8b367e8f6e69,
>>> > > "neutron:port_name"=pf0vf1,
>>> > > "neutron:project_id"="512866f9994f4ad8916d8539a7cdeec9",
>>> > > "neutron:revision_number"="1",
>>> > > "neutron:security_group_ids"="de8883e8-ccac-4be2-9bb2-95e732b0c114"}
>>> > >
>>> > > root at c1c2dpu:~# sudo ovs-vsctl show
>>> > > 62cf78e5-2c02-471e-927e-1d69c2c22195
>>> > >     Bridge br-int
>>> > >         fail_mode: secure
>>> > >         datapath_type: system
>>> > >         Port br-int
>>> > >             Interface br-int
>>> > >                 type: internal
>>> > >         Port ovn--1
>>> > >             Interface ovn--1
>>> > >                 type: geneve
>>> > >                 options: {csum="true", key=flow,
>>> remote_ip="172.168.2.98"}
>>> > >         Port pf0vf1
>>> > >             Interface pf0vf1
>>> > >     ovs_version: "2.17.2-24a81c8"
>>> > > ```
>>> > >
>>> > That's why I guess "first three command" has already create network
>>> > topology, and "openstack server create" command only need to plug VF
>>> into
>>> > VM in HOST SIDE, DO NOT CALL NEUTRON. As network has already done.
>>> no that jsut looks like the standard bridge toplogy that gets created
>>> when you provision
>>> the dpu to be used with openstac vai ovn.
>>>
>>> that looks unrelated to the neuton comamnd you ran.
>>> >
>>> > - In my test, then I run "openstack server create" command, I got ERROR
>>> > which said "No valid host...", which is what the email said above.
>>> > The reason has already said, it's nova-scheduler's PCI filter module
>>> report
>>> > no valid host. The reason "nova-scheduler's PCI filter module report no
>>> > valid host" is nova-scheduler could NOT see PCI information of compute
>>> > node. The reason "nova-scheduler could NOT see PCI information of
>>> compute
>>> > node" is compute node's /etc/nova/nova.conf configure remote_managed
>>> tag
>>> > like this:
>>> >
>>> > > ```
>>> > > [pci]
>>> > > passthrough_whitelist = {"vendor_id": "15b3", "product_id": "101e",
>>> > > "physical_network": null, "remote_managed": "true"}
>>> > > alias = { "vendor_id":"15b3", "product_id":"101e",
>>> > > "device_type":"type-VF", "name":"a1" }
>>> > > ```
>>> > >
>>> >
>>> > 2) Discuss some detail design of "remote_managed" tag, I don't know if
>>> this
>>> > is right in the design of openstack with DPU:
>>> >
>>> > - In neutron-server side, use remote_managed tag in "openstack port
>>> create
>>> > ..." command.
>>> > This command will make neutron-server / OVN / ovn-controller / ovs to
>>> make
>>> > the network topology done, like above said.
>>> > I this this is right, because test shows that.
>>> that is not correct
>>> your test do not show what you think it does, they show the baisic bridge
>>> toplogy and flow configuraiton that ovn installs by defualt when it
>>> manages
>>> as ovs.
>>>
>>> please read the design docs for this feature for both nova and neutron
>>> to understand how the interacction works.
>>>
>>> https://specs.openstack.org/openstack/nova-specs/specs/yoga/implemented/integration-with-off-path-network-backends.html
>>>
>>> https://specs.openstack.org/openstack/neutron-specs/specs/yoga/off-path-smartnic-dpu-port-binding-with-ovn.html
>>> >
>>> > - In nova side, there are 2 things should process, first is PCI
>>> passthrough
>>> > filter, second is nova-compute to plug VF into VM.
>>> >
>>> > If the link above is right, which remote_managed tag exists in
>>> > /etc/nova/nova.conf of controller node and exists in
>>> /etc/nova/nova.conf of
>>> > compute node.
>>> > As above ("- In my test, then I run "openstack server create" command")
>>> > said, got ERROR in this step.
>>> > So what should do in "PCI passthrough filter" ? How to configure ?
>>> >
>>> > Then, if "PCI passthrough filter" stage pass, what will do of
>>> nova-compute
>>> > in compute node?
>>> >
>>> > 3) Post all what I do follow this link:
>>> > https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
>>> > - build openstack physical env, link plug DPU into compute mode, use
>>> VM as
>>> > controller ... etc.
>>> > - build openstack nova, neutron, ovn, ovn-vif, ovs follow that link.
>>> > - configure DPU side /etc/neutron/neutron.conf
>>> > - configure host side /etc/nova/nova.conf
>>> > - configure host side /etc/nova/nova-compute.conf
>>> > - run first 3 command
>>> > - last, run this command, got ERROR
>>> >
>>> > ----
>>> > Simon Jones
>>> >
>>> >
>>> > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 18:35???
>>> >
>>> > > On Wed, 2023-03-01 at 18:12 +0800, Simon Jones wrote:
>>> > > > Thanks a lot !!!
>>> > > >
>>> > > > As you say, I follow
>>> > > >
>>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html.
>>> > > > And I want to use DPU mode. Not "disable DPU mode".
>>> > > > So I think I should follow the link above exactlly, so I use
>>> > > > vnic-type=remote_anaged.
>>> > > > In my opnion, after I run first three command (which is "openstack
>>> > > network
>>> > > > create ...", "openstack subnet create", "openstack port create
>>> ..."), the
>>> > > > VF rep port and OVN and OVS rules are all ready.
>>> > > not at that point nothign will have been done on ovn/ovs
>>> > >
>>> > > that will only happen after the port is bound to a vm and host.
>>> > >
>>> > > > What I should do in "openstack server create ..." is to JUST add
>>> PCI
>>> > > device
>>> > > > into VM, do NOT call neutron-server in nova-compute of compute
>>> node (
>>> > > like
>>> > > > call port_binding or something).
>>> > > this is incorrect.
>>> > > >
>>> > > > But as the log and steps said in the emails above, nova-compute
>>> call
>>> > > > port_binding to neutron-server while running the command "openstack
>>> > > server
>>> > > > create ...".
>>> > > >
>>> > > > So I still have questions is:
>>> > > > 1) Is my opinion right? Which is "JUST add PCI device into VM, do
>>> NOT
>>> > > call
>>> > > > neutron-server in nova-compute of compute node ( like call
>>> port_binding
>>> > > or
>>> > > > something)" .
>>> > > no this is not how its designed.
>>> > > until you attach the logical port to a vm (either at runtime or as
>>> part of
>>> > > vm create)
>>> > > the logical port is not assocated with any host or phsical dpu/vf.
>>> > >
>>> > > so its not possibel to instanciate the openflow rules in ovs form the
>>> > > logical switch model
>>> > > in the ovn north db as no chassie info has been populated and we do
>>> not
>>> > > have the dpu serial
>>> > > info in the port binding details.
>>> > > > 2) If it's right, how to deal with this? Which is how to JUST add
>>> PCI
>>> > > > device into VM, do NOT call neutron-server? By command or by
>>> configure?
>>> > > Is
>>> > > > there come document ?
>>> > > no this happens automaticaly when nova does the port binding which
>>> cannot
>>> > > happen until after
>>> > > teh vm is schduled to a host.
>>> > > >
>>> > > > ----
>>> > > > Simon Jones
>>> > > >
>>> > > >
>>> > > > Sean Mooney <smooney at redhat.com> ?2023?3?1??? 16:15???
>>> > > >
>>> > > > > On Wed, 2023-03-01 at 15:20 +0800, Simon Jones wrote:
>>> > > > > > BTW, this link (
>>> > > > > >
>>> > >
>>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html)
>>> > > > > said
>>> > > > > > I SHOULD add "remote_managed" in /etc/nova/nova.conf, is that
>>> WRONG ?
>>> > > > >
>>> > > > > no its not wrong but for dpu smart nics you have to make a
>>> choice when
>>> > > you
>>> > > > > deploy
>>> > > > > either they can be used in dpu mode in which case remote_managed
>>> > > shoudl be
>>> > > > > set to true
>>> > > > > and you can only use them via neutron ports with
>>> > > vnic-type=remote_managed
>>> > > > > as descried in that doc
>>> > > > >
>>> > > > >
>>> > >
>>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html#launch-an-instance-with-remote-managed-port
>>> > > > >
>>> > > > >
>>> > > > > or if you disable dpu mode in the nic frimware then you shoudl
>>> remvoe
>>> > > > > remote_managed form the pci device list and
>>> > > > > then it can be used liek a normal vf either for neutron sriov
>>> ports
>>> > > > > vnic-type=direct or via flavor based pci passthough.
>>> > > > >
>>> > > > > the issue you were havign is you configured the pci device list
>>> to
>>> > > contain
>>> > > > > "remote_managed: ture" which means
>>> > > > > the vf can only be consumed by a neutron port with
>>> > > > > vnic-type=remote_managed, when you have "remote_managed: false"
>>> or
>>> > > unset
>>> > > > > you can use it via vnic-type=direct i forgot that slight detail
>>> that
>>> > > > > vnic-type=remote_managed is required for "remote_managed: ture".
>>> > > > >
>>> > > > >
>>> > > > > in either case you foudn the correct doc
>>> > > > >
>>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>>> > > > > neutorn sriov port configuration is documented here
>>> > > > >
>>> https://docs.openstack.org/neutron/latest/admin/config-sriov.html
>>> > > > > and nova flavor based pci passthough is documeted here
>>> > > > >
>>> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
>>> > > > >
>>> > > > > all three server slightly differnt uses. both neutron
>>> proceedures are
>>> > > > > exclusivly fo network interfaces.
>>> > > > >
>>> https://docs.openstack.org/neutron/latest/admin/ovn/smartnic_dpu.html
>>> > > > > requires the use of ovn deployed on the dpu
>>> > > > > to configure the VF contolplane.
>>> > > > >
>>> https://docs.openstack.org/neutron/latest/admin/config-sriov.html uses
>>> > > > > the sriov nic agent
>>> > > > > to manage the VF with ip tools.
>>> > > > >
>>> https://docs.openstack.org/nova/latest/admin/pci-passthrough.html is
>>> > > > > intended for pci passthough
>>> > > > > of stateless acclerorators like qat devices. while the nova
>>> flavor
>>> > > approch
>>> > > > > cna be used with nics it not how its generally
>>> > > > > ment to be used and when used to passthough a nic expectation is
>>> that
>>> > > its
>>> > > > > not related to a neuton network.
>>> > > > >
>>> > > > >
>>> > >
>>> > >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/644c0e6c/attachment-0001.htm>

From elod.illes at est.tech  Fri Mar 10 14:48:05 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Fri, 10 Mar 2023 14:48:05 +0000
Subject: [release] Release countdown for week R-1, March 13-17
Message-ID: <VI1P18901MB07515430323DCD2AB003F638FFBA9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Development Focus
-----------------

We are on the final mile of the 2023.1 Antelope development cycle!

Remember that the 2023.1 Antelope final release will include the latest
release candidate (for cycle-with-rc deliverables) or the latest
intermediary release (for cycle-with-intermediary deliverables)
available.

March 17th, 2023 is the deadline for final 2023.1 Antelope release
candidates as well as any last cycle-with-intermediary deliverables.
We will then enter a quiet period until we tag the final release on
March 22nd, 2023. Teams should be prioritizing fixing release-critical
bugs, before that deadline.

Otherwise it's time to start planning the 2023.2 Bobcat development
cycle, including discussing PTG sessions content, in preparation of
the 2023.2 Bobcat Virtual PTG (March 27-31, 2023).

Actions
-------

Watch for any translation patches coming through on the stable/2023.1
branch and merge them quickly. If you discover a release-critical issue,
please make sure to fix it on the master branch first, then backport the
bugfix to the stable/2023.1 branch before triggering a new release.

Please drop by #openstack-release with any questions or concerns about
the upcoming release !

Upcoming Deadlines & Dates
--------------------------

Final 2023.1 Antelope release: March 22nd, 2023
2023.2 Bobcat Virtual PTG:     March 27-31, 2023


El?d Ill?s
irc: elodilles @ #openstack-release

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/4cbf3f41/attachment.htm>

From garcetto at gmail.com  Fri Mar 10 15:04:43 2023
From: garcetto at gmail.com (garcetto)
Date: Fri, 10 Mar 2023 16:04:43 +0100
Subject: [manila] support for encryption
Message-ID: <CADA6EfsPq2QLbGVoVVVEF7Gq_R83YcVwZwG=0EOaencc8c-P+Q@mail.gmail.com>

good afternoon,
 does manila support encryption in some sort ?

thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/3a295bb9/attachment.htm>

From satish.txt at gmail.com  Fri Mar 10 16:54:40 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 10 Mar 2023 11:54:40 -0500
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
 <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
 <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>
 <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com>
Message-ID: <CAPgF-frESL7T2hDHb3kXgYqd0hZ2u+3Scajthe0WqVBUscs2+g@mail.gmail.com>

Hi Sean,

I have a few questions and they are in-line. This is the reference doc i am
trying to achieve in my private cloud -
https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html

On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney <smooney at redhat.com> wrote:

> On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote:
> > Thanks Sean,
> >
> > I don't have NIC which supports hardware offloading or any kind of
> feature.
> > I am using intel nic 82599 just for SRIOV and looking for bonding
> > support which is only possible inside VM. As you know we already run a
> > large SRIOV environment with openstack but my biggest issue is to upgrade
> > switches without downtime. I want to be more resilient to not worry
> > about that.
> >
> > Do you still think it's dangerous or not a good idea to bond sriov nic
> > inside VM?  what could go wrong here just trying to understand before i
> go
> > crazy :)
> lacp bond mode generaly dont work fully but you should be abel to get
> basic failover bondign working
> and perhaps tcp loadbalcing provide it does not require switch coperator
> to work form inside the guest.
>

What do you mean by not working fully? Are you talking about active-active
vs active-standby?


>
> just keep in mind that by defintion if you decalre a network as on a
> seperate phsynet to another
> then you as the operator are asserting that there is no l2 connectivity
> between those networks.
>
>
This is interesting why not both physnet have the same L2 segment? Are you
worried STP about the loop? But that is how LACP works both physical
interfaces on the same segments.


> as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on
> phsynet_2
>

I did a test in the lab with physnet_1 and physnet_2 both on the same VLAN
ID in the same L2 domain and all works.


>
> if you break that and use phsynets to select PFs you are also breaking
> neutron multi teancy model
> meaning it is not safy to aloow end uers to create vlan networks and
> instead you can only use provider created
> vlan networks.
>

This is a private cloud and we don't have any multi-tenancy model. We have
all VLAN base providers and my Datacenter core router is the gateway for
all my vlans providers.


>
> so what you want to do is proably achiveable but you menthion phsyntes per
> pf and that sounds like you are breaking
> the physnets are seperate isolagged phsycial netowrks rule.
>

I can understand each physnet should be in a different tenant but in my
case its vlan base provider and not sure what rules it's going to break.


>
> >
> >
> >
> >
> > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney <smooney at redhat.com> wrote:
> >
> > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > > > Folks,
> > > >
> > > > As you know, SR-IOV doesn't support bonding so the only solution is
> to
> > > > implement LACP bonding inside the VM.
> > > >
> > > > I did some tests in the lab to create two physnet and map them with
> two
> > > > physical nic and create VF and attach them to VM. So far all good
> but one
> > > > problem I am seeing is each neutron port I create has an IP address
> > > > associated and I can use only one IP on bond but that is just a
> waste of
> > > IP
> > > > in the Public IP pool.
> > > >
> > > > Are there any way to create sriov port but without IP address?
> > > techinially we now support adressless port in neutron and nova.
> > > so that shoudl be possible.
> > > if you tried to do this with hardware offloaed ovs rather then the
> > > standard sriov with the sriov
> > > nic agent you likel will need to also use the allowed_adress_pairs
> > > extension to ensure that ovs did not
> > > drop the packets based on the ip adress. if you are using heriarcical
> port
> > > binding where you TOR is manged
> > > by an ml2 driver you might also need the allowed_adress_pairs extension
> > > with the sriov nic agent to make sure
> > > the packets are not drop at the swtitch level.
> > >
> > > as you likely arlready no we do not support VF bonding in openstack or
> > > bonded ports in general in then neutron api.
> > > there was an effort a few years ago to make a bond port extention that
> > > mirror hwo trunk ports work
> > > i.e. hanving 2 neutron subport and a bond port that  agreates them but
> we
> > > never got that far with
> > > the design. that would have enabeld boning to be implemtned in diffent
> ml2
> > > driver  like ovs/sriov/ovn ectra with
> > > a consitent/common api.
> > >
> > > some people have used mellonox's VF lag functionalty howver that was
> never
> > > actully enable propelry in nova/neutron
> > > so its not officlaly supported upstream but that functional allow you
> to
> > > attach only a singel VF to the guest form
> > > bonded ports on a single card.
> > >
> > > there is no supprot in nova/neutron for that offically as i said it
> just
> > > happens to work unitnetionally so i would not
> > > advise that you use it in produciton unless your happy to work though
> any
> > > issues you find yourself.
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/ed6c8793/attachment-0001.htm>

From pierre at stackhpc.com  Fri Mar 10 17:07:53 2023
From: pierre at stackhpc.com (Pierre Riteau)
Date: Fri, 10 Mar 2023 18:07:53 +0100
Subject: [blazar][ptg] Bobcat PTG scheduling
Message-ID: <CA+ny2szWXDRG8SE0tDREnoUjiYDPauSN8ybXf5vGVQ8R+bZ_Gg@mail.gmail.com>

Hello,

The Bobcat PTG will happen online during the week starting March 27.

As the Blazar project has done in the past, I suggest we meet on Thursday,
but starting 1400 UTC rather than the usual 1500 of our biweekly meeting. I
have booked two hours in the Bexar room. If you want to join, please let me
know if this works for you.

To summarise, the Blazar project will meet on Thursday March 30 from 1400
UTC to 1600 UTC.

We will prepare discussion topics on Etherpad.

Cheers,
Pierre Riteau (priteau)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/dbb7994a/attachment.htm>

From smooney at redhat.com  Fri Mar 10 17:37:27 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 10 Mar 2023 17:37:27 +0000
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <CAPgF-frESL7T2hDHb3kXgYqd0hZ2u+3Scajthe0WqVBUscs2+g@mail.gmail.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
 <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
 <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>
 <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com>
 <CAPgF-frESL7T2hDHb3kXgYqd0hZ2u+3Scajthe0WqVBUscs2+g@mail.gmail.com>
Message-ID: <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com>

On Fri, 2023-03-10 at 11:54 -0500, Satish Patel wrote:
> Hi Sean,
> 
> I have a few questions and they are in-line. This is the reference doc i am
> trying to achieve in my private cloud -
> https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html
^ is only safe in a multi tenant envionment if 
https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.tenant_network_types does not container vlan or flat.

it is technially breaking neutron rules for how to use phsyents.

in private cloud where tenatn isolation is not required operators have abused this for years for things like selecting numa nodes
and many other usecase which are unsafe in a public cloud.

> 
> On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote:
> > > Thanks Sean,
> > > 
> > > I don't have NIC which supports hardware offloading or any kind of
> > feature.
> > > I am using intel nic 82599 just for SRIOV and looking for bonding
> > > support which is only possible inside VM. As you know we already run a
> > > large SRIOV environment with openstack but my biggest issue is to upgrade
> > > switches without downtime. I want to be more resilient to not worry
> > > about that.
> > > 
> > > Do you still think it's dangerous or not a good idea to bond sriov nic
> > > inside VM?  what could go wrong here just trying to understand before i
> > go
> > > crazy :)
> > lacp bond mode generaly dont work fully but you should be abel to get
> > basic failover bondign working
> > and perhaps tcp loadbalcing provide it does not require switch coperator
> > to work form inside the guest.
> > 
> 
> What do you mean by not working fully? Are you talking about active-active
> vs active-standby?
some lacp modes require configuration on the swtich others do not
you can only really do that form the pf as at the switch level you can bring down
the port fo ronly some vlans in a failover case.

https://docs.rackspace.com/blog/lacp-bonding-and-linux-configuration/

i belive mode 0, 1, 2, 5 and 6 can work withour sepcial switgh config.

3 and 4 i think reuqired switch cooperation

IEEE 802.3ad (mode 4) in particalar i think neeed coperation with the switch.
"""The link is set up dynamically between two LACP-supporting peers."""
 https://en.wikipedia.org/wiki/Link_aggregation

that peerign session can only really run on the PFs

balance-tlb (5) and balance-alb(6) shoudl work fine for teh VFs in the guest however.

> 
> 
> > 
> > just keep in mind that by defintion if you decalre a network as on a
> > seperate phsynet to another
> > then you as the operator are asserting that there is no l2 connectivity
> > between those networks.
> > 
> > 
> This is interesting why not both physnet have the same L2 segment? Are you
> worried STP about the loop? But that is how LACP works both physical
> interfaces on the same segments.
if they are on the same l2 segment then there is no multi tancy when using vlan or flat netowrks.
more on this below.
> 
> 
> 
> > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on
> > phsynet_2
> > 
> 
> I did a test in the lab with physnet_1 and physnet_2 both on the same VLAN
> ID in the same L2 domain and all works.

if you create 2 neutron networks 

physnet_1_vlan_100 and physnet_2_vlan_100

and map phsynet_1 to eth1 and phsnet_2 to eth2
and plug the both into the same TOR with vlan 100 trunked to both

then boot one vm on physnet_1_vlan_100 and a second on physnet_2_vlan_100

then a few things will hapen.

the vms will boot fine and both will get ips.
second there will be no isolation between the two networks
so if you use the same subnet on both then they will be able to direcly ping each other.

its unsafe to have teant cretable vlan networks in this if you have overlaping vlan ranges between physnet_1 and physnet_2
as there will be no tenant isolation enforeced at teh network level.

form a neutron point of view physnet_1_vlan_100 and physnet_2_vlan_100 are two entrily differnt netowrks and
its the oeprators responsiblity to ensure there network fabric ensure the same vlan on two phsnets cant comunicate.


> 
> 
> > 
> > if you break that and use phsynets to select PFs you are also breaking
> > neutron multi teancy model
> > meaning it is not safy to aloow end uers to create vlan networks and
> > instead you can only use provider created
> > vlan networks.
> > 
> 
> This is a private cloud and we don't have any multi-tenancy model. We have
> all VLAN base providers and my Datacenter core router is the gateway for
> all my vlans providers.
ack in which case you can live with the fact that there is no mulit taenancy
guarentees because the rules areound phsynets have been broken.

this is prrety common in telco cloud by the way so you would not be the first to do this.
> 
> 
> > 
> > so what you want to do is proably achiveable but you menthion phsyntes per
> > pf and that sounds like you are breaking
> > the physnets are seperate isolagged phsycial netowrks rule.
> > 
> 
> I can understand each physnet should be in a different tenant but in my
> case its vlan base provider and not sure what rules it's going to break.
each physnet does not need to be a diffent tenatn
the imporant thing is that neutron expects vlans on differnt physnets to be allcoateable seperatly.

so the same vlan on 2 phsynets logically represnet 2 differnt networks.
> 
> 
> > 
> > > 
> > > 
> > > 
> > > 
> > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney <smooney at redhat.com> wrote:
> > > 
> > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > > > > Folks,
> > > > > 
> > > > > As you know, SR-IOV doesn't support bonding so the only solution is
> > to
> > > > > implement LACP bonding inside the VM.
> > > > > 
> > > > > I did some tests in the lab to create two physnet and map them with
> > two
> > > > > physical nic and create VF and attach them to VM. So far all good
> > but one
> > > > > problem I am seeing is each neutron port I create has an IP address
> > > > > associated and I can use only one IP on bond but that is just a
> > waste of
> > > > IP
> > > > > in the Public IP pool.
> > > > > 
> > > > > Are there any way to create sriov port but without IP address?
> > > > techinially we now support adressless port in neutron and nova.
> > > > so that shoudl be possible.
> > > > if you tried to do this with hardware offloaed ovs rather then the
> > > > standard sriov with the sriov
> > > > nic agent you likel will need to also use the allowed_adress_pairs
> > > > extension to ensure that ovs did not
> > > > drop the packets based on the ip adress. if you are using heriarcical
> > port
> > > > binding where you TOR is manged
> > > > by an ml2 driver you might also need the allowed_adress_pairs extension
> > > > with the sriov nic agent to make sure
> > > > the packets are not drop at the swtitch level.
> > > > 
> > > > as you likely arlready no we do not support VF bonding in openstack or
> > > > bonded ports in general in then neutron api.
> > > > there was an effort a few years ago to make a bond port extention that
> > > > mirror hwo trunk ports work
> > > > i.e. hanving 2 neutron subport and a bond port that  agreates them but
> > we
> > > > never got that far with
> > > > the design. that would have enabeld boning to be implemtned in diffent
> > ml2
> > > > driver  like ovs/sriov/ovn ectra with
> > > > a consitent/common api.
> > > > 
> > > > some people have used mellonox's VF lag functionalty howver that was
> > never
> > > > actully enable propelry in nova/neutron
> > > > so its not officlaly supported upstream but that functional allow you
> > to
> > > > attach only a singel VF to the guest form
> > > > bonded ports on a single card.
> > > > 
> > > > there is no supprot in nova/neutron for that offically as i said it
> > just
> > > > happens to work unitnetionally so i would not
> > > > advise that you use it in produciton unless your happy to work though
> > any
> > > > issues you find yourself.
> > > > 
> > > > 
> > 
> > 


From jay at gr-oss.io  Fri Mar 10 18:27:34 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Fri, 10 Mar 2023 10:27:34 -0800
Subject: [security-sig][ironic] Ironic + the VMT
In-Reply-To: <20230227181721.jkxd2q3pfr6d7fo6@yuggoth.org>
References: <CA+sTGNcs0g5kWEN+gR=Tm996KEE1Yr7mby3jzAXHY+0kOLz5tA@mail.gmail.com>
 <20230227181721.jkxd2q3pfr6d7fo6@yuggoth.org>
Message-ID: <CA+sTGNedOeV-x3BP=h72us3YjT7d3owy9f3n50j-jOApqQstwg@mail.gmail.com>

I've reviewed the requirements, and it's my intention to set Ironic as
under the VMT. I'll wait until it can be announced at Monday's meeting to
make it official so folks can have a chance to object if they wish.

-
Jay Faulkner
Ironic PTL
TC Member

On Mon, Feb 27, 2023 at 10:26?AM Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2023-02-27 08:16:50 -0800 (-0800), Jay Faulkner wrote:
> [...]
> > Is there any reason Ironic should not be vulnerability-managed? Is the
> > security team willing to have us?
>
> As long as you make sure you're good with this checklist, just
> propose the specific repositories in question as an update to the
> top section of the document (in openstack/ossa):
>
> https://security.openstack.org/repos-overseen.html#requirements
>
> > The only potential complication is that Ironic may receive reports
> > for vendor libraries used by Ironic but not maintained by
> > Ironic -- I was hoping there might already be some historical
> > precedent for how we handle those; it can't be that unique to
> > Ironic.
> [...]
>
>     2. The VMT will not track or issue advisories for external
>     software components. Only source code provided by official
>     OpenStack project teams is eligible for oversight by the VMT.
>     For example, base operating system components included in a
>     server/container image or libraries vendored into compiled
>     binary artifacts are not within the VMT?s scope.
>
> Receiving bug reports about such things is fine, but the VMT doesn't
> coordinate those reports nor issue official security advisories
> about them since they need fixing by their upstream maintainers with
> whom we have no direct relationship. You can still propose security
> notes urging operators to update software in those situations, if it
> seems appropriate to do so:
>
> https://wiki.openstack.org/wiki/Security_Notes
>
> --
> Jeremy Stanley
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/c5bb4c39/attachment.htm>

From jay at gr-oss.io  Fri Mar 10 18:39:53 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Fri, 10 Mar 2023 10:39:53 -0800
Subject: cryptography min version (non-rust) through 2024.1
In-Reply-To: <20230307204345.b5hvqarqyp25gqj3@yuggoth.org>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
 <CA+sTGNddyzmRGoGjuWet0d2yJ6VNER458NXdr76sp40-GUVhgw@mail.gmail.com>
 <20230307204345.b5hvqarqyp25gqj3@yuggoth.org>
Message-ID: <CA+sTGNdb3mmnWom5udxt3z6g2uzQ7iXqCnt98PiAHT2_TyZxOw@mail.gmail.com>

> I expect the TC is going to choose Ubuntu 22.04 LTS as a target
> platform for at least the OpenStack 2023.2 and 2024.1 coordinated
> releases, but almost certainly the 2024.2 coordinated release as
> well since Ubuntu 24.04 LTS won't be officially available before we
> start that development cycle. That means the first coordinated
> OpenStack release which would be able to effectively depend on
> features from a newer python3-cryptography package on Ubuntu is
> going to be 2025.1. Food for thought.
>

To be explicit, that testing platform means we require that OpenStack and
its dependencies are installable *in a virtualenv*, not using distro
python. So while this is a small imitation for this case (we cannot use
cryptography that might require newer tooling to build than that ubuntu LTS
would), the real motivation behind holding onto a lower-constraint for a
longer time is in partnership with stable distros that ship OpenStack, who
we don't want to alienate by giving them extra work of using newer releases
than they are ready to.

I am OK with this approach; but there is a trade-off: we may be leading
OpenStack consumers who don't use distro packaging that it's OK to use an
older cryptography which may not have security support. It'd be interesting
if we could ensure backwards compatibility through testing while also
ensuring that anything installed with pip gets a new enough version... but
frankly, I don't know offhand what approach would be best to do that, and I
don't have the time to pursue it myself so the status quo wins :D.

Thanks for talking this through, I like when we're explicit about the
motivations for what we do!


--
Jay Faulkner
Ironic PTL
TC Member
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/012742dd/attachment-0001.htm>

From satish.txt at gmail.com  Fri Mar 10 19:14:30 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 10 Mar 2023 14:14:30 -0500
Subject: [neutron] bonding sriov nic inside VMs
In-Reply-To: <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com>
References: <CAPgF-fq5NcPH7-pVuzRxne0wMBc+kqF0EDx3gACKvChXdxLoyg@mail.gmail.com>
 <d4b8fdffff8a48704477b1ed8d5817e01b12c60f.camel@redhat.com>
 <CAPgF-fpfbNAy2X7LqCk5xoDVXD+=dJsPn8r2epXAmZkDzbhQaQ@mail.gmail.com>
 <78ee9e543b5bda121d04bd41c1454dca38de334a.camel@redhat.com>
 <CAPgF-frESL7T2hDHb3kXgYqd0hZ2u+3Scajthe0WqVBUscs2+g@mail.gmail.com>
 <40b0dfa26e2c0d869c1dfd9d0fb23d7bd719dc03.camel@redhat.com>
Message-ID: <CAPgF-freqcVDzRKz5JnrL2=L8c7seayNrbUA4N9V_j+=sGfs9w@mail.gmail.com>

Thank you Sean for the detailed explanation,

I agree on LACP mode and I think Active-Standby would be a better and safer
option for me.

Yes, as you said telco is abusing many neutron rules and i think i am one
of them because pretty much we are running telco applications :) As I said,
it's a private cloud so I can break and bend rules to just make
applications available 24x7.  We don't have any multi-tenancy where I
should be worried about security.

Last question, Related MAC Address change because neutron doesn't allow
change of Mac address correct so i have to set the same MAC Address on both
sriov port. As per reference blog.

On Fri, Mar 10, 2023 at 12:37?PM Sean Mooney <smooney at redhat.com> wrote:

> On Fri, 2023-03-10 at 11:54 -0500, Satish Patel wrote:
> > Hi Sean,
> >
> > I have a few questions and they are in-line. This is the reference doc i
> am
> > trying to achieve in my private cloud -
> >
> https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html
> ^ is only safe in a multi tenant envionment if
>
> https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.tenant_network_types
> does not container vlan or flat.
>
> it is technially breaking neutron rules for how to use phsyents.
>
> in private cloud where tenatn isolation is not required operators have
> abused this for years for things like selecting numa nodes
> and many other usecase which are unsafe in a public cloud.
>
> >
> > On Fri, Mar 10, 2023 at 9:02?AM Sean Mooney <smooney at redhat.com> wrote:
> >
> > > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote:
> > > > Thanks Sean,
> > > >
> > > > I don't have NIC which supports hardware offloading or any kind of
> > > feature.
> > > > I am using intel nic 82599 just for SRIOV and looking for bonding
> > > > support which is only possible inside VM. As you know we already run
> a
> > > > large SRIOV environment with openstack but my biggest issue is to
> upgrade
> > > > switches without downtime. I want to be more resilient to not worry
> > > > about that.
> > > >
> > > > Do you still think it's dangerous or not a good idea to bond sriov
> nic
> > > > inside VM?  what could go wrong here just trying to understand
> before i
> > > go
> > > > crazy :)
> > > lacp bond mode generaly dont work fully but you should be abel to get
> > > basic failover bondign working
> > > and perhaps tcp loadbalcing provide it does not require switch
> coperator
> > > to work form inside the guest.
> > >
> >
> > What do you mean by not working fully? Are you talking about
> active-active
> > vs active-standby?
> some lacp modes require configuration on the swtich others do not
> you can only really do that form the pf as at the switch level you can
> bring down
> the port fo ronly some vlans in a failover case.
>
> https://docs.rackspace.com/blog/lacp-bonding-and-linux-configuration/
>
> i belive mode 0, 1, 2, 5 and 6 can work withour sepcial switgh config.
>
> 3 and 4 i think reuqired switch cooperation
>
> IEEE 802.3ad (mode 4) in particalar i think neeed coperation with the
> switch.
> """The link is set up dynamically between two LACP-supporting peers."""
>  https://en.wikipedia.org/wiki/Link_aggregation
>
> that peerign session can only really run on the PFs
>
> balance-tlb (5) and balance-alb(6) shoudl work fine for teh VFs in the
> guest however.
>
> >
> >
> > >
> > > just keep in mind that by defintion if you decalre a network as on a
> > > seperate phsynet to another
> > > then you as the operator are asserting that there is no l2 connectivity
> > > between those networks.
> > >
> > >
> > This is interesting why not both physnet have the same L2 segment? Are
> you
> > worried STP about the loop? But that is how LACP works both physical
> > interfaces on the same segments.
> if they are on the same l2 segment then there is no multi tancy when using
> vlan or flat netowrks.
> more on this below.
> >
> >
> >
> > > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan
> 100 on
> > > phsynet_2
> > >
> >
> > I did a test in the lab with physnet_1 and physnet_2 both on the same
> VLAN
> > ID in the same L2 domain and all works.
>
> if you create 2 neutron networks
>
> physnet_1_vlan_100 and physnet_2_vlan_100
>
> and map phsynet_1 to eth1 and phsnet_2 to eth2
> and plug the both into the same TOR with vlan 100 trunked to both
>
> then boot one vm on physnet_1_vlan_100 and a second on physnet_2_vlan_100
>
> then a few things will hapen.
>
> the vms will boot fine and both will get ips.
> second there will be no isolation between the two networks
> so if you use the same subnet on both then they will be able to direcly
> ping each other.
>
> its unsafe to have teant cretable vlan networks in this if you have
> overlaping vlan ranges between physnet_1 and physnet_2
> as there will be no tenant isolation enforeced at teh network level.
>
> form a neutron point of view physnet_1_vlan_100 and physnet_2_vlan_100 are
> two entrily differnt netowrks and
> its the oeprators responsiblity to ensure there network fabric ensure the
> same vlan on two phsnets cant comunicate.
>
>
> >
> >
> > >
> > > if you break that and use phsynets to select PFs you are also breaking
> > > neutron multi teancy model
> > > meaning it is not safy to aloow end uers to create vlan networks and
> > > instead you can only use provider created
> > > vlan networks.
> > >
> >
> > This is a private cloud and we don't have any multi-tenancy model. We
> have
> > all VLAN base providers and my Datacenter core router is the gateway for
> > all my vlans providers.
> ack in which case you can live with the fact that there is no mulit
> taenancy
> guarentees because the rules areound phsynets have been broken.
>
> this is prrety common in telco cloud by the way so you would not be the
> first to do this.
> >
> >
> > >
> > > so what you want to do is proably achiveable but you menthion phsyntes
> per
> > > pf and that sounds like you are breaking
> > > the physnets are seperate isolagged phsycial netowrks rule.
> > >
> >
> > I can understand each physnet should be in a different tenant but in my
> > case its vlan base provider and not sure what rules it's going to break.
> each physnet does not need to be a diffent tenatn
> the imporant thing is that neutron expects vlans on differnt physnets to
> be allcoateable seperatly.
>
> so the same vlan on 2 phsynets logically represnet 2 differnt networks.
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Mar 10, 2023 at 6:57?AM Sean Mooney <smooney at redhat.com>
> wrote:
> > > >
> > > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > > > > > Folks,
> > > > > >
> > > > > > As you know, SR-IOV doesn't support bonding so the only solution
> is
> > > to
> > > > > > implement LACP bonding inside the VM.
> > > > > >
> > > > > > I did some tests in the lab to create two physnet and map them
> with
> > > two
> > > > > > physical nic and create VF and attach them to VM. So far all good
> > > but one
> > > > > > problem I am seeing is each neutron port I create has an IP
> address
> > > > > > associated and I can use only one IP on bond but that is just a
> > > waste of
> > > > > IP
> > > > > > in the Public IP pool.
> > > > > >
> > > > > > Are there any way to create sriov port but without IP address?
> > > > > techinially we now support adressless port in neutron and nova.
> > > > > so that shoudl be possible.
> > > > > if you tried to do this with hardware offloaed ovs rather then the
> > > > > standard sriov with the sriov
> > > > > nic agent you likel will need to also use the allowed_adress_pairs
> > > > > extension to ensure that ovs did not
> > > > > drop the packets based on the ip adress. if you are using
> heriarcical
> > > port
> > > > > binding where you TOR is manged
> > > > > by an ml2 driver you might also need the allowed_adress_pairs
> extension
> > > > > with the sriov nic agent to make sure
> > > > > the packets are not drop at the swtitch level.
> > > > >
> > > > > as you likely arlready no we do not support VF bonding in
> openstack or
> > > > > bonded ports in general in then neutron api.
> > > > > there was an effort a few years ago to make a bond port extention
> that
> > > > > mirror hwo trunk ports work
> > > > > i.e. hanving 2 neutron subport and a bond port that  agreates them
> but
> > > we
> > > > > never got that far with
> > > > > the design. that would have enabeld boning to be implemtned in
> diffent
> > > ml2
> > > > > driver  like ovs/sriov/ovn ectra with
> > > > > a consitent/common api.
> > > > >
> > > > > some people have used mellonox's VF lag functionalty howver that
> was
> > > never
> > > > > actully enable propelry in nova/neutron
> > > > > so its not officlaly supported upstream but that functional allow
> you
> > > to
> > > > > attach only a singel VF to the guest form
> > > > > bonded ports on a single card.
> > > > >
> > > > > there is no supprot in nova/neutron for that offically as i said it
> > > just
> > > > > happens to work unitnetionally so i would not
> > > > > advise that you use it in produciton unless your happy to work
> though
> > > any
> > > > > issues you find yourself.
> > > > >
> > > > >
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/a3d90253/attachment.htm>

From fungi at yuggoth.org  Fri Mar 10 19:19:25 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Fri, 10 Mar 2023 19:19:25 +0000
Subject: cryptography min version (non-rust) through 2024.1
In-Reply-To: <CA+sTGNdb3mmnWom5udxt3z6g2uzQ7iXqCnt98PiAHT2_TyZxOw@mail.gmail.com>
References: <CADn0iZ2+o4We-TyzTXiyRc4RcexjGcqdwHmehj4b=4t_j0O=OA@mail.gmail.com>
 <CA+sTGNddyzmRGoGjuWet0d2yJ6VNER458NXdr76sp40-GUVhgw@mail.gmail.com>
 <20230307204345.b5hvqarqyp25gqj3@yuggoth.org>
 <CA+sTGNdb3mmnWom5udxt3z6g2uzQ7iXqCnt98PiAHT2_TyZxOw@mail.gmail.com>
Message-ID: <20230310191925.m5amjf3idd6mss2q@yuggoth.org>

On 2023-03-10 10:39:53 -0800 (-0800), Jay Faulkner wrote:
[...]
> To be explicit, that testing platform means we require that
> OpenStack and its dependencies are installable *in a virtualenv*,
> not using distro python.

Well, "sort of." Let's unpack those assertions:

1. "[we're testing] that OpenStack and its dependencies are
   installable in a virtualenv"

Here I think you mean specifically its Python library dependencies,
OpenStack has lots of (in many cases versioned) dependencies on
non-Python components of a system and that's a big part of what
we're checking by testing on multiple platforms. Where Python
libraries are concerned, some may include non-Python "binary"
extensions so it's not purely Python in the Python dependencies
either. We also install some things like Javascript libraries from
places other than the distribution (e.g. NPM) on those platforms.

But we don't only test in virtualenvs/venvs either. Today at least,
DevStack/Grenade jobs install a vast majority of those dependencies
into the base system environment. That probably won't be the case in
the future thanks to increasing adoption of the PEP 668
EXTERNALLY-MANAGED flag, but for now at least we're comingling
pip-installed and distro-installed Python libraries in those jobs.

2. "[we're testing] OpenStack and its dependencies [...] not using
   distro python"

This is definitely not true, but is maybe not what you meant. We
definitely test with the exact build of Python interpreter and
stdlib that these platforms ship, backported fixes and all. That's
also a major reason why we test on multiple platforms, so that we
can know our software will work with the Python that's provided on
those platforms, which is how many of our users will be running our
software too.

> So while this is a small imitation for this case (we cannot use
> cryptography that might require newer tooling to build than that
> ubuntu LTS would), the real motivation behind holding onto a
> lower-constraint for a longer time is in partnership with stable
> distros that ship OpenStack, who we don't want to alienate by
> giving them extra work of using newer releases than they are ready
> to.

Yes, more precisely these distros are not likely to backport an
entire new Rust toolchain just so that new versions of OpenStack can
be installed. Instead, they're going to patch the new versions of
OpenStack so that they work with older libraries that don't require
an entire new Rust toolchain, because that's the easier of the two
options. If we can avoid requiring them to do either of those
things, obviously it's even nicer for them. If they're going to do
the second thing anyway, we could just say "hey send us those
patches and we'll consider them bug fixes, because we want people to
be able to easily use OpenStack on stable server platforms."

> I am OK with this approach; but there is a trade-off: we may be
> leading OpenStack consumers who don't use distro packaging that
> it's OK to use an older cryptography which may not have security
> support.

I'd argue that we already do this, because our stable branches use
frozen-in-time constraints lists specifying obsolete versions of
dependencies which are not receiving upstream security support. The
intent is that we only use those to test backported fixes without
destabilizing our CI jobs, but some of our deployment projects even
still build container images or deploy packages from PyPI based on
what's in those frozen constraints lists, much to my dismay.

> It'd be interesting if we could ensure backwards compatibility
> through testing while also ensuring that anything installed with
> pip gets a new enough version... but frankly, I don't know offhand
> what approach would be best to do that, and I don't have the time
> to pursue it myself so the status quo wins :D.
[...]

My preference would be to just keep testing with the latest version
of PYCA/Cryptography from PyPI in our master branch jobs, and tell
people who want to use OpenStack on stable server distributions that
we'll accept bug reports and review their fixes if we start doing
something that the python3-cryptography package on those
distributions doesn't support. That's just one approach though,
there are certainly other ways to go about it.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/1033959e/attachment-0001.sig>

From alsotoes at gmail.com  Fri Mar 10 20:48:34 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Fri, 10 Mar 2023 14:48:34 -0600
Subject: [manila] support for encryption
In-Reply-To: <CADA6EfsPq2QLbGVoVVVEF7Gq_R83YcVwZwG=0EOaencc8c-P+Q@mail.gmail.com>
References: <CADA6EfsPq2QLbGVoVVVEF7Gq_R83YcVwZwG=0EOaencc8c-P+Q@mail.gmail.com>
Message-ID: <CA+eLJkanUKGMpjxWKi_NssWsOx5zH42UaQUPGqcnxn+-gDJC6Q@mail.gmail.com>

You mean data or token encryption?

Cheers!

On Fri, Mar 10, 2023 at 9:08?AM garcetto <garcetto at gmail.com> wrote:

> good afternoon,
>  does manila support encryption in some sort ?
>
> thank you
>


-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/734b029d/attachment.htm>

From gmann at ghanshyammann.com  Fri Mar 10 20:55:49 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Fri, 10 Mar 2023 12:55:49 -0800
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
References: <CAHV77z9uS9YnRZke75En-pAS=n0zy-SBF0hDEurz-myL98mXGQ@mail.gmail.com>
 <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com>
 <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
Message-ID: <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>

 ---- On Wed, 22 Feb 2023 10:13:32 -0800  James Slagle  wrote --- 
 > On Wed, Feb 22, 2023 at 12:43 PM Ghanshyam Mann gmann at ghanshyammann.com> wrote:
 > > Hi James,
 > >
 > > Just checking if you got a chance to discuss this with the TripleO team?
 > 
 > Yes, I asked folks to reply here if there are any volunteers for
 > stable/zed maintenance, or any other feedback about the approach. I do
 > not personally know of any volunteers.
 
Ok. We discussed the stable/zed case in the TC meeting and decided[1] to keep stable/zed as 'supported
but no maintainers' (will update this information in stable/zed README.rst file).

For the master branch, you can follow the normal deprecation process mentioned in the project-team-guide[2].
I have proposed step 1 in governance to mark it deprecated, please check and we need PTL +1
on that.

- https://review.opendev.org/c/openstack/governance/+/877132

NOTE: As this is deprecated and not retired yet, we still need PTL nomination for TrilpeO[3]

[1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.log.html#l-256
[2] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository
[3] https://etherpad.opendev.org/p/2023.2-leaderless#L26

-gmann'

> 
 > -- 
 > -- James Slagle
 > --
 > 


From tomas.bredar at gmail.com  Fri Mar 10 23:02:20 2023
From: tomas.bredar at gmail.com (=?UTF-8?B?VG9tw6HFoSBCcmVkw6Fy?=)
Date: Sat, 11 Mar 2023 00:02:20 +0100
Subject: [ovn] safely change bridge_mappings
In-Reply-To: <CAECr9X7zh54xUgK_TwhyEBNwfb0W0=yP+nV6=0P409dzgpPfhw@mail.gmail.com>
References: <CAMEY_LY-L_Yv9TSKtqUb4u9PPjz283sYFw3QmhUiyJRPmF8TUw@mail.gmail.com>
 <CAECr9X7zh54xUgK_TwhyEBNwfb0W0=yP+nV6=0P409dzgpPfhw@mail.gmail.com>
Message-ID: <CAMEY_LbNwFvcPRz+57y4KnC5RyA146djKsw4h_V8M8EJ-tLqbg@mail.gmail.com>

  Hi Rodolfo,

you helped a lot. I managed configure this, manually. Just for future
reference let me write down what I did.
- First I already had the interface br-ex2 configured and correctly
assigned physical interfaces in it
- I added the bridge mappings to the OVN DB:
ovs-vsctl set open .
external-ids:ovn-bridge-mappings=datacentre:br-ex,m-storage:br-ex2

- I added my nw m-storage to ml2_conf.ini:
[ml2_type_vlan]
network_vlan_ranges=datacentre:1:2700,m-storage:3700:4000

[ml2_type_flat]
flat_networks=datacentre,m-storage

- I restarted the neutron service
- since I already had the m-storage nw created in openstack, but as
provider "datacenter" and I already had instance ports using it (but it was
not working), I had to create a new network and subnet. Delete the original
ports and recreate and reassign it to the instances.

If I may, now I have two questions:
1. Shouldn't I also define this in ml2_conf.ini
[ovs]
bridge_mappings = datacentre:br-ex,m-storage:br-ex2

or is the setting of the vswitch register via ovs-vsctl persistent between
redeployments or reboots?

2. Which parameters in tripleo-heat-templates sets the above ml2_conf.ini?
I found these params:
NeutronFlatNetworks
NeutronNetworkVLANRanges
NeutronBridgeMappings

Thanks for your help

Tomas


ut 7. 3. 2023 o 10:13 Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
nap?sal(a):

> Hello Tom??:
>
> You need to follow the steps in [1]:
> * You need to create the new physical bridge "br-ex2".
> * Then you need to add to the bridge the physical interface.
> * In the compute node you need to add the bridge mappings to the OVN
> database Open vSwitch register
> * In the controller, you need to add the reference for this second
> provider network in "flat_networks" and "network_vlan_ranges" (in the
> ml2.ini file). Then you need to restart the Neutron server to read these
> new parameters (this step is not mentioned in this link).
>   $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini
>   [ml2_type_flat]
>   flat_networks = public,public2
>   [ml2_type_vlan]
>   network_vlan_ranges = public:11:200,public2:11:200
>
> Regards.
>
> [1]
> https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html
>
> On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r <tomas.bredar at gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a running production OpenStack deployment - version Wallaby
>> installed using TripleO. I'm using the default OVN/OVS networking.
>> For provider networks I have two bridges on the compute nodes br-ex and
>> br-ex2. Instances mainly use br-ex for provider networks, but there are
>> some instances which started using a provider network which should be
>> mapped to br-ex2, however I didn't specify "bridge_mappings" on
>> ml2_conf.ini, so the traffic wants to flow through the default
>> datacentre:br-ex.
>> My questions is, what services should I restart on the controller and
>> compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And
>> if this operation is safe and if the instances already using br-ex will
>> lose connectivity?
>>
>> Thanks for your help
>>
>> Tomas
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230311/7df9cffb/attachment.htm>

From fsmith at techwiki.info  Fri Mar 10 23:22:56 2023
From: fsmith at techwiki.info (Frank Smith)
Date: Fri, 10 Mar 2023 16:22:56 -0700
Subject: bare metal provisioner replacement for Fuel
Message-ID: <CAMw_q8-ZVAadCuPj0wXU6O0U0XPCyKTk8eGsu=XfodN1pn1BjQ@mail.gmail.com>

Hey all, quick question:

Previously I would use Fuel to deploy Openstack. It was a good tool, as it
built a deployment server where I would see new, un-provisioned servers
show up in the list, and I could choose the roles, including stacking
roles, onto the servers and then provision the whole cluster. There were no
complex yaml or JSON files to deal with and it would retry the deployment
on an error state. When done, I had a nice OpenStack cluster, a Ceph
backend, and life was good.

Now it seems Fuel has been retired. What is comparable to the abilities of
Fuel today? What are people using for such deployments now? I get
that devstack, microstack and packstack are great for all-in-one installs,
but Fuel was great for deploying a whole rack of servers from bare metal. I
am not asking for any free training here, but instead asking for info on a
similar product, if there is one, so I can learn that and be OpenStack
productive once more.

Thank you for any help,
--Francis Smith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/6018765a/attachment.htm>

From alsotoes at gmail.com  Sat Mar 11 00:24:47 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Fri, 10 Mar 2023 18:24:47 -0600
Subject: bare metal provisioner replacement for Fuel
In-Reply-To: <CAMw_q8-ZVAadCuPj0wXU6O0U0XPCyKTk8eGsu=XfodN1pn1BjQ@mail.gmail.com>
References: <CAMw_q8-ZVAadCuPj0wXU6O0U0XPCyKTk8eGsu=XfodN1pn1BjQ@mail.gmail.com>
Message-ID: <CA+eLJkY_zUm0WXoWEUsvKv9gFKnx6hMc=tndmO5ANBHGY5qqMA@mail.gmail.com>

Hey Frank,
Take a look at kolla-ansible [1] and then OSA [2]

1.- https://docs.openstack.org/kolla-ansible/latest/
2.- https://docs.openstack.org/openstack-ansible/latest/

IMHO, kolla-ansible will be the best one to use =)

Cheers!


On Fri, Mar 10, 2023 at 5:59?PM Frank Smith <fsmith at techwiki.info> wrote:

> Hey all, quick question:
>
> Previously I would use Fuel to deploy Openstack. It was a good tool, as it
> built a deployment server where I would see new, un-provisioned servers
> show up in the list, and I could choose the roles, including stacking
> roles, onto the servers and then provision the whole cluster. There were no
> complex yaml or JSON files to deal with and it would retry the deployment
> on an error state. When done, I had a nice OpenStack cluster, a Ceph
> backend, and life was good.
>
> Now it seems Fuel has been retired. What is comparable to the abilities of
> Fuel today? What are people using for such deployments now? I get
> that devstack, microstack and packstack are great for all-in-one installs,
> but Fuel was great for deploying a whole rack of servers from bare metal. I
> am not asking for any free training here, but instead asking for info on a
> similar product, if there is one, so I can learn that and be OpenStack
> productive once more.
>
> Thank you for any help,
> --Francis Smith
>


-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230310/d280b4f5/attachment-0001.htm>

From kamrankhadijadj at gmail.com  Sat Mar 11 21:25:58 2023
From: kamrankhadijadj at gmail.com (Khadija)
Date: Sun, 12 Mar 2023 02:25:58 +0500
Subject: [Outreachy] Setting up development envirenment
Message-ID: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>

Hi Sofia!
I have made myself familiar with the launchpad, storyboard and gerrit. I
have also experimented in the Sandbox projects.
I was trying to set up my development environment following
https://docs.openstack.org/cinder/latest/contributor/development.environment.html
but I was unable to run unit tests, on running command 'tox -e py3' I get
error saying AttributeError: module 'py' has no attribute 'io'
Kindly help me with this so that I can start working on my first issue :)
Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/96af2850/attachment.htm>

From jay at gr-oss.io  Sat Mar 11 21:41:23 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Sat, 11 Mar 2023 13:41:23 -0800
Subject: bare metal provisioner replacement for Fuel
In-Reply-To: <CAMw_q8-ZVAadCuPj0wXU6O0U0XPCyKTk8eGsu=XfodN1pn1BjQ@mail.gmail.com>
References: <CAMw_q8-ZVAadCuPj0wXU6O0U0XPCyKTk8eGsu=XfodN1pn1BjQ@mail.gmail.com>
Message-ID: <CA+sTGNdTqO-N0g42dsHrmP6g1ePm5w7r_48H0KjQDyEfhbVYWA@mail.gmail.com>

Echoing and enhancing what Alvaro said, kolla-ansible has great docs on
deploying Bifrost+Ironic:
https://docs.openstack.org/kolla-ansible/latest/reference/deployment-and-bootstrapping/bifrost.html
.

I don't know what you class as "complex YAML or JSON"; bifrost does require
an inventory file with BMC information and credentials at a minimum.

Bifrost is kinda a hidden gem in OpenStack; give it a shot. If there's a
specific reason it doesn't fit your use case, or you encounter a problem if
you choose to try it, please close the loop and let us know!

Thanks! Good luck with your project.

-
Jay Faulkner
Ironic PTL
OpenStack TC Member

On Fri, Mar 10, 2023 at 4:08?PM Frank Smith <fsmith at techwiki.info> wrote:

> Hey all, quick question:
>
> Previously I would use Fuel to deploy Openstack. It was a good tool, as it
> built a deployment server where I would see new, un-provisioned servers
> show up in the list, and I could choose the roles, including stacking
> roles, onto the servers and then provision the whole cluster. There were no
> complex yaml or JSON files to deal with and it would retry the deployment
> on an error state. When done, I had a nice OpenStack cluster, a Ceph
> backend, and life was good.
>
> Now it seems Fuel has been retired. What is comparable to the abilities of
> Fuel today? What are people using for such deployments now? I get
> that devstack, microstack and packstack are great for all-in-one installs,
> but Fuel was great for deploying a whole rack of servers from bare metal. I
> am not asking for any free training here, but instead asking for info on a
> similar product, if there is one, so I can learn that and be OpenStack
> productive once more.
>
> Thank you for any help,
> --Francis Smith
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230311/0ea9a9c9/attachment.htm>

From vincentlee676 at gmail.com  Sun Mar 12 04:33:30 2023
From: vincentlee676 at gmail.com (vincent lee)
Date: Sat, 11 Mar 2023 22:33:30 -0600
Subject: Member access calendar
Message-ID: <CAFsmLJwLNJ7HSE_tNO2ad4FPP9kp+yVWschttSyTJ0HfctXAHw@mail.gmail.com>

Hi all,
I am using kolla-ansible for openstack deployment in yoga version. Is it
possible to allow normal user (member role) to view the calendar graph
under reservation (lease)?

Best
Vincent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230311/1ffe780b/attachment.htm>

From mnaser at vexxhost.com  Sun Mar 12 10:09:53 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Sun, 12 Mar 2023 11:09:53 +0100
Subject: [neutron] detecting l3-agent readiness
Message-ID: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>

Hi folks,

I'm working on improving the stability of rollouts when using Kubernetes as
a control plane, specifically around the L3 agent, it seems that I have not
found a clear way to detect in the code path where the L3 agent has
finished it's initial sync..

Am I missing it somewhere or is the architecture built in a way that
doesn't really answer that question?

Thanks
Mohammed

-- 
Mohammed Naser
VEXXHOST, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/084213a3/attachment.htm>

From vincentlee676 at gmail.com  Sun Mar 12 12:18:08 2023
From: vincentlee676 at gmail.com (vincent lee)
Date: Sun, 12 Mar 2023 07:18:08 -0500
Subject: Allow user access calendar
Message-ID: <CAFsmLJwJD4Xo-_yLK+kvgBJ6jz+Wqo0trbgmtzj=iGhUsM3ptg@mail.gmail.com>

Hi all,
I am using kolla-ansible for openstack deployment in yoga version. Is it
possible to allow normal user (member role) to view the calendar graph
under reservation (lease)?

Best
Vincent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/edc600d3/attachment.htm>

From kamrankhadijadj at gmail.com  Sun Mar 12 13:30:34 2023
From: kamrankhadijadj at gmail.com (Khadija Kamran)
Date: Sun, 12 Mar 2023 18:30:34 +0500
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
 <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
Message-ID: <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>

On Sat, Mar 11, 2023 at 10:28:33PM +0000, Sofia Enriquez wrote:
> ? Hi,
> 
> I think you clone the cinder repository and try to run tox. Have you
> install tox?
> https://pypi.org/project/tox/
> 
> If you installed tox and this seeing a error: Please copy and paste the
> full error on
> https://paste.openstack.org/ and share the link here.
>
> Cheers,
> Sofia
>

Hey Sofia!
Yes I have installed tox.
Here is the link to the error:
https://paste.openstack.org/show/bQBHKrMiphG75uWHjkRj/
Kindly look into this.
Thank you for your time :)

> El El s?b, 11 mar 2023 a las 21:26, Khadija <kamrankhadijadj at gmail.com>
> escribi?:
> 
> > Hi Sofia!
> > I have made myself familiar with the launchpad, storyboard and gerrit. I
> > have also experimented in the Sandbox projects.
> > I was trying to set up my development environment following
> > https://docs.openstack.org/cinder/latest/contributor/development.environment.html
> > but I was unable to run unit tests, on running command 'tox -e py3' I get
> > error saying AttributeError: module 'py' has no attribute 'io'
> > Kindly help me with this so that I can start working on my first issue :)
> > Thank you!
> >
> -- 
> Sofia Enriquez


From thomas at goirand.fr  Sat Mar 11 08:55:45 2023
From: thomas at goirand.fr (Thomas Goirand)
Date: Sat, 11 Mar 2023 09:55:45 +0100
Subject: bare metal provisioner replacement for Fuel
Message-ID: <mailman.311.1678628190.934889.openstack-discuss@lists.openstack.org>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230311/eb76ccaa/attachment.htm>

From lsofia.enriquez at gmail.com  Sat Mar 11 22:28:33 2023
From: lsofia.enriquez at gmail.com (Sofia Enriquez)
Date: Sat, 11 Mar 2023 22:28:33 +0000
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
Message-ID: <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>

? Hi,

I think you clone the cinder repository and try to run tox. Have you
install tox?
https://pypi.org/project/tox/

If you installed tox and this seeing a error: Please copy and paste the
full error on
https://paste.openstack.org/ and share the link here.

Cheers,
Sofia

El El s?b, 11 mar 2023 a las 21:26, Khadija <kamrankhadijadj at gmail.com>
escribi?:

> Hi Sofia!
> I have made myself familiar with the launchpad, storyboard and gerrit. I
> have also experimented in the Sandbox projects.
> I was trying to set up my development environment following
> https://docs.openstack.org/cinder/latest/contributor/development.environment.html
> but I was unable to run unit tests, on running command 'tox -e py3' I get
> error saying AttributeError: module 'py' has no attribute 'io'
> Kindly help me with this so that I can start working on my first issue :)
> Thank you!
>
-- 
Sofia Enriquez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230311/ad532e69/attachment.htm>

From fungi at yuggoth.org  Sun Mar 12 13:52:07 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Sun, 12 Mar 2023 13:52:07 +0000
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
 <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
 <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>
Message-ID: <20230312135207.lah4kpi2xh3getbg@yuggoth.org>

On 2023-03-12 18:30:34 +0500 (+0500), Khadija Kamran wrote:
[...]
> Yes I have installed tox.
> Here is the link to the error:
> https://paste.openstack.org/show/bQBHKrMiphG75uWHjkRj/
> Kindly look into this.
[...]

Based on the slew of bug reports I found, this looks like an
incompatibility between the latest versions of tox and pytest.
Unfortunately there's a lot of finger-pointing, and the suggestions
from their respective upstreams are to either uninstall one of the
two packages, or pin one of them to an older version, or use an
isolated environment such as a venv instead of using `sudo pip
install ...` in the system Python's context.

You could try one of the following:

    sudo pip install tox py

    sudo pip install tox 'pytest<7.2'

    sudo pip uninstall pytest ; sudo pip install tox

Ultimately, though, we should be teaching people to install and run
tox and similar tools with their distribution package manager or in
a venv rather than running `sudo pip install ...` since PEP 668 is
starting to disallow that workflow on newer platforms anyway.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/b12482b8/attachment.sig>

From kamrankhadijadj at gmail.com  Sun Mar 12 14:10:44 2023
From: kamrankhadijadj at gmail.com (Khadija Kamran)
Date: Sun, 12 Mar 2023 19:10:44 +0500
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <20230312135207.lah4kpi2xh3getbg@yuggoth.org>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
 <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
 <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>
 <20230312135207.lah4kpi2xh3getbg@yuggoth.org>
Message-ID: <ZA3dZAYZiPGKe2pG@khadija-virtual-machine>

Hey Jeremy!
Thank you for the reply. I have tried all the above commands and it
still doesn't seem to work.
Also, I am using PyCharm with a venv using python3.9
Regards,
Khadija


From fungi at yuggoth.org  Sun Mar 12 14:20:18 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Sun, 12 Mar 2023 14:20:18 +0000
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <ZA3dZAYZiPGKe2pG@khadija-virtual-machine>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
 <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
 <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>
 <20230312135207.lah4kpi2xh3getbg@yuggoth.org>
 <ZA3dZAYZiPGKe2pG@khadija-virtual-machine>
Message-ID: <20230312142018.3ohzzex2f5pospp4@yuggoth.org>

On 2023-03-12 19:10:44 +0500 (+0500), Khadija Kamran wrote:
[...]
> I have tried all the above commands and it still doesn't seem to
> work.

By "doesn't work" do you mean you get the exact same error message,
or did you get different errors?

> Also, I am using PyCharm with a venv using python3.9

I don't know much about PyCharm (someone with more familiarity would
need to chime in on how it might influence this situation, if at
all), but if you're using `sudo pip install tox` as indicated in
your earlier paste then that's definitely not installing tox into a
venv of any kind, as evidenced by the traceback you pasted
referencing modules in /usr/lib/python3.10 instead of a venv path.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/17ff5040/attachment.sig>

From kamrankhadijadj at gmail.com  Sun Mar 12 16:08:37 2023
From: kamrankhadijadj at gmail.com (Khadija Kamran)
Date: Sun, 12 Mar 2023 21:08:37 +0500
Subject: [Outreachy] Setting up development envirenment
In-Reply-To: <20230312142018.3ohzzex2f5pospp4@yuggoth.org>
References: <CACcTnM4_nCDUxZK-k21Cuw11XM+6YYFS0-W_h2tPmCnn_0OTTA@mail.gmail.com>
 <CAGvpwmMkXA0ao3R_N8R7+bU5cqJKxT5WNaJtOuK+rRqWce71Gw@mail.gmail.com>
 <ZA3T+oYXvYUzD9AM@khadija-virtual-machine>
 <20230312135207.lah4kpi2xh3getbg@yuggoth.org>
 <ZA3dZAYZiPGKe2pG@khadija-virtual-machine>
 <20230312142018.3ohzzex2f5pospp4@yuggoth.org>
Message-ID: <ZA35BVp6SSgnhE27@khadija-virtual-machine>

On Sun, Mar 12, 2023 at 02:20:18PM +0000, Jeremy Stanley wrote:
> On 2023-03-12 19:10:44 +0500 (+0500), Khadija Kamran wrote:
> [...]
> > I have tried all the above commands and it still doesn't seem to
> > work.
>

Hi Jeremy,
The command runs successfully now. Yes, I was getting the exact same
errors. But it worked when I restarted the IDE. 
Thank you for your time.
Regards,
Khadija

> By "doesn't work" do you mean you get the exact same error message,
> or did you get different errors?
> 
> > Also, I am using PyCharm with a venv using python3.9
> 
> I don't know much about PyCharm (someone with more familiarity would
> need to chime in on how it might influence this situation, if at
> all), but if you're using `sudo pip install tox` as indicated in
> your earlier paste then that's definitely not installing tox into a
> venv of any kind, as evidenced by the traceback you pasted
> referencing modules in /usr/lib/python3.10 instead of a venv path.
> -- 
> Jeremy Stanley


From benjaminfaruna at gmail.com  Sun Mar 12 21:03:34 2023
From: benjaminfaruna at gmail.com (Benjamin Faruna)
Date: Sun, 12 Mar 2023 22:03:34 +0100
Subject: [outreachy][cinder] Cannot access code repository
Message-ID: <CAE1m3dHZdyR0C_QL8ANkCPoOWQBcTCQZ3K6Wt2W2T-H64Wesnw@mail.gmail.com>

Hello, good day,
I was selected for the outreachy internship program and I want to
contribute to cinders' codebase, but I am having trouble accessing it.
Whenever I open the code tab on launchpad I get 2 repositories that haven't
been updated in a while but the main page shows a lot of activities on the
codebase. Please I want to request help setting up and getting started
making contributions.
The images of what I see is attached to this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/c0800916/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinder code.png
Type: image/png
Size: 101436 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/c0800916/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinder2.png
Type: image/png
Size: 257583 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/c0800916/attachment-0003.png>

From fungi at yuggoth.org  Sun Mar 12 21:17:49 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Sun, 12 Mar 2023 21:17:49 +0000
Subject: [outreachy][cinder] Cannot access code repository
In-Reply-To: <CAE1m3dHZdyR0C_QL8ANkCPoOWQBcTCQZ3K6Wt2W2T-H64Wesnw@mail.gmail.com>
References: <CAE1m3dHZdyR0C_QL8ANkCPoOWQBcTCQZ3K6Wt2W2T-H64Wesnw@mail.gmail.com>
Message-ID: <20230312211748.vhxztfd2yroj5lxl@yuggoth.org>

On 2023-03-12 22:03:34 +0100 (+0100), Benjamin Faruna wrote:
[...]
> I want to contribute to cinders' codebase, but I am having trouble
> accessing it. Whenever I open the code tab on launchpad I get 2
> repositories that haven't been updated in a while but the main
> page shows a lot of activities on the codebase.
[...]

Like many OpenStack projects, Cinder uses Launchpad for defect
tracking. OpenStack projects (including Cinder) do not use Launchpad
for code hosting however, they use the OpenDev Collaboratory.

Please see the OpenStack Code & Documentation Contributor Guide for
details:

https://docs.openstack.org/contributors/code-and-documentation/

-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/28f4b1a5/attachment.sig>

From jamesleong123098 at gmail.com  Mon Mar 13 03:36:32 2023
From: jamesleong123098 at gmail.com (James Leong)
Date: Sun, 12 Mar 2023 22:36:32 -0500
Subject: [Horizon]: allow user to access calendar on horizon
Message-ID: <CA+_ZFmHVNN3QtNkM3ipDCDhbP71wCXekc3vE2p=Bjc5nDg0KNg@mail.gmail.com>

Hi all,
I am using kolla-ansible for OpenStack deployment in the yoga version. Is
it possible to allow the user (member role) to view the calendar graph
under the reservation tab (lease)? Currently, only the admin will be able
to view the calendar graph with all the reserved leases. However, a user
with other roles cannot load the calendar information. On the dashboard, I
saw it displayed " Unable to load reservations." When I look into the log
file, I get the below error message.

"blazarclient.exception.BlazarClientException: ERROR: Policy doesn't allow
blazar:oshosts:get to be performed."

Is there a way to allow the policy?

Thanks for your help.
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230312/2a3c993a/attachment.htm>

From rdhasman at redhat.com  Mon Mar 13 06:15:54 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Mon, 13 Mar 2023 11:45:54 +0530
Subject: [cinder] proposing Jon Bernard for cinder core
In-Reply-To: <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com>
References: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
 <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com>
Message-ID: <CAARK8KTCgVwJqkf6z6acODm2W83HiAmRiMji+VeVwTGdfQe=bA@mail.gmail.com>

It's been a week and having heard no objections, I have added Jon Bernard
to the cinder-core team.
Jon, you should see a +2 and +W option in your review now.
Welcome to the team!

On Mon, Mar 6, 2023 at 7:32?PM Jay Bryant <jungleboyj at gmail.com> wrote:

> No objections from me!  I think Jon would be a great addition!
>
> Thanks,
>
> Jay
>
> On 3/3/2023 5:04 AM, Rajat Dhasmana wrote:
> > Hello everyone,
> >
> > I would like to propose Jon Bernard as cinder core. Looking at the
> > review stats
> > for the past 60[1], 90[2], 120[3] days, he has been consistently in
> > the top 5
> > reviewers with a good +/- ratio and leaving helpful comments
> > indicating good
> > quality of reviews. He has been managing the stable branch releases
> > for the
> > past 2 cycles (Zed and 2023.1) and has helped in releasing security
> > issues as well.
> >
> > Jon has been part of the cinder and OpenStack community for a long
> > time and
> > has shown very active interest in upstream activities, be it release
> > liaison, review
> > contribution, attending cinder meetings and also involving in
> > outreachy activities.
> > He will be a very good addition to our team helping out with the
> > review bandwidth
> > and adding valuable input in our discussions.
> >
> > I will leave this thread open for a week and if there are no
> > objections, I will add
> > Jon Bernard to the cinder core team.
> >
> > [1]
> >
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
> > <
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
> >
> > [2]
> >
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
> > <
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
> >
> > [3]
> >
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120
> > <
> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120
> >
> >
> > Thanks
> > Rajat Dhasmana
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/9a047add/attachment.htm>

From rdhasman at redhat.com  Mon Mar 13 07:13:41 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Mon, 13 Mar 2023 12:43:41 +0530
Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) PTG Planning
In-Reply-To: <CAARK8KS1PGZQjLcM6LoF_R_Y8Ca_O6istQ6ROJdaTZ5mTnQxvQ@mail.gmail.com>
References: <CAARK8KS1PGZQjLcM6LoF_R_Y8Ca_O6istQ6ROJdaTZ5mTnQxvQ@mail.gmail.com>
Message-ID: <CAARK8KTw-vC6u8Dn6LTNjJCtt1D12fOnchnws3BnS8Y+uq_=hA@mail.gmail.com>

REMINDER!

We have PTG in less than 2 weeks and only 1 topic has been added (apart
from mine).
Please add topics as soon as possible since it takes time to arrange them
based on different parameters,
like, availability of author, context of the topics, allocating a driver
discussion day or not etc.

Thanks
Rajat Dhasmana

On Tue, Mar 7, 2023 at 4:30?PM Rajat Dhasmana <rdhasman at redhat.com> wrote:

> Hello All,
>
> The 2023.2 (Bobcat) virtual PTG is approaching and will be held between
> 27-31 March, 2023.
> I've created a planning etherpad[1] and a PTG etherpad[2] to gather topics
> for the PTG.
> Note that you only need to add topics in the planning etherpad and those
> will be arranged
> in the PTG etherpad later.
>
> Dates: Tuesday (28th March) to Friday (31st March) 2023
> Time: 1300 to 1700 UTC
> Etherpad: https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning
>
> Please add the topics as early as possible as finalizing and arranging
> topics would require some
> buffer time.
>
> [1] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning
> [2] https://etherpad.opendev.org/p/bobcat-ptg-cinder
>
> Thanks
> Rajat Dhasmana
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/2cb848e7/attachment.htm>

From tkajinam at redhat.com  Mon Mar 13 07:39:20 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Mon, 13 Mar 2023 16:39:20 +0900
Subject: [heat][PTG] 2023.2 (Bobcat) PTG Planning
Message-ID: <CAL_crJSnCPFaHSPOPkDX=ctJbF17E=u3hbEoUc1Dv_HFOXAOgg@mail.gmail.com>

Hello,


I've signed up for the upcoming virtual PTG so that we can have some slots
for Heat discussion.
In case you are interested in attending the sessions or have any topics you
want to discuss,
please put your name and the proposed topics in the etherpad.
 https://etherpad.opendev.org/p/march2023-ptg-heat-planning

It'd be nice if we can update the planning etherpad this week so that I'll
fix our slots and topics
early next week.

Thank you,
Takashi Kajinami
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/e4de11e5/attachment-0001.htm>

From rdhasman at redhat.com  Mon Mar 13 08:18:00 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Mon, 13 Mar 2023 13:48:00 +0530
Subject: [outreachy][cinder]
In-Reply-To: <CAMr4psz0JV6Z3iigP7fkgmRjJ62rz7V0YjoG-=j=QDrxo5Na5A@mail.gmail.com>
References: <CAMr4psz0JV6Z3iigP7fkgmRjJ62rz7V0YjoG-=j=QDrxo5Na5A@mail.gmail.com>
Message-ID: <CAARK8KTB05=n+F03mvWPbH6_dS8795kDFVuBoQqn=eZ79Z2AZQ@mail.gmail.com>

Hi Desire,

I'm the co-mentor for the "Extend automated validation of API" project.
Good to know you have interest in the project.
You can contact Sofia or me on IRC in the #openstack-cinder channel if you
have any doubts/queries regarding the
onboarding process. If you're not on IRC, then that would be my first
recommendation to configure IRC, connect to
OFTC network and join the #openstack-cinder channel, you can find Cinder
team members and other outreachy
applicants also there.
IRC nicks:
Rajat: whoami-rajat
Sofia: enriquetaso

Thanks
Rajat Dhasmana

On Thu, Mar 9, 2023 at 12:22?PM Desire Barine <desirebarine16 at gmail.com>
wrote:

> Hello  Sofia Enriquez,
>
> I'm Desire Barine, an Outreachy applicant. I would love to work on Extend
> automated validation of API reference request/response samples project. I
> would like to get started with the contribution.
> I am currently going over the instructions on contributions given. This is
> my first time contributing on an open source project but I'm really
> excited to get started.
> I'm proficient in python, bash and have worked on Rest api creation
> before. I would love to hear from you.
>
> Desire.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/333dcea7/attachment.htm>

From rdhasman at redhat.com  Mon Mar 13 08:26:13 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Mon, 13 Mar 2023 13:56:13 +0530
Subject: [cinder] Openstack-ansible and cinder GPFS backend
In-Reply-To: <CAPd_6AuZuokY7ds5V+f+yK3m6jVm6iq-HgxUW2KrNMt6ydGjvQ@mail.gmail.com>
References: <7e8cda5e51f7b02878ed92dc58920be4fff25f3e.camel@lunarc.lu.se>
 <CAPd_6AuZuokY7ds5V+f+yK3m6jVm6iq-HgxUW2KrNMt6ydGjvQ@mail.gmail.com>
Message-ID: <CAARK8KRZHLqdz6FA7t6xR-xSW-Xuq=NpTNr3Lk7C2i+v4OWwUg@mail.gmail.com>

Hi,


On Thu, Mar 9, 2023 at 5:09?PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> Hi, Nicolas,
>
> No, we don't really maintain documentation for each cinder driver
> that's available. So we assume using an override variable for
> adjustment of cinder configuration to match the desired state.
>
> So basically, you can use smth like that in your user_variables.yml:
>
> cinder_backends:
>   GPFSNFS:
>     volume_backend_name: GPFSNFS
>     volume_driver: cinder.volume.drivers.ibm.gpfs.GPFSNFSDriver
>
> cinder_cinder_conf_overrides:
>  DEFAULT:
>     gpfs_hosts: ip.add.re.ss
>     gpfs_storage_pool: cinder
>     gpfs_images_share_mode: copy_on_write
>     ....
>
> I have no idea though if gpfs_* variables can be defined or not inside
> the backend section, as they're referenced in DEFAULT in docs. But
> overrides will work regardless.
>
>
Backend related configuration should always go in the [BACKEND] section and
not
in the [DEFAULT] section so the documentation needs to be corrected for
GPFS.


> ??, 9 ???. 2023??. ? 11:41, Nicolas Melot <nicolas.melot at lunarc.lu.se>:
> >
> > Hi,
> >
> > I can find doc on using various backends for cinder
> > (
> https://docs.openstack.org/openstack-ansible-os_cinder/zed/configure-cinder.html#configuring-cinder-to-use-lvm
> )
> > and some documentation to configure a GPFS backend for cinder
> > (
> https://docs.openstack.org/cinder/zed/configuration/block-storage/drivers/ibm-gpfs-volume-driver.html
> )
> > but I cannot find any documentation to deploy cinder with GPFS backend
> > using openstack-ansible. Does this exist at all? Is there any
> > documentation?
> >
> > /Nicolas
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/4ee17d58/attachment.htm>

From christian.rohmann at inovex.de  Mon Mar 13 08:43:40 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Mon, 13 Mar 2023 09:43:40 +0100
Subject: [keystone] Best-practice for (admin) endpoint config
Message-ID: <aeeaac69-019b-d5db-2830-b72969ac9119@inovex.de>

Hello Openstack-discuss,


I am wondering what the current recommendations / best-practices are in 
regards to the endpoints of keystone (and other services).
There are three types of endpoints: public, internal and admin:

 ?* "public" certainly is for API access done by cloud users - so 
measures like rate limiting are likely in place.

 ?* "internal" is an alternative URL to be used by other services. 
Sounds reasonable to have an alternate path for the internal
 ? ?? communication of OpenStack services, like having a management network
 ???? Also reading 
https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-recommendations.html 
this
 ???? seems to be the recommendation.

 ?* "admin" - This is the one that gives me a little headache.


According to commits

 ?1) 
https://opendev.org/openstack/keystone/commit/4ec69218454d9f8be7150e2cee50c28765d50c94
 ?2) 
https://github.com/openstack/keystone/commit/ecf721a3c176daf67d00536c48e80e78bded1af6

there should actually be no admin endpoint for Keystone anymore. Or 
should there?


But looking at openstack-ansible doing the endpoint config 
(https://opendev.org/openstack/openstack-ansible-os_keystone/src/commit/a020ff87cde136a5c507b2cdc719d8c1dd85824d/tasks/main.yml#L246)
all tree types are still configured? Backwards compatibility for 
services still expecting this endpoint?

1) Ceilometer - https://bugs.launchpad.net/ceilometer/+bug/1981207
2) Heat - https://review.opendev.org/c/openstack/openstacksdk/+/777343


Apart from Keystone also other services have "admin" endpoints which can 
be configured and
placed as such into the service catalog. What is the reasoning behind that?


Thanks and with kind regards,


Christian


From garcetto at gmail.com  Mon Mar 13 09:30:55 2023
From: garcetto at gmail.com (garcetto)
Date: Mon, 13 Mar 2023 10:30:55 +0100
Subject: [manila] create snapshot from share not permitted
Message-ID: <CADA6EfuOvZHJ8GNcVN4d1KXvAxEExReA0xQTOyJmRhDMpgeC1A@mail.gmail.com>

good morning,
i am using manila and generic driver with dhss true, but cannot create
snapshot from shares, any help? where can i look at?
(cinder backend is a linux nfs server)

thank you

$ manila snapshot-create share-01 --name Snapshot1
ERROR: Snapshots cannot be created for share
'2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that
capability. (HTTP 422) (Request-ID:
req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/305816b3/attachment.htm>

From felix.huettner at mail.schwarz  Mon Mar 13 09:38:39 2023
From: felix.huettner at mail.schwarz (=?iso-8859-1?Q?Felix_H=FCttner?=)
Date: Mon, 13 Mar 2023 09:38:39 +0000
Subject: [neutron] Openstack Network Interconnection
In-Reply-To: <CALsEdxT5d_iZd9nRcnXiSy-61pcQCZ090Liux-5c=nByQ4gVoQ@mail.gmail.com>
References: <CALsEdxS0dGBqSki+wQ-Gfehmk9eeRypNoof9F7H_t+Lp-XBopg@mail.gmail.com>
 <DU0PR10MB5244316B6372451A8A7FB613EAB59@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <CALsEdxT5d_iZd9nRcnXiSy-61pcQCZ090Liux-5c=nByQ4gVoQ@mail.gmail.com>
Message-ID: <DU0PR10MB5244EFD3BEEF53633097CDD3EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>

Hi Roberto,

yea then i guess ovn-interconnect sounds like the more correct solution.
We are also aming for that for similar reasons.

Our idea for now to handle the creation of the Transit Logical Switches outside of neutron (as otherwise one neutron would rule over other neutrons).
As the transit switches are then created in the individual ovn deployments we thought about treating them as provider networks.

So the creation flow would be:
1. Create a transit switch on the ic-northbound
2. Wait for it to replicate to all ovn deployments
3. Create the provider networks on the neutron sides with a new `provider-network-type` and `provider-phyiscal-network` set to the transit switch name

So we would probably only be interested in the provider-network-type and not in the handling of the transit switches themselves in netron.

--
Felix Huettner

> Hi Felix,
>
> Thanks for your feedback.
>
> The ovn-bgp-agent is a very powerful application to interconnect multi-tenancy networks using BGP evpn type 5. This application integrates the br-ext with FRR and provides the interconnect using the BGP session. That would be one way to do it, but the problem is that bgpvpn service plugin is only integrated with Neutron. Imagine in the future that we need to integrate the tenant network between different cloud solutions (e.g using OpenStack, Kubernetes, LXD, etc.)... this could be possible if everyone uses OVN as a network backend and ovn-ic to interconnect the LRPs between AZs.
>
> Maybe I'm missing some point and there's no community interest in something like that. But back to the OpenStack/Neutron case, it might be interesting to continue the work on Neutron interconnect (or something like that), but maybe this time with the service plugin for ovn-ic.
>
> Regards,
> Roberto
>
> > Em qui., 9 de mar. de 2023 ?s 05:24, Felix H?ttner <felix.huettner at mail.schwarz> escreveu:
> > Hi Roberto,
> >
> > We will face a similar issue in the future and have also looked at ovn-interconnect (but not yet tested it).
> > There is also ovn-bgp-agent [1] which has an evpn mode that might be relevant.
> >
> > Whatever you find I would definitely be interested in your results
> >
> > [1] https://opendev.org/x/ovn-bgp-agent
> >
> > --
> > Felix Huettner
> >
> > > From: Roberto Bartzen Acosta <roberto.acosta at luizalabs.com>
> > > Sent: Wednesday, March 8, 2023 9:49 PM
> > > To: openstack-discuss at lists.openstack.org
> > > Cc: Tiago Pires <tiago.pires at luizalabs.com>
> > > Subject: [neutron] Openstack Network Interconnection
> > >
> > > Hey folks.
> > >
> > > Does anyone have ideas on how to interconnect different Openstack deployments?
> > > Consider that we have multiple Datacenters and need to interconnect tenant networks. How could this be done in the context of OpenStack (without using VPN) ?
> > >
> > > We have some ideas about the usage of OVN-IC (OVN Interconnect). It looks like a great solution to create a network layer between DCs/AZs with the help of the OVN driver. However, Neutron does not support the Transit Switches (OVN-IC design) that are required for this application.
> > >
> > > We've seen references to abandoned projects like [1] [2] [3].
> > >
> > > Does anyone use something similar in production or have an idea about how to do it? Imagine that we need to put workloads on two different AZs that run different Openstack installations, and we want to communicate with the local networks without using a FIP.
> > > I believe that the most coherent way to maintain databases consistent in each Openstack would be an integration with Neutron, but I haven't seen any movement on that.
> > >
> > > Regards,
> > > Roberto
> > >
> > > [1] https://www.youtube.com/watch?v=GizLmSiH1Q0
> > > [2] https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html
> > > [3] https://opendev.org/x/neutron-interconnection
Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.


From senrique at redhat.com  Mon Mar 13 09:45:07 2023
From: senrique at redhat.com (Sofia Enriquez)
Date: Mon, 13 Mar 2023 09:45:07 +0000
Subject: [outreachy][cinder] Cannot access code repository
In-Reply-To: <20230312211748.vhxztfd2yroj5lxl@yuggoth.org>
References: <CAE1m3dHZdyR0C_QL8ANkCPoOWQBcTCQZ3K6Wt2W2T-H64Wesnw@mail.gmail.com>
 <20230312211748.vhxztfd2yroj5lxl@yuggoth.org>
Message-ID: <CANtmtpEx7QrKZXTdtKXbE=bUoSa6Nh6L2TgT5_wDeyZ0_JB8jQ@mail.gmail.com>

As Jeremy mentioned, we use OpenDev [1]. You may be more familiar with
GitHub to host the source code. However, you can find an openstack/cinder
repository on GitHub; please don't push there since it's only a mirror. The
source code of Cinder is at https://opendev.org/openstack/cinder, and you
can `git clone` and `git review` from there.

Best,
Sofia

[1]
https://docs.openstack.org/contributors/code-and-documentation/using-gerrit.html

On Sun, Mar 12, 2023 at 9:20?PM Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2023-03-12 22:03:34 +0100 (+0100), Benjamin Faruna wrote:
> [...]
> > I want to contribute to cinders' codebase, but I am having trouble
> > accessing it. Whenever I open the code tab on launchpad I get 2
> > repositories that haven't been updated in a while but the main
> > page shows a lot of activities on the codebase.
> [...]
>
> Like many OpenStack projects, Cinder uses Launchpad for defect
> tracking. OpenStack projects (including Cinder) do not use Launchpad
> for code hosting however, they use the OpenDev Collaboratory.
>
> Please see the OpenStack Code & Documentation Contributor Guide for
> details:
>
> https://docs.openstack.org/contributors/code-and-documentation/
>
> --
> Jeremy Stanley
>


-- 

Sof?a Enriquez

she/her

Software Engineer

Red Hat PnT <https://www.redhat.com>

IRC: @enriquetaso
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/03bf33ec/attachment.htm>

From ralonsoh at redhat.com  Mon Mar 13 10:27:45 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Mon, 13 Mar 2023 11:27:45 +0100
Subject: [ovn] safely change bridge_mappings
In-Reply-To: <CAMEY_LbNwFvcPRz+57y4KnC5RyA146djKsw4h_V8M8EJ-tLqbg@mail.gmail.com>
References: <CAMEY_LY-L_Yv9TSKtqUb4u9PPjz283sYFw3QmhUiyJRPmF8TUw@mail.gmail.com>
 <CAECr9X7zh54xUgK_TwhyEBNwfb0W0=yP+nV6=0P409dzgpPfhw@mail.gmail.com>
 <CAMEY_LbNwFvcPRz+57y4KnC5RyA146djKsw4h_V8M8EJ-tLqbg@mail.gmail.com>
Message-ID: <CAECr9X40VZdW-x4=X+gvbXUphi43_sAJ_ZKJF00S692ffjBe-A@mail.gmail.com>

Hi Tom??:

The ml2 config option "bridge_mappings" is used by ML2/OVS. The mechanism
driver ML2/OVN read this config from the local OVS DB, reading the
"ovn-bridge-mappings"
option from the Open vSwitch register. In other words, this is not needed
in ML2/OVN.

NeutronFlatNetworks --> ml2_type_flat.flat_networks
NeutronNetworkVLANRanges --> ml2_type_vlan.network_vlan_ranges
NeutronBridgeMappings --> that will set "ovn_bridge_mappings" on the OVS
register during the installation (when using ML2/OVN)

Regards.


On Sat, Mar 11, 2023 at 12:02?AM Tom?? Bred?r <tomas.bredar at gmail.com>
wrote:

>   Hi Rodolfo,
>
> you helped a lot. I managed configure this, manually. Just for future
> reference let me write down what I did.
> - First I already had the interface br-ex2 configured and correctly
> assigned physical interfaces in it
> - I added the bridge mappings to the OVN DB:
> ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=datacentre:br-ex,m-storage:br-ex2
>
> - I added my nw m-storage to ml2_conf.ini:
> [ml2_type_vlan]
> network_vlan_ranges=datacentre:1:2700,m-storage:3700:4000
>
> [ml2_type_flat]
> flat_networks=datacentre,m-storage
>
> - I restarted the neutron service
> - since I already had the m-storage nw created in openstack, but as
> provider "datacenter" and I already had instance ports using it (but it was
> not working), I had to create a new network and subnet. Delete the original
> ports and recreate and reassign it to the instances.
>
> If I may, now I have two questions:
> 1. Shouldn't I also define this in ml2_conf.ini
> [ovs]
> bridge_mappings = datacentre:br-ex,m-storage:br-ex2
>
> or is the setting of the vswitch register via ovs-vsctl persistent between
> redeployments or reboots?
>
> 2. Which parameters in tripleo-heat-templates sets the above ml2_conf.ini?
> I found these params:
> NeutronFlatNetworks
> NeutronNetworkVLANRanges
> NeutronBridgeMappings
>
> Thanks for your help
>
> Tomas
>
>
> ut 7. 3. 2023 o 10:13 Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
> nap?sal(a):
>
>> Hello Tom??:
>>
>> You need to follow the steps in [1]:
>> * You need to create the new physical bridge "br-ex2".
>> * Then you need to add to the bridge the physical interface.
>> * In the compute node you need to add the bridge mappings to the OVN
>> database Open vSwitch register
>> * In the controller, you need to add the reference for this second
>> provider network in "flat_networks" and "network_vlan_ranges" (in the
>> ml2.ini file). Then you need to restart the Neutron server to read these
>> new parameters (this step is not mentioned in this link).
>>   $ cat ./etc/neutron/plugins/ml2/ml2_conf.ini
>>   [ml2_type_flat]
>>   flat_networks = public,public2
>>   [ml2_type_vlan]
>>   network_vlan_ranges = public:11:200,public2:11:200
>>
>> Regards.
>>
>> [1]
>> https://docs.openstack.org/networking-ovn/pike/admin/refarch/provider-networks.html
>>
>> On Tue, Mar 7, 2023 at 12:33?AM Tom?? Bred?r <tomas.bredar at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a running production OpenStack deployment - version Wallaby
>>> installed using TripleO. I'm using the default OVN/OVS networking.
>>> For provider networks I have two bridges on the compute nodes br-ex and
>>> br-ex2. Instances mainly use br-ex for provider networks, but there are
>>> some instances which started using a provider network which should be
>>> mapped to br-ex2, however I didn't specify "bridge_mappings" on
>>> ml2_conf.ini, so the traffic wants to flow through the default
>>> datacentre:br-ex.
>>> My questions is, what services should I restart on the controller and
>>> compute nodes after defining bridge_mappings in [ovs] in ml2_conf.ini. And
>>> if this operation is safe and if the instances already using br-ex will
>>> lose connectivity?
>>>
>>> Thanks for your help
>>>
>>> Tomas
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/fdd7a001/attachment.htm>

From senrique at redhat.com  Mon Mar 13 10:31:40 2023
From: senrique at redhat.com (Sofia Enriquez)
Date: Mon, 13 Mar 2023 10:31:40 +0000
Subject: [cinder] proposing Jon Bernard for cinder core
In-Reply-To: <CAARK8KTCgVwJqkf6z6acODm2W83HiAmRiMji+VeVwTGdfQe=bA@mail.gmail.com>
References: <CAARK8KTJRWqCdmqikhydXU=BTJng2HJnOX5softeFJWqJt+jfA@mail.gmail.com>
 <7cbe477b-b4a6-8d63-17fa-43bce14179aa@gmail.com>
 <CAARK8KTCgVwJqkf6z6acODm2W83HiAmRiMji+VeVwTGdfQe=bA@mail.gmail.com>
Message-ID: <CANtmtpFHmqN+eMJfVRm_vkXnMVbs+p9dfC=zB_bJgO79Chi3BQ@mail.gmail.com>

?

On Mon, Mar 13, 2023 at 6:20?AM Rajat Dhasmana <rdhasman at redhat.com> wrote:

> It's been a week and having heard no objections, I have added Jon Bernard
> to the cinder-core team.
> Jon, you should see a +2 and +W option in your review now.
> Welcome to the team!
>
> On Mon, Mar 6, 2023 at 7:32?PM Jay Bryant <jungleboyj at gmail.com> wrote:
>
>> No objections from me!  I think Jon would be a great addition!
>>
>> Thanks,
>>
>> Jay
>>
>> On 3/3/2023 5:04 AM, Rajat Dhasmana wrote:
>> > Hello everyone,
>> >
>> > I would like to propose Jon Bernard as cinder core. Looking at the
>> > review stats
>> > for the past 60[1], 90[2], 120[3] days, he has been consistently in
>> > the top 5
>> > reviewers with a good +/- ratio and leaving helpful comments
>> > indicating good
>> > quality of reviews. He has been managing the stable branch releases
>> > for the
>> > past 2 cycles (Zed and 2023.1) and has helped in releasing security
>> > issues as well.
>> >
>> > Jon has been part of the cinder and OpenStack community for a long
>> > time and
>> > has shown very active interest in upstream activities, be it release
>> > liaison, review
>> > contribution, attending cinder meetings and also involving in
>> > outreachy activities.
>> > He will be a very good addition to our team helping out with the
>> > review bandwidth
>> > and adding valuable input in our discussions.
>> >
>> > I will leave this thread open for a week and if there are no
>> > objections, I will add
>> > Jon Bernard to the cinder core team.
>> >
>> > [1]
>> >
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
>> > <
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=60
>> >
>> > [2]
>> >
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
>> > <
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=90
>> >
>> > [3]
>> >
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120
>> > <
>> https://www.stackalytics.io/report/contribution?module=cinder-group&project_type=openstack&days=120
>> >
>> >
>> > Thanks
>> > Rajat Dhasmana
>>
>>

-- 

Sof?a Enriquez

she/her

Software Engineer

Red Hat PnT <https://www.redhat.com>

IRC: @enriquetaso
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/53b908e9/attachment-0001.htm>

From ralonsoh at redhat.com  Mon Mar 13 11:07:32 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Mon, 13 Mar 2023 12:07:32 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
Message-ID: <CAECr9X62V0zVCeLRx0OVWo-wKxpJ8pRN471tUAV_jwv+WwqmCw@mail.gmail.com>

Hello Mohammed:

So far we don't have any mechanism to report the sync status of an agent. I
know that, for example, the DHCP agent reports an INFO message with the
statement 'Synchronizing state complete'. But other agents don't provide
this information or you need to manually observe the logs to detect that.

Because this could be an interesting information, I'll open a RFE bug to
try to bring this information to the existing agents.

Regards.

On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser <mnaser at vexxhost.com> wrote:

> Hi folks,
>
> I'm working on improving the stability of rollouts when using Kubernetes
> as a control plane, specifically around the L3 agent, it seems that I have
> not found a clear way to detect in the code path where the L3 agent has
> finished it's initial sync..
>
> Am I missing it somewhere or is the architecture built in a way that
> doesn't really answer that question?
>
> Thanks
> Mohammed
>
> --
> Mohammed Naser
> VEXXHOST, Inc.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/d8971779/attachment.htm>

From mnaser at vexxhost.com  Mon Mar 13 11:24:34 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Mon, 13 Mar 2023 12:24:34 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAECr9X62V0zVCeLRx0OVWo-wKxpJ8pRN471tUAV_jwv+WwqmCw@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <CAECr9X62V0zVCeLRx0OVWo-wKxpJ8pRN471tUAV_jwv+WwqmCw@mail.gmail.com>
Message-ID: <CAEs876hrSjVyjSBd6CwfaJvxTM5qv8crn23Sp2ksm3MjUGDsqw@mail.gmail.com>

It looks like this has sparked a cool ops discussion.

I?ve tried an attempt here, though I am not sure how I feel about it yet.

https://github.com/vexxhost/atmosphere/pull/359/files

I have not extensively tested it but would be good to hear from Neutron
team on this approach vs the approach from Felix.

On Mon, Mar 13, 2023 at 12:07 PM Rodolfo Alonso Hernandez <
ralonsoh at redhat.com> wrote:

> Hello Mohammed:
>
> So far we don't have any mechanism to report the sync status of an agent.
> I know that, for example, the DHCP agent reports an INFO message with the
> statement 'Synchronizing state complete'. But other agents don't provide
> this information or you need to manually observe the logs to detect that.
>
> Because this could be an interesting information, I'll open a RFE bug to
> try to bring this information to the existing agents.
>
> Regards.
>
> On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser <mnaser at vexxhost.com>
> wrote:
>
>> Hi folks,
>>
>> I'm working on improving the stability of rollouts when using Kubernetes
>> as a control plane, specifically around the L3 agent, it seems that I have
>> not found a clear way to detect in the code path where the L3 agent has
>> finished it's initial sync..
>>
>> Am I missing it somewhere or is the architecture built in a way that
>> doesn't really answer that question?
>>
>> Thanks
>> Mohammed
>>
>> --
>> Mohammed Naser
>> VEXXHOST, Inc.
>>
> --
Mohammed Naser
VEXXHOST, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/e351dc6f/attachment.htm>

From eblock at nde.ag  Mon Mar 13 11:46:41 2023
From: eblock at nde.ag (Eugen Block)
Date: Mon, 13 Mar 2023 11:46:41 +0000
Subject: [openstack][backup] Experience for instance backup
In-Reply-To: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>
Message-ID: <20230313114641.Horde.IuxsgTM4j7xJeTh3Q4J16nB@webmail.nde.ag>

Hi,

could you be more specific what is "not too fast" for you? I don't  
really have too much information from my side, but I can describe how  
we do it.
We use Ceph as back end for all services (nova, glance, cinder), and  
the most important machines are backed up by our backup server  
directly via rbd commands:
  - We create a snapshot of the running instances and export the  
snapshot to an external drive, this is a full backup.
In addition to that many of our machines have their home or working  
directories mounted from CephFS which we also backup as a tar ball to  
an external drive once a week (also full backup).
And then we have also a "real" backup solution in place (bacula) where  
we store incremental as well as full backups from individually  
configured resources for different intervals.
All these different approaches have different runtimes, of course.  
Just as an example, one 40 GB VM (rbd image) which has areound 24 GB  
in-use takes around 6 minutes for the full backup.
Although we also have the cinder-backup service up and running nobody  
is using it because the important volumes are attached to instances  
which we backup by our rbd solution, so there's no real need for that.

Regards,
Eugen

Zitat von Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:

> Hello guys.
> I am looking for instance backup solution. I am using Cinder backup with
> nfs backup but it looks not too fast. I am using a 10Gbps network. I would
> like to know experience for best practice for instance backup solutions on
> Openstack.
> Thank you.
> Nguyen Huu Khoi


From ralonsoh at redhat.com  Mon Mar 13 12:11:01 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Mon, 13 Mar 2023 13:11:01 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAEs876hrSjVyjSBd6CwfaJvxTM5qv8crn23Sp2ksm3MjUGDsqw@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <CAECr9X62V0zVCeLRx0OVWo-wKxpJ8pRN471tUAV_jwv+WwqmCw@mail.gmail.com>
 <CAEs876hrSjVyjSBd6CwfaJvxTM5qv8crn23Sp2ksm3MjUGDsqw@mail.gmail.com>
Message-ID: <CAECr9X5qFd27+Y2e+JaYT0pvO3cLC6D41Kr8An4tqpRvRBzmiA@mail.gmail.com>

Technically is correct but you can imagine what my answer is about enabling
the green threads backdoors. This functionality is for troubleshooting only
and should not be enabled in a production environment. Just as a temporary
workaround, we can add INFO messages in the "periodic_sync_routers_task"
method that you can easily parse reading the logs. This patch could be also
backported to stable versions.

Bug for reporting full sync state in Neutron agents:
https://bugs.launchpad.net/neutron/+bug/2011422

On Mon, Mar 13, 2023 at 12:24?PM Mohammed Naser <mnaser at vexxhost.com> wrote:

> It looks like this has sparked a cool ops discussion.
>
> I?ve tried an attempt here, though I am not sure how I feel about it yet.
>
> https://github.com/vexxhost/atmosphere/pull/359/files
>
> I have not extensively tested it but would be good to hear from Neutron
> team on this approach vs the approach from Felix.
>
> On Mon, Mar 13, 2023 at 12:07 PM Rodolfo Alonso Hernandez <
> ralonsoh at redhat.com> wrote:
>
>> Hello Mohammed:
>>
>> So far we don't have any mechanism to report the sync status of an agent.
>> I know that, for example, the DHCP agent reports an INFO message with the
>> statement 'Synchronizing state complete'. But other agents don't provide
>> this information or you need to manually observe the logs to detect that.
>>
>> Because this could be an interesting information, I'll open a RFE bug to
>> try to bring this information to the existing agents.
>>
>> Regards.
>>
>> On Sun, Mar 12, 2023 at 11:11?AM Mohammed Naser <mnaser at vexxhost.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> I'm working on improving the stability of rollouts when using Kubernetes
>>> as a control plane, specifically around the L3 agent, it seems that I have
>>> not found a clear way to detect in the code path where the L3 agent has
>>> finished it's initial sync..
>>>
>>> Am I missing it somewhere or is the architecture built in a way that
>>> doesn't really answer that question?
>>>
>>> Thanks
>>> Mohammed
>>>
>>> --
>>> Mohammed Naser
>>> VEXXHOST, Inc.
>>>
>> --
> Mohammed Naser
> VEXXHOST, Inc.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/922ac647/attachment.htm>

From mister.mackarow at yandex.ru  Mon Mar 13 07:55:46 2023
From: mister.mackarow at yandex.ru (=?utf-8?B?0JzQkNCa0JDQoNCe0JIg0JzQkNCa0KE=?=)
Date: Mon, 13 Mar 2023 10:55:46 +0300
Subject: Magnum in yoga release on Ubuntu 22.04
Message-ID: <30941678692580@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/69ffbbc4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/69ffbbc4/attachment.png>

From adivya1.singh at gmail.com  Mon Mar 13 14:46:03 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Mon, 13 Mar 2023 20:16:03 +0530
Subject: *Migration)Red Hat Open Stack Migration Lift and Shift
Message-ID: <CA+ykd61KgrJ0oA4440Cm_2Y5Zsj=oaU1n20fCSeNp4Q1vvb7Qg@mail.gmail.com>

Hi Team,

I am making a plan for Red hat OpenStack migration(Lift and Shift), The
Director is in VM sourced in VMware, which we are migrated in other DC, But
the IP and VLAN will change

Will it be advisable to change the Provisioning IP and other IP , according
to the New VLAN designed in the new DC, Configured in the template , Do a
introspection of the new design and Configured the Red hat OpenStack,

Any addition on this or guidance would be highly usefull

Regards
Adivya Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/c411125e/attachment-0001.htm>

From juliaashleykreger at gmail.com  Mon Mar 13 15:05:23 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Mon, 13 Mar 2023 08:05:23 -0700
Subject: *Migration)Red Hat Open Stack Migration Lift and Shift
In-Reply-To: <CA+ykd61KgrJ0oA4440Cm_2Y5Zsj=oaU1n20fCSeNp4Q1vvb7Qg@mail.gmail.com>
References: <CA+ykd61KgrJ0oA4440Cm_2Y5Zsj=oaU1n20fCSeNp4Q1vvb7Qg@mail.gmail.com>
Message-ID: <CAF7gwdi-L9SVHbiEH7sfyBBjp30azrUqz-RB5e5KQhJD5kH_eg@mail.gmail.com>

Greetings,

I suspect you might want to reach out to RH support. I think it is entirely
going to depend on a union of your end user needs/requirements, as well as
your planning needs/requirements.

-Julia

On Mon, Mar 13, 2023 at 7:49?AM Adivya Singh <adivya1.singh at gmail.com>
wrote:

> Hi Team,
>
> I am making a plan for Red hat OpenStack migration(Lift and Shift), The
> Director is in VM sourced in VMware, which we are migrated in other DC, But
> the IP and VLAN will change
>
> Will it be advisable to change the Provisioning IP and other IP ,
> according to the New VLAN designed in the new DC, Configured in the
> template , Do a introspection of the new design and Configured the Red hat
> OpenStack,
>
> Any addition on this or guidance would be highly usefull
>
> Regards
> Adivya Singh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/2871efb4/attachment.htm>

From felix.huettner at mail.schwarz  Mon Mar 13 15:35:43 2023
From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=)
Date: Mon, 13 Mar 2023 15:35:43 +0000
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
Message-ID: <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>

Hi Mohammed,

> Subject: [neutron] detecting l3-agent readiness
>
> Hi folks,
>
> I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync..
>

We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-/blob/devel/files/startup_wait_for_ns.py
Basically we are checking against the neutron api what routers should be on the node and then validate that all keepalived processes are up and running.

> Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question?
>

Adding a option in the neutron api would be a lot nicer. But i guess that also counts for l2 and dhcp agents.


> Thanks
> Mohammed
>
>
> --
> Mohammed Naser
> VEXXHOST, Inc.

--
Felix Huettner
Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.

From christian.rohmann at inovex.de  Mon Mar 13 16:11:17 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Mon, 13 Mar 2023 17:11:17 +0100
Subject: [openstack][backup] Experience for instance backup
In-Reply-To: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>
References: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>
Message-ID: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de>

Hey there,

On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote:
>
> I am looking for instance backup solution. I am using Cinder backup 
> with nfs backup but it looks not too fast. I am using a 10Gbps 
> network. I would like to know experience for best practice for 
> instance backup solutions?on Openstack.


On 13/03/2023 12:46, Eugen Block wrote:
> We use Ceph as back end for all services (nova, glance, cinder), and 
> the most important machines are backed up by our backup server 
> directly via rbd commands: 

There is RBD and "the other" drivers. While RBD uses the native export / 
import feature of Ceph, all other drivers (file, NFS, object storages 
like S3) are based on the abstract chunked driver
(https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py).
This driver reads the volume / image and treats it as chunks before 
making use of a concrete driver (e.g. NFS or S3) to send those chunks 
off somewhere to be stored. Restore works just the opposite way. The 
performance of the chunked driver based back-ends is not (yet) 
comparable to what RBD can achieve due to various reasons.

But again, while "RBD" uses Ceph's mechanisms internally all other 
"targets" for backup storage work differently.
We ourselves were looking into using and S3-compatible storage and thus 
I started a dicsussion about the state of those other drivers at 
https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html

This then led to a discussion at the Cinder PTG 
https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many 
observations.

There also are changes in the works, like restore into sparse volumes 
(https://review.opendev.org/c/openstack/cinder/+/852654) when going via 
the chunked driver.
But also features like "encryption" 
(https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being 
discussed.


Regards


Christian


From juliaashleykreger at gmail.com  Mon Mar 13 18:43:35 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Mon, 13 Mar 2023 11:43:35 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
Message-ID: <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>

Greetings!

Time slot wise, I think that works for me.

Time wise, in regards to the amount, I'm wondering if we need more.  By my
count, we have 11 new topics, 4 topics to revisit, in about six hours
of non-operator dedicated time, not accounting for breaks for coffee/tea.
Granted, some topics might be super quick at the 10 minute quick poll of
the room, whereas other topics I feel like will require extensive
discussion. If I were to size them, I think we would have 6 large-ish
topics along with 3-4 medium sized topics.

-Julia


On Thu, Mar 9, 2023 at 3:19?PM Jay Faulkner <jay at gr-oss.io> wrote:

> Hey all,
>
> The vPTG will be upon us soon, the week of March 27.
>
> I booked the following times on behalf of Ironic + BM SIG Operator hour,
> in accordance with what times worked in Antelope. It's my hope that since
> we've had little contributor turnover, these times continue to work. I'm
> completely open to having things moved around if it's more convenient to
> participants.
>
> I've booked the following times, all in Folsom:
> - Tuesday 1400 UTC - 1700 UTC
> - Wednesday 1300 UTC Operator hour: baremetal SIG
> - Wednesday 1400 UTC - 1600 UTC
> - Wednesday 2200 - 2300 UTC
>
>
> I propose that after the Ironic meeting on March 20, we shortly sync up in
> the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg)
> to pick topics and assign time.
>
>
> Again, this is all meant to be a suggestion, I'm happy to move things
> around but didn't want us to miss out on getting things booked.
>
>
> -
> Jay Faulkner
> Ironic PTL
> TC Member
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/708519c0/attachment-0001.htm>

From tobias.urdin at binero.com  Mon Mar 13 18:46:38 2023
From: tobias.urdin at binero.com (Tobias Urdin)
Date: Mon, 13 Mar 2023 18:46:38 +0000
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
Message-ID: <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com>

Hello,

Interesting thread!

We are also interested in this for use when we are upgrading services, we are currently
doing our best to parse the logs but that?s only for OVS agent and I was going to look
into this.

I can imagine having something like this for containers would be crucial as well.

Best regards
Tobias

> On 12 Mar 2023, at 11:09, Mohammed Naser <mnaser at vexxhost.com> wrote:
> 
> Hi folks,
> 
> I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync..
> 
> Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question?
> 
> Thanks
> Mohammed
> 
> -- 
> Mohammed Naser
> VEXXHOST, Inc.


From tobias.urdin at binero.com  Mon Mar 13 18:48:50 2023
From: tobias.urdin at binero.com (Tobias Urdin)
Date: Mon, 13 Mar 2023 18:48:50 +0000
Subject: [openstack][backup] Experience for instance backup
In-Reply-To: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de>
References: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>
 <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de>
Message-ID: <533DE7EF-DEE7-4D36-9580-FDB757DC6658@binero.com>

Hello,

Indeed and interesting topic for the PTG, we are using the Swift backup driver and
also had issues with backups timing out due to Keystone tokens of which we have
done some work to mitigate.

Best regards
Tobias

> On 13 Mar 2023, at 17:11, Christian Rohmann <christian.rohmann at inovex.de> wrote:
> 
> Hey there,
> 
> On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote:
>> 
>> I am looking for instance backup solution. I am using Cinder backup with nfs backup but it looks not too fast. I am using a 10Gbps network. I would like to know experience for best practice for instance backup solutions on Openstack.
> 
> 
> On 13/03/2023 12:46, Eugen Block wrote:
>> We use Ceph as back end for all services (nova, glance, cinder), and the most important machines are backed up by our backup server directly via rbd commands: 
> 
> There is RBD and "the other" drivers. While RBD uses the native export / import feature of Ceph, all other drivers (file, NFS, object storages like S3) are based on the abstract chunked driver
> (https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py).
> This driver reads the volume / image and treats it as chunks before making use of a concrete driver (e.g. NFS or S3) to send those chunks off somewhere to be stored. Restore works just the opposite way. The performance of the chunked driver based back-ends is not (yet) comparable to what RBD can achieve due to various reasons.
> 
> But again, while "RBD" uses Ceph's mechanisms internally all other "targets" for backup storage work differently.
> We ourselves were looking into using and S3-compatible storage and thus I started a dicsussion about the state of those other drivers at https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html
> 
> This then led to a discussion at the Cinder PTG https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many observations.
> 
> There also are changes in the works, like restore into sparse volumes (https://review.opendev.org/c/openstack/cinder/+/852654) when going via the chunked driver.
> But also features like "encryption" (https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being discussed.
> 
> 
> 
> Regards
> 
> 
> Christian
> 
> 


From amy at demarco.com  Mon Mar 13 20:57:30 2023
From: amy at demarco.com (Amy Marrich)
Date: Mon, 13 Mar 2023 15:57:30 -0500
Subject: [Diversity] Diversity and Inclusion WG Meeting reminder
Message-ID: <CAFs83Qpcx9p8fW=cE=2Dj_kTSGz6YtbvuPEi3zo0ZdFCmi_j8w@mail.gmail.com>

This is a reminder that the Diversity and Inclusion WG will be meeting
tomorrow at 14:00 UTC in the #openinfra-diversity channel on OFTC. We
hope members of all OpenInfra projects join us as we look at the Code
of Conduct, and continue working on planning for the OpenInfra Summit
as well as Foundation-wide diversity surveys.

Thanks,

Amy (spotz)
0 - https://etherpad.opendev.org/p/diversity-wg-agenda


From alsotoes at gmail.com  Mon Mar 13 22:45:04 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Mon, 13 Mar 2023 16:45:04 -0600
Subject: [manila] create snapshot from share not permitted
In-Reply-To: <CADA6EfuOvZHJ8GNcVN4d1KXvAxEExReA0xQTOyJmRhDMpgeC1A@mail.gmail.com>
References: <CADA6EfuOvZHJ8GNcVN4d1KXvAxEExReA0xQTOyJmRhDMpgeC1A@mail.gmail.com>
Message-ID: <CA+eLJkYCcwxNbe4k5DWKPbru6g3A3_Ba57iG+uyirz2oAPepWA@mail.gmail.com>

If you are inside this features support matrix

https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping

Examine your configuration as well:


   -

   snapshot_support indicates whether snapshots are supported for shares
   created on the pool/backend. When administrators do not set this capability
   as an extra-spec in a share type, the scheduler can place new shares of
   that type in pools without regard for whether snapshots are supported, and
   those shares will not support snapshots.

https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html

Cheers!

On Mon, Mar 13, 2023 at 3:35?AM garcetto <garcetto at gmail.com> wrote:

> good morning,
> i am using manila and generic driver with dhss true, but cannot create
> snapshot from shares, any help? where can i look at?
> (cinder backend is a linux nfs server)
>
> thank you
>
> $ manila snapshot-create share-01 --name Snapshot1
> ERROR: Snapshots cannot be created for share
> '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that
> capability. (HTTP 422) (Request-ID:
> req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba)
>
>

-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/101f9e18/attachment.htm>

From alsotoes at gmail.com  Mon Mar 13 22:46:25 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Mon, 13 Mar 2023 16:46:25 -0600
Subject: [manila] support for encryption
In-Reply-To: <CADA6EfuaQLQYeD1F1Q3kMaTdra8H03Z7wfaLXQOc9AEim9=nLg@mail.gmail.com>
References: <CADA6EfsPq2QLbGVoVVVEF7Gq_R83YcVwZwG=0EOaencc8c-P+Q@mail.gmail.com>
 <CA+eLJkanUKGMpjxWKi_NssWsOx5zH42UaQUPGqcnxn+-gDJC6Q@mail.gmail.com>
 <CADA6EfuaQLQYeD1F1Q3kMaTdra8H03Z7wfaLXQOc9AEim9=nLg@mail.gmail.com>
Message-ID: <CA+eLJkYVa4jHVaxiBedeOKJfNAYcpk7PJ0SBMvTrWqAotr+0bw@mail.gmail.com>

I don't think so, and it also doesn't sound like a feature that Manila
should implement; try to do that in your backend storage.

Cheers!

On Sat, Mar 11, 2023 at 7:10?AM garcetto <garcetto at gmail.com> wrote:

> good afternoon,
>  data encryption.
> thank you
>
> Il Ven 10 Mar 2023, 21:48 Alvaro Soto <alsotoes at gmail.com> ha scritto:
>
>> You mean data or token encryption?
>>
>> Cheers!
>>
>> On Fri, Mar 10, 2023 at 9:08?AM garcetto <garcetto at gmail.com> wrote:
>>
>>> good afternoon,
>>>  does manila support encryption in some sort ?
>>>
>>> thank you
>>>
>>
>>
>> --
>>
>> Alvaro Soto
>>
>> *Note: My work hours may not be your work hours. Please do not feel the
>> need to respond during a time that is not convenient for you.*
>> ----------------------------------------------------------
>> Great people talk about ideas,
>> ordinary people talk about things,
>> small people talk... about other people.
>>
>

-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230313/49feb769/attachment.htm>

From rdhasman at redhat.com  Tue Mar 14 05:18:02 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Tue, 14 Mar 2023 10:48:02 +0530
Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope
In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
Message-ID: <CAARK8KQBUwnWUMZHKQkp1rFFwBjcF9kTPox19SYyN1gaaLXzGQ@mail.gmail.com>

Hi,

I can provide updates for Cinder.

Thanks
Rajat Dhasmana

On Thu, Mar 9, 2023 at 11:43?PM Kristin Barrientos <kristin at openinfra.dev>
wrote:

> Hi everyone,
>
> As we get closer to the OpenStack release, I wanted to reach out to see if
> any PTL?s were interested in providing their Antelope cycle highlights in
> an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we
> would get 4-6 projects represented. Previous examples of OpenStack release
> episodes can be found here[2]
> <http://www.youtube.com/watch?v=hwPfjvshxOM>and here[3]
> <https://www.youtube.com/watch?v=MSbB3L9_MeY>.
>
> Please let me know if you?re interested and I can provide next steps. If
> you would like to provide a project update but that time doesn?t work for
> you, please share a recording with me and I can get it added to the project
> navigator.
>
> Thanks,
>
> Kristin Barrientos
> Marketing Coordinator
> OpenInfra Foundation
>
> [1] https://openinfra.dev/live/
>
> [2] https://www.youtube.com/watch?v=hwPfjvshxOM
>
> [3] https://www.youtube.com/watch?v=MSbB3L9_MeY
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/af6bfc2b/attachment-0001.htm>

From oliver.weinmann at me.com  Tue Mar 14 05:43:04 2023
From: oliver.weinmann at me.com (Oliver Weinmann)
Date: Tue, 14 Mar 2023 06:43:04 +0100
Subject: Magnum in yoga release on Ubuntu 22.04
In-Reply-To: <30941678692580@mail.yandex.ru>
References: <30941678692580@mail.yandex.ru>
Message-ID: <183F8F44-25BD-4E6F-A5D6-0BC93A8B6EAF@me.com>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/05066cf8/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: favicon.ico
Type: image/png
Size: 338 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/05066cf8/attachment.png>

From zakhar at gmail.com  Tue Mar 14 06:34:48 2023
From: zakhar at gmail.com (Zakhar Kirpichenko)
Date: Tue, 14 Mar 2023 08:34:48 +0200
Subject: Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC
 unusually slow
Message-ID: <CAEw-OTVHCanmWVBnnxy=SCCTemVO7gWxwvD4LySYkzjqw1x2UA@mail.gmail.com>

Hi!

We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra
nodes with a RabbitMQ cluster. I updated Neutron components to version
18.6.0, which recently became available in the cloud repository (
http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby
main). The exact package versions updated are as follows:

Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic),
openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2,
0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)

Installed Neutron packages:

ii  neutron-common                        2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - common
ii  neutron-dhcp-agent                    2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - DHCP agent
 Firewall-as-a-Service driver for OpenStack Neutron
ii  neutron-l3-agent                      2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - l3 agent
ii  neutron-linuxbridge-agent             2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - linuxbridge agent
ii  neutron-metadata-agent                2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - metadata agent
ii  neutron-plugin-ml2                    2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - ML2 plugin
ii  neutron-server                        2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - server
ii  python3-neutron                       2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - Python library
ii  python3-neutron-lib                   2.10.1-0ubuntu1~cloud0
                    all          Neutron shared routines and utilities -
Python 3.x
ii  python3-neutronclient                 1:7.2.1-0ubuntu1~cloud0
                   all          client API library for Neutron - Python 3.x

Normally this would be an easy update, but this time neutron-dhcp-agent
doesn't work properly:

2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent
[req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state
complete
2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method
dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the
server is not down, consider increasing the rpc_response_timeout option as
Neutron server(s) may be overloaded and unable to respond quickly enough.:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for
dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it
to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying
server of ports ready. Retrying...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-]
No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method
dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the
server is not down, consider increasing the rpc_response_timeout option as
Neutron server(s) may be overloaded and unable to respond quickly enough.:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for
dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it
to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-]
No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying
server of ports ready. Retrying...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent
[req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for
ports ... (A successful configuration here).

While neutron-dhcp-agent is waiting, neutron-server log gets filled up with:

neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO
neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
- -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
...
neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO
neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
- -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76

This repeats for each port of each network neutron-dhcp-agent needs to
configure.

Each subsequent configuration for each network takes about 1-2
minutes, depending on the network size. With earlier Neutron versions the
whole process of configuring all networks would finish in under a minute,
i.e. DHCP configuration per port (and network) is several orders of
magnitude slower than it should be. Once neutron-dhcp-agent finishes
synchronization, it seems to work without issues although there aren't that
many changes in our cloud to tell whether it's fast or slow, individual
port updates seem to happen quickly.

All other services are working well, RabbitMQ cluster is working well,
infra nodes are not overloaded and there are no apparent issues other than
this one with Neutron, thus I am inclined to think that the issue is
specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.

I would appreciate any advice!

Best regards,
Zakhar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/26be3614/attachment-0001.htm>

From geguileo at redhat.com  Tue Mar 14 08:46:01 2023
From: geguileo at redhat.com (Gorka Eguileor)
Date: Tue, 14 Mar 2023 09:46:01 +0100
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <20230313163251.xpnzyvzb65b6zaal@localhost>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
 <20230309095514.l3i67tys2ujaq6dp@localhost>
 <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
 <20230313163251.xpnzyvzb65b6zaal@localhost>
Message-ID: <20230314084601.t2ez24gcljnu5plq@localhost>

[Sending the email again as it seems it didn't reach the ML]


On 13/03, Gorka Eguileor wrote:
> On 11/03, Rishat Azizov wrote:
> > Hi, Gorka,
> >
> > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> >


Hi,

There are multiple things going on here:

1. There is a bug in os-brick, because the disconnect_volume should not
   fail, since it is being called with force=True and
   ignore_errors=True.

   The issues is that this call [1] is not wrapped in the
   ExceptionChainer context manager, and it should not even be a flush
   call, it should be a call to "multipathd remove map $map" instead.

2. The way multipath code is written [2][3], the error we see about
   "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
   different things: it is not a multipath or an error happened.

   So we don't really know what happened without enabling more verbose
   multipathd log levels.

3. The "multipath -f" call should not be failing in the first place,
   because the failure is happening on disconnecting the source volume,
   which has no data buffered to be written and therefore no reason to
   fail the flush (unless it's using a friendly name).

   I don't know if it could be happening that the first flush fails with
   a timeout (maybe because there is an extend operation happening), but
   multipathd keeps trying to flush it in the background and when it
   succeeds it removes the multipath device, which makes following calls
   fail.

   If that's the case we would need to change the retry from automatic
   [4] to manual and check in-between to see if the device has been
   removed in-between calls.

The first issue is definitely a bug, the 2nd one is something that could
be changed in the deployment to try to get additional information on the
failure, and the 3rd one could be a bug.

I'll see if I can find someone who wants to work on the 1st and 3rd
points.

Cheers,
Gorka.

[1]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
[2]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
[3]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
[4]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384


> >
> > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor <geguileo at redhat.com>:
> >
> > > On 06/03, Rishat Azizov wrote:
> > > > Hi,
> > > >
> > > > It works with smaller volumes.
> > > >
> > > > multipath.conf attached to thist email.
> > > >
> > > > Cinder version - 18.2.0 Wallaby
> > >


From kamil.madac at gmail.com  Tue Mar 14 09:46:07 2023
From: kamil.madac at gmail.com (Kamil Madac)
Date: Tue, 14 Mar 2023 10:46:07 +0100
Subject: [neutron]
Message-ID: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>

Hi All,

I'm in the process of planning a small public cloud based on OpenStack. I
have quite experience with kolla-ansible deployments which use OVS
networking and I have no issues with that. It works stable for my use cases
(Vlan provider networks, DVR, tenant networks, floating IPs).

For that new deployment I'm looking at OVN deployment which from what I
read should be more performant (faster build of instances) and with ability
to cover more networking features in OVN instead of needing external
software like iptables/dnsmasq.

Does anyone use OVN in production and what is your experience (pros/cons)?
Is OVN mature enough to replace OVS in the production deployment (are there
some basic features from OVS missing)?

Thanks in advance for sharing the experience.

-- 
Kamil Madac <https://kmadac.github.io/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/a9b8c6dc/attachment.htm>

From ces.eduardo98 at gmail.com  Tue Mar 14 13:15:29 2023
From: ces.eduardo98 at gmail.com (Carlos Silva)
Date: Tue, 14 Mar 2023 10:15:29 -0300
Subject: [ptls][Antelope] OpenInfra Live: OpenStack Antelope
In-Reply-To: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
References: <0F4ADE89-D51B-46BB-B1B6-4E6320DA1B9B@openinfra.dev>
Message-ID: <CAE51gQKSt+cJFCjUsWatfr7Ep3cQ1h9m6+hA1XsLj6rgKQSK0g@mail.gmail.com>

I would like to share the updates for Manila! :)

Em qui., 9 de mar. de 2023 ?s 15:12, Kristin Barrientos <
kristin at openinfra.dev> escreveu:

> Hi everyone,
>
> As we get closer to the OpenStack release, I wanted to reach out to see if
> any PTL?s were interested in providing their Antelope cycle highlights in
> an OpenInfra Live[1] episode on Thursday, March 23 at 1500 UTC. Ideally, we
> would get 4-6 projects represented. Previous examples of OpenStack release
> episodes can be found here[2]
> <http://www.youtube.com/watch?v=hwPfjvshxOM>and here[3]
> <https://www.youtube.com/watch?v=MSbB3L9_MeY>.
>
> Please let me know if you?re interested and I can provide next steps. If
> you would like to provide a project update but that time doesn?t work for
> you, please share a recording with me and I can get it added to the project
> navigator.
>
> Thanks,
>
> Kristin Barrientos
> Marketing Coordinator
> OpenInfra Foundation
>
> [1] https://openinfra.dev/live/
>
> [2] https://www.youtube.com/watch?v=hwPfjvshxOM
>
> [3] https://www.youtube.com/watch?v=MSbB3L9_MeY
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/2a813acf/attachment.htm>

From geguileo at redhat.com  Mon Mar 13 16:32:51 2023
From: geguileo at redhat.com (Gorka Eguileor)
Date: Mon, 13 Mar 2023 17:32:51 +0100
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
 <20230309095514.l3i67tys2ujaq6dp@localhost>
 <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
Message-ID: <20230313163251.xpnzyvzb65b6zaal@localhost>

On 11/03, Rishat Azizov wrote:
> Hi, Gorka,
>
> Thanks. I see multiple "multipath -f" calls. Logs in attachments.
>

Hi,

There are multiple things going on here:

1. There is a bug in os-brick, because the disconnect_volume should not
   fail, since it is being called with force=True and
   ignore_errors=True.

   The issues is that this call [1] is not wrapped in the
   ExceptionChainer context manager, and it should not even be a flush
   call, it should be a call to "multipathd remove map $map" instead.

2. The way multipath code is written [2][3], the error we see about
   "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
   different things: it is not a multipath or an error happened.

   So we don't really know what happened without enabling more verbose
   multipathd log levels.

3. The "multipath -f" call should not be failing in the first place,
   because the failure is happening on disconnecting the source volume,
   which has no data buffered to be written and therefore no reason to
   fail the flush (unless it's using a friendly name).

   I don't know if it could be happening that the first flush fails with
   a timeout (maybe because there is an extend operation happening), but
   multipathd keeps trying to flush it in the background and when it
   succeeds it removes the multipath device, which makes following calls
   fail.

   If that's the case we would need to change the retry from automatic
   [4] to manual and check in-between to see if the device has been
   removed in-between calls.

The first issue is definitely a bug, the 2nd one is something that could
be changed in the deployment to try to get additional information on the
failure, and the 3rd one could be a bug.

I'll see if I can find someone who wants to work on the 1st and 3rd
points.

Cheers,
Gorka.

[1]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
[2]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
[3]: https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
[4]: https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384

>
> ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor <geguileo at redhat.com>:
>
> > On 06/03, Rishat Azizov wrote:
> > > Hi,
> > >
> > > It works with smaller volumes.
> > >
> > > multipath.conf attached to thist email.
> > >
> > > Cinder version - 18.2.0 Wallaby
> >
> > Hi,
> >
> > After giving it some thought I think I may know what is going on.
> >
> > If you have DEBUG logs enabled in cinder-backup when it fails, how many
> > calls do you see in the cinder-backup to "multipath -f" from os-brick,
> > only one or do you see more?
> >
> > Cheers,
> > Gorka.
> >
> > >
> > > ??, 6 ???. 2023??. ? 17:35, Gorka Eguileor <geguileo at redhat.com>:
> > >
> > > > On 16/02, Rishat Azizov wrote:
> > > > > Hello!
> > > > >
> > > > > We have an error with creating backups from iscsi volume. Usually,
> > this
> > > > > happens with large backups over 100GB.
> > > > >
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > [req-f6619913-6f96-4226-8d75-2da3fca722f1
> > > > 23de1b92e7674cf59486f07ac75b886b
> > > > > a7585b47d1f143e9839c49b4e3bbe1b4 - - -] Exception during message
> > > > handling:
> > > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error
> > > > while
> > > > > running command.
> > > > > Command: multipath -f 3624a93705842cfae35d7483200015ec6
> > > > > Exit code: 1
> > > > > Stdout: ''
> > > > > Stderr: 'Feb 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a
> > > > > multipath device\n'
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > Traceback
> > > > > (most recent call last):
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line
> > > > 165,
> > > > > in _process_incoming
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server     res
> > =
> > > > > self.dispatcher.dispatch(message)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py",
> > line
> > > > > 309, in dispatch
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > self._do_dispatch(endpoint, method, ctxt, args)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py",
> > line
> > > > > 229, in _do_dispatch
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  result =
> > > > > func(ctxt, **new_args)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in
> > wrapper
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > func(self, *args, **kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line
> > 410, in
> > > > > create_backup
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > volume_utils.update_backup_error(backup, str(err))
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227,
> > in
> > > > > __exit__
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > self.force_reraise()
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200,
> > in
> > > > > force_reraise
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  raise
> > > > > self.value
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line
> > 399, in
> > > > > create_backup
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  updates
> > > > =
> > > > > self._run_backup(context, backup, volume)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line
> > 493, in
> > > > > _run_backup
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > ignore_errors=True)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line
> > 1066,
> > > > in
> > > > > _detach_device
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > force=force, ignore_errors=ignore_errors)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 141, in
> > > > > trace_logging_wrapper
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > f(*args, **kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py",
> > line
> > > > 360,
> > > > > in inner
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > f(*args, **kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > >
> > > >
> > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> > > > > line 880, in disconnect_volume
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > is_disconnect_call=True)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > >
> > > >
> > "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py",
> > > > > line 942, in _cleanup_connection
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > self._linuxscsi.flush_multipath_device(multipath_name)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py",
> > line
> > > > > 382, in flush_multipath_device
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > root_helper=self._root_helper)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in
> > > > > _execute
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  result =
> > > > > self.__execute(*args, **kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py",
> > line
> > > > > 172, in execute
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > execute_root(*cmd, **kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line
> > > > 247,
> > > > > in _wrap
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  return
> > > > > self.channel.remote_call(name, args, kwargs)
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server   File
> > > > > "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224,
> > in
> > > > > remote_call
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> >  raise
> > > > > exc_type(*result[2])
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > > oslo_concurrency.processutils.ProcessExecutionError: Unexpected error
> > > > while
> > > > > running command.
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Command:
> > > > > multipath -f 3624a93705842cfae35d7483200015ec6
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Exit
> > code: 1
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stdout:
> > ''
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server Stderr:
> > 'Feb
> > > > > 16 00:22:45 | 3624a93705842cfae35d7483200015ec6 is not a multipath
> > > > device\n'
> > > > > 2023-02-16 00:23:25.298 1920 ERROR oslo_messaging.rpc.server
> > > > >
> > > > > Could you please help with this error?
> > > >
> > > > Hi,
> > > >
> > > > Does it work for smaller volumes or does it also fail?
> > > >
> > > > What are your defaults in your /etc/multipath.conf file?
> > > >
> > > > What Cinder release are you using?
> > > >
> > > > Cheers,
> > > > Gorka.
> > > >
> > > >
> >
> > > defaults {
> > >         user_friendly_names no
> > >         find_multipaths yes
> > >         enable_foreign "^$"
> > > }
> > >
> > > blacklist_exceptions {
> > >         property "(SCSI_IDENT_|ID_WWN)"
> > > }
> > >
> > > blacklist {
> > > }
> > >
> > > devices {
> > >   device {
> > >         vendor "PURE"
> > >         product "FlashArray"
> > >         fast_io_fail_tmo 10
> > >         path_grouping_policy "group_by_prio"
> > >         failback "immediate"
> > >         prio "alua"
> > >         hardware_handler "1 alua"
> > >         max_sectors_kb 4096
> > >     }
> > > }
> >
> >

> 2023-03-10 16:42:41.785 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Transferred chunk 1600 of 1600 (84387K/s) _transfer_data /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:426
> 2023-03-10 16:42:42.107 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-base-metadata' _save_vol_base_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:79
> 2023-03-10 16:42:42.139 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Completed fetching metadata type 'volume-base-metadata' _save_vol_base_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:98
> 2023-03-10 16:42:42.139 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-metadata' _save_vol_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:109
> 2023-03-10 16:42:42.147 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] No metadata type 'volume-metadata' available _save_vol_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:123
> 2023-03-10 16:42:42.148 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting metadata type 'volume-glance-metadata' _save_vol_glance_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:132
> 2023-03-10 16:42:42.156 2878341 DEBUG cinder.backup.driver [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Completed fetching metadata type 'volume-glance-metadata' _save_vol_glance_meta /usr/lib/python3.6/site-packages/cinder/backup/driver.py:145
> 2023-03-10 16:42:42.157 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Backing up metadata for volume 7e9beead-01a6-488e-bc2f-47093a8cdbd9. _backup_metadata /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:946
> 2023-03-10 16:42:42.251 2878341 DEBUG cinder.backup.drivers.ceph [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Backup '4f32e959-509d-49d9-9674-acc3776b4b6a' of volume 7e9beead-01a6-488e-bc2f-47093a8cdbd9 finished. backup /usr/lib/python3.6/site-packages/cinder/backup/drivers/ceph.py:1011
> 2023-03-10 16:42:42.252 2878341 DEBUG oslo_concurrency.processutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf chown 0 /dev/dm-5 execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:42:42.613 2878341 DEBUG oslo_concurrency.processutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf chown 0 /dev/dm-5" returned: 0 in 0.361s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:42:42.614 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] ==> disconnect_volume: call "{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fe6925c3ac8>, {'target_discovered': False, 'discard': True, 'target_luns': [1, 1, 1, 1], 'target_iqns': ['iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'], 'target_portals': ['10.224.18.46:3260', '10.224.18.47:3260', '10.224.18.48:3260', '10.224.18.49:3260'], 'wwn': '3624a93705842cfae35d7483200015fce', 'qos_specs': {'total_bytes_sec': '524288000', 'read_iops_sec': '12800', 'write_iops_sec': '6400'}, 'access_mode': 'rw', 'encrypted': False, 'cacheable': False}, {'type': 'block', 'scsi_wwn': '3624a93705842cfae35d7483200015fce', 'path': '/dev/dm-5', 'multipath_id': '3624a93705842cfae35d7483200015fce'}), 'kwargs': {'force': True, 'ignore_errors': True}}" trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:150
> 2023-03-10 16:42:42.616 2878341 DEBUG oslo_concurrency.lockutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Lock "connect_volume" acquired by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: waited 0.001s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
> 2023-03-10 16:42:42.616 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Getting connected devices for (ips,iqns,luns)=[('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1', 1)] _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:821
> 2023-03-10 16:42:42.618 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:42:42.631 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:42:42.632 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('10.224.18.46:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.47:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.48:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n10.224.18.49:3260,4294967295 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:42:42.634 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m session execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:42:42.647 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m session" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:42:42.647 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\ntcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:42:42.649 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('-m', 'session'): stdout=tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
>  stderr= _run_iscsiadm_bare /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1155
> 2023-03-10 16:42:42.649 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsi session list stdout=tcp: [234] 10.224.18.46:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [235] 10.224.18.47:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [236] 10.224.18.49:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
> tcp: [237] 10.224.18.48:3260,4013 iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 (non-flash)
>  stderr= _run_iscsi_session /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1144
> 2023-03-10 16:42:42.653 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Resulting device map defaultdict(<function ISCSIConnector._get_connection_devices.<locals>.<lambda> at 0x7fe6921bf268>, {('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdb'}, set()), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sda'}, set()), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdc'}, set()), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'): ({'sdd'}, set())}) _get_connection_devices /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:853
> 2023-03-10 16:42:42.654 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Removing multipathed devices sdc, sdd, sda, sdb remove_connection /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:308
> 2023-03-10 16:42:42.655 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flush multipath device 3624a93705842cfae35d7483200015fce flush_multipath_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:376
> 2023-03-10 16:42:42.657 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:42:46.675 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 4.019s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:42:46.676 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478
> 2023-03-10 16:42:46.676 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:42:46.685 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 20 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106
> 2023-03-10 16:42:55.994 2878341 DEBUG oslo_service.periodic_task [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Running periodic task BackupManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:211
> 2023-03-10 16:42:55.995 2878341 DEBUG cinder.manager [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.6/site-packages/cinder/manager.py:197
> 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 20.031s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478
> 2023-03-10 16:43:06.707 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:06.714 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 40 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106
> 2023-03-10 16:43:46.753 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 40.046s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.754 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:474
> 2023-03-10 16:43:46.754 2880468 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140628271231200]: Unexpected error while running command.
> Command: multipath -f 3624a93705842cfae35d7483200015fce
> Exit code: 1
> Stdout: ''
> Stderr: 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
>     ret = func(*f_args, **f_kwargs)
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap
>     return func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 197, in execute_root
>     return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs)
>   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 146, in custom_execute
>     on_completion=on_completion, *cmd, **kwargs)
>   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 441, in execute
>     cmd=sanitized_cmd)
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
> Command: multipath -f 3624a93705842cfae35d7483200015fce
> Exit code: 1
> Stdout: ''
> Stderr: 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n'
> 2023-03-10 16:43:46.757 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('', 'Mar 10 16:43:06 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.758 2878341 WARNING os_brick.exception [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flushing 3624a93705842cfae35d7483200015fce failed: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
> 2023-03-10 16:43:46.761 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdc execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.772 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdc" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.772 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.774 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdc with /sys/block/sdc/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75
> 2023-03-10 16:43:46.774 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdc/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.814 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdc/device/delete" returned: 0 in 0.040s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.815 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.817 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdd execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.828 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdd" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.828 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.829 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdd with /sys/block/sdd/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75
> 2023-03-10 16:43:46.831 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdd/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.869 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdd/device/delete" returned: 0 in 0.039s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.870 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.872 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sda execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.883 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sda" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.884 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.885 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sda with /sys/block/sda/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75
> 2023-03-10 16:43:46.887 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sda/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.929 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sda/device/delete" returned: 0 in 0.042s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.930 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.931 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipathd del path /dev/sdb execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.942 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipathd del path /dev/sdb" returned: 0 in 0.011s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.943 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('ok\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.944 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Remove SCSI device /dev/sdb with /sys/block/sdb/device/delete remove_scsi_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:75
> 2023-03-10 16:43:46.945 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/block/sdb/device/delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.979 2880468 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/block/sdb/device/delete" returned: 0 in 0.034s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.979 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('1', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.980 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Checking to see if SCSI volumes sdc, sdd, sda, sdb have been removed. wait_for_volumes_removal /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:83
> 2023-03-10 16:43:46.981 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] SCSI volumes sdc, sdd, sda, sdb have been removed. wait_for_volumes_removal /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:92
> 2023-03-10 16:43:46.982 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Disconnecting from: [('10.224.18.46:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.47:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.48:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1'), ('10.224.18.49:3260', 'iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1')] _disconnect_connection /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1132
> 2023-03-10 16:43:46.983 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:46.996 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op update -n node.startup -v manual" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:46.996 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:46.997 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:46.998 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.026 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --logout" returned: 0 in 0.028s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.027 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260]\nLogout of [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.028 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260]
> Logout of [sid: 234, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.46,3260] successful.
>  stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.029 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.041 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.46:3260 --op delete" returned: 0 in 0.012s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.041 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.042 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.043 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.055 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op update -n node.startup -v manual" returned: 0 in 0.012s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.055 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.056 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.057 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.084 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --logout" returned: 0 in 0.026s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.084 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260]\nLogout of [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.086 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260]
> Logout of [sid: 235, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.47,3260] successful.
>  stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.088 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.101 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.47:3260 --op delete" returned: 0 in 0.013s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.102 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.103 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.104 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.114 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op update -n node.startup -v manual" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.114 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.115 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.116 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.140 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --logout" returned: 0 in 0.024s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.140 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260]\nLogout of [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.141 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260]
> Logout of [sid: 237, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.48,3260] successful.
>  stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.142 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.153 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.48:3260 --op delete" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.153 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.154 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.155 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op update -n node.startup -v manual execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.165 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op update -n node.startup -v manual" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.165 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.166 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'update', '-n', 'node.startup', '-v', 'manual'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.167 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --logout execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.189 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --logout" returned: 0 in 0.022s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.190 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('Logging out of session [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260]\nLogout of [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260] successful.\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.191 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--logout',): stdout=Logging out of session [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260]
> Logout of [sid: 236, target: iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1, portal: 10.224.18.49,3260] successful.
>  stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.192 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op delete execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.202 2880468 DEBUG oslo_concurrency.processutils [-] CMD "iscsiadm -m node -T iqn.2010-06.com.purestorage:flasharray.55893eb505d1d2a1 -p 10.224.18.49:3260 --op delete" returned: 0 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.202 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (4, ('', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:43:47.203 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] iscsiadm ('--op', 'delete'): stdout= stderr= _run_iscsiadm /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:1009
> 2023-03-10 16:43:47.204 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flushing again multipath 3624a93705842cfae35d7483200015fce now that we removed the devices. _cleanup_connection /usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py:941
> 2023-03-10 16:43:47.204 2878341 DEBUG os_brick.initiator.linuxscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Flush multipath device 3624a93705842cfae35d7483200015fce flush_multipath_device /usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py:376
> 2023-03-10 16:43:47.205 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.215 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 0.010s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:43:47.216 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478
> 2023-03-10 16:43:47.216 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:43:47.221 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 20 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106
> 2023-03-10 16:43:55.994 2878341 DEBUG oslo_service.periodic_task [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Running periodic task BackupManager.publish_service_capabilities run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:211
> 2023-03-10 16:43:55.995 2878341 DEBUG cinder.manager [req-b7d7b4ae-b092-40bf-9105-97bede752389 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python3.6/site-packages/cinder/manager.py:197
> 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 20.026s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:478
> 2023-03-10 16:44:07.242 2880468 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): multipath -f 3624a93705842cfae35d7483200015fce execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
> 2023-03-10 16:44:07.249 2880468 DEBUG os_brick.privileged.rootwrap [-] Sleeping for 40 seconds on_execute /usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py:106
> 2023-03-10 16:44:47.287 2880468 DEBUG oslo_concurrency.processutils [-] CMD "multipath -f 3624a93705842cfae35d7483200015fce" returned: 1 in 40.045s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
> 2023-03-10 16:44:47.287 2880468 DEBUG oslo_concurrency.processutils [-] 'multipath -f 3624a93705842cfae35d7483200015fce' failed. Not Retrying. execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:474
> 2023-03-10 16:44:47.288 2880468 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140628271231200]: Unexpected error while running command.
> Command: multipath -f 3624a93705842cfae35d7483200015fce
> Exit code: 1
> Stdout: ''
> Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n' _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
>     ret = func(*f_args, **f_kwargs)
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap
>     return func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 197, in execute_root
>     return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs)
>   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 146, in custom_execute
>     on_completion=on_completion, *cmd, **kwargs)
>   File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 441, in execute
>     cmd=sanitized_cmd)
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
> Command: multipath -f 3624a93705842cfae35d7483200015fce
> Exit code: 1
> Stdout: ''
> Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n'
> 2023-03-10 16:44:47.288 2880468 DEBUG oslo.privsep.daemon [-] privsep: reply[140628271231200]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', ('', 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:511
> 2023-03-10 16:44:47.289 2878341 DEBUG oslo_concurrency.lockutils [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Lock "connect_volume" released by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 124.674s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371
> 2023-03-10 16:44:47.289 2878341 DEBUG os_brick.initiator.connectors.iscsi [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] <== disconnect_volume: exception (124674ms) ProcessExecutionError('', 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n', 1, 'multipath -f 3624a93705842cfae35d7483200015fce', None) trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:160
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server [req-bdebfdef-daf3-4250-8594-9ed91adb7c00 f91779ad06064ebfbeeff54de535a6cd 8a676a415f9541c59705a373a36b0ec4 - - -] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
> Command: multipath -f 3624a93705842cfae35d7483200015fce
> Exit code: 1
> Stdout: ''
> Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n'
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/cinder/utils.py", line 890, in wrapper
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     return func(self, *args, **kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 410, in create_backup
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     volume_utils.update_backup_error(backup, str(err))
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     self.force_reraise()
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     raise self.value
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 399, in create_backup
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     updates = self._run_backup(context, backup, volume)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 493, in _run_backup
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     ignore_errors=True)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1066, in _detach_device
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     force=force, ignore_errors=ignore_errors)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 154, in trace_logging_wrapper
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     result = f(*args, **kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 880, in disconnect_volume
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     is_disconnect_call=True)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 942, in _cleanup_connection
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     self._linuxscsi.flush_multipath_device(multipath_name)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 382, in flush_multipath_device
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     root_helper=self._root_helper)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     result = self.__execute(*args, **kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 172, in execute
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     return execute_root(*cmd, **kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     return self.channel.remote_call(name, args, kwargs)
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server     raise exc_type(*result[2])
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Command: multipath -f 3624a93705842cfae35d7483200015fce
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Exit code: 1
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Stdout: ''
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server Stderr: 'Mar 10 16:44:07 | 3624a93705842cfae35d7483200015fce is not a multipath device\n'
> 2023-03-10 16:44:47.314 2878341 ERROR oslo_messaging.rpc.server


From knikolla at bu.edu  Tue Mar 14 14:40:57 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Tue, 14 Mar 2023 14:40:57 +0000
Subject: [tc][all] What's happening in the Technical Committee - 2023 March 14
Message-ID: <F2C4656D-CC03-4CBB-AB11-F267398AFC34@bu.edu>

Hi all,

Please find below a summary for the last 2 weeks of the TC.

Meetings
=======
- March 1, 2023
A video recoding of the meeting is available
https://www.youtube.com/watch?v=HA1owc9qGiE

- March 8, 2023
Meeting notes are available
https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.html 

- The next Technical Committee meeting will be tomorrow, March 14, 2023 at 16:00 UTC, chaired by Jay Faulkner.
Agenda is available at the link below. Please contact me or Jay if you notice missing items that should be discussed.
https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee


Merged Changes
==============
- Appoint Felipe Reyes as OpenStack_Charms PTL (formal-vote)
https://review.opendev.org/c/openstack/governance/+/874971
- Appoint Liu as Senlin PTL (formal-vote)
https://review.opendev.org/c/openstack/governance/+/874969 
- Adding mailto link in upstream opportunities doc (documentation-change)
https://review.opendev.org/c/openstack/governance/+/874968 
- Move heat-translator and tosca-parser to tacker's governance (project-update)
https://review.opendev.org/c/openstack/governance/+/876012
- Fix doc referencing 'admin_api' rule (typo-fix)
https://review.opendev.org/c/openstack/governance/+/875860
- Correct CHAIR.rst: reflect vice-chair nom practice
https://review.opendev.org/c/openstack/governance/+/875788
- Ironic program adopting x/virtualpdu
https://review.opendev.org/c/openstack/governance/+/876208
- Volunteer to serve at 2023.2 TC vice chair
https://review.opendev.org/c/openstack/governance/+/875787
- Add chair role to knikolla for 2023.2
https://review.opendev.org/c/openstack/governance/+/875742

Open Changes
============
There are 15 open changes to the Governance repo.
https://review.opendev.org/q/project:openstack/governance+status:open


Happenings
==========

New TC Chair and Vice Chair
----------------------------------------
The TC has a new Chair (Kristi Nikolla) and Vice Chair (Jay Faulkner). Thank you Ghanshyam Mann for your 2 years of service as the TC chair.
https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032506.html


TripleO is being deprecated
--------------------------------------
The TC voted to continue supporting the Zed release of TripleO, however without a project team. 
https://review.opendev.org/c/openstack/governance/+/877132


Leaderless projects
--------------------------
We're now down to 7 projects without any PTL candidacies (Monasca, Rally, Sahara, Swift, TripleO, Vitrage, and Winstackers). The TC has started reaching out to the teams and former PTLs. If you have any information or interest, please reach out to the TC via the etherpad linked below.
https://etherpad.opendev.org/p/2023.2-leaderless


Release naming for projects
--------------------------------------
The TC continued discussing changes related to version naming for projects. Specifically, there are two proposals about allowing projects to omit the OpenStack version when referring to their project, or flipping the OpenStack and project version order.
https://review.opendev.org/c/openstack/governance/+/874484
https://review.opendev.org/c/openstack/governance/+/875942


Virtual PTG
----------------
The TC is considering scheduling the same PTG time slots for itself as the previous PTG. Namely, Thursday and Friday 15:00 - 19:00 UTC. I will be booking the slots later today.
https://etherpad.opendev.org/p/tc-2023-2-ptg


How to contact the TC:
==================
If you would like to discuss or give feedback to TC, you can reach out to us in multiple ways:

1. Email: you can send an email with the tag [tc] on the openstack-discuss mailing list.
2. Weekly meeting: The Technical Committee conduct a weekly meeting every Thursday 16:00 UTC
3. IRC: Ping us using the 'tc-members' keyword on the #openstack-tc IRC channel on OFTC.


From skaplons at redhat.com  Tue Mar 14 16:06:52 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Tue, 14 Mar 2023 17:06:52 +0100
Subject: [neutron] Bug deputy - report from week of 6th March
Message-ID: <12170940.O9o76ZdvQC@p1>

Hi,

Sorry for sending it so late but I was off Monday and Tuesday and I forgot about it.
Here's list of new bugs reported last week:

## High
* https://bugs.launchpad.net/neutron/+bug/2009678 -  [OVN] The OVN agent config entry point is incorrect in setup.cfg - assigned to Rodolfo, fix proposed
* https://bugs.launchpad.net/neutron/+bug/2009632 -  ARP requests from ovnmeta namespaces are sent to physical interfaces of computing nodes - **not assigned yet**
* https://bugs.launchpad.net/neutron/+bug/2009703 -  [OVN] HW offload event "QoSMinimumBandwidthEvent" fails if the min-bw rule is removed - assigned to Rodolfo, fix proposed

## Medium
* https://bugs.launchpad.net/neutron/+bug/2009509 -  Large number of FIPs and subnets causes slow sync_routers response - assigned to Adam Oswick
* https://bugs.launchpad.net/neutron/+bug/2009804 -  [OVN] Method ``get_port_qos`` should always return 2 values - **not assigned yet, low hanging fruit bug**

## Low
* https://bugs.launchpad.net/neutron/+bug/2009728 -  [OVS] "permitted_ethertypes" should be validated and filtered during the OVS agent initialization - **not assigned yet, low hanging fruit bug**
* https://bugs.launchpad.net/neutron/+bug/2009832 -  FWaaS docs lack required packages - **not assigned yet, low hanging fruit bug**
* https://bugs.launchpad.net/neutron/+bug/2009831 -  VPNaaS docs lack packages to install - **not assigned yet, low hanging fruit bug**

## Incomlete
* https://bugs.launchpad.net/neutron/+bug/2009807 -  Not able to create external physical network - needs some more info but for now it looks more like user's issue rather than bug

## Others
* https://bugs.launchpad.net/neutron/+bug/2009705 -  [FWaaS ]Openstack Zed - firewall group status doesn't change to ACTIVE. - **unassigned**, it needs someone from FWaaS to look at it and ZhaoHeng is looking into it already.

-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/5a94939c/attachment.sig>

From nguyenhuukhoinw at gmail.com  Tue Mar 14 16:12:03 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Tue, 14 Mar 2023 23:12:03 +0700
Subject: [openstack][backup] Experience for instance backup
In-Reply-To: <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de>
References: <CABAODRdirYpXD0OYSiKeNT2bOpeJXCj5-jv=pcs=61VbK79Jdw@mail.gmail.com>
 <0d54528b-eac7-d39f-2d5d-141fde0d9a9e@inovex.de>
Message-ID: <CABAODRdeZKqAUvkTESs-_XHj-cjogPADzeCDBOU5C6L+oK-ucQ@mail.gmail.com>

Thank Christian, I will try to follow it.

*Hello * Eugen,
I use SAN to back our openstack services and I planned to use NFS for
Cinder backup. Because of that, We separate tenants for
different departments. So they can back up by themself. Thank you for your
sharing.
Nguyen Huu Khoi


On Mon, Mar 13, 2023 at 11:11?PM Christian Rohmann <
christian.rohmann at inovex.de> wrote:

> Hey there,
>
> On 06/03/2023 22:34, Nguy?n H?u Kh?i wrote:
> >
> > I am looking for instance backup solution. I am using Cinder backup
> > with nfs backup but it looks not too fast. I am using a 10Gbps
> > network. I would like to know experience for best practice for
> > instance backup solutions on Openstack.
>
>
> On 13/03/2023 12:46, Eugen Block wrote:
> > We use Ceph as back end for all services (nova, glance, cinder), and
> > the most important machines are backed up by our backup server
> > directly via rbd commands:
>
> There is RBD and "the other" drivers. While RBD uses the native export /
> import feature of Ceph, all other drivers (file, NFS, object storages
> like S3) are based on the abstract chunked driver
> (
> https://opendev.org/openstack/cinder/src/branch/master/cinder/backup/chunkeddriver.py
> ).
> This driver reads the volume / image and treats it as chunks before
> making use of a concrete driver (e.g. NFS or S3) to send those chunks
> off somewhere to be stored. Restore works just the opposite way. The
> performance of the chunked driver based back-ends is not (yet)
> comparable to what RBD can achieve due to various reasons.
>
> But again, while "RBD" uses Ceph's mechanisms internally all other
> "targets" for backup storage work differently.
> We ourselves were looking into using and S3-compatible storage and thus
> I started a dicsussion about the state of those other drivers at
>
> https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030263.html
>
> This then led to a discussion at the Cinder PTG
> https://etherpad.opendev.org/p/antelope-ptg-cinder#L119 with many
> observations.
>
> There also are changes in the works, like restore into sparse volumes
> (https://review.opendev.org/c/openstack/cinder/+/852654) when going via
> the chunked driver.
> But also features like "encryption"
> (https://review.opendev.org/c/openstack/cinder-specs/+/862601) are being
> discussed.
>
>
>
> Regards
>
>
> Christian
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/aea3c085/attachment-0001.htm>

From skaplons at redhat.com  Tue Mar 14 16:44:32 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Tue, 14 Mar 2023 17:44:32 +0100
Subject: [neutron]
In-Reply-To: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
References: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
Message-ID: <5696019.DvuYhMxLoT@p1>

Hi,

Dnia wtorek, 14 marca 2023 10:46:07 CET Kamil Madac pisze:
> Hi All,
> 
> I'm in the process of planning a small public cloud based on OpenStack. I
> have quite experience with kolla-ansible deployments which use OVS
> networking and I have no issues with that. It works stable for my use cases
> (Vlan provider networks, DVR, tenant networks, floating IPs).
> 
> For that new deployment I'm looking at OVN deployment which from what I
> read should be more performant (faster build of instances) and with ability
> to cover more networking features in OVN instead of needing external
> software like iptables/dnsmasq.
> 
> Does anyone use OVN in production and what is your experience (pros/cons)?
> Is OVN mature enough to replace OVS in the production deployment (are there
> some basic features from OVS missing)?

I'm not using it in production as I'm not cloud operator but I can say that it is stable and mature enough to use it.
In Red Hat OpenStack (RH OSP) it's default networking backend since OSP16 (based on upstream Train version).
Regarding list of the feature parity gaps You can check https://docs.openstack.org/neutron/latest/ovn/gaps.html - this list should be more or less up to date. In case of any doubts You can always ask on neutron channel on IRC about specific feature which You would need :)

> 
> Thanks in advance for sharing the experience.
> 
> -- 
> Kamil Madac <https://kmadac.github.io/>
> 


-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/2afe013e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/2afe013e/attachment.sig>

From skaplons at redhat.com  Tue Mar 14 16:47:27 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Tue, 14 Mar 2023 17:47:27 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <2315188.ElGaqSPkdT@p1>

Hi,

Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze:
> Hi Mohammed,
> 
> > Subject: [neutron] detecting l3-agent readiness
> >
> > Hi folks,
> >
> > I'm working on improving the stability of rollouts when using Kubernetes as a control plane, specifically around the L3 agent, it seems that I have not found a clear way to detect in the code path where the L3 agent has finished it's initial sync..
> >
> 
> We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-/blob/devel/files/startup_wait_for_ns.py
> Basically we are checking against the neutron api what routers should be on the node and then validate that all keepalived processes are up and running.

That would work only for HA routers. If You would also have routers which aren't "ha" this method may fail.

> 
> > Am I missing it somewhere or is the architecture built in a way that doesn't really answer that question?
> >
> 
> Adding a option in the neutron api would be a lot nicer. But i guess that also counts for l2 and dhcp agents.
> 
> 
> > Thanks
> > Mohammed
> >
> >
> > --
> > Mohammed Naser
> > VEXXHOST, Inc.
> 
> --
> Felix Huettner
> Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.
> 


-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/4f193215/attachment.sig>

From zakhar at gmail.com  Tue Mar 14 18:28:32 2023
From: zakhar at gmail.com (Zakhar Kirpichenko)
Date: Tue, 14 Mar 2023 20:28:32 +0200
Subject: Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC
 unusually slow
In-Reply-To: <CAEw-OTVHCanmWVBnnxy=SCCTemVO7gWxwvD4LySYkzjqw1x2UA@mail.gmail.com>
References: <CAEw-OTVHCanmWVBnnxy=SCCTemVO7gWxwvD4LySYkzjqw1x2UA@mail.gmail.com>
Message-ID: <CAEw-OTV7HkPp2_CKJdfR1G2bxo7CHPcSgpm9XSKEj0eOom5J+Q@mail.gmail.com>

If anyone is interested, I reported the bug/regression:
https://bugs.launchpad.net/cloud-archive/+bug/2011513

Is anyone else facing such issues?

/Z

On Tue, 14 Mar 2023 at 08:34, Zakhar Kirpichenko <zakhar at gmail.com> wrote:

> Hi!
>
> We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra
> nodes with a RabbitMQ cluster. I updated Neutron components to version
> 18.6.0, which recently became available in the cloud repository (
> http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby
> main). The exact package versions updated are as follows:
>
> Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic),
> openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
> Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0,
> 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2,
> 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64
> (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
> neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
> 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0,
> 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0,
> 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64
> (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
> neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
> 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64
> (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)
>
> Installed Neutron packages:
>
> ii  neutron-common                        2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - common
> ii  neutron-dhcp-agent                    2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - DHCP agent
>  Firewall-as-a-Service driver for OpenStack Neutron
> ii  neutron-l3-agent                      2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - l3 agent
> ii  neutron-linuxbridge-agent             2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - linuxbridge agent
> ii  neutron-metadata-agent                2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - metadata agent
> ii  neutron-plugin-ml2                    2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - ML2 plugin
> ii  neutron-server                        2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - server
> ii  python3-neutron                       2:18.6.0-0ubuntu1~cloud1
>                     all          Neutron is a virtual network service for
> Openstack - Python library
> ii  python3-neutron-lib                   2.10.1-0ubuntu1~cloud0
>                     all          Neutron shared routines and utilities -
> Python 3.x
> ii  python3-neutronclient                 1:7.2.1-0ubuntu1~cloud0
>                      all          client API library for Neutron - Python
> 3.x
>
> Normally this would be an easy update, but this time neutron-dhcp-agent
> doesn't work properly:
>
> 2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent
> [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state
> complete
> 2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc
> [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method
> dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the
> server is not down, consider increasing the rpc_response_timeout option as
> Neutron server(s) may be overloaded and unable to respond quickly enough.:
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID bd97110b004e413cb2d6b05d9fb3b57c
> 2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc
> [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for
> dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it
> to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
> out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
> 2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent
> [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying
> server of ports ready. Retrying...:
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID bd97110b004e413cb2d6b05d9fb3b57c
> 2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver
> [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
> 2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc
> [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method
> dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the
> server is not down, consider increasing the rpc_response_timeout option as
> Neutron server(s) may be overloaded and unable to respond quickly enough.:
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID f254f735998243c4b0a58ce95c974534
> 2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc
> [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for
> dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it
> to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
> out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
> 2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver
> [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
> 2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent
> [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying
> server of ports ready. Retrying...:
> oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
> to message ID f254f735998243c4b0a58ce95c974534
> 2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent
> [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for
> ports ... (A successful configuration here).
>
> While neutron-dhcp-agent is waiting, neutron-server log gets filled up
> with:
>
> neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO
> neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
> - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
> ...
> neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO
> neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
> - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
>
> This repeats for each port of each network neutron-dhcp-agent needs to
> configure.
>
> Each subsequent configuration for each network takes about 1-2
> minutes, depending on the network size. With earlier Neutron versions the
> whole process of configuring all networks would finish in under a minute,
> i.e. DHCP configuration per port (and network) is several orders of
> magnitude slower than it should be. Once neutron-dhcp-agent finishes
> synchronization, it seems to work without issues although there aren't that
> many changes in our cloud to tell whether it's fast or slow, individual
> port updates seem to happen quickly.
>
> All other services are working well, RabbitMQ cluster is working well,
> infra nodes are not overloaded and there are no apparent issues other than
> this one with Neutron, thus I am inclined to think that the issue is
> specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.
>
> I would appreciate any advice!
>
> Best regards,
> Zakhar
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/7e75c202/attachment-0001.htm>

From fkr at hazardous.org  Tue Mar 14 19:53:22 2023
From: fkr at hazardous.org (Felix Kronlage-Dammers)
Date: Tue, 14 Mar 2023 20:53:22 +0100
Subject: [publiccloud-sig] Reminder - next meeting March 15th - 0800 UTC
Message-ID: <ZBDQsrhl7Z6lnC6H@hazardous.org>

Hi everyone,

better late than not at all ;) Here comes the reminder for the next 
meeting of the Public Cloud SIG: This is on March 15th (this
wednesday) at 0800 UTC.

We meet on IRC in #openstack-operators.

A preliminary agenda can be found in the pad:
https://etherpad.opendev.org/p/publiccloud-sig-meeting

See also here for all other details: 
https://wiki.openstack.org/wiki/PublicCloudSIG

read you on wednesday!

felix


From jay at gr-oss.io  Tue Mar 14 21:08:20 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Tue, 14 Mar 2023 14:08:20 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
 <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>
Message-ID: <CA+sTGNd2JNR64KgCgNRjAN2iPJNfHraPZvE6OyVjLRRhXGOoPg@mail.gmail.com>

I'm not opposed to adding more time necessarily, but I wonder if the
solution is to triage what we talk about better.

Last PTG, we planned several features which we didn't have enough
contribution to complete. IMO, it might be better to limit what we discuss
to things that are likely to be accomplished next cycle. If we can't
determine that without more discussion, then sure, let's add vPTG sessions..

Is there a suggestion as to specifically what times, and how much to add?

Thanks,
Jay Faulkner

On Mon, Mar 13, 2023 at 11:43?AM Julia Kreger <juliaashleykreger at gmail.com>
wrote:

> Greetings!
>
> Time slot wise, I think that works for me.
>
> Time wise, in regards to the amount, I'm wondering if we need more.  By my
> count, we have 11 new topics, 4 topics to revisit, in about six hours
> of non-operator dedicated time, not accounting for breaks for coffee/tea.
> Granted, some topics might be super quick at the 10 minute quick poll of
> the room, whereas other topics I feel like will require extensive
> discussion. If I were to size them, I think we would have 6 large-ish
> topics along with 3-4 medium sized topics.
>
> -Julia
>
>
> On Thu, Mar 9, 2023 at 3:19?PM Jay Faulkner <jay at gr-oss.io> wrote:
>
>> Hey all,
>>
>> The vPTG will be upon us soon, the week of March 27.
>>
>> I booked the following times on behalf of Ironic + BM SIG Operator hour,
>> in accordance with what times worked in Antelope. It's my hope that since
>> we've had little contributor turnover, these times continue to work. I'm
>> completely open to having things moved around if it's more convenient to
>> participants.
>>
>> I've booked the following times, all in Folsom:
>> - Tuesday 1400 UTC - 1700 UTC
>> - Wednesday 1300 UTC Operator hour: baremetal SIG
>> - Wednesday 1400 UTC - 1600 UTC
>> - Wednesday 2200 - 2300 UTC
>>
>>
>> I propose that after the Ironic meeting on March 20, we shortly sync up
>> in the Bobcat PTG etherpad (
>> https://etherpad.opendev.org/p/ironic-bobcat-ptg) to pick topics and
>> assign time.
>>
>>
>> Again, this is all meant to be a suggestion, I'm happy to move things
>> around but didn't want us to miss out on getting things booked.
>>
>>
>> -
>> Jay Faulkner
>> Ironic PTL
>> TC Member
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/0f2f6ed5/attachment.htm>

From juliaashleykreger at gmail.com  Tue Mar 14 21:41:18 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Tue, 14 Mar 2023 14:41:18 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CA+sTGNd2JNR64KgCgNRjAN2iPJNfHraPZvE6OyVjLRRhXGOoPg@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
 <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>
 <CA+sTGNd2JNR64KgCgNRjAN2iPJNfHraPZvE6OyVjLRRhXGOoPg@mail.gmail.com>
Message-ID: <CAF7gwdi2nxy6y_60EPwSU-NmbLeBRgHohkqdNYvnQKm-FH5a1A@mail.gmail.com>

I think there is a bit of a challenge to navigate though in that the PTG as
a sync point is needed especially on items which may take more than just
one cycle to deliver. A great example is driver composition. The other
unknown is if people are just not interested in some topics, which can
result in that topic being very quick.

One thing we also did in the past is guess how much time as a group in
advance. I know for a few cycles we had a quick 15 minute call to discuss
sizing.  Based upon output from that, I think we could adjust the time
slots accordingly. Maybe that might make sense to do?

-Julia

On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner <jay at gr-oss.io> wrote:

> I'm not opposed to adding more time necessarily, but I wonder if the
> solution is to triage what we talk about better.
>
> Last PTG, we planned several features which we didn't have enough
> contribution to complete. IMO, it might be better to limit what we discuss
> to things that are likely to be accomplished next cycle. If we can't
> determine that without more discussion, then sure, let's add vPTG sessions..
>
> Is there a suggestion as to specifically what times, and how much to add?
>
> Thanks,
> Jay Faulkner
>
>> [trim]
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/3776e554/attachment.htm>

From jay at gr-oss.io  Tue Mar 14 21:53:59 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Tue, 14 Mar 2023 14:53:59 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CAF7gwdi2nxy6y_60EPwSU-NmbLeBRgHohkqdNYvnQKm-FH5a1A@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
 <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>
 <CA+sTGNd2JNR64KgCgNRjAN2iPJNfHraPZvE6OyVjLRRhXGOoPg@mail.gmail.com>
 <CAF7gwdi2nxy6y_60EPwSU-NmbLeBRgHohkqdNYvnQKm-FH5a1A@mail.gmail.com>
Message-ID: <CA+sTGNeLan7SjFARbDOH2O7Kd7rhvZFv7N9mmPNuporxZX14pA@mail.gmail.com>

That sync is exactly what I hoped

>
I propose that after the Ironic meeting on March 20, we shortly sync up in
the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg)
to pick topics and assign time.

^ that would be.

Does that sound good to you? If it comes out in there we need more time, we
should also have interested parties around to book it.

-Jay

On Tue, Mar 14, 2023 at 2:41?PM Julia Kreger <juliaashleykreger at gmail.com>
wrote:

> I think there is a bit of a challenge to navigate though in that the PTG
> as a sync point is needed especially on items which may take more than just
> one cycle to deliver. A great example is driver composition. The other
> unknown is if people are just not interested in some topics, which can
> result in that topic being very quick.
>
> One thing we also did in the past is guess how much time as a group in
> advance. I know for a few cycles we had a quick 15 minute call to discuss
> sizing.  Based upon output from that, I think we could adjust the time
> slots accordingly. Maybe that might make sense to do?
>
> -Julia
>
> On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner <jay at gr-oss.io> wrote:
>
>> I'm not opposed to adding more time necessarily, but I wonder if the
>> solution is to triage what we talk about better.
>>
>> Last PTG, we planned several features which we didn't have enough
>> contribution to complete. IMO, it might be better to limit what we discuss
>> to things that are likely to be accomplished next cycle. If we can't
>> determine that without more discussion, then sure, let's add vPTG sessions..
>>
>> Is there a suggestion as to specifically what times, and how much to add?
>>
>> Thanks,
>> Jay Faulkner
>>
>>> [trim]
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/4424b5a9/attachment-0001.htm>

From juliaashleykreger at gmail.com  Tue Mar 14 22:07:05 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Tue, 14 Mar 2023 15:07:05 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CA+sTGNeLan7SjFARbDOH2O7Kd7rhvZFv7N9mmPNuporxZX14pA@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
 <CAF7gwdhTOV6r=pppXWVeejfya=MZ73uGfCmXUpZay4ShBQQ_Gw@mail.gmail.com>
 <CA+sTGNd2JNR64KgCgNRjAN2iPJNfHraPZvE6OyVjLRRhXGOoPg@mail.gmail.com>
 <CAF7gwdi2nxy6y_60EPwSU-NmbLeBRgHohkqdNYvnQKm-FH5a1A@mail.gmail.com>
 <CA+sTGNeLan7SjFARbDOH2O7Kd7rhvZFv7N9mmPNuporxZX14pA@mail.gmail.com>
Message-ID: <CAF7gwdiHuNWR-jgMKT3cpqP0VYEZRA2RfDj++52q8XE1mkODZg@mail.gmail.com>

Sounds good to me!

Thanks!

-Julia

On Tue, Mar 14, 2023 at 2:54?PM Jay Faulkner <jay at gr-oss.io> wrote:

>
> That sync is exactly what I hoped
>
> >
> I propose that after the Ironic meeting on March 20, we shortly sync up in
> the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg)
> to pick topics and assign time.
>
> ^ that would be.
>
> Does that sound good to you? If it comes out in there we need more time,
> we should also have interested parties around to book it.
>
> -Jay
>
> On Tue, Mar 14, 2023 at 2:41?PM Julia Kreger <juliaashleykreger at gmail.com>
> wrote:
>
>> I think there is a bit of a challenge to navigate though in that the PTG
>> as a sync point is needed especially on items which may take more than just
>> one cycle to deliver. A great example is driver composition. The other
>> unknown is if people are just not interested in some topics, which can
>> result in that topic being very quick.
>>
>> One thing we also did in the past is guess how much time as a group in
>> advance. I know for a few cycles we had a quick 15 minute call to discuss
>> sizing.  Based upon output from that, I think we could adjust the time
>> slots accordingly. Maybe that might make sense to do?
>>
>> -Julia
>>
>> On Tue, Mar 14, 2023 at 2:08?PM Jay Faulkner <jay at gr-oss.io> wrote:
>>
>>> I'm not opposed to adding more time necessarily, but I wonder if the
>>> solution is to triage what we talk about better.
>>>
>>> Last PTG, we planned several features which we didn't have enough
>>> contribution to complete. IMO, it might be better to limit what we discuss
>>> to things that are likely to be accomplished next cycle. If we can't
>>> determine that without more discussion, then sure, let's add vPTG sessions..
>>>
>>> Is there a suggestion as to specifically what times, and how much to
>>> add?
>>>
>>> Thanks,
>>> Jay Faulkner
>>>
>>>> [trim]
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/16ce509e/attachment.htm>

From sbauza at redhat.com  Wed Mar 15 09:33:03 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 15 Mar 2023 10:33:03 +0100
Subject: [nova][PTG] PTG etherpad,
 please ideally add your topics before next week
Message-ID: <CALOCmumjzB0QUs-3Go7uyV3FkXyE-CzXW4nTFJa-FyUz2gCkgg@mail.gmail.com>

As we discussed on the last nova meeting [1], I'll create an agenda for all
the topics we will have for the next vPTG by next Tuesday. For this, it
would be nice if most of the topics people would like to discuss are
already on the PTG etherpad before Tuesday, so please look at this etherpad
and add your own topics rather sooner than later  ;-)

https://etherpad.opendev.org/p/nova-bobcat-ptg

I also add a courtesy ping list item for each of the existing topics.
Please add your IRC nick if you can't be around for all of the PTG time, so
basically at the beginning of every PTG topic, I'd ping all the folks for
it.

As a reminder, I booked those time slots for the Diablo room :

   - *Tuesday* *13:00 UTC - 17:00 UTC*


   - *Wednesday 13:00 UTC - 17:00 UTC*


   - *Thursday 13:00 UTC - 17:00 UTC*


   - *Friday 13:00 UTC - 17:00 UTC*


Thanks,
-S
[1]
https://meetings.opendev.org/meetings/nova/2023/nova.2023-03-14-16.00.log.html#l-150
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/15f8be16/attachment.htm>

From senrique at redhat.com  Wed Mar 15 11:06:34 2023
From: senrique at redhat.com (Sofia Enriquez)
Date: Wed, 15 Mar 2023 11:06:34 +0000
Subject: [cinder] Bug Report | 15-03-2023
Message-ID: <CANtmtpE3fSo5LsMNzf5Bp3w_DgODcq-EuCv9sq-4Z34aijezwQ@mail.gmail.com>

Hello Argonauts,

Medium

   - Extending SCSI multipath doesn't work if Nova configuration changed.
   <https://bugs.launchpad.net/os-brick/+bug/2009157>
      - *Status*: Unassigned.
   - An Extended multipathed device should not require a reconfigure.
   <https://bugs.launchpad.net/os-brick/+bug/2009158>
      - *Status*: Unassigned.
   - [rbac] Reader user able to delete a user message.
   <https://bugs.launchpad.net/cinder/+bug/2009818>
      - *Status*: Unassigned.
      - Cinder Message API creates failure "'NoneType' object is not
   subscriptable". <https://bugs.launchpad.net/cinder/+bug/2009483>
      - *Status*: Marked as duplicate of similar bug
      <https://bugs.launchpad.net/cinder/+bug/1978729>. Fix proposed to
      master <https://review.opendev.org/c/openstack/cinder/+/876587>.

Low

   -  [HPE] Volume name migration fails with keyerror
   <https://bugs.launchpad.net/cinder/+bug/2008931>.
      - *Status*: No fix proposed to master yet.

Incomplete

   - Cannot delete snapshot: invalid backing file.
   <https://bugs.launchpad.net/cinder/+bug/2009849>

Cinder removed:

   - [nova] size_iops_sec does behave differently than mentioned in docs.
   <https://bugs.launchpad.net/nova/+bug/2008762>


Cheers,
-- 

Sof?a Enriquez

she/her

Software Engineer

Red Hat PnT <https://www.redhat.com>

IRC: @enriquetaso
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/f045437a/attachment.htm>

From zigo at debian.org  Wed Mar 15 11:26:37 2023
From: zigo at debian.org (Thomas Goirand)
Date: Wed, 15 Mar 2023 12:26:37 +0100
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
Message-ID: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>

On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> Watcher is not good because It need cpu metric 
> such as cpu load in Ceilometer?which is removed so we cannot use it. 

Hi!

What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, 
and it works well... If by that, you mean "ceilometer-api" is removed, 
then yes, but then you can use gnocchi.

Cheers,

Thomas Goirand (zigo)


From smooney at redhat.com  Wed Mar 15 11:57:58 2023
From: smooney at redhat.com (Sean Mooney)
Date: Wed, 15 Mar 2023 11:57:58 -0000
Subject: live resize
In-Reply-To: <CAJa=V5ZMZM9FN2y2WkACxnoKevaoM84xAVCNGkVi1d0eXDsgxA@mail.gmail.com>
References: <CAJa=V5ZMZM9FN2y2WkACxnoKevaoM84xAVCNGkVi1d0eXDsgxA@mail.gmail.com>
Message-ID: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com>

On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote:
> hello
> on openstack with ceph backend is it possible to live resize instances ? I
> want to change flavor without any down time .
no that is not support in nova with any storage backend or hypervior.


From nguyenhuukhoinw at gmail.com  Wed Mar 15 12:09:07 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 15 Mar 2023 19:09:07 +0700
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
Message-ID: <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>

Hello.
I cannot use because missing cpu_util metric. I try to match it work but
not yet. It need some code to make it work. It seem none care about balance
reources on cloud.

On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org> wrote:

> On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> > Watcher is not good because It need cpu metric
> > such as cpu load in Ceilometer which is removed so we cannot use it.
>
> Hi!
>
> What do you mean by "Ceilometer [is] removed"? It certainly isn't dead,
> and it works well... If by that, you mean "ceilometer-api" is removed,
> then yes, but then you can use gnocchi.
>
> Cheers,
>
> Thomas Goirand (zigo)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/26ef8e0d/attachment.htm>

From nguyenhuukhoinw at gmail.com  Wed Mar 15 12:10:37 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 15 Mar 2023 19:10:37 +0700
Subject: live resize
In-Reply-To: <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com>
References: <CAJa=V5ZMZM9FN2y2WkACxnoKevaoM84xAVCNGkVi1d0eXDsgxA@mail.gmail.com>
 <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com>
Message-ID: <CABAODRfXrANDv6gUS_Fa5416K+t6zaky3xYDqmGAJYkdee4mWw@mail.gmail.com>

Hello.
It looks like no way at this time.

On Wed, Mar 15, 2023, 7:05 PM Sean Mooney <smooney at redhat.com> wrote:

> On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote:
> > hello
> > on openstack with ceph backend is it possible to live resize instances ?
> I
> > want to change flavor without any down time .
> no that is not support in nova with any storage backend or hypervior.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/a5925ccd/attachment.htm>

From smooney at redhat.com  Wed Mar 15 12:25:05 2023
From: smooney at redhat.com (Sean Mooney)
Date: Wed, 15 Mar 2023 12:25:05 +0000
Subject: live resize
In-Reply-To: <CABAODRfXrANDv6gUS_Fa5416K+t6zaky3xYDqmGAJYkdee4mWw@mail.gmail.com>
References: <CAJa=V5ZMZM9FN2y2WkACxnoKevaoM84xAVCNGkVi1d0eXDsgxA@mail.gmail.com>
 <4bfa4e3fa2f0396d8963bb0dff76cf8dfb557872.camel@redhat.com>
 <CABAODRfXrANDv6gUS_Fa5416K+t6zaky3xYDqmGAJYkdee4mWw@mail.gmail.com>
Message-ID: <db90e2cfbc5057cb091c78a41b9d510e06f91824.camel@redhat.com>

On Wed, 2023-03-15 at 19:10 +0700, Nguy?n H?u Kh?i wrote:
> Hello.
> It looks like no way at this time.

correct live reisze is not supproted and not planned to be supported in a future release.

you can extend the disk live if its a boot form voluem instance but nova resize api is and will
continue to be an offline operatoion. live resize wiht qemu/kvm has a lot of edge cases like
startign the domain with the maxium ram set to the largeset value you might resize too and setting current
to what is in the flavor. the same would be required for cpu. its really not something that is compatible with how
we do flavor as there are too many ways that it could fail without signifciatly modifying how flavors work.
> 
> On Wed, Mar 15, 2023, 7:05 PM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Tue, 2022-05-24 at 22:49 +0430, Parsa Aminian wrote:
> > > hello
> > > on openstack with ceph backend is it possible to live resize instances ?
> > I
> > > want to change flavor without any down time .
> > no that is not support in nova with any storage backend or hypervior.
> > 
> > 
> > 


From michal.arbet at ultimum.io  Wed Mar 15 12:28:20 2023
From: michal.arbet at ultimum.io (Michal Arbet)
Date: Wed, 15 Mar 2023 13:28:20 +0100
Subject: Magnum in yoga release on Ubuntu 22.04
In-Reply-To: <30941678692580@mail.yandex.ru>
References: <30941678692580@mail.yandex.ru>
Message-ID: <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>

Hi,

You can't import qcow and mark it as raw, before upload to glance you have
to convert qcow2 to raw.

openstack image create Fedora-CoreOS
--file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
--disk-format=raw --container-format=bare --property
os_distro='fedora-coreos' --public

to

qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
fedora-coreos-35.20220313.3.1-openstack.x86_64.raw
openstack image create Fedora-CoreOS
--file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw
--container-format=bare --property os_distro='fedora-coreos' --public

Hope it will help you.

Regards,
Michal Arbet
Openstack Engineer

Ultimum Technologies a.s.
Na Po???? 1047/26, 11000 Praha 1
Czech Republic

+420 604 228 897
michal.arbet at ultimum.io
*https://ultimum.io <https://ultimum.io/>*

LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
<https://twitter.com/ultimumtech> | Facebook
<https://www.facebook.com/ultimumtechnologies/timeline>


po 13. 3. 2023 v 13:25 odes?latel ??????? ???? <mister.mackarow at yandex.ru>
napsal:

> Hello, openstack team! Please help me! I'm trying to use magnum in the
> yoga release on ubuntu 22.04 I can't understand why it doesn't work, when
> creating a container I get an error
> "
> 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver
> [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack
> status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b,
> reason:Resource CREATE failed: ResourceInError:
> resources.kube_masters.resources[0].resources.kube-master:
> Went to status ERROR due to "Message: Build of instance
> 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image
> 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw
> format, Code: 500"
> "
> I downloaded the container image and tried to create it in raw with the
> command and create a new Cluster template
> "
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> --disk-format=raw --container-format=bare --property
> os_distro='fedora-coreos' --public
> "
> and create a new Cluster template, when I tried to create a cluster, it
> had the status CREATE_IN_PROGRESS for a very long time I got an error
> "
> 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver
> [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack
> status:
> CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason:
> Timed out=
> "
> why it doesn't work?  I saw a massage on your website.
>
> Does this mean that magnum only works in the Zed release?
> --
> With respect,
> Makarov Maxim
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/f4035b95/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/f4035b95/attachment-0001.png>

From bence.romsics at gmail.com  Wed Mar 15 12:39:22 2023
From: bence.romsics at gmail.com (Bence Romsics)
Date: Wed, 15 Mar 2023 13:39:22 +0100
Subject: [nova][cinder] future of rebuild without reimaging
Message-ID: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>

Hi All!

We have users who use 'rebuild' on volume booted servers before nova
microversion 2.93, relying on the behavior that it keeps the volume as
is. And they would like to keep doing this even after the openstack
distro moves to a(n at least) zed base (sometime in the future).

As a naive user, it seems to me both behaviors make sense. I can
easily imagine use cases for rebuild with and without reimaging.
However since the implementation of
https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volume-backed-server-rebuild.html
rebuild without reimaging is only possible using an old microversion
(<2.93). With that change merged, rebuild without reimaging seems to
be a somewhat less than fully supported feature. A few examples of
what I mean by that:

First, there's this warning:
https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1cebad9245c27d58a0dafd7f363ece0/openstackclient/compute/v2/server.py#L3452-L3453

In which it is unclear to me what exactly will become an error in a
future release. Rebuild with a different image? Or any rebuild with
microversion <2.93?

Then old nova microversions may get dropped. Though what I heard from
nova folks, this is unlikely to happen.

Then there are a few hypothetical situations like:
a) Rebuild gets a new api feature (in a new microversion) which can
never be combined with the do-not-reimage behavior.
b) Rebuild may have a bug, whose fix requires a microversion bump.
This again can never be combined with the old behavior.

What do you think, are these concerns purely theoretical or real?
If we would like to keep having rebuild without reimaging, can we rely
on the old microversion indefinitely?
Alternatively shall we propose and implement a nova spec to explicitly
expose the choice in the rebuild api (just to express the idea: osc
server rebuild --reimage|--no-reimage)?

If the topic is worth further discussion beyond the ML, I can also
bring it to the nova ptg.

Thanks in advance,
Bence Romsics (rubasov)

ps: I'll be afk for a few days, but I'll follow up next Tuesday.


From nguyenhuukhoinw at gmail.com  Wed Mar 15 13:20:49 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 15 Mar 2023 20:20:49 +0700
Subject: Magnum in yoga release on Ubuntu 22.04
In-Reply-To: <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>
References: <30941678692580@mail.yandex.ru>
 <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>
Message-ID: <CABAODReGtcUZzQni6F7-qfmZp95_TWq0HRveRQhJyv-axEQ-VA@mail.gmail.com>

This is image error. Not magnum

On Wed, Mar 15, 2023, 7:34 PM Michal Arbet <michal.arbet at ultimum.io> wrote:

> Hi,
>
> You can't import qcow and mark it as raw, before upload to glance you have
> to convert qcow2 to raw.
>
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> --disk-format=raw --container-format=bare --property
> os_distro='fedora-coreos' --public
>
> to
>
> qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> fedora-coreos-35.20220313.3.1-openstack.x86_64.raw
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw
> --container-format=bare --property os_distro='fedora-coreos' --public
>
> Hope it will help you.
>
> Regards,
> Michal Arbet
> Openstack Engineer
>
> Ultimum Technologies a.s.
> Na Po???? 1047/26, 11000 Praha 1
> Czech Republic
>
> +420 604 228 897
> michal.arbet at ultimum.io
> *https://ultimum.io <https://ultimum.io/>*
>
> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
> <https://twitter.com/ultimumtech> | Facebook
> <https://www.facebook.com/ultimumtechnologies/timeline>
>
>
> po 13. 3. 2023 v 13:25 odes?latel ??????? ???? <mister.mackarow at yandex.ru>
> napsal:
>
>> Hello, openstack team! Please help me! I'm trying to use magnum in the
>> yoga release on ubuntu 22.04 I can't understand why it doesn't work, when
>> creating a container I get an error
>> "
>> 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver
>> [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack
>> status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b,
>> reason:Resource CREATE failed: ResourceInError:
>> resources.kube_masters.resources[0].resources.kube-master:
>> Went to status ERROR due to "Message: Build of instance
>> 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image
>> 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw
>> format, Code: 500"
>> "
>> I downloaded the container image and tried to create it in raw with the
>> command and create a new Cluster template
>> "
>> openstack image create Fedora-CoreOS
>> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
>> --disk-format=raw --container-format=bare --property
>> os_distro='fedora-coreos' --public
>> "
>> and create a new Cluster template, when I tried to create a cluster, it
>> had the status CREATE_IN_PROGRESS for a very long time I got an error
>> "
>> 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver
>> [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack
>> status:
>> CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason:
>> Timed out=
>> "
>> why it doesn't work?  I saw a massage on your website.
>>
>> Does this mean that magnum only works in the Zed release?
>> --
>> With respect,
>> Makarov Maxim
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/3c4f3483/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/3c4f3483/attachment.png>

From mister.mackarow at yandex.ru  Wed Mar 15 13:35:04 2023
From: mister.mackarow at yandex.ru (=?utf-8?B?0JzQkNCa0JDQoNCe0JIg0JzQkNCa0KE=?=)
Date: Wed, 15 Mar 2023 16:35:04 +0300
Subject: Magnum in yoga release on Ubuntu 22.04
In-Reply-To: <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>
References: <30941678692580@mail.yandex.ru>
 <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>
Message-ID: <401541678887257@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/21d4821d/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/21d4821d/attachment-0001.png>

From nguyenhuukhoinw at gmail.com  Wed Mar 15 13:49:41 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 15 Mar 2023 20:49:41 +0700
Subject: Magnum in yoga release on Ubuntu 22.04
In-Reply-To: <401541678887257@mail.yandex.ru>
References: <30941678692580@mail.yandex.ru>
 <CAKz7_JT0yMcU+cSCLNBg-6VB8xMFqyW9qSEELYDtRBqWc19Urg@mail.gmail.com>
 <401541678887257@mail.yandex.ru>
Message-ID: <CABAODReXSCSLUBH6XPNUZwZY9cnCUy1dPe1PQnxyCb96wySRjw@mail.gmail.com>

Just forcus heat log in k8s master instance. It is all you need to specify
problems.

On Wed, Mar 15, 2023, 8:46 PM ??????? ???? <mister.mackarow at yandex.ru>
wrote:

> Thanks for the answer! I solved this problem by specifying in the settings
> nova.conf image-type = qcow2 now I can use qcow2, but unfortunately the
> cluster does not create, it crashes with a time_out error, if I run heat
> stack-list I see that kube_masters create in progress, now I'm stuck at
> this step
>
> 15.03.2023, 15:28, "Michal Arbet" <michal.arbet at ultimum.io>:
>
> Hi,
>
> You can't import qcow and mark it as raw, before upload to glance you have
> to convert qcow2 to raw.
>
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> --disk-format=raw --container-format=bare --property
> os_distro='fedora-coreos' --public
>
> to
>
> qemu-img convert fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> fedora-coreos-35.20220313.3.1-openstack.x86_64.raw
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.raw --disk-format=raw
> --container-format=bare --property os_distro='fedora-coreos' --public
>
> Hope it will help you.
>
> Regards,
> *Michal Arbet*
> Openstack Engineer
>
> Ultimum Technologies a.s.
> Na Po???? 1047/26, 11000 Praha 1
> Czech Republic
>
> +420 604 228 897
> michal.arbet at ultimum.io
> *https://ultimum.io <https://ultimum.io/>*
>
> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter
> <https://twitter.com/ultimumtech> | Facebook
> <https://www.facebook.com/ultimumtechnologies/timeline>
>
>
>
>
> po 13. 3. 2023 v 13:25 odes?latel ??????? ???? <mister.mackarow at yandex.ru>
> napsal:
>
> Hello, openstack team! Please help me! I'm trying to use magnum in the
> yoga release on ubuntu 22.04 I can't understand why it doesn't work, when
> creating a container I get an error
> "
> 2023-03-13 09:07:38.090 1507357 ERROR magnum.drivers.heat.driver
> [req-b0b05017-af6b-4f4c-bf2c-003b34f17ba0 - - - - -]Nodegroup error, stack
> status: CREATE_FAILED, stack_id: 54f98fee-c070-40f9b337-b9b6df49e73b,
> reason:Resource CREATE failed: ResourceInError:
> resources.kube_masters.resources[0].resources.kube-master:
> Went to status ERROR due to "Message: Build of instance
> 912d1432-c692-4569-a511-3fb9291c97dc aborted: Image
> 54c026f6-6ff7-4be6-b76e-a0732d9b8814 is unacceptable: Image is not raw
> format, Code: 500"
> "
> I downloaded the container image and tried to create it in raw with the
> command and create a new Cluster template
> "
> openstack image create Fedora-CoreOS
> --file=fedora-coreos-35.20220313.3.1-openstack.x86_64.qcow2
> --disk-format=raw --container-format=bare --property
> os_distro='fedora-coreos' --public
> "
> and create a new Cluster template, when I tried to create a cluster, it
> had the status CREATE_IN_PROGRESS for a very long time I got an error
> "
> 2023-03-13 10:21:58.110 1507357 ERROR magnum.drivers.heat.driver
> [req-29753ae4-b9c3-4602-ba8e-acf9a0d024c5 - - - - -] Nodegroup error, stack
> status:
> CREATE_FAILED, stack_id: 5958bd59-c6f9-464d-a4f7-ddd530fdf804, reason:
> Timed out=
> "
> why it doesn't work?  I saw a massage on your website.
>
> Does this mean that magnum only works in the Zed release?
> --
> With respect,
> Makarov Maxim
>
>
>
>
> --
> With respect,
> Makarov Maxim
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/61e4cbae/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/61e4cbae/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 4048 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/61e4cbae/attachment-0003.png>

From felix.huettner at mail.schwarz  Wed Mar 15 16:10:44 2023
From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=)
Date: Wed, 15 Mar 2023 16:10:44 +0000
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <2315188.ElGaqSPkdT@p1>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <2315188.ElGaqSPkdT@p1>
Message-ID: <DU0PR10MB52445364762EC3A76C94C4FBEABF9@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>

Hi,

> Subject: Re: [neutron] detecting l3-agent readiness
>
> Hi,
>
> Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze:
> > Hi Mohammed,
> >
> > > Subject: [neutron] detecting l3-agent readiness
> > >
> > > Hi folks,
> > >
> > > I'm working on improving the stability of rollouts when using Kubernetes as a control
> plane, specifically around the L3 agent, it seems that I have not found a clear way to
> detect in the code path where the L3 agent has finished it's initial sync..
> > >
> >
> > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-
> /blob/devel/files/startup_wait_for_ns.py
> > Basically we are checking against the neutron api what routers should be on the node and
> then validate that all keepalived processes are up and running.
>
> That would work only for HA routers. If You would also have routers which aren't "ha" this
> method may fail.
>

Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived).

> >
> > > Am I missing it somewhere or is the architecture built in a way that doesn't really
> answer that question?
> > >
> >
> > Adding a option in the neutron api would be a lot nicer. But i guess that also counts
> for l2 and dhcp agents.
> >
> >
> > > Thanks
> > > Mohammed
> > >
> > >
> > > --
> > > Mohammed Naser
> > > VEXXHOST, Inc.
> >
> > --
> > Felix Huettner
> > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung
> durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger
> sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail.
> Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.
> >
>
>
> --
> Slawek Kaplonski
> Principal Software Engineer
> Red Hat

--
Felix Huettner
Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.

From adivya1.singh at gmail.com  Wed Mar 15 16:48:22 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Wed, 15 Mar 2023 22:18:22 +0530
Subject: (OpenStack-horizon) unable to open horizon page after installing Open
 Stack
Message-ID: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>

Hi Team,

I am unable to open Open OpenStack horizon page, after installation
When i am opening the link , it says

Haproxy service seems up and running, I have tried to Flush IP tables also,
Seeing this might be causing the issue

Port 443 is also listening.

Any thoughts on this

[image: image.png]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/d489426f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9294 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/d489426f/attachment.png>

From rdopiera at redhat.com  Wed Mar 15 17:37:14 2023
From: rdopiera at redhat.com (Radomir Dopieralski)
Date: Wed, 15 Mar 2023 18:37:14 +0100
Subject: (OpenStack-horizon) unable to open horizon page after installing
 Open Stack
In-Reply-To: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>
References: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>
Message-ID: <CAF_JR37ER-MoaO8OMKRdEL3ZT3Fokyeh=2PnP7R7J+guW4ET-A@mail.gmail.com>

try /dashboard

On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh <adivya1.singh at gmail.com>
wrote:

> Hi Team,
>
> I am unable to open Open OpenStack horizon page, after installation
> When i am opening the link , it says
>
> Haproxy service seems up and running, I have tried to Flush IP tables
> also, Seeing this might be causing the issue
>
> Port 443 is also listening.
>
> Any thoughts on this
>
> [image: image.png]
>


-- 
Radomir Dopieralski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/28f15ceb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9294 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/28f15ceb/attachment-0001.png>

From swogatpradhan22 at gmail.com  Wed Mar 15 17:41:45 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 15 Mar 2023 23:11:45 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
Message-ID: <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>

Hi Brendan,
Now i have deployed another site where i have used 2 linux bonds network
template for both 3 compute nodes and 3 ceph nodes.
The bonding options is set to mode=802.3ad (lacp=active).
I used a cirros image to launch instance but the instance timed out so i
waited for the volume to be created.
Once the volume was created i tried launching the instance from the volume
and still the instance is stuck in spawning state.

Here is the nova-compute log:

2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon
starting
2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process
running with uid/gid: 0/0
2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process
running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon
running as pid 185437
2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof
[req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
in _get_host_uuid: Unexpected error while running command.
Command: blkid overlay -s UUID -o value
Exit code: 2
Stdout: ''
Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected
error while running command.
2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
[req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - default default] [instance:
450b749c-a10a-4308-80a9-3b8020fee758] Creating image

It is stuck in creating image, do i need to run the template mentioned here
?:
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html

The volume is already created and i do not understand why the instance is
stuck in spawning state.

With regards,
Swogat Pradhan


On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com> wrote:

> Does your environment use different network interfaces for each of the
> networks? Or does it have a bond with everything on it?
>
> One issue I have seen before is that when launching instances, there is a
> lot of network traffic between nodes as the hypervisor needs to download
> the image from Glance. Along with various other services sending normal
> network traffic, it can be enough to cause issues if everything is running
> over a single 1Gbe interface.
>
> I have seen the same situation in fact when using a single active/backup
> bond on 1Gbe nics. It?s worth checking the network traffic while you try to
> spawn the instance to see if you?re dropping packets. In the situation I
> described, there were dropped packets which resulted in a loss of
> communication between nova_compute and RMQ, so the node appeared offline.
> You should also confirm that nova_compute is being disconnected in the
> nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
>
> In my case, changing from active/backup to LACP helped. So, based on that
> experience, from my perspective, is certainly sounds like some kind of
> network issue.
>
> Regards,
>
> Brendan Shephard
> Senior Software Engineer
> Red Hat Australia
>
>
>
> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>
> Hi,
>
> I tried to help someone with a similar issue some time ago in this thread:
>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>
> But apparently a neutron reinstallation fixed it for that user, not sure
> if that could apply here. But is it possible that your nova and neutron
> versions are different between central and edge site? Have you restarted
> nova and neutron services on the compute nodes after installation? Have you
> debug logs of nova-conductor and maybe nova-compute? Maybe they can help
> narrow down the issue.
> If there isn't any additional information in the debug logs I probably
> would start "tearing down" rabbitmq. I didn't have to do that in a
> production system yet so be careful. I can think of two routes:
>
> - Either remove queues, exchanges etc. while rabbit is running, this will
> most likely impact client IO depending on your load. Check out the
> rabbitmqctl commands.
> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes
> and restart rabbitmq so the exchanges, queues etc. rebuild.
>
> I can imagine that the failed reply "survives" while being replicated
> across the rabbit nodes. But I don't really know the rabbit internals too
> well, so maybe someone else can chime in here and give a better advice.
>
> Regards,
> Eugen
>
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>
> Hi,
> Can someone please help me out on this issue?
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
> Hi
> I don't see any major packet loss.
> It seems the problem is somewhere in rabbitmq maybe but not due to packet
> loss.
>
> with regards,
> Swogat Pradhan
>
> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
> Hi,
> Yes the MTU is the same as the default '1500'.
> Generally I haven't seen any packet loss, but never checked when
> launching the instance.
> I will check that and come back.
> But everytime i launch an instance the instance gets stuck at spawning
> state and there the hypervisor becomes down, so not sure if packet loss
> causes this.
>
> With regards,
> Swogat pradhan
>
> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>
> One more thing coming to mind is MTU size. Are they identical between
> central and edge site? Do you see packet loss through the tunnel?
>
> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>
> > Hi Eugen,
> > Request you to please add my email either on 'to' or 'cc' as i am not
> > getting email's from you.
> > Coming to the issue:
> >
> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
> /
> > Listing policies for vhost "/" ...
> > vhost   name    pattern apply-to        definition      priority
> > /       ha-all  ^(?!amq\.).*    queues
> >
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >
> > I have the edge site compute nodes up, it only goes down when i am
> trying
> > to launch an instance and the instance comes to a spawning state and
> then
> > gets stuck.
> >
> > I have a tunnel setup between the central and the edge sites.
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> > wrote:
> >
> >> Hi Eugen,
> >> For some reason i am not getting your email to me directly, i am
> checking
> >> the email digest and there i am able to find your reply.
> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> >> Yes, these logs are from the time when the issue occurred.
> >>
> >> *Note: i am able to create vm's and perform other activities in the
> >> central site, only facing this issue in the edge site.*
> >>
> >> With regards,
> >> Swogat Pradhan
> >>
> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >> wrote:
> >>
> >>> Hi Eugen,
> >>> Thanks for your response.
> >>> I have actually a 4 controller setup so here are the details:
> >>>
> >>> *PCS Status:*
> >>>   * Container bundle set: rabbitmq-bundle [
> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-no-ceph-3
> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-2
> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-1
> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
> Started
> >>> overcloud-controller-0
> >>>
> >>> I have tried restarting the bundle multiple times but the issue is
> still
> >>> present.
> >>>
> >>> *Cluster status:*
> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> >>> Cluster status of node
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> >>> Basics
> >>>
> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>
> >>> Disk Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Running Nodes
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>
> >>> Versions
> >>>
> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
> 3.8.3
> >>> on Erlang 22.3.4.1
> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> RabbitMQ
> >>> 3.8.3 on Erlang 22.3.4.1
> >>>
> >>> Alarms
> >>>
> >>> (none)
> >>>
> >>> Network Partitions
> >>>
> >>> (none)
> >>>
> >>> Listeners
> >>>
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
> tool
> >>> communication
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> interface:
> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: [::], port: 25672, protocol: clustering, purpose:
> inter-node and
> >>> CLI tool communication
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>> and AMQP 1.0
> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> ,
> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
> >>>
> >>> Feature flags
> >>>
> >>> Flag: drop_unroutable_metric, state: enabled
> >>> Flag: empty_basic_get_metric, state: enabled
> >>> Flag: implicit_default_bindings, state: enabled
> >>> Flag: quorum_queue, state: enabled
> >>> Flag: virtual_host_metadata, state: enabled
> >>>
> >>> *Logs:*
> >>> *(Attached)*
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>> Please find the nova conductor as well as nova api log.
> >>>>
> >>>> nova-conuctor:
> >>>>
> >>>> 2023-02-26 08:45:01.108 31 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> 2023-02-26 08:45:02.144 26 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> 2023-02-26 08:45:02.314 32 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.282 35 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.303 33 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> b240e3e89d99489284cd731e75f2a5db
> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> with
> >>>> backend dogpile.cache.null.
> >>>> 2023-02-26 08:50:01.264 27 WARNING
> oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
> due to a
> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> Abandoning...:
> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>
> >>>> With regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
> >>>>> launch vm's.
> >>>>> When the VM is in spawning state the node goes down (openstack
> compute
> >>>>> service list), the node comes backup when i restart the nova
> compute
> >>>>> service but then the launch of the vm fails.
> >>>>>
> >>>>> nova-compute.log
> >>>>>
> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
> >>>>> instance usage
> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
> to
> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
> >>>>> dcn01-hci-0.bdxworld.com
> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
> name:
> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> with
> >>>>> backend dogpile.cache.null.
> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
> >>>>> privsep helper:
> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> 'privsep-helper',
> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
> privsep
> >>>>> daemon via rootwrap
> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon starting
> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with uid/gid: 0/0
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> process running with capabilities (eff/prm/inh):
> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
> >>>>> daemon running as pid 2647
> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> os_brick.initiator.connectors.nvmeof
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
> >>>>> execution error
> >>>>> in _get_host_uuid: Unexpected error while running command.
> >>>>> Command: blkid overlay -s UUID -o value
> >>>>> Exit code: 2
> >>>>> Stdout: ''
> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> >>>>> Unexpected error while running command.
> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>>>
> >>>>> Is there a way to solve this issue?
> >>>>>
> >>>>>
> >>>>> With regards,
> >>>>>
> >>>>> Swogat Pradhan
> >>>>>
> >>>>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/3e0ed962/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Wed Mar 15 17:43:42 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 15 Mar 2023 23:13:42 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
Message-ID: <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>

Update:
In the hypervisor list the compute node state is showing down.


On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Brendan,
> Now i have deployed another site where i have used 2 linux bonds network
> template for both 3 compute nodes and 3 ceph nodes.
> The bonding options is set to mode=802.3ad (lacp=active).
> I used a cirros image to launch instance but the instance timed out so i
> waited for the volume to be created.
> Once the volume was created i tried launching the instance from the volume
> and still the instance is stuck in spawning state.
>
> Here is the nova-compute log:
>
> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon
> starting
> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
> process running with uid/gid: 0/0
> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
> process running with capabilities (eff/prm/inh):
> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon
> running as pid 185437
> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
> in _get_host_uuid: Unexpected error while running command.
> Command: blkid overlay -s UUID -o value
> Exit code: 2
> Stdout: ''
> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> Unexpected error while running command.
> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>
> It is stuck in creating image, do i need to run the template mentioned
> here ?:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>
> The volume is already created and i do not understand why the instance is
> stuck in spawning state.
>
> With regards,
> Swogat Pradhan
>
>
> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com>
> wrote:
>
>> Does your environment use different network interfaces for each of the
>> networks? Or does it have a bond with everything on it?
>>
>> One issue I have seen before is that when launching instances, there is a
>> lot of network traffic between nodes as the hypervisor needs to download
>> the image from Glance. Along with various other services sending normal
>> network traffic, it can be enough to cause issues if everything is running
>> over a single 1Gbe interface.
>>
>> I have seen the same situation in fact when using a single active/backup
>> bond on 1Gbe nics. It?s worth checking the network traffic while you try to
>> spawn the instance to see if you?re dropping packets. In the situation I
>> described, there were dropped packets which resulted in a loss of
>> communication between nova_compute and RMQ, so the node appeared offline.
>> You should also confirm that nova_compute is being disconnected in the
>> nova_compute logs if you tail them on the Hypervisor while spawning the
>> instance.
>>
>> In my case, changing from active/backup to LACP helped. So, based on that
>> experience, from my perspective, is certainly sounds like some kind of
>> network issue.
>>
>> Regards,
>>
>> Brendan Shephard
>> Senior Software Engineer
>> Red Hat Australia
>>
>>
>>
>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>
>> Hi,
>>
>> I tried to help someone with a similar issue some time ago in this thread:
>>
>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>
>> But apparently a neutron reinstallation fixed it for that user, not sure
>> if that could apply here. But is it possible that your nova and neutron
>> versions are different between central and edge site? Have you restarted
>> nova and neutron services on the compute nodes after installation? Have you
>> debug logs of nova-conductor and maybe nova-compute? Maybe they can help
>> narrow down the issue.
>> If there isn't any additional information in the debug logs I probably
>> would start "tearing down" rabbitmq. I didn't have to do that in a
>> production system yet so be careful. I can think of two routes:
>>
>> - Either remove queues, exchanges etc. while rabbit is running, this will
>> most likely impact client IO depending on your load. Check out the
>> rabbitmqctl commands.
>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes
>> and restart rabbitmq so the exchanges, queues etc. rebuild.
>>
>> I can imagine that the failed reply "survives" while being replicated
>> across the rabbit nodes. But I don't really know the rabbit internals too
>> well, so maybe someone else can chime in here and give a better advice.
>>
>> Regards,
>> Eugen
>>
>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>
>> Hi,
>> Can someone please help me out on this issue?
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>> Hi
>> I don't see any major packet loss.
>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>> loss.
>>
>> with regards,
>> Swogat Pradhan
>>
>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>> Hi,
>> Yes the MTU is the same as the default '1500'.
>> Generally I haven't seen any packet loss, but never checked when
>> launching the instance.
>> I will check that and come back.
>> But everytime i launch an instance the instance gets stuck at spawning
>> state and there the hypervisor becomes down, so not sure if packet loss
>> causes this.
>>
>> With regards,
>> Swogat pradhan
>>
>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>
>> One more thing coming to mind is MTU size. Are they identical between
>> central and edge site? Do you see packet loss through the tunnel?
>>
>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>
>> > Hi Eugen,
>> > Request you to please add my email either on 'to' or 'cc' as i am not
>> > getting email's from you.
>> > Coming to the issue:
>> >
>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>> /
>> > Listing policies for vhost "/" ...
>> > vhost   name    pattern apply-to        definition      priority
>> > /       ha-all  ^(?!amq\.).*    queues
>> >
>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>> >
>> > I have the edge site compute nodes up, it only goes down when i am
>> trying
>> > to launch an instance and the instance comes to a spawning state and
>> then
>> > gets stuck.
>> >
>> > I have a tunnel setup between the central and the edge sites.
>> >
>> > With regards,
>> > Swogat Pradhan
>> >
>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> > wrote:
>> >
>> >> Hi Eugen,
>> >> For some reason i am not getting your email to me directly, i am
>> checking
>> >> the email digest and there i am able to find your reply.
>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>> >> Yes, these logs are from the time when the issue occurred.
>> >>
>> >> *Note: i am able to create vm's and perform other activities in the
>> >> central site, only facing this issue in the edge site.*
>> >>
>> >> With regards,
>> >> Swogat Pradhan
>> >>
>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >> wrote:
>> >>
>> >>> Hi Eugen,
>> >>> Thanks for your response.
>> >>> I have actually a 4 controller setup so here are the details:
>> >>>
>> >>> *PCS Status:*
>> >>>   * Container bundle set: rabbitmq-bundle [
>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>> Started
>> >>> overcloud-controller-no-ceph-3
>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>> Started
>> >>> overcloud-controller-2
>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>> Started
>> >>> overcloud-controller-1
>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>> Started
>> >>> overcloud-controller-0
>> >>>
>> >>> I have tried restarting the bundle multiple times but the issue is
>> still
>> >>> present.
>> >>>
>> >>> *Cluster status:*
>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>> >>> Cluster status of node
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> >>> Basics
>> >>>
>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>> >>>
>> >>> Disk Nodes
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>
>> >>> Running Nodes
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>
>> >>> Versions
>> >>>
>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>> 3.8.3
>> >>> on Erlang 22.3.4.1
>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>> RabbitMQ
>> >>> 3.8.3 on Erlang 22.3.4.1
>> >>>
>> >>> Alarms
>> >>>
>> >>> (none)
>> >>>
>> >>> Network Partitions
>> >>>
>> >>> (none)
>> >>>
>> >>> Listeners
>> >>>
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>> tool
>> >>> communication
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> interface:
>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> ,
>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>> inter-node and
>> >>> CLI tool communication
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> ,
>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>> 0-9-1
>> >>> and AMQP 1.0
>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> ,
>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>
>> >>> Feature flags
>> >>>
>> >>> Flag: drop_unroutable_metric, state: enabled
>> >>> Flag: empty_basic_get_metric, state: enabled
>> >>> Flag: implicit_default_bindings, state: enabled
>> >>> Flag: quorum_queue, state: enabled
>> >>> Flag: virtual_host_metadata, state: enabled
>> >>>
>> >>> *Logs:*
>> >>> *(Attached)*
>> >>>
>> >>> With regards,
>> >>> Swogat Pradhan
>> >>>
>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>> Please find the nova conductor as well as nova api log.
>> >>>>
>> >>>> nova-conuctor:
>> >>>>
>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 16152921c1eb45c2b1f562087140168b
>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> >>>> 83dbe5f567a940b698acfe986f6194fa
>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>> due to a
>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>> due to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>> due to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> b240e3e89d99489284cd731e75f2a5db
>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>> with
>> >>>> backend dogpile.cache.null.
>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>> oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>> due to a
>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> Abandoning...:
>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>
>> >>>> With regards,
>> >>>> Swogat Pradhan
>> >>>>
>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>> >>>> swogatpradhan22 at gmail.com> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>> >>>>> launch vm's.
>> >>>>> When the VM is in spawning state the node goes down (openstack
>> compute
>> >>>>> service list), the node comes backup when i restart the nova
>> compute
>> >>>>> service but then the launch of the vm fails.
>> >>>>>
>> >>>>> nova-compute.log
>> >>>>>
>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>> >>>>> instance usage
>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>> to
>> >>>>> 2023-02-26 08:00:00. 0 instances.
>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>> >>>>> dcn01-hci-0.bdxworld.com
>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>> name:
>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>> with
>> >>>>> backend dogpile.cache.null.
>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>> >>>>> privsep helper:
>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>> 'privsep-helper',
>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>> privsep
>> >>>>> daemon via rootwrap
>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> daemon starting
>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> process running with uid/gid: 0/0
>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> process running with capabilities (eff/prm/inh):
>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>> >>>>> daemon running as pid 2647
>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>> os_brick.initiator.connectors.nvmeof
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>> >>>>> execution error
>> >>>>> in _get_host_uuid: Unexpected error while running command.
>> >>>>> Command: blkid overlay -s UUID -o value
>> >>>>> Exit code: 2
>> >>>>> Stdout: ''
>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>> >>>>> Unexpected error while running command.
>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>> >>>>>
>> >>>>> Is there a way to solve this issue?
>> >>>>>
>> >>>>>
>> >>>>> With regards,
>> >>>>>
>> >>>>> Swogat Pradhan
>> >>>>>
>> >>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/6c697b12/attachment-0001.htm>

From adivya1.singh at gmail.com  Wed Mar 15 17:56:56 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Wed, 15 Mar 2023 23:26:56 +0530
Subject: (OpenStack-horizon) unable to open horizon page after installing
 Open Stack
In-Reply-To: <CAF_JR37ER-MoaO8OMKRdEL3ZT3Fokyeh=2PnP7R7J+guW4ET-A@mail.gmail.com>
References: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>
 <CAF_JR37ER-MoaO8OMKRdEL3ZT3Fokyeh=2PnP7R7J+guW4ET-A@mail.gmail.com>
Message-ID: <CA+ykd63gNFjV=9M5dR6Hc6eaYFoQCP6P=mTJ=Qf2QNnVaRi7jA@mail.gmail.com>

Same result

On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski <rdopiera at redhat.com>
wrote:

> try /dashboard
>
> On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh <adivya1.singh at gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I am unable to open Open OpenStack horizon page, after installation
>> When i am opening the link , it says
>>
>> Haproxy service seems up and running, I have tried to Flush IP tables
>> also, Seeing this might be causing the issue
>>
>> Port 443 is also listening.
>>
>> Any thoughts on this
>>
>> [image: image.png]
>>
>
>
> --
> Radomir Dopieralski
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/3bbbb1ca/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9294 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/3bbbb1ca/attachment.png>

From sbauza at redhat.com  Wed Mar 15 18:28:58 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 15 Mar 2023 19:28:58 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
Message-ID: <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>

Le mer. 15 mars 2023 ? 13:45, Bence Romsics <bence.romsics at gmail.com> a
?crit :

> Hi All!
>
> We have users who use 'rebuild' on volume booted servers before nova
> microversion 2.93, relying on the behavior that it keeps the volume as
> is. And they would like to keep doing this even after the openstack
> distro moves to a(n at least) zed base (sometime in the future).
>
> As a naive user, it seems to me both behaviors make sense. I can
> easily imagine use cases for rebuild with and without reimaging.
> However since the implementation of
>
> https://specs.openstack.org/openstack/nova-specs/specs/zed/implemented/volume-backed-server-rebuild.html
> rebuild without reimaging is only possible using an old microversion
> (<2.93). With that change merged, rebuild without reimaging seems to
> be a somewhat less than fully supported feature. A few examples of
> what I mean by that:
>
>
That's not really true : the new microversion just means we change the
default behaviour, but you can still opt into the previous behaviour by
requesting an older microversion.
That being said, I do understand your concerns, further below.


> First, there's this warning:
>
> https://opendev.org/openstack/python-openstackclient/src/commit/5eb89e4ca1cebad9245c27d58a0dafd7f363ece0/openstackclient/compute/v2/server.py#L3452-L3453
>
> In which it is unclear to me what exactly will become an error in a
> future release. Rebuild with a different image? Or any rebuild with
> microversion <2.93?
>
>
The latter (in theory) : if you opt into a microversion older or equal than
2.93, you shouldn't expect your volume to *not* be rebuilt.

Then old nova microversions may get dropped. Though what I heard from
> nova folks, this is unlikely to happen.
>
>
Correct, I never want to say never, but we don't have any plans in any
subsequent futures to bump the minimum versions, for many many reasons, not
only due to the tech debt but also and mainly because of the
interoperatibility we must guarantee.


> Then there are a few hypothetical situations like:
> a) Rebuild gets a new api feature (in a new microversion) which can
> never be combined with the do-not-reimage behavior.
> b) Rebuild may have a bug, whose fix requires a microversion bump.
> This again can never be combined with the old behavior.
>
> What do you think, are these concerns purely theoretical or real?
> If we would like to keep having rebuild without reimaging, can we rely
> on the old microversion indefinitely?
> Alternatively shall we propose and implement a nova spec to explicitly
> expose the choice in the rebuild api (just to express the idea: osc
> server rebuild --reimage|--no-reimage)?
>

I'm not opposed to challenge the usecases in a spec, for sure.


>
> If the topic is worth further discussion beyond the ML, I can also
> bring it to the nova ptg.
>

That's already the case. Add yourself to the courtesy ping list of that
topic.
https://etherpad.opendev.org/p/nova-bobcat-ptg#L152

-Sylvain


>
> Thanks in advance,
> Bence Romsics (rubasov)
>
> ps: I'll be afk for a few days, but I'll follow up next Tuesday.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/6acdff5b/attachment-0001.htm>

From alsotoes at gmail.com  Wed Mar 15 18:33:47 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Wed, 15 Mar 2023 12:33:47 -0600
Subject: [manila] create snapshot from share not permitted
In-Reply-To: <CADA6Efs6eeKx-o9tXAGfRaKfeA6NAD1ENWd8WfSqAtg1RFkVSQ@mail.gmail.com>
References: <CADA6EfuOvZHJ8GNcVN4d1KXvAxEExReA0xQTOyJmRhDMpgeC1A@mail.gmail.com>
 <CA+eLJkYCcwxNbe4k5DWKPbru6g3A3_Ba57iG+uyirz2oAPepWA@mail.gmail.com>
 <CADA6Efs6eeKx-o9tXAGfRaKfeA6NAD1ENWd8WfSqAtg1RFkVSQ@mail.gmail.com>
Message-ID: <CA+eLJkYsJg1FC707iyBfckYxFL7B06R3UxnjaxdKRL8eW=h+Qg@mail.gmail.com>

So you can test and get fresh log outputs, please do the following

manila create <PARAMS>
manila list
manila show <PARAMS>

Then go to https://pastebin.com/ and paste the CLI output and errors you
see on the manila log files

Friendly reminder: please click on 'reply all' so this thread can help more
people

Cheers!

On Tue, Mar 14, 2023 at 1:53?AM garcetto <garcetto at gmail.com> wrote:

> seems enabled...dont know why is not working...
>
> $ manila pool-list --detail
>
> +--------------------------------------+----------------------------------------------+
> | Property                             | Value
>            |
>
> +--------------------------------------+----------------------------------------------+
> | name                                 | ostack-test at generic#GENERIC
>            |
> | share_backend_name                   | GENERIC
>            |
> | driver_handles_share_servers         | True
>             |
> | vendor_name                          | Open Source
>            |
> | driver_version                       | 1.0
>            |
> | storage_protocol                     | NFS_CIFS
>             |
> | total_capacity_gb                    | unknown
>            |
> | free_capacity_gb                     | unknown
>            |
> | reserved_percentage                  | 0
>            |
> | reserved_snapshot_percentage         | 0
>            |
> | reserved_share_extend_percentage     | 0
>            |
> | qos                                  | False
>            |
> | pools                                | None
>             |
> | snapshot_support                     | True
>             |
> | create_share_from_snapshot_support   | True
>             |
> | revert_to_snapshot_support           | False
>            |
> | mount_snapshot_support               | False
>            |
> | replication_domain                   | None
>             |
> | filter_function                      | None
>             |
> | goodness_function                    | None
>             |
> | security_service_update_support      | False
>            |
> | network_allocation_update_support    | False
>            |
> | share_server_multiple_subnet_support | False
>            |
> | max_shares_per_share_server          | -1
>             |
> | max_share_server_size                | -1
>             |
> | share_group_stats                    | {'consistent_snapshot_support':
> None}        |
> | ipv4_support                         | True
>             |
> | ipv6_support                         | False
>            |
> | server_pools_mapping                 |
> {'2242964f-be38-4f11-8c90-cdcfcd20c20a': []} |
> | timestamp                            | 2023-03-13T09:13:06.331713
>             |
>
> +--------------------------------------+----------------------------------------------+
>
>
> On Mon, Mar 13, 2023 at 11:45?PM Alvaro Soto <alsotoes at gmail.com> wrote:
>
>> If you are inside this features support matrix
>>
>>
>> https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping
>>
>> Examine your configuration as well:
>>
>>
>>    -
>>
>>    snapshot_support indicates whether snapshots are supported for shares
>>    created on the pool/backend. When administrators do not set this capability
>>    as an extra-spec in a share type, the scheduler can place new shares of
>>    that type in pools without regard for whether snapshots are supported, and
>>    those shares will not support snapshots.
>>
>>
>> https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html
>>
>> Cheers!
>>
>> On Mon, Mar 13, 2023 at 3:35?AM garcetto <garcetto at gmail.com> wrote:
>>
>>> good morning,
>>> i am using manila and generic driver with dhss true, but cannot create
>>> snapshot from shares, any help? where can i look at?
>>> (cinder backend is a linux nfs server)
>>>
>>> thank you
>>>
>>> $ manila snapshot-create share-01 --name Snapshot1
>>> ERROR: Snapshots cannot be created for share
>>> '2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that
>>> capability. (HTTP 422) (Request-ID:
>>> req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba)
>>>
>>>
>>
>> --
>>
>> Alvaro Soto
>>
>> *Note: My work hours may not be your work hours. Please do not feel the
>> need to respond during a time that is not convenient for you.*
>> ----------------------------------------------------------
>> Great people talk about ideas,
>> ordinary people talk about things,
>> small people talk... about other people.
>>
>

-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/bf7b54a9/attachment.htm>

From dms at danplanet.com  Wed Mar 15 18:54:32 2023
From: dms at danplanet.com (Dan Smith)
Date: Wed, 15 Mar 2023 11:54:32 -0700
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 (Sylvain Bauza's message of "Wed, 15 Mar 2023 19:28:58 +0100")
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
Message-ID: <m28rfxdevr.fsf@caffeine.hv.danplanet.com>

>  We have users who use 'rebuild' on volume booted servers before nova
>  microversion 2.93, relying on the behavior that it keeps the volume as
>  is. And they would like to keep doing this even after the openstack
>  distro moves to a(n at least) zed base (sometime in the future).

Maybe I'm missing something, but what are the reasons you would want to
rebuild an instance without ... rebuilding it?

I assume it's because you want to redefine the metadata or name or
something. There's a reason why those things are not easily mutable
today, and why we had a lot of discussion on how to make user metadata
mutable on an existing instance in the last cycle. However, I would
really suggest that we not override "recreate the thing" to "maybe
recreate the thing or just update a few fields". Instead, for things we
think really should be mutable on a server at runtime, we should
probably just do that.

Imagine if the way you changed permissions recursively was to run 'rm
-Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
that is (IMHO) what "recreate but don't just change $name" means to a
user.

>  As a naive user, it seems to me both behaviors make sense. I can
>  easily imagine use cases for rebuild with and without reimaging.

I think that's because you're already familiar with the difference. For
users not already in that mindset, I think it probably seems very weird
that rebuild is destructive in one case and not the other.

>  Then there are a few hypothetical situations like:
>  a) Rebuild gets a new api feature (in a new microversion) which can
>  never be combined with the do-not-reimage behavior.
>  b) Rebuild may have a bug, whose fix requires a microversion bump.
>  This again can never be combined with the old behavior.
>
>  What do you think, are these concerns purely theoretical or real?
>  If we would like to keep having rebuild without reimaging, can we rely
>  on the old microversion indefinitely?
>  Alternatively shall we propose and implement a nova spec to explicitly
>  expose the choice in the rebuild api (just to express the idea: osc
>  server rebuild --reimage|--no-reimage)?
>
> I'm not opposed to challenge the usecases in a spec, for sure. 

I really want to know what the use-case is for "rebuild but not
really". And also what "rebuild" means to a user if --no-reimage is
passed. What's being rebuilt? The docs[0] for the API say very clearly:

"This operation recreates the root disk of the server."

That was a lie for volume-backed instances for technical reasons. It was
a bug, not a feature.

I also strongly believe that if we're going to add a "but not
really" flag, it needs to apply to volume-backed and regular instances
identically. Because that's what the change here was doing - unifying
the behavior for a single API operation. Going the other direction does
not seem useful to me.

--Dan

[0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action


From alsotoes at gmail.com  Wed Mar 15 19:17:53 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Wed, 15 Mar 2023 13:17:53 -0600
Subject: (OpenStack-horizon) unable to open horizon page after installing
 Open Stack
In-Reply-To: <CA+ykd63gNFjV=9M5dR6Hc6eaYFoQCP6P=mTJ=Qf2QNnVaRi7jA@mail.gmail.com>
References: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>
 <CAF_JR37ER-MoaO8OMKRdEL3ZT3Fokyeh=2PnP7R7J+guW4ET-A@mail.gmail.com>
 <CA+ykd63gNFjV=9M5dR6Hc6eaYFoQCP6P=mTJ=Qf2QNnVaRi7jA@mail.gmail.com>
Message-ID: <CA+eLJkZrc5F=CZ-S19Mr5THDZTnoeit7pCYm=NJGaCcX8i__ag@mail.gmail.com>

Try to curl from the controller node where the dashboard lives; if it
works, the dashboard is up and running, but maybe the access is behind a
firewall on an upper level and you will need to tunnel your way.

Cheers!

On Wed, Mar 15, 2023 at 12:01?PM Adivya Singh <adivya1.singh at gmail.com>
wrote:

> Same result
>
> On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski <rdopiera at redhat.com>
> wrote:
>
>> try /dashboard
>>
>> On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh <adivya1.singh at gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am unable to open Open OpenStack horizon page, after installation
>>> When i am opening the link , it says
>>>
>>> Haproxy service seems up and running, I have tried to Flush IP tables
>>> also, Seeing this might be causing the issue
>>>
>>> Port 443 is also listening.
>>>
>>> Any thoughts on this
>>>
>>> [image: image.png]
>>>
>>
>>
>> --
>> Radomir Dopieralski
>>
>

-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/d6659968/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9294 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/d6659968/attachment-0001.png>

From gouthampravi at gmail.com  Wed Mar 15 20:27:16 2023
From: gouthampravi at gmail.com (Goutham Pacha Ravi)
Date: Wed, 15 Mar 2023 13:27:16 -0700
Subject: [manila] create snapshot from share not permitted
In-Reply-To: <CA+eLJkYsJg1FC707iyBfckYxFL7B06R3UxnjaxdKRL8eW=h+Qg@mail.gmail.com>
References: <CADA6EfuOvZHJ8GNcVN4d1KXvAxEExReA0xQTOyJmRhDMpgeC1A@mail.gmail.com>
 <CA+eLJkYCcwxNbe4k5DWKPbru6g3A3_Ba57iG+uyirz2oAPepWA@mail.gmail.com>
 <CADA6Efs6eeKx-o9tXAGfRaKfeA6NAD1ENWd8WfSqAtg1RFkVSQ@mail.gmail.com>
 <CA+eLJkYsJg1FC707iyBfckYxFL7B06R3UxnjaxdKRL8eW=h+Qg@mail.gmail.com>
Message-ID: <CAKSuTPZTaDhjSCwzngWuRvThwocGcPzD=XOWRrF48E==WPF7Bg@mail.gmail.com>

On Wed, Mar 15, 2023 at 11:34?AM Alvaro Soto <alsotoes at gmail.com> wrote:
>
> So you can test and get fresh log outputs, please do the following
>
> manila create <PARAMS>
> manila list
> manila show <PARAMS>
>
> Then go to https://pastebin.com/ and paste the CLI output and errors you
see on the manila log files
>
> Friendly reminder: please click on 'reply all' so this thread can help
more people
++ thanks Alvaro!


I'd like to point back to the doc that Alvaro linked in his original
response:
https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html

With "manila pool-list --detail", you are able to see the
backend's capabilities. You would use this information to create share
types. The share type you're using needs to have the extra-spec
"snapshot_support=True". Without it, shares created of that type will not
support snapshots.

>
> Cheers!
>
> On Tue, Mar 14, 2023 at 1:53?AM garcetto <garcetto at gmail.com> wrote:
>>
>> seems enabled...dont know why is not working...
>>
>> $ manila pool-list --detail
>>
+--------------------------------------+----------------------------------------------+
>> | Property                             | Value
             |
>>
+--------------------------------------+----------------------------------------------+
>> | name                                 | ostack-test at generic#GENERIC
           |
>> | share_backend_name                   | GENERIC
             |
>> | driver_handles_share_servers         | True
              |
>> | vendor_name                          | Open Source
             |
>> | driver_version                       | 1.0
             |
>> | storage_protocol                     | NFS_CIFS
              |
>> | total_capacity_gb                    | unknown
             |
>> | free_capacity_gb                     | unknown
             |
>> | reserved_percentage                  | 0
             |
>> | reserved_snapshot_percentage         | 0
             |
>> | reserved_share_extend_percentage     | 0
             |
>> | qos                                  | False
             |
>> | pools                                | None
              |
>> | snapshot_support                     | True
              |
>> | create_share_from_snapshot_support   | True
              |
>> | revert_to_snapshot_support           | False
             |
>> | mount_snapshot_support               | False
             |
>> | replication_domain                   | None
              |
>> | filter_function                      | None
              |
>> | goodness_function                    | None
              |
>> | security_service_update_support      | False
             |
>> | network_allocation_update_support    | False
             |
>> | share_server_multiple_subnet_support | False
             |
>> | max_shares_per_share_server          | -1
              |
>> | max_share_server_size                | -1
              |
>> | share_group_stats                    | {'consistent_snapshot_support':
None}        |
>> | ipv4_support                         | True
              |
>> | ipv6_support                         | False
             |
>> | server_pools_mapping                 |
{'2242964f-be38-4f11-8c90-cdcfcd20c20a': []} |
>> | timestamp                            | 2023-03-13T09:13:06.331713
              |
>>
+--------------------------------------+----------------------------------------------+
>>
>>
>> On Mon, Mar 13, 2023 at 11:45?PM Alvaro Soto <alsotoes at gmail.com> wrote:
>>>
>>> If you are inside this features support matrix
>>>
>>>
https://docs.openstack.org/manila/latest/admin/share_back_ends_feature_support_mapping.html#share-back-ends-feature-support-mapping
>>>
>>> Examine your configuration as well:
>>>
>>> snapshot_support indicates whether snapshots are supported for shares
created on the pool/backend. When administrators do not set this capability
as an extra-spec in a share type, the scheduler can place new shares of
that type in pools without regard for whether snapshots are supported, and
those shares will not support snapshots.
>>>
>>>
https://docs.openstack.org/manila/latest/admin/capabilities_and_extra_specs.html
>>>
>>> Cheers!
>>>
>>> On Mon, Mar 13, 2023 at 3:35?AM garcetto <garcetto at gmail.com> wrote:
>>>>
>>>> good morning,
>>>> i am using manila and generic driver with dhss true, but cannot create
snapshot from shares, any help? where can i look at?
>>>> (cinder backend is a linux nfs server)
>>>>
>>>> thank you
>>>>
>>>> $ manila snapshot-create share-01 --name Snapshot1
>>>> ERROR: Snapshots cannot be created for share
'2c8b1b3d-ef82-4372-94df-678539f0d843' since it does not have that
capability. (HTTP 422) (Request-ID:
req-cab23a46-37dc-4f2b-b26c-d6b21b7453ba)
>>>>
>>>
>>>
>>> --
>>>
>>> Alvaro Soto
>>>
>>> Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.
>>> ----------------------------------------------------------
>>> Great people talk about ideas,
>>> ordinary people talk about things,
>>> small people talk... about other people.
>
>
>
> --
>
> Alvaro Soto
>
> Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.
> ----------------------------------------------------------
> Great people talk about ideas,
> ordinary people talk about things,
> small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230315/fb8fe162/attachment.htm>

From rdhasman at redhat.com  Wed Mar 15 23:25:47 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Thu, 16 Mar 2023 04:55:47 +0530
Subject: [cinder] Canceling upstream meeting 22nd March
Message-ID: <CAARK8KQ_mvioh9pr_hwSnot_MsE6SPnKzruucAxiqvfReDQVMw@mail.gmail.com>

Hello Argonauts,

As discussed in this week's meeting[1], we will be canceling the Cinder
upstream meeting next week i.e. 22nd March, 2023.
Since we have RC2 this week, 2023.1 release next week and PTG after
that, we don't expect many topics next week, but if you still have any,
please add them to the PTG planning etherpad[2].

See you all at the PTG!

[1]
https://meetings.opendev.org/irclogs/%23openstack-meeting-alt/%23openstack-meeting-alt.2023-03-15.log.html#t2023-03-15T14:12:41
[2] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/a3651018/attachment.htm>

From rdhasman at redhat.com  Wed Mar 15 23:28:19 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Thu, 16 Mar 2023 04:58:19 +0530
Subject: [cinder] festival of feature reviews 17th March 2023
Message-ID: <CAARK8KTuoGrgzWA0gBUaCnBniwvUO9=1PBp1G5OeOWyPJLCGWA@mail.gmail.com>

Hello Argonauts,

We will be having our monthly festival of reviews tomorrow
i.e. 17th March (Friday) from 1400-1600 UTC.
We are close to the PTG but we still have backlog so good
to have a head start in reviews for the next (2023.2) cycle.
Following are some additional details:

Date: 17th March, 2023
Time: 1400-1600 UTC
Meeting link:  https://bluejeans.com/556681290
etherpad: https://etherpad.opendev.org/p/cinder-festival-of-reviews

See you there!

Thanks
Rajat Dhasmana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/532218ec/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Wed Mar 15 19:20:12 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 16 Mar 2023 00:50:12 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
Message-ID: <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>

Update: After restarting the nova services on the controller and running
the deploy script on the edge site, I was able to launch the VM from volume.

Right now the instance creation is failing as the block device creation is
stuck in creating state, it is taking more than 10 mins for the volume to
be created, whereas the image has already been imported to the edge glance.

I will try and create a new fresh image and test again then update.

With regards,
Swogat Pradhan

On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Update:
> In the hypervisor list the compute node state is showing down.
>
>
> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Brendan,
>> Now i have deployed another site where i have used 2 linux bonds network
>> template for both 3 compute nodes and 3 ceph nodes.
>> The bonding options is set to mode=802.3ad (lacp=active).
>> I used a cirros image to launch instance but the instance timed out so i
>> waited for the volume to be created.
>> Once the volume was created i tried launching the instance from the
>> volume and still the instance is stuck in spawning state.
>>
>> Here is the nova-compute log:
>>
>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep
>> daemon starting
>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
>> process running with uid/gid: 0/0
>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>> process running with capabilities (eff/prm/inh):
>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>> daemon running as pid 185437
>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>> in _get_host_uuid: Unexpected error while running command.
>> Command: blkid overlay -s UUID -o value
>> Exit code: 2
>> Stdout: ''
>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>> Unexpected error while running command.
>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>
>> It is stuck in creating image, do i need to run the template mentioned
>> here ?:
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>
>> The volume is already created and i do not understand why the instance is
>> stuck in spawning state.
>>
>> With regards,
>> Swogat Pradhan
>>
>>
>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com>
>> wrote:
>>
>>> Does your environment use different network interfaces for each of the
>>> networks? Or does it have a bond with everything on it?
>>>
>>> One issue I have seen before is that when launching instances, there is
>>> a lot of network traffic between nodes as the hypervisor needs to download
>>> the image from Glance. Along with various other services sending normal
>>> network traffic, it can be enough to cause issues if everything is running
>>> over a single 1Gbe interface.
>>>
>>> I have seen the same situation in fact when using a single active/backup
>>> bond on 1Gbe nics. It?s worth checking the network traffic while you try to
>>> spawn the instance to see if you?re dropping packets. In the situation I
>>> described, there were dropped packets which resulted in a loss of
>>> communication between nova_compute and RMQ, so the node appeared offline.
>>> You should also confirm that nova_compute is being disconnected in the
>>> nova_compute logs if you tail them on the Hypervisor while spawning the
>>> instance.
>>>
>>> In my case, changing from active/backup to LACP helped. So, based on
>>> that experience, from my perspective, is certainly sounds like some kind of
>>> network issue.
>>>
>>> Regards,
>>>
>>> Brendan Shephard
>>> Senior Software Engineer
>>> Red Hat Australia
>>>
>>>
>>>
>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>>
>>> Hi,
>>>
>>> I tried to help someone with a similar issue some time ago in this
>>> thread:
>>>
>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>
>>> But apparently a neutron reinstallation fixed it for that user, not sure
>>> if that could apply here. But is it possible that your nova and neutron
>>> versions are different between central and edge site? Have you restarted
>>> nova and neutron services on the compute nodes after installation? Have you
>>> debug logs of nova-conductor and maybe nova-compute? Maybe they can help
>>> narrow down the issue.
>>> If there isn't any additional information in the debug logs I probably
>>> would start "tearing down" rabbitmq. I didn't have to do that in a
>>> production system yet so be careful. I can think of two routes:
>>>
>>> - Either remove queues, exchanges etc. while rabbit is running, this
>>> will most likely impact client IO depending on your load. Check out the
>>> rabbitmqctl commands.
>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes
>>> and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>
>>> I can imagine that the failed reply "survives" while being replicated
>>> across the rabbit nodes. But I don't really know the rabbit internals too
>>> well, so maybe someone else can chime in here and give a better advice.
>>>
>>> Regards,
>>> Eugen
>>>
>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>
>>> Hi,
>>> Can someone please help me out on this issue?
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com
>>> >
>>> wrote:
>>>
>>> Hi
>>> I don't see any major packet loss.
>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>>> loss.
>>>
>>> with regards,
>>> Swogat Pradhan
>>>
>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com
>>> >
>>> wrote:
>>>
>>> Hi,
>>> Yes the MTU is the same as the default '1500'.
>>> Generally I haven't seen any packet loss, but never checked when
>>> launching the instance.
>>> I will check that and come back.
>>> But everytime i launch an instance the instance gets stuck at spawning
>>> state and there the hypervisor becomes down, so not sure if packet loss
>>> causes this.
>>>
>>> With regards,
>>> Swogat pradhan
>>>
>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>
>>> One more thing coming to mind is MTU size. Are they identical between
>>> central and edge site? Do you see packet loss through the tunnel?
>>>
>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>
>>> > Hi Eugen,
>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>> > getting email's from you.
>>> > Coming to the issue:
>>> >
>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>> /
>>> > Listing policies for vhost "/" ...
>>> > vhost   name    pattern apply-to        definition      priority
>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >
>>> > I have the edge site compute nodes up, it only goes down when i am
>>> trying
>>> > to launch an instance and the instance comes to a spawning state and
>>> then
>>> > gets stuck.
>>> >
>>> > I have a tunnel setup between the central and the edge sites.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Eugen,
>>> >> For some reason i am not getting your email to me directly, i am
>>> checking
>>> >> the email digest and there i am able to find your reply.
>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>
>>> >> *Note: i am able to create vm's and perform other activities in the
>>> >> central site, only facing this issue in the edge site.*
>>> >>
>>> >> With regards,
>>> >> Swogat Pradhan
>>> >>
>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Hi Eugen,
>>> >>> Thanks for your response.
>>> >>> I have actually a 4 controller setup so here are the details:
>>> >>>
>>> >>> *PCS Status:*
>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>> Started
>>> >>> overcloud-controller-no-ceph-3
>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>> Started
>>> >>> overcloud-controller-2
>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>> Started
>>> >>> overcloud-controller-1
>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>> Started
>>> >>> overcloud-controller-0
>>> >>>
>>> >>> I have tried restarting the bundle multiple times but the issue is
>>> still
>>> >>> present.
>>> >>>
>>> >>> *Cluster status:*
>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>> >>> Cluster status of node
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>> >>> Basics
>>> >>>
>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>
>>> >>> Disk Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Running Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Versions
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>
>>> >>> Alarms
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Network Partitions
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Listeners
>>> >>>
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and
>>> >>> CLI tool communication
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>
>>> >>> Feature flags
>>> >>>
>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>> Flag: quorum_queue, state: enabled
>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>
>>> >>> *Logs:*
>>> >>> *(Attached)*
>>> >>>
>>> >>> With regards,
>>> >>> Swogat Pradhan
>>> >>>
>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>>
>>> >>>> nova-conuctor:
>>> >>>>
>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>> backend dogpile.cache.null.
>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>
>>> >>>>> Hi,
>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>> >>>>> launch vm's.
>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>> compute
>>> >>>>> service list), the node comes backup when i restart the nova
>>> compute
>>> >>>>> service but then the launch of the vm fails.
>>> >>>>>
>>> >>>>> nova-compute.log
>>> >>>>>
>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>> >>>>> instance usage
>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>> to
>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>> name:
>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>>> backend dogpile.cache.null.
>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>> >>>>> privsep helper:
>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> 'privsep-helper',
>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>> privsep
>>> >>>>> daemon via rootwrap
>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon starting
>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with uid/gid: 0/0
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon running as pid 2647
>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>> >>>>> execution error
>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>>> Exit code: 2
>>> >>>>> Stdout: ''
>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>>> Unexpected error while running command.
>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>>>
>>> >>>>> Is there a way to solve this issue?
>>> >>>>>
>>> >>>>>
>>> >>>>> With regards,
>>> >>>>>
>>> >>>>> Swogat Pradhan
>>> >>>>>
>>> >>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/beebaf75/attachment-0001.htm>

From noonedeadpunk at gmail.com  Thu Mar 16 01:03:25 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 16 Mar 2023 02:03:25 +0100
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
 <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
Message-ID: <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>

Eventually I don't fully understand reasons behind need of such service.

As fighting with high load by migrating instances between computes is
fighting with consequences rather then with root cause, not saying that it
brings more negative effects then positive for experience of the end-users,
as you're just moving problem to another place affecting more workloads
with degraded performance.

If you struggling from high load on a daily basis - then you have too high
cpu_allocation_ratio set for computes. As high load issues always come from
attempts to oversell too agressively.

If you have workloads in the cloud that always utilize all CPUs available -
then you should consider having flavors and aggregates with cpu-pinning,
meaning providing physical CPUs for such workloads.

Also don't forget, that it's worth setting more realistic numbers for
reserved resources on computes, because default 2gb of RAM is usually too
small.


??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:

> Hello.
> I cannot use because missing cpu_util metric. I try to match it work but
> not yet. It need some code to make it work. It seem none care about balance
> reources on cloud.
>
> On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org> wrote:
>
>> On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
>> > Watcher is not good because It need cpu metric
>> > such as cpu load in Ceilometer which is removed so we cannot use it.
>>
>> Hi!
>>
>> What do you mean by "Ceilometer [is] removed"? It certainly isn't dead,
>> and it works well... If by that, you mean "ceilometer-api" is removed,
>> then yes, but then you can use gnocchi.
>>
>> Cheers,
>>
>> Thomas Goirand (zigo)
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/13634ddc/attachment.htm>

From smooney at redhat.com  Thu Mar 16 08:46:51 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 16 Mar 2023 08:46:51 +0000
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
 <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
 <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>
Message-ID: <ac387fbeced8312313f8273e1ef34b1e83723d2e.camel@redhat.com>

On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
> Eventually I don't fully understand reasons behind need of such service.
> 
> As fighting with high load by migrating instances between computes is
> fighting with consequences rather then with root cause, not saying that it
> brings more negative effects then positive for experience of the end-users,
> as you're just moving problem to another place affecting more workloads
> with degraded performance.
> 
> If you struggling from high load on a daily basis - then you have too high
> cpu_allocation_ratio set for computes. As high load issues always come from
> attempts to oversell too agressively.
> 
> If you have workloads in the cloud that always utilize all CPUs available -
> then you should consider having flavors and aggregates with cpu-pinning,
> meaning providing physical CPUs for such workloads.
> 
> Also don't forget, that it's worth setting more realistic numbers for
> reserved resources on computes, because default 2gb of RAM is usually too
> small.
i tend to agree although there are some thing you can do in the nova schduler ot help
e.g. prefering spreading over packing.

for cpu load in particalar you can also enable the metric weigher

i have not read this thread in detail altough skiming i see refrences to ceilometer.
nova's metrics weigher has no depency on it.
the metrics weigher 
https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py
is configured by adding weight_setting in the schduler config
https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting

    [metrics]
    weight_setting = name1=1.0, name2=-1.0
and enabeling the monitors in the nova-comptue config
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors
[DEFAULT]
compute_monitors = cpu.virt_driver

^ that is the only one we support

the datafiles we report are set here
https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101

the more intersting values are 
"cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"

we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they
have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus
and this help blance systme load.

  [metrics]
    weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0

you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation.
and you woudl want to prefer idle host if your intent is to blance load.

iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so 
[metrics]
    weight_setting = cpu.percent=-1.0
would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly

so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the
metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them
to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for
each compute. if you have a lot of compute that might be problematic. its one of the reasons we
decided not to add more metrics like this.


> 
> 
> 
> ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
> 
> > Hello.
> > I cannot use because missing cpu_util metric. I try to match it work but
> > not yet. It need some code to make it work. It seem none care about balance
> > reources on cloud.
> > 
> > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org> wrote:
> > 
> > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> > > > Watcher is not good because It need cpu metric
> > > > such as cpu load in Ceilometer which is removed so we cannot use it.
> > > 
> > > Hi!
> > > 
> > > What do you mean by "Ceilometer [is] removed"? It certainly isn't dead,
> > > and it works well... If by that, you mean "ceilometer-api" is removed,
> > > then yes, but then you can use gnocchi.
> > > 
> > > Cheers,
> > > 
> > > Thomas Goirand (zigo)
> > > 
> > > 


From noonedeadpunk at gmail.com  Thu Mar 16 09:35:52 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 16 Mar 2023 10:35:52 +0100
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <ac387fbeced8312313f8273e1ef34b1e83723d2e.camel@redhat.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
 <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
 <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>
 <ac387fbeced8312313f8273e1ef34b1e83723d2e.camel@redhat.com>
Message-ID: <CAPd_6AsiXbjkD2Ld6-__Ksj0Xb4QjQktH_XSyHGeWkuKWyvT-Q@mail.gmail.com>

Oh, thanks for that detailed explanation!
I was looking at metrics weighter for years and looked through code couple
of times but never got it properly configured. That is very helpful, thanks
a lot!

??, 16 ???. 2023 ?., 09:46 Sean Mooney <smooney at redhat.com>:

> On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
> > Eventually I don't fully understand reasons behind need of such service.
> >
> > As fighting with high load by migrating instances between computes is
> > fighting with consequences rather then with root cause, not saying that
> it
> > brings more negative effects then positive for experience of the
> end-users,
> > as you're just moving problem to another place affecting more workloads
> > with degraded performance.
> >
> > If you struggling from high load on a daily basis - then you have too
> high
> > cpu_allocation_ratio set for computes. As high load issues always come
> from
> > attempts to oversell too agressively.
> >
> > If you have workloads in the cloud that always utilize all CPUs
> available -
> > then you should consider having flavors and aggregates with cpu-pinning,
> > meaning providing physical CPUs for such workloads.
> >
> > Also don't forget, that it's worth setting more realistic numbers for
> > reserved resources on computes, because default 2gb of RAM is usually too
> > small.
> i tend to agree although there are some thing you can do in the nova
> schduler ot help
> e.g. prefering spreading over packing.
>
> for cpu load in particalar you can also enable the metric weigher
>
> i have not read this thread in detail altough skiming i see refrences to
> ceilometer.
> nova's metrics weigher has no depency on it.
> the metrics weigher
>
> https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py
> is configured by adding weight_setting in the schduler config
>
> https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting
>
>     [metrics]
>     weight_setting = name1=1.0, name2=-1.0
> and enabeling the monitors in the nova-comptue config
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors
> [DEFAULT]
> compute_monitors = cpu.virt_driver
>
> ^ that is the only one we support
>
> the datafiles we report are set here
>
> https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101
>
> the more intersting values are
> "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
>
> we have a fairly large internal cloud that is used for dev and ci and as
> of about 12 to 18 months ago they
> have been using this to help balance the schduling fo instance as we have
> a mix of hyperviros skus
> and this help blance systme load.
>
>   [metrics]
>     weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0,
> cpu.idle.percent=1.0
>
> you want iowait and cpu.percent to be negitive since you want to avoid
> host with high iowait or high cpu utilsation.
> and you woudl want to prefer idle host if your intent is to blance load.
>
> iowait is actully included in cpu.percent and infact cpu.percent is
> basicaly cpu load - idel so
> [metrics]
>     weight_setting = cpu.percent=-1.0
> would have a simialreffect but you might want the extra granularity to
> weight iowait vs idle differntly
>
> so if you find the normal cpu/ram/disk weigher are not sufficent to blance
> based onload check out the
> metrics weigher and see it that helps. just be awere that collecting the
> cpu metrics and providing them
> to the schduelr will increase rabbitmq load a little since we perodicly
> have ot update those values for
> each compute. if you have a lot of compute that might be problematic. its
> one of the reasons we
> decided not to add more metrics like this.
>
>
>
> >
> >
> >
> > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
> >
> > > Hello.
> > > I cannot use because missing cpu_util metric. I try to match it work
> but
> > > not yet. It need some code to make it work. It seem none care about
> balance
> > > reources on cloud.
> > >
> > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org> wrote:
> > >
> > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> > > > > Watcher is not good because It need cpu metric
> > > > > such as cpu load in Ceilometer which is removed so we cannot use
> it.
> > > >
> > > > Hi!
> > > >
> > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't
> dead,
> > > > and it works well... If by that, you mean "ceilometer-api" is
> removed,
> > > > then yes, but then you can use gnocchi.
> > > >
> > > > Cheers,
> > > >
> > > > Thomas Goirand (zigo)
> > > >
> > > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/10fff8a2/attachment-0001.htm>

From smooney at redhat.com  Thu Mar 16 09:46:20 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 16 Mar 2023 09:46:20 +0000
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <CAPd_6AsiXbjkD2Ld6-__Ksj0Xb4QjQktH_XSyHGeWkuKWyvT-Q@mail.gmail.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
 <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
 <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>
 <ac387fbeced8312313f8273e1ef34b1e83723d2e.camel@redhat.com>
 <CAPd_6AsiXbjkD2Ld6-__Ksj0Xb4QjQktH_XSyHGeWkuKWyvT-Q@mail.gmail.com>
Message-ID: <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com>

On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote:
> Oh, thanks for that detailed explanation!
> I was looking at metrics weighter for years and looked through code couple
> of times but never got it properly configured. That is very helpful, thanks
> a lot!

that tells me i sure porbaly update the docs...
> 
> ??, 16 ???. 2023 ?., 09:46 Sean Mooney <smooney at redhat.com>:
> 
> > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
> > > Eventually I don't fully understand reasons behind need of such service.
> > > 
> > > As fighting with high load by migrating instances between computes is
> > > fighting with consequences rather then with root cause, not saying that
> > it
> > > brings more negative effects then positive for experience of the
> > end-users,
> > > as you're just moving problem to another place affecting more workloads
> > > with degraded performance.
> > > 
> > > If you struggling from high load on a daily basis - then you have too
> > high
> > > cpu_allocation_ratio set for computes. As high load issues always come
> > from
> > > attempts to oversell too agressively.
> > > 
> > > If you have workloads in the cloud that always utilize all CPUs
> > available -
> > > then you should consider having flavors and aggregates with cpu-pinning,
> > > meaning providing physical CPUs for such workloads.
> > > 
> > > Also don't forget, that it's worth setting more realistic numbers for
> > > reserved resources on computes, because default 2gb of RAM is usually too
> > > small.
> > i tend to agree although there are some thing you can do in the nova
> > schduler ot help
> > e.g. prefering spreading over packing.
> > 
> > for cpu load in particalar you can also enable the metric weigher
> > 
> > i have not read this thread in detail altough skiming i see refrences to
> > ceilometer.
> > nova's metrics weigher has no depency on it.
> > the metrics weigher
> > 
> > https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py
> > is configured by adding weight_setting in the schduler config
> > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting
> > 
> >     [metrics]
> >     weight_setting = name1=1.0, name2=-1.0
> > and enabeling the monitors in the nova-comptue config
> > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors
> > [DEFAULT]
> > compute_monitors = cpu.virt_driver
> > 
> > ^ that is the only one we support
> > 
> > the datafiles we report are set here
> > 
> > https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101
> > 
> > the more intersting values are
> > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
> > 
> > we have a fairly large internal cloud that is used for dev and ci and as
> > of about 12 to 18 months ago they
> > have been using this to help balance the schduling fo instance as we have
> > a mix of hyperviros skus
> > and this help blance systme load.
> > 
> >   [metrics]
> >     weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0,
> > cpu.idle.percent=1.0
> > 
> > you want iowait and cpu.percent to be negitive since you want to avoid
> > host with high iowait or high cpu utilsation.
> > and you woudl want to prefer idle host if your intent is to blance load.
> > 
> > iowait is actully included in cpu.percent and infact cpu.percent is
> > basicaly cpu load - idel so
> > [metrics]
> >     weight_setting = cpu.percent=-1.0
> > would have a simialreffect but you might want the extra granularity to
> > weight iowait vs idle differntly
> > 
> > so if you find the normal cpu/ram/disk weigher are not sufficent to blance
> > based onload check out the
> > metrics weigher and see it that helps. just be awere that collecting the
> > cpu metrics and providing them
> > to the schduelr will increase rabbitmq load a little since we perodicly
> > have ot update those values for
> > each compute. if you have a lot of compute that might be problematic. its
> > one of the reasons we
> > decided not to add more metrics like this.
> > 
> > 
> > 
> > > 
> > > 
> > > 
> > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
> > > 
> > > > Hello.
> > > > I cannot use because missing cpu_util metric. I try to match it work
> > but
> > > > not yet. It need some code to make it work. It seem none care about
> > balance
> > > > reources on cloud.
> > > > 
> > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org> wrote:
> > > > 
> > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> > > > > > Watcher is not good because It need cpu metric
> > > > > > such as cpu load in Ceilometer which is removed so we cannot use
> > it.
> > > > > 
> > > > > Hi!
> > > > > 
> > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't
> > dead,
> > > > > and it works well... If by that, you mean "ceilometer-api" is
> > removed,
> > > > > then yes, but then you can use gnocchi.
> > > > > 
> > > > > Cheers,
> > > > > 
> > > > > Thomas Goirand (zigo)
> > > > > 
> > > > > 
> > 
> > 


From christian.rohmann at inovex.de  Thu Mar 16 11:09:35 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Thu, 16 Mar 2023 12:09:35 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <510A9181-1D22-41F2-AE3C-EE354CD6F895@binero.com>
Message-ID: <0f117b65-7006-d6b3-7b96-ba5e01bbf09e@inovex.de>

On 13/03/2023 19:46, Tobias Urdin wrote:
> Interesting thread!

+1

Most installations run into this issue of wondering when a network node 
is really ready / fully synced. While the tooling that Mohammed or Felix 
does work in "observing" or "determining" the sync state independently, 
I strongly believe a network agent should report it's sync state back to 
the control plane.

Orchestration of e.g. rolling upgrades of agents should be possible with 
state information provided by neutron itself and not require external 
tooling.

By implementing the state data structure and then having the drivers 
(OVN, OVS, linuxbridge) report this back, this is independent from the 
particular implementation details (network NS, certain processes 
running, ...).

Looking at this problem the taints and tolerations model use for node 
"readiness" from Kubernetes come to mind 
(https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable).


Regards

Christian


From nguyenhuukhoinw at gmail.com  Thu Mar 16 11:34:33 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 16 Mar 2023 18:34:33 +0700
Subject: [Openstack] Lack of Balance solution such as Watcher.
In-Reply-To: <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com>
References: <CABAODRe4yD0knhzMTKxwTEWwus_xhS4JsYv5nLV6A+xcvMjoxg@mail.gmail.com>
 <01e79afe-64f9-b49e-a316-b9980c41d71d@debian.org>
 <CABAODRcWdw+FqXXQirazruV3kW-9Fp17bEFq0QzAHhe-pCFScQ@mail.gmail.com>
 <CAPd_6AvwN9Z_V4sc6cDBP64HirrsrtActbqD+AyH7VtyO7zkeQ@mail.gmail.com>
 <ac387fbeced8312313f8273e1ef34b1e83723d2e.camel@redhat.com>
 <CAPd_6AsiXbjkD2Ld6-__Ksj0Xb4QjQktH_XSyHGeWkuKWyvT-Q@mail.gmail.com>
 <6c936677c510a3888ee113f26b91231b7b78a8ec.camel@redhat.com>
Message-ID: <CABAODRfqRtek8D2-OdEnPnFM3PU+vLmHAZHvoNGO2oL2sYMEZw@mail.gmail.com>

Thank you very much for sharing!
I will dig dive with it.
Nguyen Huu Khoi


On Thu, Mar 16, 2023 at 4:54?PM Sean Mooney <smooney at redhat.com> wrote:

> On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote:
> > Oh, thanks for that detailed explanation!
> > I was looking at metrics weighter for years and looked through code
> couple
> > of times but never got it properly configured. That is very helpful,
> thanks
> > a lot!
>
> that tells me i sure porbaly update the docs...
> >
> > ??, 16 ???. 2023 ?., 09:46 Sean Mooney <smooney at redhat.com>:
> >
> > > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
> > > > Eventually I don't fully understand reasons behind need of such
> service.
> > > >
> > > > As fighting with high load by migrating instances between computes is
> > > > fighting with consequences rather then with root cause, not saying
> that
> > > it
> > > > brings more negative effects then positive for experience of the
> > > end-users,
> > > > as you're just moving problem to another place affecting more
> workloads
> > > > with degraded performance.
> > > >
> > > > If you struggling from high load on a daily basis - then you have too
> > > high
> > > > cpu_allocation_ratio set for computes. As high load issues always
> come
> > > from
> > > > attempts to oversell too agressively.
> > > >
> > > > If you have workloads in the cloud that always utilize all CPUs
> > > available -
> > > > then you should consider having flavors and aggregates with
> cpu-pinning,
> > > > meaning providing physical CPUs for such workloads.
> > > >
> > > > Also don't forget, that it's worth setting more realistic numbers for
> > > > reserved resources on computes, because default 2gb of RAM is
> usually too
> > > > small.
> > > i tend to agree although there are some thing you can do in the nova
> > > schduler ot help
> > > e.g. prefering spreading over packing.
> > >
> > > for cpu load in particalar you can also enable the metric weigher
> > >
> > > i have not read this thread in detail altough skiming i see refrences
> to
> > > ceilometer.
> > > nova's metrics weigher has no depency on it.
> > > the metrics weigher
> > >
> > >
> https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py
> > > is configured by adding weight_setting in the schduler config
> > >
> > >
> https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting
> > >
> > >     [metrics]
> > >     weight_setting = name1=1.0, name2=-1.0
> > > and enabeling the monitors in the nova-comptue config
> > >
> > >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors
> > > [DEFAULT]
> > > compute_monitors = cpu.virt_driver
> > >
> > > ^ that is the only one we support
> > >
> > > the datafiles we report are set here
> > >
> > >
> https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101
> > >
> > > the more intersting values are
> > > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
> > >
> > > we have a fairly large internal cloud that is used for dev and ci and
> as
> > > of about 12 to 18 months ago they
> > > have been using this to help balance the schduling fo instance as we
> have
> > > a mix of hyperviros skus
> > > and this help blance systme load.
> > >
> > >   [metrics]
> > >     weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0,
> > > cpu.idle.percent=1.0
> > >
> > > you want iowait and cpu.percent to be negitive since you want to avoid
> > > host with high iowait or high cpu utilsation.
> > > and you woudl want to prefer idle host if your intent is to blance
> load.
> > >
> > > iowait is actully included in cpu.percent and infact cpu.percent is
> > > basicaly cpu load - idel so
> > > [metrics]
> > >     weight_setting = cpu.percent=-1.0
> > > would have a simialreffect but you might want the extra granularity to
> > > weight iowait vs idle differntly
> > >
> > > so if you find the normal cpu/ram/disk weigher are not sufficent to
> blance
> > > based onload check out the
> > > metrics weigher and see it that helps. just be awere that collecting
> the
> > > cpu metrics and providing them
> > > to the schduelr will increase rabbitmq load a little since we perodicly
> > > have ot update those values for
> > > each compute. if you have a lot of compute that might be problematic.
> its
> > > one of the reasons we
> > > decided not to add more metrics like this.
> > >
> > >
> > >
> > > >
> > > >
> > > >
> > > > ??, 15 ???. 2023 ?., 13:11 Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com>:
> > > >
> > > > > Hello.
> > > > > I cannot use because missing cpu_util metric. I try to match it
> work
> > > but
> > > > > not yet. It need some code to make it work. It seem none care about
> > > balance
> > > > > reources on cloud.
> > > > >
> > > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo at debian.org>
> wrote:
> > > > >
> > > > > > On 12/11/22 01:59, Nguy?n H?u Kh?i wrote:
> > > > > > > Watcher is not good because It need cpu metric
> > > > > > > such as cpu load in Ceilometer which is removed so we cannot
> use
> > > it.
> > > > > >
> > > > > > Hi!
> > > > > >
> > > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't
> > > dead,
> > > > > > and it works well... If by that, you mean "ceilometer-api" is
> > > removed,
> > > > > > then yes, but then you can use gnocchi.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Thomas Goirand (zigo)
> > > > > >
> > > > > >
> > >
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/b3fef903/attachment-0001.htm>

From johfulto at redhat.com  Thu Mar 16 11:54:36 2023
From: johfulto at redhat.com (John Fulton)
Date: Thu, 16 Mar 2023 07:54:36 -0400
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
Message-ID: <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>

On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
<swogatpradhan22 at gmail.com> wrote:
>
> Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume.
>
> Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance.

Try following this document and making the same observations in your
environment for AZs and their local ceph cluster.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites

On a DCN site if you run a command like this:

$ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
/etc/ceph/dcn0.client.admin.keyring
$ rbd --cluster dcn0 -p volumes ls -l
NAME                                      SIZE  PARENT
                          FMT PROT LOCK
volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
$

Then, you should see the parent of the volume is the image which is on
the same local ceph cluster.

I wonder if something is misconfigured and thus you're encountering
the streaming behavior described here:

Ideally all images should reside in the central Glance and be copied
to DCN sites before instances of those images are booted on DCN sites.
If an image is not copied to a DCN site before it is booted, then the
image will be streamed to the DCN site and then the image will boot as
an instance. This happens because Glance at the DCN site has access to
the images store at the Central ceph cluster. Though the booting of
the image will take time because it has not been copied in advance,
this is still preferable to failing to boot the image.

You can also exec into the cinder container at the DCN site and
confirm it's using it's local ceph cluster.

  John

>
> I will try and create a new fresh image and test again then update.
>
> With regards,
> Swogat Pradhan
>
> On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>
>> Update:
>> In the hypervisor list the compute node state is showing down.
>>
>>
>> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>
>>> Hi Brendan,
>>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>> The bonding options is set to mode=802.3ad (lacp=active).
>>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created.
>>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state.
>>>
>>> Here is the nova-compute log:
>>>
>>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting
>>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command.
>>> Command: blkid overlay -s UUID -o value
>>> Exit code: 2
>>> Stdout: ''
>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
>>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>
>>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>
>>> The volume is already created and i do not understand why the instance is stuck in spawning state.
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>>
>>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com> wrote:
>>>>
>>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it?
>>>>
>>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface.
>>>>
>>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance.
>>>>
>>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue.
>>>>
>>>> Regards,
>>>>
>>>> Brendan Shephard
>>>> Senior Software Engineer
>>>> Red Hat Australia
>>>>
>>>>
>>>>
>>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I tried to help someone with a similar issue some time ago in this thread:
>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>
>>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue.
>>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes:
>>>>
>>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands.
>>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>
>>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice.
>>>>
>>>> Regards,
>>>> Eugen
>>>>
>>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>
>>>> Hi,
>>>> Can someone please help me out on this issue?
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>> wrote:
>>>>
>>>> Hi
>>>> I don't see any major packet loss.
>>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>>>> loss.
>>>>
>>>> with regards,
>>>> Swogat Pradhan
>>>>
>>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>> Yes the MTU is the same as the default '1500'.
>>>> Generally I haven't seen any packet loss, but never checked when
>>>> launching the instance.
>>>> I will check that and come back.
>>>> But everytime i launch an instance the instance gets stuck at spawning
>>>> state and there the hypervisor becomes down, so not sure if packet loss
>>>> causes this.
>>>>
>>>> With regards,
>>>> Swogat pradhan
>>>>
>>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>>
>>>> One more thing coming to mind is MTU size. Are they identical between
>>>> central and edge site? Do you see packet loss through the tunnel?
>>>>
>>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>
>>>> > Hi Eugen,
>>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>>> > getting email's from you.
>>>> > Coming to the issue:
>>>> >
>>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>>> /
>>>> > Listing policies for vhost "/" ...
>>>> > vhost   name    pattern apply-to        definition      priority
>>>> > /       ha-all  ^(?!amq\.).*    queues
>>>> >
>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>> >
>>>> > I have the edge site compute nodes up, it only goes down when i am
>>>> trying
>>>> > to launch an instance and the instance comes to a spawning state and
>>>> then
>>>> > gets stuck.
>>>> >
>>>> > I have a tunnel setup between the central and the edge sites.
>>>> >
>>>> > With regards,
>>>> > Swogat Pradhan
>>>> >
>>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> > wrote:
>>>> >
>>>> >> Hi Eugen,
>>>> >> For some reason i am not getting your email to me directly, i am
>>>> checking
>>>> >> the email digest and there i am able to find your reply.
>>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>> >> Yes, these logs are from the time when the issue occurred.
>>>> >>
>>>> >> *Note: i am able to create vm's and perform other activities in the
>>>> >> central site, only facing this issue in the edge site.*
>>>> >>
>>>> >> With regards,
>>>> >> Swogat Pradhan
>>>> >>
>>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi Eugen,
>>>> >>> Thanks for your response.
>>>> >>> I have actually a 4 controller setup so here are the details:
>>>> >>>
>>>> >>> *PCS Status:*
>>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>> Started
>>>> >>> overcloud-controller-no-ceph-3
>>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>> Started
>>>> >>> overcloud-controller-2
>>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>> Started
>>>> >>> overcloud-controller-1
>>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>> Started
>>>> >>> overcloud-controller-0
>>>> >>>
>>>> >>> I have tried restarting the bundle multiple times but the issue is
>>>> still
>>>> >>> present.
>>>> >>>
>>>> >>> *Cluster status:*
>>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>> >>> Cluster status of node
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>> >>> Basics
>>>> >>>
>>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>> >>>
>>>> >>> Disk Nodes
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>
>>>> >>> Running Nodes
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>
>>>> >>> Versions
>>>> >>>
>>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>> 3.8.3
>>>> >>> on Erlang 22.3.4.1
>>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>> >>>
>>>> >>> Alarms
>>>> >>>
>>>> >>> (none)
>>>> >>>
>>>> >>> Network Partitions
>>>> >>>
>>>> >>> (none)
>>>> >>>
>>>> >>> Listeners
>>>> >>>
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> tool
>>>> >>> communication
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> interface:
>>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>> inter-node and
>>>> >>> CLI tool communication
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>>> 0-9-1
>>>> >>> and AMQP 1.0
>>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> ,
>>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>
>>>> >>> Feature flags
>>>> >>>
>>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>> >>> Flag: implicit_default_bindings, state: enabled
>>>> >>> Flag: quorum_queue, state: enabled
>>>> >>> Flag: virtual_host_metadata, state: enabled
>>>> >>>
>>>> >>> *Logs:*
>>>> >>> *(Attached)*
>>>> >>>
>>>> >>> With regards,
>>>> >>> Swogat Pradhan
>>>> >>>
>>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Hi,
>>>> >>>> Please find the nova conductor as well as nova api log.
>>>> >>>>
>>>> >>>> nova-conuctor:
>>>> >>>>
>>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> with
>>>> >>>> backend dogpile.cache.null.
>>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>>> due to a
>>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> Abandoning...:
>>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>
>>>> >>>>> Hi,
>>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>> >>>>> launch vm's.
>>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>>> compute
>>>> >>>>> service list), the node comes backup when i restart the nova
>>>> compute
>>>> >>>>> service but then the launch of the vm fails.
>>>> >>>>>
>>>> >>>>> nova-compute.log
>>>> >>>>>
>>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>> >>>>> instance usage
>>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>>> to
>>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>> >>>>> dcn01-hci-0.bdxworld.com
>>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>>> name:
>>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> with
>>>> >>>>> backend dogpile.cache.null.
>>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>> >>>>> privsep helper:
>>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>> 'privsep-helper',
>>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>>> privsep
>>>> >>>>> daemon via rootwrap
>>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> daemon starting
>>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> process running with uid/gid: 0/0
>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> process running with capabilities (eff/prm/inh):
>>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> daemon running as pid 2647
>>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>> os_brick.initiator.connectors.nvmeof
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>> >>>>> execution error
>>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>>>> Command: blkid overlay -s UUID -o value
>>>> >>>>> Exit code: 2
>>>> >>>>> Stdout: ''
>>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> >>>>> Unexpected error while running command.
>>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>> >>>>>
>>>> >>>>> Is there a way to solve this issue?
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> With regards,
>>>> >>>>>
>>>> >>>>> Swogat Pradhan
>>>> >>>>>
>>>> >>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From geguileo at redhat.com  Thu Mar 16 12:10:23 2023
From: geguileo at redhat.com (Gorka Eguileor)
Date: Thu, 16 Mar 2023 13:10:23 +0100
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <CAB_Ljt746dHvTArtAuNv0Rk7NEwRAnXR3PNnuAJ+hi_8_zaE0Q@mail.gmail.com>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
 <20230309095514.l3i67tys2ujaq6dp@localhost>
 <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
 <20230313163251.xpnzyvzb65b6zaal@localhost>
 <20230314084601.t2ez24gcljnu5plq@localhost>
 <CAB_Ljt746dHvTArtAuNv0Rk7NEwRAnXR3PNnuAJ+hi_8_zaE0Q@mail.gmail.com>
Message-ID: <20230316121023.tdzgu6zinm7spvjp@localhost>

On 16/03, Rishat Azizov wrote:
> Hi Gorka,
>
> Thanks!
> I fixed issue by adding to multipathd config uxsock_timeout directive:
> uxsock_timeout 10000
>
> Because in multipathd logs I saw this error:
> 3624a93705842cfae35d7483200015fd8: map flushed
> cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
> 4.858561 secs
>
> Now large disk backups work fine.
>
> 2. This happens because despite the timeout of the first attempt and exit
> code 1, the multipath device was disconnected, so the next attempts ended
> with an error "is not a multipath device", since the multipath device had
> already disconnected.
>

Hi,

That's a nice workaround until we fix it upstream!!

Thanks for confirming my suspicions were right. This is the 3rd thing I
mentioned was happening, flush call failed but it actually removed the
device.

We'll proceed to fix the flushing code in master.

Cheers,
Gorka.

>
> ??, 14 ???. 2023??. ? 14:46, Gorka Eguileor <geguileo at redhat.com>:
>
> > [Sending the email again as it seems it didn't reach the ML]
> >
> >
> > On 13/03, Gorka Eguileor wrote:
> > > On 11/03, Rishat Azizov wrote:
> > > > Hi, Gorka,
> > > >
> > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > > >
> >
> >
> >
> > Hi,
> >
> > There are multiple things going on here:
> >
> > 1. There is a bug in os-brick, because the disconnect_volume should not
> >    fail, since it is being called with force=True and
> >    ignore_errors=True.
> >
> >    The issues is that this call [1] is not wrapped in the
> >    ExceptionChainer context manager, and it should not even be a flush
> >    call, it should be a call to "multipathd remove map $map" instead.
> >
> > 2. The way multipath code is written [2][3], the error we see about
> >    "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
> >    different things: it is not a multipath or an error happened.
> >
> >    So we don't really know what happened without enabling more verbose
> >    multipathd log levels.
> >
> > 3. The "multipath -f" call should not be failing in the first place,
> >    because the failure is happening on disconnecting the source volume,
> >    which has no data buffered to be written and therefore no reason to
> >    fail the flush (unless it's using a friendly name).
> >
> >    I don't know if it could be happening that the first flush fails with
> >    a timeout (maybe because there is an extend operation happening), but
> >    multipathd keeps trying to flush it in the background and when it
> >    succeeds it removes the multipath device, which makes following calls
> >    fail.
> >
> >    If that's the case we would need to change the retry from automatic
> >    [4] to manual and check in-between to see if the device has been
> >    removed in-between calls.
> >
> > The first issue is definitely a bug, the 2nd one is something that could
> > be changed in the deployment to try to get additional information on the
> > failure, and the 3rd one could be a bug.
> >
> > I'll see if I can find someone who wants to work on the 1st and 3rd
> > points.
> >
> > Cheers,
> > Gorka.
> >
> > [1]:
> > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> > [2]:
> > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> > [3]:
> > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> > [4]:
> > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
> >
> >
> >
> > > >
> > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor <geguileo at redhat.com>:
> > > >
> > > > > On 06/03, Rishat Azizov wrote:
> > > > > > Hi,
> > > > > >
> > > > > > It works with smaller volumes.
> > > > > >
> > > > > > multipath.conf attached to thist email.
> > > > > >
> > > > > > Cinder version - 18.2.0 Wallaby
> > > > >
> >
> >


From dtantsur at protonmail.com  Thu Mar 16 12:21:54 2023
From: dtantsur at protonmail.com (Dmitry Tantsur)
Date: Thu, 16 Mar 2023 12:21:54 +0000
Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o
Message-ID: <d198947c-1009-00f5-688e-f2e024071bec@protonmail.com>

Hi all,

I would like to do a clean-up of old or wrongly created IPA images on
the tarballs site. Before I do that, could you please check the proposed
list to make sure we don't remove something we expect to be used?

The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/

Dmitry


From noonedeadpunk at gmail.com  Thu Mar 16 12:35:10 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 16 Mar 2023 13:35:10 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
Message-ID: <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>

> Maybe I'm missing something, but what are the reasons you would want to
> rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute
without a long-lasting shelve/unshelve and when you don't need to
change the flavor. So kind of self-service when the user does detect
some weirdness, but before bothering the tech team will attempt to
reschedule to another compute on their own.

??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com>:
>
> >  We have users who use 'rebuild' on volume booted servers before nova
> >  microversion 2.93, relying on the behavior that it keeps the volume as
> >  is. And they would like to keep doing this even after the openstack
> >  distro moves to a(n at least) zed base (sometime in the future).
>
> Maybe I'm missing something, but what are the reasons you would want to
> rebuild an instance without ... rebuilding it?
>
> I assume it's because you want to redefine the metadata or name or
> something. There's a reason why those things are not easily mutable
> today, and why we had a lot of discussion on how to make user metadata
> mutable on an existing instance in the last cycle. However, I would
> really suggest that we not override "recreate the thing" to "maybe
> recreate the thing or just update a few fields". Instead, for things we
> think really should be mutable on a server at runtime, we should
> probably just do that.
>
> Imagine if the way you changed permissions recursively was to run 'rm
> -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> that is (IMHO) what "recreate but don't just change $name" means to a
> user.
>
> >  As a naive user, it seems to me both behaviors make sense. I can
> >  easily imagine use cases for rebuild with and without reimaging.
>
> I think that's because you're already familiar with the difference. For
> users not already in that mindset, I think it probably seems very weird
> that rebuild is destructive in one case and not the other.
>
> >  Then there are a few hypothetical situations like:
> >  a) Rebuild gets a new api feature (in a new microversion) which can
> >  never be combined with the do-not-reimage behavior.
> >  b) Rebuild may have a bug, whose fix requires a microversion bump.
> >  This again can never be combined with the old behavior.
> >
> >  What do you think, are these concerns purely theoretical or real?
> >  If we would like to keep having rebuild without reimaging, can we rely
> >  on the old microversion indefinitely?
> >  Alternatively shall we propose and implement a nova spec to explicitly
> >  expose the choice in the rebuild api (just to express the idea: osc
> >  server rebuild --reimage|--no-reimage)?
> >
> > I'm not opposed to challenge the usecases in a spec, for sure.
>
> I really want to know what the use-case is for "rebuild but not
> really". And also what "rebuild" means to a user if --no-reimage is
> passed. What's being rebuilt? The docs[0] for the API say very clearly:
>
> "This operation recreates the root disk of the server."
>
> That was a lie for volume-backed instances for technical reasons. It was
> a bug, not a feature.
>
> I also strongly believe that if we're going to add a "but not
> really" flag, it needs to apply to volume-backed and regular instances
> identically. Because that's what the change here was doing - unifying
> the behavior for a single API operation. Going the other direction does
> not seem useful to me.
>
> --Dan
>
> [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
>


From rishat.azizov at gmail.com  Thu Mar 16 11:02:07 2023
From: rishat.azizov at gmail.com (Rishat Azizov)
Date: Thu, 16 Mar 2023 17:02:07 +0600
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <20230314084601.t2ez24gcljnu5plq@localhost>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
 <20230309095514.l3i67tys2ujaq6dp@localhost>
 <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
 <20230313163251.xpnzyvzb65b6zaal@localhost>
 <20230314084601.t2ez24gcljnu5plq@localhost>
Message-ID: <CAB_Ljt746dHvTArtAuNv0Rk7NEwRAnXR3PNnuAJ+hi_8_zaE0Q@mail.gmail.com>

Hi Gorka,

Thanks!
I fixed issue by adding to multipathd config uxsock_timeout directive:
uxsock_timeout 10000

Because in multipathd logs I saw this error:
3624a93705842cfae35d7483200015fd8: map flushed
cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
4.858561 secs

Now large disk backups work fine.

2. This happens because despite the timeout of the first attempt and exit
code 1, the multipath device was disconnected, so the next attempts ended
with an error "is not a multipath device", since the multipath device had
already disconnected.


??, 14 ???. 2023??. ? 14:46, Gorka Eguileor <geguileo at redhat.com>:

> [Sending the email again as it seems it didn't reach the ML]
>
>
> On 13/03, Gorka Eguileor wrote:
> > On 11/03, Rishat Azizov wrote:
> > > Hi, Gorka,
> > >
> > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > >
>
>
>
> Hi,
>
> There are multiple things going on here:
>
> 1. There is a bug in os-brick, because the disconnect_volume should not
>    fail, since it is being called with force=True and
>    ignore_errors=True.
>
>    The issues is that this call [1] is not wrapped in the
>    ExceptionChainer context manager, and it should not even be a flush
>    call, it should be a call to "multipathd remove map $map" instead.
>
> 2. The way multipath code is written [2][3], the error we see about
>    "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
>    different things: it is not a multipath or an error happened.
>
>    So we don't really know what happened without enabling more verbose
>    multipathd log levels.
>
> 3. The "multipath -f" call should not be failing in the first place,
>    because the failure is happening on disconnecting the source volume,
>    which has no data buffered to be written and therefore no reason to
>    fail the flush (unless it's using a friendly name).
>
>    I don't know if it could be happening that the first flush fails with
>    a timeout (maybe because there is an extend operation happening), but
>    multipathd keeps trying to flush it in the background and when it
>    succeeds it removes the multipath device, which makes following calls
>    fail.
>
>    If that's the case we would need to change the retry from automatic
>    [4] to manual and check in-between to see if the device has been
>    removed in-between calls.
>
> The first issue is definitely a bug, the 2nd one is something that could
> be changed in the deployment to try to get additional information on the
> failure, and the 3rd one could be a bug.
>
> I'll see if I can find someone who wants to work on the 1st and 3rd
> points.
>
> Cheers,
> Gorka.
>
> [1]:
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> [2]:
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> [3]:
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> [4]:
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
>
>
>
> > >
> > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor <geguileo at redhat.com>:
> > >
> > > > On 06/03, Rishat Azizov wrote:
> > > > > Hi,
> > > > >
> > > > > It works with smaller volumes.
> > > > >
> > > > > multipath.conf attached to thist email.
> > > > >
> > > > > Cinder version - 18.2.0 Wallaby
> > > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/72f32ad9/attachment-0001.htm>

From shr.chauhan at gmail.com  Thu Mar 16 12:00:04 2023
From: shr.chauhan at gmail.com (Shrey Chauhan)
Date: Thu, 16 Mar 2023 17:30:04 +0530
Subject: Neutron
Message-ID: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/14954343/attachment-0001.htm>

From eblock at nde.ag  Thu Mar 16 13:57:45 2023
From: eblock at nde.ag (Eugen Block)
Date: Thu, 16 Mar 2023 13:57:45 +0000
Subject: (OpenStack-horizon) unable to open horizon page after
 installing Open Stack
In-Reply-To: <CA+ykd63gNFjV=9M5dR6Hc6eaYFoQCP6P=mTJ=Qf2QNnVaRi7jA@mail.gmail.com>
References: <CA+ykd63iUfrC4QL1OJDy179PGOrTDzDdZfQaXWb_ckjk4QRRyw@mail.gmail.com>
 <CAF_JR37ER-MoaO8OMKRdEL3ZT3Fokyeh=2PnP7R7J+guW4ET-A@mail.gmail.com>
 <CA+ykd63gNFjV=9M5dR6Hc6eaYFoQCP6P=mTJ=Qf2QNnVaRi7jA@mail.gmail.com>
Message-ID: <20230316135745.Horde.JMkdtl7hq-QngIyx4x6MI1S@webmail.nde.ag>

Can you also try /horizon ?
I have an Ubuntu based Victoria test environment and we had to modify  
the apache openstack-dashboard.conf because the default didn't work  
for me as well:

# before
openstack-dashboard.conf:  WSGIScriptAlias /horizon  
/usr/share/openstack-dashboard/openstack_dashboard/wsgi.py  
process-group=horizon

# after
openstack-dashboard.conf:  WSGIScriptAlias /  
/usr/share/openstack-dashboard/openstack_dashboard/wsgi.py  
process-group=horizon

Regards,
Eugen

Zitat von Adivya Singh <adivya1.singh at gmail.com>:

> Same result
>
> On Wed, Mar 15, 2023 at 11:07?PM Radomir Dopieralski <rdopiera at redhat.com>
> wrote:
>
>> try /dashboard
>>
>> On Wed, Mar 15, 2023 at 5:56?PM Adivya Singh <adivya1.singh at gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am unable to open Open OpenStack horizon page, after installation
>>> When i am opening the link , it says
>>>
>>> Haproxy service seems up and running, I have tried to Flush IP tables
>>> also, Seeing this might be causing the issue
>>>
>>> Port 443 is also listening.
>>>
>>> Any thoughts on this
>>>
>>> [image: image.png]
>>>
>>
>>
>> --
>> Radomir Dopieralski
>>


From smooney at redhat.com  Thu Mar 16 14:03:47 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 16 Mar 2023 14:03:47 +0000
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
Message-ID: <1f9b4be304b2a9c1e463eb28420635630374529b.camel@redhat.com>

On Thu, 2023-03-16 at 13:35 +0100, Dmitriy Rabotyagov wrote:
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
> 
> I think it might be the case of rescheduling the VM to other compute
> without a long-lasting shelve/unshelve and when you don't need to
> change the flavor. So kind of self-service when the user does detect
> some weirdness, but before bothering the tech team will attempt to
> reschedule to another compute on their own.

rebuild is __not__ a move operation
the curernt special case is a hack to alow image metadata properties to be updated for an exsitng
vm but it will not reschedule the vm to another host.

we talks about  this in paste ptg where i propsoed adding a recreate api.

i do not think we should ever make rebuilt a move oepratation
but we could supprot a new api to enable the vm to recreate (keeping its data) on a new
host with updated flavor/image extra specs based on teh current value of either.

i really wish we coudl remvoe the current rebuild beahvior but when we discussed doing that
before we decied it woudl break to many people.

> 
> ??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com>:
> > 
> > > ?We have users who use 'rebuild' on volume booted servers before nova
> > > ?microversion 2.93, relying on the behavior that it keeps the volume as
> > > ?is. And they would like to keep doing this even after the openstack
> > > ?distro moves to a(n at least) zed base (sometime in the future).
> > 
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
> > 
> > I assume it's because you want to redefine the metadata or name or
> > something. There's a reason why those things are not easily mutable
> > today, and why we had a lot of discussion on how to make user metadata
> > mutable on an existing instance in the last cycle. However, I would
> > really suggest that we not override "recreate the thing" to "maybe
> > recreate the thing or just update a few fields". Instead, for things we
> > think really should be mutable on a server at runtime, we should
> > probably just do that.
> > 
> > Imagine if the way you changed permissions recursively was to run 'rm
> > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> > that is (IMHO) what "recreate but don't just change $name" means to a
> > user.
> > 
> > > ?As a naive user, it seems to me both behaviors make sense. I can
> > > ?easily imagine use cases for rebuild with and without reimaging.
> > 
> > I think that's because you're already familiar with the difference. For
> > users not already in that mindset, I think it probably seems very weird
> > that rebuild is destructive in one case and not the other.
> > 
> > > ?Then there are a few hypothetical situations like:
> > > ?a) Rebuild gets a new api feature (in a new microversion) which can
> > > ?never be combined with the do-not-reimage behavior.
> > > ?b) Rebuild may have a bug, whose fix requires a microversion bump.
> > > ?This again can never be combined with the old behavior.
> > > 
> > > ?What do you think, are these concerns purely theoretical or real?
> > > ?If we would like to keep having rebuild without reimaging, can we rely
> > > ?on the old microversion indefinitely?
> > > ?Alternatively shall we propose and implement a nova spec to explicitly
> > > ?expose the choice in the rebuild api (just to express the idea: osc
> > > ?server rebuild --reimage|--no-reimage)?
> > > 
> > > I'm not opposed to challenge the usecases in a spec, for sure.
> > 
> > I really want to know what the use-case is for "rebuild but not
> > really". And also what "rebuild" means to a user if --no-reimage is
> > passed. What's being rebuilt? The docs[0] for the API say very clearly:
> > 
> > "This operation recreates the root disk of the server."
> > 
> > That was a lie for volume-backed instances for technical reasons. It was
> > a bug, not a feature.
> > 
> > I also strongly believe that if we're going to add a "but not
> > really" flag, it needs to apply to volume-backed and regular instances
> > identically. Because that's what the change here was doing - unifying
> > the behavior for a single API operation. Going the other direction does
> > not seem useful to me.
> > 
> > --Dan
> > 
> > [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
> > 
> 


From sylvain.bauza at gmail.com  Thu Mar 16 14:21:14 2023
From: sylvain.bauza at gmail.com (Sylvain Bauza)
Date: Thu, 16 Mar 2023 15:21:14 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
Message-ID: <CANWE-Cncbcs0sX2vTJo8CQ3dsx8t_HN34uYzsB+HSH_=rS_hLA@mail.gmail.com>

Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
a ?crit :

> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
>
> I think it might be the case of rescheduling the VM to other compute
> without a long-lasting shelve/unshelve and when you don't need to
> change the flavor. So kind of self-service when the user does detect
> some weirdness, but before bothering the tech team will attempt to
> reschedule to another compute on their own.
>
>
We already have an existing API method for this, which is 'cold-migrate'
(and it does the same that resize, without changing the flavor)


??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com>:
> >
> > >  We have users who use 'rebuild' on volume booted servers before nova
> > >  microversion 2.93, relying on the behavior that it keeps the volume as
> > >  is. And they would like to keep doing this even after the openstack
> > >  distro moves to a(n at least) zed base (sometime in the future).
> >
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
> >
> > I assume it's because you want to redefine the metadata or name or
> > something. There's a reason why those things are not easily mutable
> > today, and why we had a lot of discussion on how to make user metadata
> > mutable on an existing instance in the last cycle. However, I would
> > really suggest that we not override "recreate the thing" to "maybe
> > recreate the thing or just update a few fields". Instead, for things we
> > think really should be mutable on a server at runtime, we should
> > probably just do that.
> >
> > Imagine if the way you changed permissions recursively was to run 'rm
> > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> > that is (IMHO) what "recreate but don't just change $name" means to a
> > user.
> >
> > >  As a naive user, it seems to me both behaviors make sense. I can
> > >  easily imagine use cases for rebuild with and without reimaging.
> >
> > I think that's because you're already familiar with the difference. For
> > users not already in that mindset, I think it probably seems very weird
> > that rebuild is destructive in one case and not the other.
> >
> > >  Then there are a few hypothetical situations like:
> > >  a) Rebuild gets a new api feature (in a new microversion) which can
> > >  never be combined with the do-not-reimage behavior.
> > >  b) Rebuild may have a bug, whose fix requires a microversion bump.
> > >  This again can never be combined with the old behavior.
> > >
> > >  What do you think, are these concerns purely theoretical or real?
> > >  If we would like to keep having rebuild without reimaging, can we rely
> > >  on the old microversion indefinitely?
> > >  Alternatively shall we propose and implement a nova spec to explicitly
> > >  expose the choice in the rebuild api (just to express the idea: osc
> > >  server rebuild --reimage|--no-reimage)?
> > >
> > > I'm not opposed to challenge the usecases in a spec, for sure.
> >
> > I really want to know what the use-case is for "rebuild but not
> > really". And also what "rebuild" means to a user if --no-reimage is
> > passed. What's being rebuilt? The docs[0] for the API say very clearly:
> >
> > "This operation recreates the root disk of the server."
> >
> > That was a lie for volume-backed instances for technical reasons. It was
> > a bug, not a feature.
> >
> > I also strongly believe that if we're going to add a "but not
> > really" flag, it needs to apply to volume-backed and regular instances
> > identically. Because that's what the change here was doing - unifying
> > the behavior for a single API operation. Going the other direction does
> > not seem useful to me.
> >
> > --Dan
> >
> > [0]
> https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/ce5209f2/attachment.htm>

From ces.eduardo98 at gmail.com  Thu Mar 16 14:26:53 2023
From: ces.eduardo98 at gmail.com (Carlos Silva)
Date: Thu, 16 Mar 2023 11:26:53 -0300
Subject: [manila] Bobcat vPTG slots and topics
Message-ID: <CAE51gQJiXAq6LrC8-NQ2E+1PtDu+O7ftqjxM6oGX5XBARHfhEQ@mail.gmail.com>

Hello, Zorillas!

PTG is right around the corner and I would like to remind you to please add
the topics you would like to bring up during our sessions to the planning
etherpad [1] until next Tuesday (Mar 21st).

I have already allocated some slots for our sessions:

   - Monday: 16:00 to 17:00 UTC
   - Wednesday: 14:00 to 16:00 UTC
   - Thursday: 14:00 to 16:00 UTC
   - Friday: 14:00 to 17:00 UTC


We will be meeting in the Austin room, you can access the meeting room
through the PTG page [2].

If you have a preference of date/time for your topic to be discussed,
please let me know and I will try to accommodate it.

Looking forward to meeting you!

[1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning
[2] https://ptg.opendev.org/ptg.html

Thanks,
carloss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/5befea09/attachment-0001.htm>

From jay at gr-oss.io  Thu Mar 16 15:40:14 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 16 Mar 2023 08:40:14 -0700
Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o
In-Reply-To: <d198947c-1009-00f5-688e-f2e024071bec@protonmail.com>
References: <d198947c-1009-00f5-688e-f2e024071bec@protonmail.com>
Message-ID: <CA+sTGNeXjMcxo-JiTthpT9hXXxUmLN4uHy-4mvgDzUtT1m6xeA@mail.gmail.com>

It's good that we get these out of the way of folks looking for modern
images. I'm on board.

Thanks,
Jay Faulkner
Ironic PTL
TC Member

On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur <dtantsur at protonmail.com>
wrote:

> Hi all,
>
> I would like to do a clean-up of old or wrongly created IPA images on
> the tarballs site. Before I do that, could you please check the proposed
> list to make sure we don't remove something we expect to be used?
>
> The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/
>
> Dmitry
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/66ba182d/attachment.htm>

From ralonsoh at redhat.com  Thu Mar 16 15:44:20 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 16 Mar 2023 16:44:20 +0100
Subject: [neutron] PTG poll for meeting slots
Message-ID: <CAECr9X4Za2u4UsWKTvFgR5EUhBdudO79fwk1SwNFz9N4W0cr=g@mail.gmail.com>

Hello Neutrinos:

This is the link [1] (that I should have sent yesterday) to vote for the
meeting slots during the PTG week. I think that 3 hours per day, from
Tuesday to Friday, will be enough to cover the topics we need to discuss.
Please feel free to send your vote here. Next week I'll close the poll and
schedule the meetings.

Thank you!

[1]https://doodle.com/meeting/participate/id/eZ83mmvb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/48522e5e/attachment.htm>

From ralonsoh at redhat.com  Thu Mar 16 16:15:42 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 16 Mar 2023 17:15:42 +0100
Subject: [neutron][release] Proposing transition to EOL Train (all Neutron
 related projects)
Message-ID: <CAECr9X7bJrrspPQ7q+YAkDN8iXSQ8gSha4+0h6dASZBXhGJdVw@mail.gmail.com>

Hello:

I'm sending this mail in advance to propose transitioning Neutron and all
related projects to EOL. I'll propose this topic too during the next
Neutron meeting.

The announcement is the first step [1] to transition a stable branch to EOL.

The patch to mark these branches as EOL will be pushed in two weeks. If you
have any inconvenience, please let me know in this mail chain or in IRC
(ralonsoh, #openstack-neutron channel). You can also contact any Neutron
core reviewer in the IRC channel.

Regards.

[1]
https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/7675f69c/attachment.htm>

From cboylan at sapwetik.org  Thu Mar 16 16:24:07 2023
From: cboylan at sapwetik.org (Clark Boylan)
Date: Thu, 16 Mar 2023 09:24:07 -0700
Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o
In-Reply-To: <CA+sTGNeXjMcxo-JiTthpT9hXXxUmLN4uHy-4mvgDzUtT1m6xeA@mail.gmail.com>
References: <d198947c-1009-00f5-688e-f2e024071bec@protonmail.com>
 <CA+sTGNeXjMcxo-JiTthpT9hXXxUmLN4uHy-4mvgDzUtT1m6xeA@mail.gmail.com>
Message-ID: <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com>

On Thu, Mar 16, 2023, at 8:40 AM, Jay Faulkner wrote:
> It's good that we get these out of the way of folks looking for modern 
> images. I'm on board.

I'm not familiar enough with IPA to know the answers to these questions, but I think they play an important role in the decision making here.

Can a user use the latest version of IPA with an old deployment of Ironic? If so why do we bother to publish a bunch of version and distro specific images? You should be able to keep an up to date image published that users find and use?

If the versions do matter then you should be very careful to avoid deleting images that a user may need to run with their version of Ironic.

>
> Thanks,
> Jay Faulkner
> Ironic PTL
> TC Member
>
> On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur <dtantsur at protonmail.com> wrote:
>> Hi all,
>> 
>> I would like to do a clean-up of old or wrongly created IPA images on
>> the tarballs site. Before I do that, could you please check the proposed
>> list to make sure we don't remove something we expect to be used?
>> 
>> The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/
>> 
>> Dmitry
>> 
>>


From Danny.Webb at thehutgroup.com  Thu Mar 16 16:43:02 2023
From: Danny.Webb at thehutgroup.com (Danny Webb)
Date: Thu, 16 Mar 2023 16:43:02 +0000
Subject: [neutron]
In-Reply-To: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
References: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
Message-ID: <LO2P265MB5773804149D68F2756AD206E9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>

Hi Kamil,

We're currently running 4 (soon to be 5) production regions all using kolla ansible as our deployer with OVN as our neutron backend.  It's been fairly solid for us and we've  had less issues with OVN than the traditional hybrid OVS / Iptables neutron driver (which we ran for about a year before switching to OVN).  Our regions are anywhere from 50-60 compute hosts with 1-2k+ VMs per region.  As far as I know most of the new development is going into OVN so would be a good place to start.  Ultimately, we've only really had 2 real issues whilst running it.  First was an issue where we had the provider network spamming gateway changes into southbound as we had our anycast SVI bound to our top of rack switches which made OVN keep updating it's location.  We mitigated this by moving the provider SVIs to our border routers and the issue went away and dropped the load on our OVN controllers significantly.   Only other real issue we had was during an upgrade of a region we ended up with what we believed to be some sort of stale flows that resulted in some hypervisors losing connectivity until we rebooted them.

Hope this helps!

Cheers,

Danny
________________________________
From: Kamil Madac <kamil.madac at gmail.com>
Sent: 14 March 2023 09:46
To: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: [neutron]


CAUTION: This email originates from outside THG

________________________________
Hi All,

I'm in the process of planning a small public cloud based on OpenStack. I have quite experience with kolla-ansible deployments which use OVS networking and I have no issues with that. It works stable for my use cases (Vlan provider networks, DVR, tenant networks, floating IPs).

For that new deployment I'm looking at OVN deployment which from what I read should be more performant (faster build of instances) and with ability to cover more networking features in OVN instead of needing external software like iptables/dnsmasq.

Does anyone use OVN in production and what is your experience (pros/cons)?
Is OVN mature enough to replace OVS in the production deployment (are there some basic features from OVS missing)?

Thanks in advance for sharing the experience.

--
Kamil Madac<https://kmadac.github.io/>

Danny Webb
Principal OpenStack Engineer
Danny.Webb at thehutgroup.com
[THG Ingenuity Logo]<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>  [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgplc?lang=en>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/a76998d6/attachment-0001.htm>

From jay at gr-oss.io  Thu Mar 16 16:48:43 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 16 Mar 2023 09:48:43 -0700
Subject: [ironic] [infra] Cleaning up old IPA images from tarballs.o.o
In-Reply-To: <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com>
References: <d198947c-1009-00f5-688e-f2e024071bec@protonmail.com>
 <CA+sTGNeXjMcxo-JiTthpT9hXXxUmLN4uHy-4mvgDzUtT1m6xeA@mail.gmail.com>
 <519d69da-a1fb-4db9-8594-a1417bbe2eac@app.fastmail.com>
Message-ID: <CA+sTGNdh=M_6a+H8SmSLYtwWmkrLacTRcfepVReX2z=uXbBXFg@mail.gmail.com>

The first half of the posted list are the ramdisk artifacts corresponding
to now-retired bugfix branches. I could see an argument being made that we
should continue providing those deliverables, as we do on PyPI -- I am OK
with deleting them even with that in mind, as they contain massively out of
date software (beyond IPA) that is likely unfit for running on production
servers anymore. These are potential targets for movement to a deprecated
location instead of deletion, if we feel we should continue providing them.

The second half of the list are extra dangerous; they are advertised as
builds from "master" branch, but they are very old and out of date due to
us no longer creating images for those distributions or using those tools
anymore. The CoreOS images mentioned are from 2016, to put this in
perspective. These should be deleted IMO, even if we find a softer way for
the retired bugfix branch ramdisks.

To be frank, if someone *is* consuming these old images, and deleting them
forced them to make a support contact with upstream, it'd likely end up
with a better solution for them overall than running years-old software on
their bare metal.

--
Jay Faulkner
Ironic PTL
TC Member

On Thu, Mar 16, 2023 at 9:32?AM Clark Boylan <cboylan at sapwetik.org> wrote:

> On Thu, Mar 16, 2023, at 8:40 AM, Jay Faulkner wrote:
> > It's good that we get these out of the way of folks looking for modern
> > images. I'm on board.
>
> I'm not familiar enough with IPA to know the answers to these questions,
> but I think they play an important role in the decision making here.
>
> Can a user use the latest version of IPA with an old deployment of Ironic?
> If so why do we bother to publish a bunch of version and distro specific
> images? You should be able to keep an up to date image published that users
> find and use?
>
> If the versions do matter then you should be very careful to avoid
> deleting images that a user may need to run with their version of Ironic.
>
> >
> > Thanks,
> > Jay Faulkner
> > Ironic PTL
> > TC Member
> >
> > On Thu, Mar 16, 2023 at 5:29?AM Dmitry Tantsur <dtantsur at protonmail.com>
> wrote:
> >> Hi all,
> >>
> >> I would like to do a clean-up of old or wrongly created IPA images on
> >> the tarballs site. Before I do that, could you please check the proposed
> >> list to make sure we don't remove something we expect to be used?
> >>
> >> The list is https://paste.opendev.org/show/btDps0HFoYG9TKMmv1LB/
> >>
> >> Dmitry
> >>
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/11339ad7/attachment.htm>

From thremes172 at gmail.com  Thu Mar 16 17:05:23 2023
From: thremes172 at gmail.com (kaqiu pi)
Date: Fri, 17 Mar 2023 01:05:23 +0800
Subject: Fwd: [kolla-ansible] Whether the cluster of two control nodes is
 available
In-Reply-To: <CAJbin1BVk8eM41MAF8H8J1+uZ256h9MT5or2zmFgAz3fk7tnWg@mail.gmail.com>
References: <CAJbin1BVk8eM41MAF8H8J1+uZ256h9MT5or2zmFgAz3fk7tnWg@mail.gmail.com>
Message-ID: <CAJbin1BMsbJBrcK-ffPeYiEwscUMzQpMfpWj9KTPTu9-wwSv_g@mail.gmail.com>

Hi?

I'm a newer in kolla-ansibe.  And I could deploy a cluster of two controll
nodes by kolla-ansible. But I don't konw whether the cluster is anvailable?

I would like to ask, when the number of control nodes is 2, the status of
mariadb and rabbitmq clusters, are they safe and available for production?

Thanks for any guidance.
Good Luck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/4eb41a0a/attachment.htm>

From Danny.Webb at thehutgroup.com  Thu Mar 16 18:05:48 2023
From: Danny.Webb at thehutgroup.com (Danny Webb)
Date: Thu, 16 Mar 2023 18:05:48 +0000
Subject: [kolla-ansible] Whether the cluster of two control nodes is
 available
In-Reply-To: <CAJbin1BMsbJBrcK-ffPeYiEwscUMzQpMfpWj9KTPTu9-wwSv_g@mail.gmail.com>
References: <CAJbin1BVk8eM41MAF8H8J1+uZ256h9MT5or2zmFgAz3fk7tnWg@mail.gmail.com>
 <CAJbin1BMsbJBrcK-ffPeYiEwscUMzQpMfpWj9KTPTu9-wwSv_g@mail.gmail.com>
Message-ID: <LO2P265MB57738E78D483DE0E3F8ED26A9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>

You can't do 2 control nodes with kolla as you need an odd number of mariadb nodes for quorum purposes (1 or 3 or more).
________________________________
From: kaqiu pi <thremes172 at gmail.com>
Sent: 16 March 2023 17:05
To: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Fwd: [kolla-ansible] Whether the cluster of two control nodes is available


CAUTION: This email originates from outside THG

________________________________
Hi?

I'm a newer in kolla-ansibe.  And I could deploy a cluster of two controll nodes by kolla-ansible. But I don't konw whether the cluster is anvailable?

I would like to ask, when the number of control nodes is 2, the status of mariadb and rabbitmq clusters, are they safe and available for production?

Thanks for any guidance.
Good Luck

Danny Webb
Principal OpenStack Engineer
Danny.Webb at thehutgroup.com
[THG Ingenuity Logo]<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>  [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgplc?lang=en>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/5b03cdf5/attachment.htm>

From ralonsoh at redhat.com  Thu Mar 16 16:56:26 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 16 Mar 2023 17:56:26 +0100
Subject: Neutron
In-Reply-To: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol>
References: <661430E8-438A-4619-AA52-E7FD09DA5966@hxcore.ol>
Message-ID: <CAECr9X4e8rmGAF=mpVeOz5ViSM18JR9B6FjN4V6ncXSUsFQhfA@mail.gmail.com>

Hello Shrey:

First of all, let me say that the ML2 Linux Bridge mechanism driver is now
considered as "experimental support". That means we no longer have active
developers working on this driver and we always recommend using others like
ML2/OVS or ML2/OVN (or ML2/SR-IOV in case you have the needed hardware).

Let me also point you to launchpad [1] that is the place to report a defect
like this one. Please open a bug in this link.

In order to debug and try to reproduce this issue, can you please print the
values you are passing in [2] (the name, the namespace name and kwargs)?

Thanks!

[1]https://bugs.launchpad.net/neutron/
[2]
https://github.com/openstack/neutron/blob/85b82d4452ed3199c7f1f7c2455d2a75faaa2991/neutron/agent/linux/ip_lib.py#L321


On Thu, Mar 16, 2023 at 2:19?PM Shrey Chauhan <shr.chauhan at gmail.com> wrote:

> Hi,
>
> I am running an openstack xena environment
>
>
>
> My neutron version
>
> dnf list installed | grep neutron
>
>
>
>
>
>
>
>
> *openstack-neutron.noarch 1:19.4.0-2.el8
> @ecnlocalrepoopenstack-neutron-common.noarch 1:19.4.0-2.el8
> @ecnlocalrepoopenstack-neutron-linuxbridge.noarch 1:19.4.0-2.el8
> @ecnlocalrepoopenstack-neutron-ml2.noarch 1:19.4.0-2.el8
> @ecnlocalrepopython3-neutron.noarch 1:19.4.0-2.el8
> @ecnlocalrepopython3-neutron-lib.noarch 2.15.2-1.el8
> @ecnlocalrepopython3-neutronclient.noarch 7.6.0-1.el8 @ecnlocalrepo*
>
>
> I observed when I create a vm, sometimes the vm was not getting the right
> dhcp ip which was getting assigned from openstack:
> I looked inside the vm dhcp call was just timing out, I am still not able
> to figure out what is wrong with the setup.
> One thing that we have noticed the Linux bridge logs and are just filled
> with these errors, the whole log file is just filled with these errors:
> 2023-03-16 07:38:29.182 118098 INFO
> neutron.plugins.ml2.drivers.agent._common_agent
> [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Port tapc0ef9dda-42
> updated. Details: {'device': 'tapc0ef9dda-42', 'network_id':
> 'e67e45d1-cf29-416b-89ef-353db4ef3586', 'port_id':
> 'c0ef9dda-4278-4e97-a014-09bc125bb57d', 'mac_address': 'fa:16:3e:32:f4:94',
> 'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 1908,
> 'physical_network': None, 'mtu': 1450, 'fixed_ips': [{'subnet_id':
> '67d6e448-0976-4fc8-a54e-7df40d0a438d', 'ip_address': '169.254.195.108'}],
> 'device_owner': 'network:router_ha_interface', 'allowed_address_pairs': [],
> 'port_security_enabled': False, 'qos_policy_id': None,
> 'network_qos_policy_id': None, 'profile': {}, 'propagate_uplink_status':
> False}
>
> 2023-03-16 07:38:29.191 118098 INFO
> neutron.plugins.ml2.drivers.linuxbridge.agent.arp_protect
> [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Skipping ARP spoofing
> rules for port 'tapc0ef9dda-42' because it has port security disabled
>
> 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-]   File
> "/usr/lib64/python3.6/threading.py", line 905, in _bootstrap
>
>     self._bootstrap_inner()
>
>   File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner
>
>     self.run()
>
>   File "/usr/lib64/python3.6/threading.py", line 885, in run
>
>     self._target(*self._args, **self._kwargs)
>
>   File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 69, in
> _worker
>
>     work_item.run()
>
>   File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
>
>     result = self.fn(*self.args, **self.kwargs)
>
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line
> 477, in _process_cmd
>
>     ret = func(*f_args, **f_kwargs)
>
>   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py",
> line 274, in _wrap
>
>     return func(*args, **kwargs)
>
>   File
> "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py",
> line 317, in create_interface
>
>     return ip.link("add", ifname=ifname, kind=kind, **kwargs)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/iproute/linux.py",
> line 1461, in link
>
>     msg_flags=msg_flags)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py",
> line 397, in nlm_request
>
>     return tuple(self._genlm_request(*argv, **kwarg))
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py",
> line 888, in nlm_request
>
>     self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/nlsocket.py",
> line 636, in put
>
>     self.sendto_gate(msg, addr)
>
>   File
> "/usr/lib/python3.6/site-packages/pr2modules/netlink/rtnl/iprsocket.py",
> line 61, in _gate_linux
>
>     msg.encode()
>
>   File
> "/usr/lib/python3.6/site-packages/pr2modules/netlink/rtnl/ifinfmsg/__init__.py",
> line 511, in encode
>
>     return super(ifinfbase, self).encode()
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1062, in encode
>
>     offset = self.encode_nlas(offset)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1323, in encode_nlas
>
>     nla_instance.encode()
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1062, in encode
>
>     offset = self.encode_nlas(offset)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1323, in encode_nlas
>
>     nla_instance.encode()
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1062, in encode
>
>     offset = self.encode_nlas(offset)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1323, in encode_nlas
>
>     nla_instance.encode()
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1059, in encode
>
>     offset, diff = self.ft_encode(offset)
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1493, in ft_encode
>
>     log.error(''.join(traceback.format_stack()))
>
>
>
> 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-] Traceback
> (most recent call last):
>
>   File "/usr/lib/python3.6/site-packages/pr2modules/netlink/__init__.py",
> line 1491, in ft_encode
>
>     struct.pack_into(efmt, self.data, offset, value)
>
> struct.error: required argument is not an integer
>
>
>
> 2023-03-16 07:38:29.239 118393 ERROR pr2modules.netlink [-] error pack: B
> b'inherit' <class 'bytes'>
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent
> [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Error in agent loop.
> Devices info: {'current': {'tapc0ef9dda-42'}, 'timestamps':
> {'tapc0ef9dda-42': 27}, 'added': {'tapc0ef9dda-42'}, 'removed': set(),
> 'updated': set()}: struct.error: required argument is not an integer
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call
> last):
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py",
> line 465, in daemon_loop
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     sync =
> self.process_network_devices(device_info)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in
> wrapper
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     result = f(*args,
> **kwargs)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py",
> line 214, in process_network_devices
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     resync_a =
> self.treat_devices_added_updated(devices_added_updated)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in
> wrapper
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     result = f(*args,
> **kwargs)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py",
> line 231, in treat_devices_added_updated
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent
> self._process_device_if_exists(device_details)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py",
> line 258, in _process_device_if_exists
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     device,
> device_details['device_owner'])
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 585, in plug_interface
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     network_segment.mtu)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 520, in add_tap_interface
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     return False
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in
> __exit__
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     self.force_reraise()
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in
> force_reraise
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     raise self.value
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 512, in add_tap_interface
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     tap_device_name,
> device_owner, mtu)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 545, in _add_tap_interface
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     mtu):
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 484, in ensure_physical_in_bridge
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     return
> self.ensure_vxlan_bridge(network_id, segmentation_id, mtu)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 259, in ensure_vxlan_bridge
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     interface =
> self.ensure_vxlan(segmentation_id, mtu)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py",
> line 356, in ensure_vxlan
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     self.local_int, **args)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 322,
> in add_vxlan
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent
> privileged.create_interface(name, self.namespace, "vxlan", **kwargs)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 272,
> in _wrap
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     r_call_timeout)
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent   File
> "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 216, in
> remote_call
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent     raise
> exc_type(*result[2])
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent struct.error: required
> argument is not an integer
>
> 2023-03-16 07:38:29.242 118098 ERROR
> neutron.plugins.ml2.drivers.agent._common_agent
>
> 2023-03-16 07:38:30.849 118098 INFO
> neutron.plugins.ml2.drivers.agent._common_agent
> [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Linux bridge agent
> Agent out of sync with plugin!
>
> 2023-03-16 07:38:30.850 118098 INFO neutron.agent.securitygroups_rpc
> [req-3e756b88-6d40-4aac-b161-4ec8dce4d1c9 - - - - -] Preparing filters for
> devices {'tapc0ef9dda-42'}
>
>
>
> I have been struggling with these in our environment, any suggestion what
> could be the reason behind this?
> What is wrong in my setup here?
> Thanks in advance for any help
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/d695c7c8/attachment-0001.htm>

From toheeb.olawale.to23 at gmail.com  Thu Mar 16 19:10:54 2023
From: toheeb.olawale.to23 at gmail.com (Toheeb Oyekola)
Date: Thu, 16 Mar 2023 20:10:54 +0100
Subject: [outreachy][cinder] Running test in dev Enviornment
Message-ID: <CABPq-iN=A=gSSu-4jt8p2P-Qw2mvLqz7wwpzaaiYE3UixncoYw@mail.gmail.com>

Hi everyone, I am having some error when i run "tox -e py3", here is a
linke to the error  https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/cf2d7533/attachment.htm>

From cboylan at sapwetik.org  Thu Mar 16 20:06:36 2023
From: cboylan at sapwetik.org (Clark Boylan)
Date: Thu, 16 Mar 2023 13:06:36 -0700
Subject: [outreachy][cinder] Running test in dev Enviornment
In-Reply-To: <CABPq-iN=A=gSSu-4jt8p2P-Qw2mvLqz7wwpzaaiYE3UixncoYw@mail.gmail.com>
References: <CABPq-iN=A=gSSu-4jt8p2P-Qw2mvLqz7wwpzaaiYE3UixncoYw@mail.gmail.com>
Message-ID: <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com>

On Thu, Mar 16, 2023, at 12:10 PM, Toheeb Oyekola wrote:
> Hi everyone, I am having some error when i run "tox -e py3", here is a 
> linke to the error  
> https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/.

The paste indicates "Error: pg_config executable not found." which appears to be necessary to install the psycopg2 PostgreSQL python library. Cinder's bindep.txt file [0] captures the system level dependencies for PostgreSQL which i expect cover this. You can use the bindep tool to ensure you've got all the necessary system libs installed. However, I note in your paste that you have windows filesystem paths and I'm not sure if bindep will run properly in that environment. We'd be happy to hear if it works or not as that is good info to have.

[0] https://opendev.org/openstack/cinder/src/branch/master/bindep.txt#L28-L31


From toheeb.olawale.to23 at gmail.com  Thu Mar 16 20:51:32 2023
From: toheeb.olawale.to23 at gmail.com (Toheeb Oyekola)
Date: Thu, 16 Mar 2023 21:51:32 +0100
Subject: [outreachy][cinder] Running test in dev Enviornment
In-Reply-To: <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com>
References: <CABPq-iN=A=gSSu-4jt8p2P-Qw2mvLqz7wwpzaaiYE3UixncoYw@mail.gmail.com>
 <2b73d833-e1b9-46a0-a7fb-f0fd956dec4b@app.fastmail.com>
Message-ID: <CABPq-iNmMH5=-N4Wo_4VeXQJab6mwc-sRD6jNSKBVFfqBn9cqQ@mail.gmail.com>

Thanks, I'll check it out now.

On Thu, Mar 16, 2023 at 9:07?PM Clark Boylan <cboylan at sapwetik.org> wrote:

> On Thu, Mar 16, 2023, at 12:10 PM, Toheeb Oyekola wrote:
> > Hi everyone, I am having some error when i run "tox -e py3", here is a
> > linke to the error
> > https://paste.openstack.org/show/b5SAyZ35FdDuK2sRwTxD/.
>
> The paste indicates "Error: pg_config executable not found." which appears
> to be necessary to install the psycopg2 PostgreSQL python library. Cinder's
> bindep.txt file [0] captures the system level dependencies for PostgreSQL
> which i expect cover this. You can use the bindep tool to ensure you've got
> all the necessary system libs installed. However, I note in your paste that
> you have windows filesystem paths and I'm not sure if bindep will run
> properly in that environment. We'd be happy to hear if it works or not as
> that is good info to have.
>
> [0]
> https://opendev.org/openstack/cinder/src/branch/master/bindep.txt#L28-L31
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/60906abc/attachment.htm>

From jay at gr-oss.io  Thu Mar 16 22:22:51 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Thu, 16 Mar 2023 15:22:51 -0700
Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra
 maintainers
Message-ID: <CA+sTGNfCy+w9uLqpz3KFEGo14h7Qxy_6gRx7htceSDKLABgpzQ@mail.gmail.com>

Hi PTLs,

The TC recently voted[1] to require humans be removed from PyPI access for
OpenStack-managed projects. This helps ensure all releases are created via
releases team tooling and makes it less likely for a user account
compromise to impact OpenStack packages.

Many projects have already updated
https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33 with
a list of packages that contain extra maintainers. We'd like to request
that PTLs, or their designate, reach out to any extra maintainers listed
for projects you are responsible for and request they remove their access
in accordance with policy. An example email, and detailed steps to follow
have been provided at
https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template
.

Thank you for your cooperation as we work to improve our security posture
and harden against supply chain attacks.

Thank you,
Jay Faulkner
TC Vice-Chair

1:
https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/8ca8b6fc/attachment.htm>

From alsotoes at gmail.com  Thu Mar 16 23:18:51 2023
From: alsotoes at gmail.com (Alvaro Soto)
Date: Thu, 16 Mar 2023 17:18:51 -0600
Subject: [kolla-ansible] Whether the cluster of two control nodes is
 available
In-Reply-To: <LO2P265MB57738E78D483DE0E3F8ED26A9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
References: <CAJbin1BVk8eM41MAF8H8J1+uZ256h9MT5or2zmFgAz3fk7tnWg@mail.gmail.com>
 <CAJbin1BMsbJBrcK-ffPeYiEwscUMzQpMfpWj9KTPTu9-wwSv_g@mail.gmail.com>
 <LO2P265MB57738E78D483DE0E3F8ED26A9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
Message-ID: <CA+eLJkb55LxomALo6-zUsFp6pG7bwLKCD3t-h1NYNCGE4fds3g@mail.gmail.com>

Just to complement Danny's comment: this applies to any kind of distributed
cluster; if you have an even number of members or only one member in a
cluster that requires a quorum to work, you will be prone to a
split-brain situation. So it's not safe for production.

https://en.wikipedia.org/wiki/Split-brain_(computing)

Cheers!

On Thu, Mar 16, 2023 at 12:12?PM Danny Webb <Danny.Webb at thehutgroup.com>
wrote:

> You can't do 2 control nodes with kolla as you need an odd number of
> mariadb nodes for quorum purposes (1 or 3 or more).
> ------------------------------
> *From:* kaqiu pi <thremes172 at gmail.com>
> *Sent:* 16 March 2023 17:05
> *To:* openstack-discuss at lists.openstack.org <
> openstack-discuss at lists.openstack.org>
> *Subject:* Fwd: [kolla-ansible] Whether the cluster of two control nodes
> is available
>
>
> * CAUTION: This email originates from outside THG *
> ------------------------------
> Hi?
>
> I'm a newer in kolla-ansibe.  And I could deploy a cluster of two controll
> nodes by kolla-ansible. But I don't konw whether the cluster is anvailable?
>
> I would like to ask, when the number of control nodes is 2, the status of
> mariadb and rabbitmq clusters, are they safe and available for production?
>
> Thanks for any guidance.
> Good Luck
>
> *Danny Webb*
> Principal OpenStack Engineer
> Danny.Webb at thehutgroup.com
> [image: THG Ingenuity Logo] <https://www.thg.com>
> <https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>
> <https://twitter.com/thgplc?lang=en>
>


-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/a2a21637/attachment-0001.htm>

From nguyenhuukhoinw at gmail.com  Fri Mar 17 00:22:26 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Fri, 17 Mar 2023 07:22:26 +0700
Subject: [kolla-ansible] migrate from OVS to OVN
Message-ID: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>

Hello guys.
Can we use kolla ansible to migrate from OVS to OVN? If then will it have
downtime or impacts?
Thank you much,
Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/f8a56be4/attachment.htm>

From nguyenhuukhoinw at gmail.com  Fri Mar 17 00:55:11 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Fri, 17 Mar 2023 07:55:11 +0700
Subject: [magnum][kolla ansible]Ask about use higher magnum image on previous
 opentack version
Message-ID: <CABAODRe=xo7DqE6dcBY-uUdZ+RfzA-kg0BtFy3Mu9wijTyB+-g@mail.gmail.com>

Hello guys.
I use Openstack Xena by using Kolla Ansible tool. due to some reason, I
want to use Zed Magnum on my current system(Xena). Can I do this task by
rebuilding Magnum images from code then retagging and replacing the Magnum
container with new images? Any experience for this?
Thank you very much.
Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/097830b6/attachment.htm>

From mnaser at vexxhost.com  Fri Mar 17 04:50:38 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Fri, 17 Mar 2023 04:50:38 +0000
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CANWE-Cncbcs0sX2vTJo8CQ3dsx8t_HN34uYzsB+HSH_=rS_hLA@mail.gmail.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
 <CANWE-Cncbcs0sX2vTJo8CQ3dsx8t_HN34uYzsB+HSH_=rS_hLA@mail.gmail.com>
Message-ID: <YQXP288MB001264261B4892B0C29131E4A0BD9@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>

IMHO, 0.001% of the time someone might be running rebuild to do something that?s to fix an issue in metadata or something (and probably an operator too) and 99.999% of the time it?s a user expecting a fresh VM

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Sylvain Bauza <sylvain.bauza at gmail.com>
Sent: Thursday, March 16, 2023 10:21:14 AM
To: Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: Re: [nova][cinder] future of rebuild without reimaging


Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov <noonedeadpunk at gmail.com<mailto:noonedeadpunk at gmail.com>> a ?crit :
> Maybe I'm missing something, but what are the reasons you would want to
> rebuild an instance without ... rebuilding it?

I think it might be the case of rescheduling the VM to other compute
without a long-lasting shelve/unshelve and when you don't need to
change the flavor. So kind of self-service when the user does detect
some weirdness, but before bothering the tech team will attempt to
reschedule to another compute on their own.


We already have an existing API method for this, which is 'cold-migrate' (and it does the same that resize, without changing the flavor)


??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com<mailto:dms at danplanet.com>>:
>
> >  We have users who use 'rebuild' on volume booted servers before nova
> >  microversion 2.93, relying on the behavior that it keeps the volume as
> >  is. And they would like to keep doing this even after the openstack
> >  distro moves to a(n at least) zed base (sometime in the future).
>
> Maybe I'm missing something, but what are the reasons you would want to
> rebuild an instance without ... rebuilding it?
>
> I assume it's because you want to redefine the metadata or name or
> something. There's a reason why those things are not easily mutable
> today, and why we had a lot of discussion on how to make user metadata
> mutable on an existing instance in the last cycle. However, I would
> really suggest that we not override "recreate the thing" to "maybe
> recreate the thing or just update a few fields". Instead, for things we
> think really should be mutable on a server at runtime, we should
> probably just do that.
>
> Imagine if the way you changed permissions recursively was to run 'rm
> -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> that is (IMHO) what "recreate but don't just change $name" means to a
> user.
>
> >  As a naive user, it seems to me both behaviors make sense. I can
> >  easily imagine use cases for rebuild with and without reimaging.
>
> I think that's because you're already familiar with the difference. For
> users not already in that mindset, I think it probably seems very weird
> that rebuild is destructive in one case and not the other.
>
> >  Then there are a few hypothetical situations like:
> >  a) Rebuild gets a new api feature (in a new microversion) which can
> >  never be combined with the do-not-reimage behavior.
> >  b) Rebuild may have a bug, whose fix requires a microversion bump.
> >  This again can never be combined with the old behavior.
> >
> >  What do you think, are these concerns purely theoretical or real?
> >  If we would like to keep having rebuild without reimaging, can we rely
> >  on the old microversion indefinitely?
> >  Alternatively shall we propose and implement a nova spec to explicitly
> >  expose the choice in the rebuild api (just to express the idea: osc
> >  server rebuild --reimage|--no-reimage)?
> >
> > I'm not opposed to challenge the usecases in a spec, for sure.
>
> I really want to know what the use-case is for "rebuild but not
> really". And also what "rebuild" means to a user if --no-reimage is
> passed. What's being rebuilt? The docs[0] for the API say very clearly:
>
> "This operation recreates the root disk of the server."
>
> That was a lie for volume-backed instances for technical reasons. It was
> a bug, not a feature.
>
> I also strongly believe that if we're going to add a "but not
> really" flag, it needs to apply to volume-backed and regular instances
> identically. Because that's what the change here was doing - unifying
> the behavior for a single API operation. Going the other direction does
> not seem useful to me.
>
> --Dan
>
> [0] https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/77ba4e69/attachment-0001.htm>

From mnasiadka at gmail.com  Fri Mar 17 07:31:35 2023
From: mnasiadka at gmail.com (=?UTF-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 17 Mar 2023 08:31:35 +0100
Subject: [kolla-ansible] migrate from OVS to OVN
In-Reply-To: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>
References: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>
Message-ID: <CANhpHj9h1EJqSGqeORe473OwQW3AFDLO1yyxyQSmoLpxvpp8AQ@mail.gmail.com>

Hello,

Kolla-Ansible does not support migration from OVS to OVN yet.

Best regards,
Michal

W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
napisa?(a):

> Hello guys.
> Can we use kolla ansible to migrate from OVS to OVN? If then will it have
> downtime or impacts?
> Thank you much,
> Nguyen Huu Khoi
>
-- 
Micha? Nasiadka
mnasiadka at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/5b311c35/attachment.htm>

From nguyenhuukhoinw at gmail.com  Fri Mar 17 07:35:18 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Fri, 17 Mar 2023 14:35:18 +0700
Subject: [kolla-ansible] migrate from OVS to OVN
In-Reply-To: <CANhpHj9h1EJqSGqeORe473OwQW3AFDLO1yyxyQSmoLpxvpp8AQ@mail.gmail.com>
References: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>
 <CANhpHj9h1EJqSGqeORe473OwQW3AFDLO1yyxyQSmoLpxvpp8AQ@mail.gmail.com>
Message-ID: <CABAODRdTmYSZH5Esr8d3RjBwiXPo9AqwFCpM4V0k+Reqb=oLOw@mail.gmail.com>

Thank you for your information.
Will we do it in the future?
Nguyen Huu Khoi


On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka <mnasiadka at gmail.com> wrote:

> Hello,
>
> Kolla-Ansible does not support migration from OVS to OVN yet.
>
> Best regards,
> Michal
>
> W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> napisa?(a):
>
>> Hello guys.
>> Can we use kolla ansible to migrate from OVS to OVN? If then will it have
>> downtime or impacts?
>> Thank you much,
>> Nguyen Huu Khoi
>>
> --
> Micha? Nasiadka
> mnasiadka at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/a58758a6/attachment.htm>

From mnasiadka at gmail.com  Fri Mar 17 07:51:38 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 17 Mar 2023 08:51:38 +0100
Subject: [kolla-ansible] migrate from OVS to OVN
In-Reply-To: <CABAODRdTmYSZH5Esr8d3RjBwiXPo9AqwFCpM4V0k+Reqb=oLOw@mail.gmail.com>
References: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>
 <CANhpHj9h1EJqSGqeORe473OwQW3AFDLO1yyxyQSmoLpxvpp8AQ@mail.gmail.com>
 <CABAODRdTmYSZH5Esr8d3RjBwiXPo9AqwFCpM4V0k+Reqb=oLOw@mail.gmail.com>
Message-ID: <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com>

No contributors have mentioned that they want to contribute this feature for now, but I?ll add this topic for the upcoming PTG.

Best regards,
Michal

> On 17 Mar 2023, at 08:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com> wrote:
> 
> Thank you for your information. 
> Will we do it in the future?
> Nguyen Huu Khoi
> 
> 
> On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka <mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> wrote:
>> Hello,
>> 
>> Kolla-Ansible does not support migration from OVS to OVN yet.
>> 
>> Best regards,
>> Michal
>> 
>> W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> napisa?(a):
>>> Hello guys.
>>> Can we use kolla ansible to migrate from OVS to OVN? If then will it have downtime or impacts?
>>> Thank you much,
>>> Nguyen Huu Khoi
>> -- 
>> Micha? Nasiadka
>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/60387898/attachment.htm>

From noonedeadpunk at gmail.com  Fri Mar 17 08:04:53 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Fri, 17 Mar 2023 09:04:53 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <YQXP288MB001264261B4892B0C29131E4A0BD9@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
 <CANWE-Cncbcs0sX2vTJo8CQ3dsx8t_HN34uYzsB+HSH_=rS_hLA@mail.gmail.com>
 <YQXP288MB001264261B4892B0C29131E4A0BD9@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>
Message-ID: <CAPd_6Avog5uOianOoSELbH3FrOmdUxV=73usrGG+Rsg+F9BTBg@mail.gmail.com>

Just in case I wasn't saying anything about how legit or widespread this
use case is, I was just providing an example of how rebuild without real
rebuild could be leveraged by operators.

Regarding cold migrate, I'd love to have then another policy, like
os_compute_api:os-migrate-server:migrate-specify-host
or smth, so that non-admins could not pick any arbitrary compute and had to
rely on scheduler only.


??, 17 ???. 2023 ?., 05:50 Mohammed Naser <mnaser at vexxhost.com>:

> IMHO, 0.001% of the time someone might be running rebuild to do something
> that?s to fix an issue in metadata or something (and probably an operator
> too) and 99.999% of the time it?s a user expecting a fresh VM
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
> *From:* Sylvain Bauza <sylvain.bauza at gmail.com>
> *Sent:* Thursday, March 16, 2023 10:21:14 AM
> *To:* Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
> *Cc:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Subject:* Re: [nova][cinder] future of rebuild without reimaging
>
>
>
> Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
> a ?crit :
>
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
>
> I think it might be the case of rescheduling the VM to other compute
> without a long-lasting shelve/unshelve and when you don't need to
> change the flavor. So kind of self-service when the user does detect
> some weirdness, but before bothering the tech team will attempt to
> reschedule to another compute on their own.
>
>
> We already have an existing API method for this, which is 'cold-migrate'
> (and it does the same that resize, without changing the flavor)
>
>
> ??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com>:
> >
> > >  We have users who use 'rebuild' on volume booted servers before nova
> > >  microversion 2.93, relying on the behavior that it keeps the volume as
> > >  is. And they would like to keep doing this even after the openstack
> > >  distro moves to a(n at least) zed base (sometime in the future).
> >
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
> >
> > I assume it's because you want to redefine the metadata or name or
> > something. There's a reason why those things are not easily mutable
> > today, and why we had a lot of discussion on how to make user metadata
> > mutable on an existing instance in the last cycle. However, I would
> > really suggest that we not override "recreate the thing" to "maybe
> > recreate the thing or just update a few fields". Instead, for things we
> > think really should be mutable on a server at runtime, we should
> > probably just do that.
> >
> > Imagine if the way you changed permissions recursively was to run 'rm
> > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> > that is (IMHO) what "recreate but don't just change $name" means to a
> > user.
> >
> > >  As a naive user, it seems to me both behaviors make sense. I can
> > >  easily imagine use cases for rebuild with and without reimaging.
> >
> > I think that's because you're already familiar with the difference. For
> > users not already in that mindset, I think it probably seems very weird
> > that rebuild is destructive in one case and not the other.
> >
> > >  Then there are a few hypothetical situations like:
> > >  a) Rebuild gets a new api feature (in a new microversion) which can
> > >  never be combined with the do-not-reimage behavior.
> > >  b) Rebuild may have a bug, whose fix requires a microversion bump.
> > >  This again can never be combined with the old behavior.
> > >
> > >  What do you think, are these concerns purely theoretical or real?
> > >  If we would like to keep having rebuild without reimaging, can we rely
> > >  on the old microversion indefinitely?
> > >  Alternatively shall we propose and implement a nova spec to explicitly
> > >  expose the choice in the rebuild api (just to express the idea: osc
> > >  server rebuild --reimage|--no-reimage)?
> > >
> > > I'm not opposed to challenge the usecases in a spec, for sure.
> >
> > I really want to know what the use-case is for "rebuild but not
> > really". And also what "rebuild" means to a user if --no-reimage is
> > passed. What's being rebuilt? The docs[0] for the API say very clearly:
> >
> > "This operation recreates the root disk of the server."
> >
> > That was a lie for volume-backed instances for technical reasons. It was
> > a bug, not a feature.
> >
> > I also strongly believe that if we're going to add a "but not
> > really" flag, it needs to apply to volume-backed and regular instances
> > identically. Because that's what the change here was doing - unifying
> > the behavior for a single API operation. Going the other direction does
> > not seem useful to me.
> >
> > --Dan
> >
> > [0]
> https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/2b4d3dd1/attachment-0001.htm>

From nguyenhuukhoinw at gmail.com  Fri Mar 17 08:05:26 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Fri, 17 Mar 2023 15:05:26 +0700
Subject: [kolla-ansible] migrate from OVS to OVN
In-Reply-To: <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com>
References: <CABAODRfN2uT686wc2-863ztMmCKRsiBsHepwDQfsaMx-rcyasQ@mail.gmail.com>
 <CANhpHj9h1EJqSGqeORe473OwQW3AFDLO1yyxyQSmoLpxvpp8AQ@mail.gmail.com>
 <CABAODRdTmYSZH5Esr8d3RjBwiXPo9AqwFCpM4V0k+Reqb=oLOw@mail.gmail.com>
 <5898EAF6-16E1-4BB8-8F57-070E84D6431F@gmail.com>
Message-ID: <CABAODRctO7jDArQ+zicyfB==ac89OLBxT17simNQSU6VPSk5Mg@mail.gmail.com>

Thank you very much. :)
Nguyen Huu Khoi


On Fri, Mar 17, 2023 at 2:51?PM Micha? Nasiadka <mnasiadka at gmail.com> wrote:

> No contributors have mentioned that they want to contribute this feature
> for now, but I?ll add this topic for the upcoming PTG.
>
> Best regards,
> Michal
>
> On 17 Mar 2023, at 08:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> wrote:
>
> Thank you for your information.
> Will we do it in the future?
> Nguyen Huu Khoi
>
>
> On Fri, Mar 17, 2023 at 2:31?PM Micha? Nasiadka <mnasiadka at gmail.com>
> wrote:
>
>> Hello,
>>
>> Kolla-Ansible does not support migration from OVS to OVN yet.
>>
>> Best regards,
>> Michal
>>
>> W dniu pt., 17.03.2023 o 01:26 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
>> napisa?(a):
>>
>>> Hello guys.
>>> Can we use kolla ansible to migrate from OVS to OVN? If then will it
>>> have downtime or impacts?
>>> Thank you much,
>>> Nguyen Huu Khoi
>>>
>> --
>> Micha? Nasiadka
>> mnasiadka at gmail.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/4327259f/attachment.htm>

From sbauza at redhat.com  Fri Mar 17 08:52:55 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Fri, 17 Mar 2023 09:52:55 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CAPd_6Avog5uOianOoSELbH3FrOmdUxV=73usrGG+Rsg+F9BTBg@mail.gmail.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAPd_6AvavNWM5GejFdHL+zSeOn2TJ3LOdHo70jECCKQK+RH+hg@mail.gmail.com>
 <CANWE-Cncbcs0sX2vTJo8CQ3dsx8t_HN34uYzsB+HSH_=rS_hLA@mail.gmail.com>
 <YQXP288MB001264261B4892B0C29131E4A0BD9@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>
 <CAPd_6Avog5uOianOoSELbH3FrOmdUxV=73usrGG+Rsg+F9BTBg@mail.gmail.com>
Message-ID: <CALOCmungz6JUD137c-+39vnMtcc+9ADs71FY=oO3udo07Bq+Tg@mail.gmail.com>

Le ven. 17 mars 2023 ? 09:10, Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
a ?crit :

> Just in case I wasn't saying anything about how legit or widespread this
> use case is, I was just providing an example of how rebuild without real
> rebuild could be leveraged by operators.
>
> Regarding cold migrate, I'd love to have then another policy, like os_compute_api:os-migrate-server:migrate-specify-host
> or smth, so that non-admins could not pick any arbitrary compute and had
> to rely on scheduler only.
>
>

Ah, I see your point, I'll add it for the vPTG agenda.

-Sylvain

??, 17 ???. 2023 ?., 05:50 Mohammed Naser <mnaser at vexxhost.com>:

> IMHO, 0.001% of the time someone might be running rebuild to do something
> that?s to fix an issue in metadata or something (and probably an operator
> too) and 99.999% of the time it?s a user expecting a fresh VM
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
> *From:* Sylvain Bauza <sylvain.bauza at gmail.com>
> *Sent:* Thursday, March 16, 2023 10:21:14 AM
> *To:* Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
> *Cc:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Subject:* Re: [nova][cinder] future of rebuild without reimaging
>
>
>
> Le jeu. 16 mars 2023 ? 13:38, Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
> a ?crit :
>
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
>
> I think it might be the case of rescheduling the VM to other compute
> without a long-lasting shelve/unshelve and when you don't need to
> change the flavor. So kind of self-service when the user does detect
> some weirdness, but before bothering the tech team will attempt to
> reschedule to another compute on their own.
>
>
> We already have an existing API method for this, which is 'cold-migrate'
> (and it does the same that resize, without changing the flavor)
>
>
> ??, 15 ???. 2023??. ? 19:57, Dan Smith <dms at danplanet.com>:
> >
> > >  We have users who use 'rebuild' on volume booted servers before nova
> > >  microversion 2.93, relying on the behavior that it keeps the volume as
> > >  is. And they would like to keep doing this even after the openstack
> > >  distro moves to a(n at least) zed base (sometime in the future).
> >
> > Maybe I'm missing something, but what are the reasons you would want to
> > rebuild an instance without ... rebuilding it?
> >
> > I assume it's because you want to redefine the metadata or name or
> > something. There's a reason why those things are not easily mutable
> > today, and why we had a lot of discussion on how to make user metadata
> > mutable on an existing instance in the last cycle. However, I would
> > really suggest that we not override "recreate the thing" to "maybe
> > recreate the thing or just update a few fields". Instead, for things we
> > think really should be mutable on a server at runtime, we should
> > probably just do that.
> >
> > Imagine if the way you changed permissions recursively was to run 'rm
> > -Rf --no-delete-just-change-ownership'. That would be kinda crazy, but
> > that is (IMHO) what "recreate but don't just change $name" means to a
> > user.
> >
> > >  As a naive user, it seems to me both behaviors make sense. I can
> > >  easily imagine use cases for rebuild with and without reimaging.
> >
> > I think that's because you're already familiar with the difference. For
> > users not already in that mindset, I think it probably seems very weird
> > that rebuild is destructive in one case and not the other.
> >
> > >  Then there are a few hypothetical situations like:
> > >  a) Rebuild gets a new api feature (in a new microversion) which can
> > >  never be combined with the do-not-reimage behavior.
> > >  b) Rebuild may have a bug, whose fix requires a microversion bump.
> > >  This again can never be combined with the old behavior.
> > >
> > >  What do you think, are these concerns purely theoretical or real?
> > >  If we would like to keep having rebuild without reimaging, can we rely
> > >  on the old microversion indefinitely?
> > >  Alternatively shall we propose and implement a nova spec to explicitly
> > >  expose the choice in the rebuild api (just to express the idea: osc
> > >  server rebuild --reimage|--no-reimage)?
> > >
> > > I'm not opposed to challenge the usecases in a spec, for sure.
> >
> > I really want to know what the use-case is for "rebuild but not
> > really". And also what "rebuild" means to a user if --no-reimage is
> > passed. What's being rebuilt? The docs[0] for the API say very clearly:
> >
> > "This operation recreates the root disk of the server."
> >
> > That was a lie for volume-backed instances for technical reasons. It was
> > a bug, not a feature.
> >
> > I also strongly believe that if we're going to add a "but not
> > really" flag, it needs to apply to volume-backed and regular instances
> > identically. Because that's what the change here was doing - unifying
> > the behavior for a single API operation. Going the other direction does
> > not seem useful to me.
> >
> > --Dan
> >
> > [0]
> https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/0b358b0c/attachment.htm>

From hiromu.asahina.az at hco.ntt.co.jp  Fri Mar 17 09:57:35 2023
From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina)
Date: Fri, 17 Mar 2023 18:57:35 +0900
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
Message-ID: <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>

Thank you for your reply.

I'd like to decide the time slot for this topic.
I just checked PTG schedule [1].

We have the following time slots. Which one is convenient to gether?
(I didn't get reply but I listed Barbican, as its cores are almost the 
same as Keystone)

Mon, 27:

- 14 (keystone)
- 15 (keystone)

Tue, 28

- 13 (barbican)
- 14 (keystone, ironic)
- 15 (keysonte, ironic)
- 16 (ironic)

Wed, 29

- 13 (ironic)
- 14 (keystone, ironic)
- 15 (keystone, ironic)
- 21 (ironic)

Thanks,

[1] https://ptg.opendev.org/ptg.html

Hiromu Asahina


On 2023/02/11 1:41, Jay Faulkner wrote:
> I think it's safe to say the Ironic community would be very invested in
> such an effort. Let's make sure the time chosen for vPTG with this is such
> that Ironic contributors can attend as well.
> 
> Thanks,
> Jay Faulkner
> Ironic PTL
> 
> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
> 
>> Hello Everyone,
>>
>> Recently, Tacker and Keystone have been working together on a new Keystone
>> Middleware that can work with external authentication
>> services, such as Keycloak. The code has already been submitted [1], but
>> we want to make this middleware a generic plugin that works
>> with as many OpenStack services as possible. To that end, we would like to
>> hear from other projects with similar use cases
>> (especially Ironic and Barbican, which run as standalone services). We
>> will make a time slot to discuss this topic at the next vPTG.
>> Please contact me if you are interested and available to participate.
>>
>> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>>
>> --
>> Hiromu Asahina
>>
>>
>>
>>
> 

-- 
?-------------------------------------?
    NTT Network Innovation Center
      Hiromu Asahina
   -------------------------------------
    3-9-11, Midori-cho, Musashino-shi
      Tokyo 180-8585, Japan
? Phone: +81-422-59-7008
? Email: hiromu.asahina.az at hco.ntt.co.jp
?-------------------------------------?


From mnasiadka at gmail.com  Fri Mar 17 10:52:32 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 17 Mar 2023 11:52:32 +0100
Subject: [kolla] Bobcat vPTG slots and topics
Message-ID: <E10477AA-01CF-4606-B2B7-AE0B5E88D7C2@gmail.com>

Hello, Koalas!

Allocated slots for Kolla sessions:

27-30 March 2023:
Monday - 13.00 - 17.00 UTC (general, Kolla and Kolla-Ansible)
Tuesday - 13.00 - 15.00 UTC (Kolla-Ansible)
Tuesday - 15.00 - 17.00 UTC (Operator Hours Kolla)
Thursday - 13.00 - 15.00 UTC (Kayobe)

Please look at Kolla planning etherpad [1] and fill out topic proposals.
Looking forward to meeting you!

[1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning
[2] https://ptg.opendev.org/ptg.html

Thanks,
mnasiadka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/d852c8b7/attachment.htm>

From manchandavishal143 at gmail.com  Fri Mar 17 10:59:37 2023
From: manchandavishal143 at gmail.com (vishal manchanda)
Date: Fri, 17 Mar 2023 16:29:37 +0530
Subject: [horizon] Bobcat PTG Schedule
Message-ID: <CADrq38snpYpSc2MkE1FRW118KN0gFRaFR0jaoUrMiPCn8Fi=fQ@mail.gmail.com>

Hello Team,

Please Find the Schedule for Horizon Bobcat PTG in the eherpad [1].
Feel Free to add the topics you want to discuss in the PTG.

Don't forget to register for PTG, if not done yet [2].
See you at the PTG!

Thanks & Regards,
Vishal Manchanda (irc: vishalmanchanda)

[1] https://etherpad.opendev.org/p/horizon-bobcat-ptg
[2] https://openinfra-ptg.eventbrite.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/73161b89/attachment.htm>

From mkopec at redhat.com  Fri Mar 17 13:20:17 2023
From: mkopec at redhat.com (Martin Kopec)
Date: Fri, 17 Mar 2023 14:20:17 +0100
Subject: [qa][ptg] Virtual Bobcat vPTG Planning
Message-ID: <CAKZGdE3RG-_dLZxr0Jbza19648dFzpVxt40-VRDxptCHNQnO-Q@mail.gmail.com>

Hello everyone,

here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your topics
there if there is anything you would like to discuss / propose ...
You can also vote for time slots for our sessions so that they fit your
schedule at [2].

We will go most likely with 1-hour slot per day, as they usually fit easier
into everyone's schedule. The number of slots will depend on the number of
topics proposed in [1].

[1] https://etherpad.opendev.org/p/qa-bobcat-ptg
[2] https://framadate.org/sLZppMVkFw2FcEhO

Thanks,
-- 
Martin Kopec
Senior Software Quality Engineer
Red Hat EMEA
IM: kopecmartin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/b12d4c37/attachment.htm>

From hberaud at redhat.com  Fri Mar 17 14:19:12 2023
From: hberaud at redhat.com (Herve Beraud)
Date: Fri, 17 Mar 2023 15:19:12 +0100
Subject: [release] Release countdown for week R-0, March 20-24
Message-ID: <CAFDq9gXjzh6LfTJAzWryaMcy7jBxCAyv1NVpHGPAjesb7fhT+g@mail.gmail.com>

Development Focus
-----------------

We will be releasing the coordinated OpenStack Antelope 2023.1 release next
week,
on March 22. Thanks to everyone involved in the Antelope 2023.1 cycle!

We are now in pre-release freeze, so no new deliverable will be created
until final release, unless a release-critical regression is spotted.

Otherwise, teams attending the virtual PTG should start to plan
what they will be discussing there, by creating and filling team etherpads.
You can access the list of PTG etherpads at:

http://ptg.openstack.org/etherpads.html

General Information
-------------------

On release day, the release team will produce final versions of
deliverables following the cycle-with-rc release model, by re-tagging
the commit used for the last RC.

A patch doing just that will be proposed. PTLs and release liaisons should
watch for that final release patch from the release team. While not
required, we would appreciate having an ack from each team before we
approve it on the 22nd, so that their approval is included in the metadata
that goes onto the signed tag.

Upcoming Deadlines & Dates
--------------------------

Final Antelope 2023.1 release: March 22
Virtual PTG: March 27-31

-- 
Herv? Beraud
Senior Software Engineer at Red Hat
irc: hberaud
https://github.com/4383/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/4cb3b683/attachment.htm>

From juliaashleykreger at gmail.com  Fri Mar 17 14:29:31 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Fri, 17 Mar 2023 07:29:31 -0700
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
Message-ID: <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>

I'm not sure how many Ironic contributors would be the ones to attend a
discussion, in part because this is disjointed from the items they need to
focus on. It is much more of a "big picture" item for those of us who are
leaders in the project.

I think it would help to understand how much time you expect the discussion
to take to determine a path forward and how we can collaborate. Ironic has
a huge number of topics we want to discuss during the PTG, and I suspect
our team meeting on Monday next week should yield more interest/awareness
as well as an amount of time for each topic which will aid us in scheduling.

If you can let us know how long, then I think we can figure out when the
best day/time will be.

Thanks!

-Julia


On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
hiromu.asahina.az at hco.ntt.co.jp> wrote:

> Thank you for your reply.
>
> I'd like to decide the time slot for this topic.
> I just checked PTG schedule [1].
>
> We have the following time slots. Which one is convenient to gether?
> (I didn't get reply but I listed Barbican, as its cores are almost the
> same as Keystone)
>
> Mon, 27:
>
> - 14 (keystone)
> - 15 (keystone)
>
> Tue, 28
>
> - 13 (barbican)
> - 14 (keystone, ironic)
> - 15 (keysonte, ironic)
> - 16 (ironic)
>
> Wed, 29
>
> - 13 (ironic)
> - 14 (keystone, ironic)
> - 15 (keystone, ironic)
> - 21 (ironic)
>
> Thanks,
>
> [1] https://ptg.opendev.org/ptg.html
>
> Hiromu Asahina
>
>
> On 2023/02/11 1:41, Jay Faulkner wrote:
> > I think it's safe to say the Ironic community would be very invested in
> > such an effort. Let's make sure the time chosen for vPTG with this is
> such
> > that Ironic contributors can attend as well.
> >
> > Thanks,
> > Jay Faulkner
> > Ironic PTL
> >
> > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> >
> >> Hello Everyone,
> >>
> >> Recently, Tacker and Keystone have been working together on a new
> Keystone
> >> Middleware that can work with external authentication
> >> services, such as Keycloak. The code has already been submitted [1], but
> >> we want to make this middleware a generic plugin that works
> >> with as many OpenStack services as possible. To that end, we would like
> to
> >> hear from other projects with similar use cases
> >> (especially Ironic and Barbican, which run as standalone services). We
> >> will make a time slot to discuss this topic at the next vPTG.
> >> Please contact me if you are interested and available to participate.
> >>
> >> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
> >>
> >> --
> >> Hiromu Asahina
> >>
> >>
> >>
> >>
> >
>
> --
> ?-------------------------------------?
>     NTT Network Innovation Center
>       Hiromu Asahina
>    -------------------------------------
>     3-9-11, Midori-cho, Musashino-shi
>       Tokyo 180-8585, Japan
> Phone: +81-422-59-7008
> Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/a535bee2/attachment.htm>

From ihrachys at redhat.com  Fri Mar 17 15:07:44 2023
From: ihrachys at redhat.com (Ihar Hrachyshka)
Date: Fri, 17 Mar 2023 11:07:44 -0400
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
Message-ID: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>

Hi all,

(I've tagged the thread with [ovn] because this question was raised in
the context of OVN, but it really is about the intent of neutron
stateless SG API.)

Neutron API supports 'stateless' field for security groups:
https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group

The API reference doesn't explain the intent of the API, merely
walking through the field mechanics, as in

"The stateful security group extension (stateful-security-group) adds
the stateful field to security groups, allowing users to configure
stateful or stateless security groups for ports. The existing security
groups will all be considered as stateful. Update of the stateful
attribute is allowed when there is no port associated with the
security group."

The meaning of the API is left for users to deduce. It's customary
understood as something like

"allowing to bypass connection tracking in the firewall, potentially
providing performance and simplicity benefits" (while imposing
additional complexity onto rule definitions - the user now has to
explicitly define rules for both directions of a duplex connection.)
[This is not an official definition, nor it's quoted from a respected
source, please don't criticize it. I don't think this is an important
point here.]

Either way, the definition doesn't explain what should happen with
basic network services that a user of Neutron SG API is used to rely
on. Specifically, what happens for a port related to a stateless SG
when it trying to fetch metadata from 169.254.169.254 (or its IPv6
equivalent), or what happens when it attempts to use SLAAC / DHCPv6
procedure to configure its IPv6 stack.

As part of our testing of stateless SG implementation for OVN backend,
we've noticed that VMs fail to configure via metadata, or use SLAAC to
configure IPv6.

metadata: https://bugs.launchpad.net/neutron/+bug/2009053
slaac: https://bugs.launchpad.net/neutron/+bug/2006949

We've noticed that adding explicit SG rules to allow 'returning'
communication for 169.254.169.254:80 and RA / NA fixes the problem.

I figured that these services are "base" / "basic" and should be
provided to ports regardless of the stateful-ness of SG. I proposed
patches for this here:

metadata series: https://review.opendev.org/q/topic:bug%252F2009053
RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049

Discussion in the patch that adjusts the existing stateless SG test
scenarios to not create explicit SG rules for metadata and ICMP
replies suggests that it's not a given / common understanding that
these "base" services should work by default for stateless SGs.

See discussion in comments here:
https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692

While this discussion is happening in the context of OVN, I think it
should be resolved in a broader context. Specifically, a decision
should be made about what Neutron API "means" by stateless SGs, and
how "base" services are supposed to behave. Then backends can act
accordingly.

There's also an open question of how this should be implemented.
Whether Neutron would like to create explicit SG rules visible in API
that would allow for the returning traffic and that could be deleted
as needed, or whether backends should do it implicitly. We already
have "default" egress rules, so there's a precedent here. On the other
hand, the egress rules are broad (allowing everything) and there's
more rationale to delete them and replace them with tighter filters.
In my OVN series, I implement ACLs directly in OVN database, without
creating SG rules in Neutron API.

So, questions for the community to clarify:
- whether Neutron API should define behavior of stateless SGs in general,
- if so, whether Neutron API should also define behavior of stateless
SGs in terms of "base" services like metadata and DHCP,
- if so, whether backends should implement the necessary filters
themselves, or Neutron will create default SG rules itself.

I hope I laid the problem out clearly, let me know if anything needs
clarification or explanation.

Yours,
Ihar


From dpawlik at redhat.com  Fri Mar 17 15:42:49 2023
From: dpawlik at redhat.com (Daniel Pawlik)
Date: Fri, 17 Mar 2023 16:42:49 +0100
Subject: Opensearch service upgrade
Message-ID: <CAEZjfK7ALV+P5wGT_miD1czZx_JfhtdaXogkLaBDMhZB4zEKnw@mail.gmail.com>

Hello,

We would like to notify you that the Opensearch service [1] would be
updated to a newer version on 03.04.2023 at 12:00 PM UTC.

This procedure might take a while, depending of the cluster size. During
that time, the Opensearch service would not be available.

If anyone has any doubts, please reply to the email.

Dan

[1] - https://opensearch.logs.openstack.org/_dashboards/app/home
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/d4c2c4ba/attachment.htm>

From gmann at ghanshyammann.com  Fri Mar 17 21:19:22 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Fri, 17 Mar 2023 14:19:22 -0700
Subject: [ptl][tc][ops][ptg] Operator + Developers interaction
 (operator-hours) slots in 2023.2 Bobcat PTG
Message-ID: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>

Hello Everyone/PTL,

To improve the interaction/feedback between operators and developers, one of the efforts is to schedule
the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, which had mixed
productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in March
vPTG also. 

TC will not book the placeholder this time so that slots can be booked in the project room itself, and operators
can join developers to have a joint discussion. But at the same time, we need to avoid slot conflict for operators.
Every project needs to make sure its 'operator-hour' does not overlap with the related projects (integrated projects
which might have common operators, for example. nova, cinder, neutron etc needs to avoid conflict) 'operator-hour'.

Guidelines for the project team to book 'operator-hour' 
---------------------------------------------------------------------------------------
* Request in #openinfra-events IRC channel to register the new track 'operator-hour-<your project name>'.
For example, 'operator-hour-nova'

* Once the track is registered, find a spot in your project slots where no other project (which you think is related/integrated
project and might have common operators) has already booked their operator-hour. Accordingly, book with the newly
registered track 'operator-hour-<your project name>'. For example, #operator-hour-nova book essex-WedB1 . 

* Do not book more than one slot (1 hour) so that other projects will have enough slots open to book. If more discussion is
needed on anything, it can be continued in project-specific slots.

We request that every project book an 'operator hour' slot for operators to join your PTG session.
For any query/conflict, ping TC in #openstack-tc or #openinfra-events IRC channel.

[1] https://etherpad.opendev.org/p/Oct2022_PTGFeedback#L32

-gmann


From jamesleong123098 at gmail.com  Sat Mar 18 04:49:09 2023
From: jamesleong123098 at gmail.com (James Leong)
Date: Fri, 17 Mar 2023 23:49:09 -0500
Subject: [zun] allow zun to get information from blazar database
Message-ID: <CA+_ZFmEjQog+RjAEujyNeA-sp7TCbQ10emaOwJxyvqSrrfk8YQ@mail.gmail.com>

Hi all,
I am using kolla-ansible for OpenStack deployment in the yoga version. Is
It possible to allow zun to retrieve information from the blazar database
in zun_api container? I have tried to include the blazar database
connection information in the zun.conf file. However, when I try to use the
newly added blazar information, I am getting the following error message.

oslo_config.cfg.NoSuchOptError: no such option connectionTest in group
[database]

It seems like I have to set up some option somewhere else in the code. But
I was not able to identify them.

Thanks for your help
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230317/c5c6b2de/attachment.htm>

From udaydikshit2007 at gmail.com  Sat Mar 18 05:18:35 2023
From: udaydikshit2007 at gmail.com (Uday Dikshit)
Date: Sat, 18 Mar 2023 10:48:35 +0530
Subject: Autoscaling in Kolla Ansible wallaby series
Message-ID: <CADYZ4k2YVLQGqTEfM6_SR6qAm6BB9_7=7OYi0z5XoocrY303Lw@mail.gmail.com>

Hello Team
I am looking forward to have an autoscaling feature on Kolla Ansible
wallaby series openstack.
I am using Senlin to create cluster, gnocchi for metrics and aodh for
alarm.
However I am facing issue with aodh alarm state as it gets stuck in the
state in which it is.
I also found once the load is hiking, the gnocchi metrics get inconsistent.
Due to this also the alarm state sticks and do not trigger alarm.
To solve this, i relied upon collectd. But collectd service does not push
any metric from the Hypervisor to gnocchi.
My objective is that, in case of a metric reaching the threshold, alarm
should automatically trigger and cluster should scale up or down based on
the load. Once the scale up or down task is successfully completed, alarm
should return to normal state automatically.
Does anybody have any experience with this use case or wanna propose any
other solution?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230318/48925cdc/attachment.htm>

From nguyenhuukhoinw at gmail.com  Sun Mar 19 12:23:57 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 19 Mar 2023 19:23:57 +0700
Subject: [openstack][masakari] Ask about Masakari segment
Message-ID: <CABAODRd0yHqer-AFv81fwWHRyKOoz53NuvbBoGsM0AC84-dL3Q@mail.gmail.com>

Hello guys.
I want to ask if I create two Masakari segments then instances will
failover on only segment has a group of computes? Because I do test with
this scenario, my instance failover on a different segment  which has
different compute hosts? Do I understand Masakari wrong?
Thank you. Regards
Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230319/3b09f504/attachment.htm>

From techstep at gmail.com  Sun Mar 19 15:00:30 2023
From: techstep at gmail.com (Rob Jefferson)
Date: Sun, 19 Mar 2023 11:00:30 -0400
Subject: [openstack][masakari] Ask about Masakari segment
In-Reply-To: <CABAODRd0yHqer-AFv81fwWHRyKOoz53NuvbBoGsM0AC84-dL3Q@mail.gmail.com>
References: <CABAODRd0yHqer-AFv81fwWHRyKOoz53NuvbBoGsM0AC84-dL3Q@mail.gmail.com>
Message-ID: <CAAYAaUth4=iEFEd=S9LvmPyUkUgH7pPLD42nGW__pR2xBkHsEw@mail.gmail.com>

On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i
<nguyenhuukhoinw at gmail.com> wrote:
>
> Hello guys.
> I want to ask if I create two Masakari segments then instances will failover on only segment has a group of computes? Because I do test with this scenario, my instance failover on a different segment  which has different compute hosts? Do I understand Masakari wrong?

I would check which recovery method you're using.

If you have two failover segments, and you set the recovery method to
`reserved_host`, the failover will happen on a node in the non-active
segment. If you set the method to `rh_priority`, it will try that
first, but then attempt fall back on a machine in the active segment.

If you want to recover on a host in the *same* segment (possibly the
same host), use `auto` or `auto_priority` as the recovery method.

Rob


From nguyenhuukhoinw at gmail.com  Sun Mar 19 15:10:49 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 19 Mar 2023 22:10:49 +0700
Subject: [openstack][masakari] Ask about Masakari segment
In-Reply-To: <CAAYAaUth4=iEFEd=S9LvmPyUkUgH7pPLD42nGW__pR2xBkHsEw@mail.gmail.com>
References: <CABAODRd0yHqer-AFv81fwWHRyKOoz53NuvbBoGsM0AC84-dL3Q@mail.gmail.com>
 <CAAYAaUth4=iEFEd=S9LvmPyUkUgH7pPLD42nGW__pR2xBkHsEw@mail.gmail.com>
Message-ID: <CABAODReEy=xgDN+rhHyFGbVpNOHF=B30b13LKyAKD+wekt81mA@mail.gmail.com>

Hello., thanks for sharing,
I use  auto as recovery method but instance recovery on a
different segment, I just want to separate segment by different hypervisor
hosts.
Nguyen Huu Khoi


On Sun, Mar 19, 2023 at 10:00?PM Rob Jefferson <techstep at gmail.com> wrote:

> On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i
> <nguyenhuukhoinw at gmail.com> wrote:
> >
> > Hello guys.
> > I want to ask if I create two Masakari segments then instances will
> failover on only segment has a group of computes? Because I do test with
> this scenario, my instance failover on a different segment  which has
> different compute hosts? Do I understand Masakari wrong?
>
> I would check which recovery method you're using.
>
> If you have two failover segments, and you set the recovery method to
> `reserved_host`, the failover will happen on a node in the non-active
> segment. If you set the method to `rh_priority`, it will try that
> first, but then attempt fall back on a machine in the active segment.
>
> If you want to recover on a host in the *same* segment (possibly the
> same host), use `auto` or `auto_priority` as the recovery method.
>
> Rob
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230319/a67fa8dc/attachment.htm>

From nguyenhuukhoinw at gmail.com  Sun Mar 19 15:19:08 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 19 Mar 2023 22:19:08 +0700
Subject: [openstack][masakari] Ask about Masakari segment
In-Reply-To: <CABAODReEy=xgDN+rhHyFGbVpNOHF=B30b13LKyAKD+wekt81mA@mail.gmail.com>
References: <CABAODRd0yHqer-AFv81fwWHRyKOoz53NuvbBoGsM0AC84-dL3Q@mail.gmail.com>
 <CAAYAaUth4=iEFEd=S9LvmPyUkUgH7pPLD42nGW__pR2xBkHsEw@mail.gmail.com>
 <CABAODReEy=xgDN+rhHyFGbVpNOHF=B30b13LKyAKD+wekt81mA@mail.gmail.com>
Message-ID: <CABAODReim9GTXhf=Ux_cbXrsW8b6Wu8cT3RSUhDwHV9BRjLuvg@mail.gmail.com>

I have read it
https://docs.openstack.org/masakari/xena/install/overview.html
I tested with two segments but It dont have my desired result.
Segment A:
 compute01
 compute02
 compute03
Segment B:
 compute04
 compute05


When I turn off compute01, I hope that instance will recover on compute02
or compute03 but it recovered on segment B.
I feel strange.


Nguyen Huu Khoi


On Sun, Mar 19, 2023 at 10:10?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Hello., thanks for sharing,
> I use  auto as recovery method but instance recovery on a
> different segment, I just want to separate segment by different hypervisor
> hosts.
> Nguyen Huu Khoi
>
>
> On Sun, Mar 19, 2023 at 10:00?PM Rob Jefferson <techstep at gmail.com> wrote:
>
>> On Sun, Mar 19, 2023 at 8:32?AM Nguy?n H?u Kh?i
>> <nguyenhuukhoinw at gmail.com> wrote:
>> >
>> > Hello guys.
>> > I want to ask if I create two Masakari segments then instances will
>> failover on only segment has a group of computes? Because I do test with
>> this scenario, my instance failover on a different segment  which has
>> different compute hosts? Do I understand Masakari wrong?
>>
>> I would check which recovery method you're using.
>>
>> If you have two failover segments, and you set the recovery method to
>> `reserved_host`, the failover will happen on a node in the non-active
>> segment. If you set the method to `rh_priority`, it will try that
>> first, but then attempt fall back on a machine in the active segment.
>>
>> If you want to recover on a host in the *same* segment (possibly the
>> same host), use `auto` or `auto_priority` as the recovery method.
>>
>> Rob
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230319/0d9c0cea/attachment.htm>

From tkajinam at redhat.com  Sun Mar 19 16:01:43 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Mon, 20 Mar 2023 01:01:43 +0900
Subject: [storlets] Proposal to make Train/Ussuri/Victoria EOL
Message-ID: <CAL_crJTqJJpkJ9n8VntjgBv1dDMohRPHt5ZvMw54y0LFoqp3Wg@mail.gmail.com>

Hello,


Currently we have multiple stable branches open but we haven't seen any
backport
proposed so far. To reduce number of branches we have to maintain, I'd like
to
propose retiring old stable branches(train, ussuri and victoria).

In case you have any concerns, please let me know.

Thank you,
Takashi Kajinami
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/ae85c09d/attachment.htm>

From wodel.youchi at gmail.com  Sun Mar 19 17:36:18 2023
From: wodel.youchi at gmail.com (wodel youchi)
Date: Sun, 19 Mar 2023 18:36:18 +0100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
Message-ID: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>

Hi,

I am trying to attach a cinder volume to my pod, but it does not work.

The long story, the default version of kubernetes used in Yoga is 1.23.3
fcore35. When creating a default kubernetes cluster we got :

>     Image:         quay.io/k8scsi/csi-attacher:v2.0.0
>     Image:         quay.io/k8scsi/csi-provisioner:v1.4.0
>     Image:         quay.io/k8scsi/csi-snapshotter:v1.2.2
>     Image:         quay.io/k8scsi/csi-resizer:v0.3.0
>     Image:         docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0
>     Image:         quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
>     Image:
> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1
>

Which
1 - Does not correspond to the documentation of Magnum, the documentation
states these defaults for yoga :

>     Image:         10.0.0.165:4000/csi-attacher:v3.3.0
>     Image:         10.0.0.165:4000/csi-provisioner:v3.0.0
>     Image:         10.0.0.165:4000/csi-snapshotter:v4.2.1
>     Image:         10.0.0.165:4000/csi-resizer:v1.3.0
>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>     Image:         10.0.0.165:4000/csi-node-driver-registrar:v2.4.0
>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
> (cinder-csi-plugin:v1.23.0 which does not exists anymore)
>

2 - And does not work, csi-cinder-controllerplugin keeps crashing.

I tried to use the updates images (using a local registry), but I couldn't
attach the cinder-volume, I got :

Volumes:
  html-volume:
    Type:       Cinder (a Persistent Disk resource in OpenStack)
    VolumeID:   f780cb46-ed2a-405d-b901-7201b49c3df1
    FSType:     ext4
    ReadOnly:   false
    SecretRef:  nil
  kube-api-access-slqf4:
    Type:                    Projected (a volume that contains injected
data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute
op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute
op=Exists for 300s
Events:
  Type     Reason              Age                    From
    Message
  ----     ------              ----                   ----
    -------


*  Warning  FailedMount         26m (x10 over 135m)    kubelet
     Unable to attach or mount volumes: unmounted volumes=[html-volume],
unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting
for the condition  Warning  FailedAttachVolume  3m39s (x40 over 146m)
 attachdetach-controller  AttachVolume.Attach failed for volume
"cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach
timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1  Warning
 FailedMount         104s (x54 over 146m)   kubelet                  Unable
to attach or mount volumes: unmounted volumes=[html-volume], unattached
volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the
condition*


"volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]}
I0319 12:36:50.910443       1 connection.go:201] GRPC error: <nil>
I0319 12:36:56.925658       1 controller.go:210] Started VA processing
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.925682       1 csi_handler.go:224] CSIHandler: processing VA
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.925687       1 csi_handler.go:251] Attaching
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.925691       1 csi_handler.go:421] Starting attach operation
for "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.925705       1 csi_handler.go:740] Found NodeID
472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
k8intcalnewer-56bgom6jntbm-node-0
I0319 12:36:56.925828       1 csi_handler.go:312] VA finalizer added to
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.925836       1 csi_handler.go:326] NodeID annotation added
to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:56.947632       1 connection.go:193] GRPC call:
/csi.v1.Controller/ControllerPublishVolume
I0319 12:36:56.947646       1 connection.go:194] GRPC request:
{"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"}
I0319 12:36:58.343821       1 connection.go:200] GRPC response:
{"publish_context":{"DevicePath":"/dev/vdc"}}
I0319 12:36:58.343834       1 connection.go:201] GRPC error: <nil>
I0319 12:36:58.343841       1 csi_handler.go:264] Attached
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.343848       1 util.go:38] Marking as attached
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
*I0319 12:36:58.348467       1 csi_handler.go:234] Error processing
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7":
failed to mark as attached: volumeattachments.storage.k8s.io
<http://volumeattachments.storage.k8s.io>
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is
forbidden: User
"system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch
resource "volumeattachments/status" in API group "storage.k8s.io
<http://storage.k8s.io>" at the cluster scope*
I0319 12:36:58.348503       1 controller.go:210] Started VA processing
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348509       1 csi_handler.go:224] CSIHandler: processing VA
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348513       1 csi_handler.go:251] Attaching
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348517       1 csi_handler.go:421] Starting attach operation
for "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348525       1 csi_handler.go:740] Found NodeID
472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
k8intcalnewer-56bgom6jntbm-node-0
I0319 12:36:58.348540       1 csi_handler.go:304] VA finalizer is already
set on
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348552       1 csi_handler.go:318] NodeID annotation is
already set on
"csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
I0319 12:36:58.348564       1 connection.go:193] GRPC call:
/csi.v1.Controller/ControllerPublishVolume
I0319 12:36:58.348567       1 connection.go:194] GRPC request:
{"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext:


The I tried even the most updated images :

>   10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>   10.0.0.165:4000/csi-provisioner:v3.4.0
>   10.0.0.165:4000/csi-resizer:v1.7.0
>   10.0.0.165:4000/csi-snapshotter:v6.2.1
>   10.0.0.165:4000/csi-attacher:v4.2.0
>   10.0.0.165:4000/csi-node-driver-registrar:v2.7.0
>

I had the same problem.

Then I tried to use an  older version of kubernetes : 1.21.11 with the
older images shown above (following this link
https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/),
and it worked, the cinder volume was successfully mounted inside my nginx
pod.


- What is the meaning of the error I am having?
- Is it magnum related or kubernetes related or both?


Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230319/dac8c3f7/attachment-0001.htm>

From oliver.weinmann at me.com  Sun Mar 19 20:31:28 2023
From: oliver.weinmann at me.com (Oliver Weinmann)
Date: Sun, 19 Mar 2023 21:31:28 +0100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
References: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
Message-ID: <E1B8515C-96AA-413A-AB55-0EA44172E27A@me.com>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230319/a46dbeed/attachment.htm>

From nguyenhuukhoinw at gmail.com  Mon Mar 20 02:34:11 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Mon, 20 Mar 2023 09:34:11 +0700
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
References: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
Message-ID: <CABAODRctroxwuRGJ7xVL0ya-QFBPQD7fTxObjrGAfG7pgJLdyg@mail.gmail.com>

Hello.
Are you enable enable_cluster_user_trust?
Nguyen Huu Khoi


On Mon, Mar 20, 2023 at 12:42?AM wodel youchi <wodel.youchi at gmail.com>
wrote:

> Hi,
>
> I am trying to attach a cinder volume to my pod, but it does not work.
>
> The long story, the default version of kubernetes used in Yoga is 1.23.3
> fcore35. When creating a default kubernetes cluster we got :
>
>>     Image:         quay.io/k8scsi/csi-attacher:v2.0.0
>>     Image:         quay.io/k8scsi/csi-provisioner:v1.4.0
>>     Image:         quay.io/k8scsi/csi-snapshotter:v1.2.2
>>     Image:         quay.io/k8scsi/csi-resizer:v0.3.0
>>     Image:         docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0
>>     Image:         quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
>>     Image:
>> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1
>>
>
> Which
> 1 - Does not correspond to the documentation of Magnum, the documentation
> states these defaults for yoga :
>
>>     Image:         10.0.0.165:4000/csi-attacher:v3.3.0
>>     Image:         10.0.0.165:4000/csi-provisioner:v3.0.0
>>     Image:         10.0.0.165:4000/csi-snapshotter:v4.2.1
>>     Image:         10.0.0.165:4000/csi-resizer:v1.3.0
>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>     Image:         10.0.0.165:4000/csi-node-driver-registrar:v2.4.0
>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>> (cinder-csi-plugin:v1.23.0 which does not exists anymore)
>>
>
> 2 - And does not work, csi-cinder-controllerplugin keeps crashing.
>
> I tried to use the updates images (using a local registry), but I couldn't
> attach the cinder-volume, I got :
>
> Volumes:
>   html-volume:
>     Type:       Cinder (a Persistent Disk resource in OpenStack)
>     VolumeID:   f780cb46-ed2a-405d-b901-7201b49c3df1
>     FSType:     ext4
>     ReadOnly:   false
>     SecretRef:  nil
>   kube-api-access-slqf4:
>     Type:                    Projected (a volume that contains injected
> data from multiple sources)
>     TokenExpirationSeconds:  3607
>     ConfigMapName:           kube-root-ca.crt
>     ConfigMapOptional:       <nil>
>     DownwardAPI:             true
> QoS Class:                   Burstable
> Node-Selectors:              <none>
> Tolerations:                 node.kubernetes.io/not-ready:NoExecute
> op=Exists for 300s
>                              node.kubernetes.io/unreachable:NoExecute
> op=Exists for 300s
> Events:
>   Type     Reason              Age                    From
>     Message
>   ----     ------              ----                   ----
>     -------
>
>
> *  Warning  FailedMount         26m (x10 over 135m)    kubelet
>      Unable to attach or mount volumes: unmounted volumes=[html-volume],
> unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting
> for the condition  Warning  FailedAttachVolume  3m39s (x40 over 146m)
>  attachdetach-controller  AttachVolume.Attach failed for volume
> "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach
> timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1  Warning
>  FailedMount         104s (x54 over 146m)   kubelet                  Unable
> to attach or mount volumes: unmounted volumes=[html-volume], unattached
> volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the
> condition*
>
>
>
> "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]}
> I0319 12:36:50.910443       1 connection.go:201] GRPC error: <nil>
> I0319 12:36:56.925658       1 controller.go:210] Started VA processing
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.925682       1 csi_handler.go:224] CSIHandler: processing
> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.925687       1 csi_handler.go:251] Attaching
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.925691       1 csi_handler.go:421] Starting attach
> operation for
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.925705       1 csi_handler.go:740] Found NodeID
> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
> k8intcalnewer-56bgom6jntbm-node-0
> I0319 12:36:56.925828       1 csi_handler.go:312] VA finalizer added to
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.925836       1 csi_handler.go:326] NodeID annotation added
> to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:56.947632       1 connection.go:193] GRPC call:
> /csi.v1.Controller/ControllerPublishVolume
> I0319 12:36:56.947646       1 connection.go:194] GRPC request:
> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"}
> I0319 12:36:58.343821       1 connection.go:200] GRPC response:
> {"publish_context":{"DevicePath":"/dev/vdc"}}
> I0319 12:36:58.343834       1 connection.go:201] GRPC error: <nil>
> I0319 12:36:58.343841       1 csi_handler.go:264] Attached
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.343848       1 util.go:38] Marking as attached
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> *I0319 12:36:58.348467       1 csi_handler.go:234] Error processing
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7":
> failed to mark as attached: volumeattachments.storage.k8s.io
> <http://volumeattachments.storage.k8s.io>
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is
> forbidden: User
> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch
> resource "volumeattachments/status" in API group "storage.k8s.io
> <http://storage.k8s.io>" at the cluster scope*
> I0319 12:36:58.348503       1 controller.go:210] Started VA processing
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348509       1 csi_handler.go:224] CSIHandler: processing
> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348513       1 csi_handler.go:251] Attaching
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348517       1 csi_handler.go:421] Starting attach
> operation for
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348525       1 csi_handler.go:740] Found NodeID
> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
> k8intcalnewer-56bgom6jntbm-node-0
> I0319 12:36:58.348540       1 csi_handler.go:304] VA finalizer is already
> set on
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348552       1 csi_handler.go:318] NodeID annotation is
> already set on
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
> I0319 12:36:58.348564       1 connection.go:193] GRPC call:
> /csi.v1.Controller/ControllerPublishVolume
> I0319 12:36:58.348567       1 connection.go:194] GRPC request:
> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext:
>
>
> The I tried even the most updated images :
>
>>   10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>   10.0.0.165:4000/csi-provisioner:v3.4.0
>>   10.0.0.165:4000/csi-resizer:v1.7.0
>>   10.0.0.165:4000/csi-snapshotter:v6.2.1
>>   10.0.0.165:4000/csi-attacher:v4.2.0
>>   10.0.0.165:4000/csi-node-driver-registrar:v2.7.0
>>
>
> I had the same problem.
>
> Then I tried to use an  older version of kubernetes : 1.21.11 with the
> older images shown above (following this link
> https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/),
> and it worked, the cinder volume was successfully mounted inside my nginx
> pod.
>
>
>
> - What is the meaning of the error I am having?
> - Is it magnum related or kubernetes related or both?
>
>
> Regards.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/f8641db4/attachment-0001.htm>

From arnaud.morin at gmail.com  Mon Mar 20 09:17:52 2023
From: arnaud.morin at gmail.com (Arnaud Morin)
Date: Mon, 20 Mar 2023 09:17:52 +0000
Subject: [neutron] Extra routes
Message-ID: <ZBgkwA6eCPEYDN8o@sync2>

Hey all,

When using DVR, is there any way to set extra-routes only on the snat
network nodes?
I want routes to apply only on north/south communication, not on
east/west.

I can't find something like this is API.

Cheers,


From jake.yip at ardc.edu.au  Mon Mar 20 10:33:58 2023
From: jake.yip at ardc.edu.au (Jake Yip)
Date: Mon, 20 Mar 2023 21:33:58 +1100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
References: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
Message-ID: <7014a240-19c1-ae44-795d-0123d3c0b7b1@ardc.edu.au>

Hi,

I had a feeling the below two issues are due to a missing backport[1] to 
Yoga.

I tried to backport it locally but it failed devstack, so it might take 
a while before we have something.

Regards,
Jake

[1] https://review.opendev.org/c/openstack/magnum/+/833354

On 20/3/2023 4:36 am, wodel youchi wrote:
> Hi,
> 
> I am trying to attach a cinder volume to my pod, but it does not work.
> 
> The long story, the default version of kubernetes used in Yoga is 1.23.3 
> fcore35. When creating a default kubernetes cluster we got :
> 
>      ? ? Image: quay.io/k8scsi/csi-attacher:v2.0.0
>     <http://quay.io/k8scsi/csi-attacher:v2.0.0>
> ...
> 
> Which
> 1 - Does not correspond to the documentation of Magnum, the 
> documentation states these defaults for yoga :
> 
>      ??? Image: 10.0.0.165:4000/csi-attacher:v3.3.0
>     <http://10.0.0.165:4000/csi-attacher:v3.3.0>
> ...
> 
> 
> 
> I tried to use the updates images (using a local registry), but I 
> couldn't attach the cinder-volume, I got :
> 
> ...
> *I0319 12:36:58.348467 ? ? ? 1 csi_handler.go:234] Error processing 
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7": 
> failed to mark as attached: volumeattachments.storage.k8s.io 
> <http://volumeattachments.storage.k8s.io> 
> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" 
> is forbidden: User 
> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot 
> patch resource "volumeattachments/status" in API group "storage.k8s.io 
> <http://storage.k8s.io>" at the cluster scope*


From wodel.youchi at gmail.com  Mon Mar 20 10:40:35 2023
From: wodel.youchi at gmail.com (wodel youchi)
Date: Mon, 20 Mar 2023 11:40:35 +0100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <CABAODRctroxwuRGJ7xVL0ya-QFBPQD7fTxObjrGAfG7pgJLdyg@mail.gmail.com>
References: <CAJV_UBAB8SkJBkN6mTPB9-4Hf9uoTRdQL=CWs_iRR00P9icFjw@mail.gmail.com>
 <CABAODRctroxwuRGJ7xVL0ya-QFBPQD7fTxObjrGAfG7pgJLdyg@mail.gmail.com>
Message-ID: <CAJV_UBCUW4D8J4Ti9J=T1bo=Ccjb-yvUW_D_Y9UOrEgpgZ=4cw@mail.gmail.com>

Hi,
@Oliver, thanks to you for your blog, it was simple yet it helped me a lot.
I am a newbie in the kubernetes world.

@Nguyen, yes I do have enable_cluster_user_trust enabled in my globals.yml

>From these two threads (https://github.com/rook/rook/issues/6457,
https://bugzilla.redhat.com/show_bug.cgi?id=1769693), I think it's an
access right problem, a missing access right, what I don't know, is should
I add this access right manually? should I update the rest of the images in
the cluster, maybe one of them contains the missing right?

In the first thread it is said :

Solution:-

   - apiGroups: ["storage.k8s.io"]
   resources: ["volumeattachments/status"]
   verbs: ["patch"]
   need to be added to rbd-external-provisioner-runner and
   cephfs-external-provisioner-runner ClusterRole

In the second thread :

csi-external-attacher has changed in 4.3

external attacher needs extra privileges to patch various API objects.


Regards.

Le lun. 20 mars 2023 ? 03:34, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com> a
?crit :

> Hello.
> Are you enable enable_cluster_user_trust?
> Nguyen Huu Khoi
>
>
> On Mon, Mar 20, 2023 at 12:42?AM wodel youchi <wodel.youchi at gmail.com>
> wrote:
>
>> Hi,
>>
>> I am trying to attach a cinder volume to my pod, but it does not work.
>>
>> The long story, the default version of kubernetes used in Yoga is 1.23.3
>> fcore35. When creating a default kubernetes cluster we got :
>>
>>>     Image:         quay.io/k8scsi/csi-attacher:v2.0.0
>>>     Image:         quay.io/k8scsi/csi-provisioner:v1.4.0
>>>     Image:         quay.io/k8scsi/csi-snapshotter:v1.2.2
>>>     Image:         quay.io/k8scsi/csi-resizer:v0.3.0
>>>     Image:         docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0
>>>     Image:         quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
>>>     Image:
>>> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1
>>>
>>
>> Which
>> 1 - Does not correspond to the documentation of Magnum, the documentation
>> states these defaults for yoga :
>>
>>>     Image:         10.0.0.165:4000/csi-attacher:v3.3.0
>>>     Image:         10.0.0.165:4000/csi-provisioner:v3.0.0
>>>     Image:         10.0.0.165:4000/csi-snapshotter:v4.2.1
>>>     Image:         10.0.0.165:4000/csi-resizer:v1.3.0
>>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>>     Image:         10.0.0.165:4000/csi-node-driver-registrar:v2.4.0
>>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>> (cinder-csi-plugin:v1.23.0 which does not exists anymore)
>>>
>>
>> 2 - And does not work, csi-cinder-controllerplugin keeps crashing.
>>
>> I tried to use the updates images (using a local registry), but I
>> couldn't attach the cinder-volume, I got :
>>
>> Volumes:
>>   html-volume:
>>     Type:       Cinder (a Persistent Disk resource in OpenStack)
>>     VolumeID:   f780cb46-ed2a-405d-b901-7201b49c3df1
>>     FSType:     ext4
>>     ReadOnly:   false
>>     SecretRef:  nil
>>   kube-api-access-slqf4:
>>     Type:                    Projected (a volume that contains injected
>> data from multiple sources)
>>     TokenExpirationSeconds:  3607
>>     ConfigMapName:           kube-root-ca.crt
>>     ConfigMapOptional:       <nil>
>>     DownwardAPI:             true
>> QoS Class:                   Burstable
>> Node-Selectors:              <none>
>> Tolerations:                 node.kubernetes.io/not-ready:NoExecute
>> op=Exists for 300s
>>                              node.kubernetes.io/unreachable:NoExecute
>> op=Exists for 300s
>> Events:
>>   Type     Reason              Age                    From
>>       Message
>>   ----     ------              ----                   ----
>>       -------
>>
>>
>> *  Warning  FailedMount         26m (x10 over 135m)    kubelet
>>        Unable to attach or mount volumes: unmounted volumes=[html-volume],
>> unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting
>> for the condition  Warning  FailedAttachVolume  3m39s (x40 over 146m)
>>  attachdetach-controller  AttachVolume.Attach failed for volume
>> "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach
>> timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1  Warning
>>  FailedMount         104s (x54 over 146m)   kubelet                  Unable
>> to attach or mount volumes: unmounted volumes=[html-volume], unattached
>> volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the
>> condition*
>>
>>
>>
>> "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]}
>> I0319 12:36:50.910443       1 connection.go:201] GRPC error: <nil>
>> I0319 12:36:56.925658       1 controller.go:210] Started VA processing
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.925682       1 csi_handler.go:224] CSIHandler: processing
>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.925687       1 csi_handler.go:251] Attaching
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.925691       1 csi_handler.go:421] Starting attach
>> operation for
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.925705       1 csi_handler.go:740] Found NodeID
>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
>> k8intcalnewer-56bgom6jntbm-node-0
>> I0319 12:36:56.925828       1 csi_handler.go:312] VA finalizer added to
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.925836       1 csi_handler.go:326] NodeID annotation added
>> to "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:56.947632       1 connection.go:193] GRPC call:
>> /csi.v1.Controller/ControllerPublishVolume
>> I0319 12:36:56.947646       1 connection.go:194] GRPC request:
>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"}
>> I0319 12:36:58.343821       1 connection.go:200] GRPC response:
>> {"publish_context":{"DevicePath":"/dev/vdc"}}
>> I0319 12:36:58.343834       1 connection.go:201] GRPC error: <nil>
>> I0319 12:36:58.343841       1 csi_handler.go:264] Attached
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.343848       1 util.go:38] Marking as attached
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> *I0319 12:36:58.348467       1 csi_handler.go:234] Error processing
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7":
>> failed to mark as attached: volumeattachments.storage.k8s.io
>> <http://volumeattachments.storage.k8s.io>
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is
>> forbidden: User
>> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch
>> resource "volumeattachments/status" in API group "storage.k8s.io
>> <http://storage.k8s.io>" at the cluster scope*
>> I0319 12:36:58.348503       1 controller.go:210] Started VA processing
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348509       1 csi_handler.go:224] CSIHandler: processing
>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348513       1 csi_handler.go:251] Attaching
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348517       1 csi_handler.go:421] Starting attach
>> operation for
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348525       1 csi_handler.go:740] Found NodeID
>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
>> k8intcalnewer-56bgom6jntbm-node-0
>> I0319 12:36:58.348540       1 csi_handler.go:304] VA finalizer is already
>> set on
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348552       1 csi_handler.go:318] NodeID annotation is
>> already set on
>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>> I0319 12:36:58.348564       1 connection.go:193] GRPC call:
>> /csi.v1.Controller/ControllerPublishVolume
>> I0319 12:36:58.348567       1 connection.go:194] GRPC request:
>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext:
>>
>>
>> The I tried even the most updated images :
>>
>>>   10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>>   10.0.0.165:4000/csi-provisioner:v3.4.0
>>>   10.0.0.165:4000/csi-resizer:v1.7.0
>>>   10.0.0.165:4000/csi-snapshotter:v6.2.1
>>>   10.0.0.165:4000/csi-attacher:v4.2.0
>>>   10.0.0.165:4000/csi-node-driver-registrar:v2.7.0
>>>
>>
>> I had the same problem.
>>
>> Then I tried to use an  older version of kubernetes : 1.21.11 with the
>> older images shown above (following this link
>> https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/),
>> and it worked, the cinder volume was successfully mounted inside my nginx
>> pod.
>>
>>
>>
>> - What is the meaning of the error I am having?
>> - Is it magnum related or kubernetes related or both?
>>
>>
>> Regards.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/66347007/attachment-0001.htm>

From thierry at openstack.org  Mon Mar 20 10:42:47 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Mon, 20 Mar 2023 11:42:47 +0100
Subject: [largescale-sig] Next meeting: March 22, 15utc
Message-ID: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org>

Hi everyone,

The Large Scale SIG will be meeting this Wednesday, the Antelope release 
day, in #openstack-operators on OFTC IRC, at 15UTC, our EU+US-friendly time.

Since we currently are in DST hell, you should doublecheck how that UTC 
time translates locally at:
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20230322T15

Feel free to add topics to the agenda:
https://etherpad.opendev.org/p/large-scale-sig-meeting

Regards,

-- 
Thierry Carrez


From pdeore at redhat.com  Mon Mar 20 13:45:07 2023
From: pdeore at redhat.com (Pranali Deore)
Date: Mon, 20 Mar 2023 19:15:07 +0530
Subject: [Glance][PTG] Glance 2023.2 (Bobcat) vPTG Schedule
Message-ID: <CADkbuWhdbBvWPu3BEpSt++OzE5-ujjQX5RjB9tYhVf39ajxqng@mail.gmail.com>

Hello All,

The 2023.2 (Bobcat) virtual PTG is going to start next week and we have
created our PTG etherpad [1] and also added day wise topics along with
timings we are going to discuss. Kindly let me know if you have any
concerns with allotted time slots. Friday is reserved for any unplanned
discussions. So please feel free to add your topics if you haven't added
yet.

As a reminder, these are the time slots for our discussion.

Tuesday 28 MARCH 2023
1400 UTC to 1700 UTC

Wednesday 29 MARCH 2023
1400 UTC to 1700 UTC

Thursday 30 MARCH 2023
1400 UTC to 1700 UTC

Friday 31 MARCH 2023
1400 UTC to 1700 UTC

NOTE:
We have scheduled glance operator hours on Thursday at 16:20 UTC(we can
extend it if required), let us know your availability for the same.

At the moment we don't have any sessions scheduled on Friday, if there are
any last moment request(s)/topic(s) we will discuss that on Friday else we
will conclude our PTG on Thursday 30th March.

We will be using bluejeans for our discussion, kindly try to use it once
before the actual discussion. The meeting URL is mentioned in etherpad [1]
and will be the same throughout the PTG.


[1]: https://etherpad.opendev.org/p/glance-bobcat-ptg


Hope to see you there!!

Thanks & Regards,
Pranali Deore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/eecaa8f3/attachment.htm>

From oliver.weinmann at me.com  Mon Mar 20 14:53:35 2023
From: oliver.weinmann at me.com (Oliver Weinmann)
Date: Mon, 20 Mar 2023 15:53:35 +0100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <CAJV_UBCUW4D8J4Ti9J=T1bo=Ccjb-yvUW_D_Y9UOrEgpgZ=4cw@mail.gmail.com>
References: <CAJV_UBCUW4D8J4Ti9J=T1bo=Ccjb-yvUW_D_Y9UOrEgpgZ=4cw@mail.gmail.com>
Message-ID: <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com>

An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/50057e5b/attachment.htm>

From ts-takahashi at nec.com  Mon Mar 20 15:28:25 2023
From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=)
Date: Mon, 20 Mar 2023 15:28:25 +0000
Subject: [openstack-helm][tacker] 
Message-ID: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>

Hi Openstack-helm team,


I?m Toshiaki Takahashi, Tacker?s core developer.


To my understanding, OpenStack-helm does not currently provide a Tacker?s
helm chart.

Recently there have been some requests to deploy Tacker with helm, and we
would like to proceed with development of it if possible.


Do we need to take any action such as to participate meeting of
Openstack-helm project and propose our plan?

Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the
plan at PTG, but is it planned? (I don't see any schedule at the moment).


Regards,

Toshiaki

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/23e41d8e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5764 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/23e41d8e/attachment-0001.bin>

From J.Horstmann at mittwald.de  Mon Mar 20 15:33:26 2023
From: J.Horstmann at mittwald.de (Jan Horstmann)
Date: Mon, 20 Mar 2023 15:33:26 +0000
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <DU0PR10MB52445364762EC3A76C94C4FBEABF9@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <2315188.ElGaqSPkdT@p1>
 <DU0PR10MB52445364762EC3A76C94C4FBEABF9@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de>

On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote:
> Hi,
> 
> > Subject: Re: [neutron] detecting l3-agent readiness
> > 
> > Hi,
> > 
> > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze:
> > > Hi Mohammed,
> > > 
> > > > Subject: [neutron] detecting l3-agent readiness
> > > > 
> > > > Hi folks,
> > > > 
> > > > I'm working on improving the stability of rollouts when using Kubernetes as a control
> > plane, specifically around the L3 agent, it seems that I have not found a clear way to
> > detect in the code path where the L3 agent has finished it's initial sync..
> > > > 
> > > 
> > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-
> > /blob/devel/files/startup_wait_for_ns.py
> > > Basically we are checking against the neutron api what routers should be on the node and
> > then validate that all keepalived processes are up and running.
> > 
> > That would work only for HA routers. If You would also have routers which aren't "ha" this
> > method may fail.
> > 
> 
> Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived).
> 

Instead of counting processes I have been using the l3 agent's
`configurations.routers` field to determine its readiness.
From my understanding comparing this number with the number of active
routers hosted by the agent should be a good indicator of its sync
status.
Using two api calls for this is inherently racy, but could be a
sufficient workaround for environments with a moderate number of
router events.
So a simple test snippet for the sync status of all agents could be:

```
import sys
import openstack
client = openstack.connection.Connection(
   ...
)
l3_agent_synced = [
    len([
        router
        for router in client.network.agent_hosted_routers(agent)
            if router.is_admin_state_up
    ]) <= client.network.get_agent(agent).configuration["routers"]
    for agent in client.network.agents()
        if agent.agent_type == "L3 agent"
           and (agent.configuration["agent_mode"] == "dvr_snat"
                or agent.configuration["agent_mode"] == "legacy")
]
if not all(l3_agent_synced):
    sys.exit(1)
```

Please let me know if I am way off with this approach :)


> > > 
> > > > Am I missing it somewhere or is the architecture built in a way that doesn't really
> > answer that question?
> > > > 
> > > 
> > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts
> > for l2 and dhcp agents.
> > > 
> > > 
> > > > Thanks
> > > > Mohammed
> > > > 
> > > > 
> > > > --
> > > > Mohammed Naser
> > > > VEXXHOST, Inc.
> > > 
> > > --
> > > Felix Huettner
> > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung
> > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger
> > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail.
> > Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.
> > > 
> > 
> > 
> > --
> > Slawek Kaplonski
> > Principal Software Engineer
> > Red Hat
> 
> --
> Felix Huettner
> Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.

-- 
Jan Horstmann


From ts-takahashi at nec.com  Mon Mar 20 15:33:35 2023
From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=)
Date: Mon, 20 Mar 2023 15:33:35 +0000
Subject: [openstack-helm][tacker] Proposal to provide HelmChart for Tacker
In-Reply-To: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>
References: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>
Message-ID: <TYVPR01MB10670329122FA9F676777077E93809@TYVPR01MB10670.jpnprd01.prod.outlook.com>

Sorry, I forgot to put the subject in my email ...


From: TAKAHASHI TOSHIAKI(?????) <ts-takahashi at nec.com>
Sent: Tuesday, March 21, 2023 12:28 AM
To: openstack-discuss at lists.openstack.org
Subject: [openstack-helm][tacker]


Hi Openstack-helm team,


I?m Toshiaki Takahashi, Tacker?s core developer.


To my understanding, OpenStack-helm does not currently provide a Tacker?s
helm chart.

Recently there have been some requests to deploy Tacker with helm, and we
would like to proceed with development of it if possible.


Do we need to take any action such as to participate meeting of
Openstack-helm project and propose our plan?

Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the
plan at PTG, but is it planned? (I don't see any schedule at the moment).


Regards,

Toshiaki

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/5fc9e5bc/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5764 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/5fc9e5bc/attachment-0001.bin>

From mnaser at vexxhost.com  Mon Mar 20 15:42:35 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Mon, 20 Mar 2023 15:42:35 +0000
Subject: [openstack-helm][tacker] 
In-Reply-To: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>
References: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>
Message-ID: <YQXP288MB0012A1A9761B73C37C46B9A4A0809@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>

Hi!

I think we?re always open to folks who are looking to contribute new charts.  As a matter of fact, we?re in the process of adding Manila support.

We?ve got a new PTL so perhaps it might be good to get them up to date on the PTG and reserve ?space?.

I added Vladimir to this email thread so they can hopefully provide some input too.

Thanks
Mohammed


From: TAKAHASHI TOSHIAKI(?????) <ts-takahashi at nec.com>
Date: Monday, March 20, 2023 at 11:36 AM
To: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: [openstack-helm][tacker]
Hi Openstack-helm team,

I?m Toshiaki Takahashi, Tacker?s core developer.

To my understanding, OpenStack-helm does not currently provide a Tacker?s helm chart.
Recently there have been some requests to deploy Tacker with helm, and we would like to proceed with development of it if possible.

Do we need to take any action such as to participate meeting of Openstack-helm project and propose our plan?
Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the plan at PTG, but is it planned? (I don't see any schedule at the moment).


Regards,
Toshiaki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/711815d7/attachment.htm>

From wodel.youchi at gmail.com  Mon Mar 20 15:50:03 2023
From: wodel.youchi at gmail.com (wodel youchi)
Date: Mon, 20 Mar 2023 16:50:03 +0100
Subject: [kolla-ansible][yoga][Magnum] Cannot attach cinder volume to pod
In-Reply-To: <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com>
References: <CAJV_UBCUW4D8J4Ti9J=T1bo=Ccjb-yvUW_D_Y9UOrEgpgZ=4cw@mail.gmail.com>
 <681B9B94-F589-475F-BB4C-D8959445AFCB@me.com>
Message-ID: <CAJV_UBCrWyajK5QvS+c-nZP6XpsjGodnfs-8iF-kLb92CBCXGQ@mail.gmail.com>

Hi,

As stated by @Jake, there is some code lingering in
https://review.opendev.org/c/openstack/magnum/+/833354 but it has not been
merged, it does not exist even in the master branch of Magnum.
It looks like we have no choice but the 1.21 version of kubernetes for now.

Regards.

Le lun. 20 mars 2023 ? 15:53, Oliver Weinmann <oliver.weinmann at me.com> a
?crit :

> Hi,
>
> Good point:
>
> external attacher needs extra privileges to patch various API objects
>
>
> I remember that I played around with this an tried to apply some yaml files but couldn?t make it work.
>
>
> Cheers,
>
> Oliver
>
>
> Von meinem iPhone gesendet
>
> Am 20.03.2023 um 11:43 schrieb wodel youchi <wodel.youchi at gmail.com>:
>
> ?
> Hi,
> @Oliver, thanks to you for your blog, it was simple yet it helped me a
> lot. I am a newbie in the kubernetes world.
>
> @Nguyen, yes I do have enable_cluster_user_trust enabled in my globals.yml
>
> From these two threads (https://github.com/rook/rook/issues/6457,
> https://bugzilla.redhat.com/show_bug.cgi?id=1769693), I think it's an
> access right problem, a missing access right, what I don't know, is should
> I add this access right manually? should I update the rest of the images in
> the cluster, maybe one of them contains the missing right?
>
> In the first thread it is said :
>
> Solution:-
>
>    - apiGroups: ["storage.k8s.io"]
>    resources: ["volumeattachments/status"]
>    verbs: ["patch"]
>    need to be added to rbd-external-provisioner-runner and
>    cephfs-external-provisioner-runner ClusterRole
>
> In the second thread :
>
> csi-external-attacher has changed in 4.3
>
> external attacher needs extra privileges to patch various API objects.
>
>
>
> Regards.
>
> Le lun. 20 mars 2023 ? 03:34, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> a ?crit :
>
>> Hello.
>> Are you enable enable_cluster_user_trust?
>> Nguyen Huu Khoi
>>
>>
>> On Mon, Mar 20, 2023 at 12:42?AM wodel youchi <wodel.youchi at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to attach a cinder volume to my pod, but it does not work.
>>>
>>> The long story, the default version of kubernetes used in Yoga is 1.23.3
>>> fcore35. When creating a default kubernetes cluster we got :
>>>
>>>>     Image:         quay.io/k8scsi/csi-attacher:v2.0.0
>>>>     Image:         quay.io/k8scsi/csi-provisioner:v1.4.0
>>>>     Image:         quay.io/k8scsi/csi-snapshotter:v1.2.2
>>>>     Image:         quay.io/k8scsi/csi-resizer:v0.3.0
>>>>     Image:         docker.io/k8scloudprovider/cinder-csi-plugin:v1.18.0
>>>>     Image:         quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
>>>>     Image:
>>>> docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.18.1
>>>>
>>>
>>> Which
>>> 1 - Does not correspond to the documentation of Magnum, the
>>> documentation states these defaults for yoga :
>>>
>>>>     Image:         10.0.0.165:4000/csi-attacher:v3.3.0
>>>>     Image:         10.0.0.165:4000/csi-provisioner:v3.0.0
>>>>     Image:         10.0.0.165:4000/csi-snapshotter:v4.2.1
>>>>     Image:         10.0.0.165:4000/csi-resizer:v1.3.0
>>>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>>>     Image:         10.0.0.165:4000/csi-node-driver-registrar:v2.4.0
>>>>     Image:         10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>>> (cinder-csi-plugin:v1.23.0 which does not exists anymore)
>>>>
>>>
>>> 2 - And does not work, csi-cinder-controllerplugin keeps crashing.
>>>
>>> I tried to use the updates images (using a local registry), but I
>>> couldn't attach the cinder-volume, I got :
>>>
>>> Volumes:
>>>   html-volume:
>>>     Type:       Cinder (a Persistent Disk resource in OpenStack)
>>>     VolumeID:   f780cb46-ed2a-405d-b901-7201b49c3df1
>>>     FSType:     ext4
>>>     ReadOnly:   false
>>>     SecretRef:  nil
>>>   kube-api-access-slqf4:
>>>     Type:                    Projected (a volume that contains injected
>>> data from multiple sources)
>>>     TokenExpirationSeconds:  3607
>>>     ConfigMapName:           kube-root-ca.crt
>>>     ConfigMapOptional:       <nil>
>>>     DownwardAPI:             true
>>> QoS Class:                   Burstable
>>> Node-Selectors:              <none>
>>> Tolerations:                 node.kubernetes.io/not-ready:NoExecute
>>> op=Exists for 300s
>>>                              node.kubernetes.io/unreachable:NoExecute
>>> op=Exists for 300s
>>> Events:
>>>   Type     Reason              Age                    From
>>>       Message
>>>   ----     ------              ----                   ----
>>>       -------
>>>
>>>
>>> *  Warning  FailedMount         26m (x10 over 135m)    kubelet
>>>        Unable to attach or mount volumes: unmounted volumes=[html-volume],
>>> unattached volumes=[kube-api-access-slqf4 html-volume]: timed out waiting
>>> for the condition  Warning  FailedAttachVolume  3m39s (x40 over 146m)
>>>  attachdetach-controller  AttachVolume.Attach failed for volume
>>> "cinder.csi.openstack.org-f780cb46-ed2a-405d-b901-7201b49c3df1" : Attach
>>> timeout for volume f780cb46-ed2a-405d-b901-7201b49c3df1  Warning
>>>  FailedMount         104s (x54 over 146m)   kubelet                  Unable
>>> to attach or mount volumes: unmounted volumes=[html-volume], unattached
>>> volumes=[html-volume kube-api-access-slqf4]: timed out waiting for the
>>> condition*
>>>
>>>
>>>
>>> "volume":{"capacity_bytes":5368709120,"volume_id":"7e377933-4ae6-47b7-a685-f484d35153af"}},{"status":{"published_node_ids":["c2531ccf-842e-44d1-85bd-72c811cea199"]},"volume":{"capacity_bytes":1073741824,"volume_id":"f9d5273b-e73d-4b37-8b50-1fcecb910b2a"}}]}
>>> I0319 12:36:50.910443       1 connection.go:201] GRPC error: <nil>
>>> I0319 12:36:56.925658       1 controller.go:210] Started VA processing
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.925682       1 csi_handler.go:224] CSIHandler: processing
>>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.925687       1 csi_handler.go:251] Attaching
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.925691       1 csi_handler.go:421] Starting attach
>>> operation for
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.925705       1 csi_handler.go:740] Found NodeID
>>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
>>> k8intcalnewer-56bgom6jntbm-node-0
>>> I0319 12:36:56.925828       1 csi_handler.go:312] VA finalizer added to
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.925836       1 csi_handler.go:326] NodeID annotation
>>> added to
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:56.947632       1 connection.go:193] GRPC call:
>>> /csi.v1.Controller/ControllerPublishVolume
>>> I0319 12:36:56.947646       1 connection.go:194] GRPC request:
>>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"f780cb46-ed2a-405d-b901-7201b49c3df1"}
>>> I0319 12:36:58.343821       1 connection.go:200] GRPC response:
>>> {"publish_context":{"DevicePath":"/dev/vdc"}}
>>> I0319 12:36:58.343834       1 connection.go:201] GRPC error: <nil>
>>> I0319 12:36:58.343841       1 csi_handler.go:264] Attached
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.343848       1 util.go:38] Marking as attached
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> *I0319 12:36:58.348467       1 csi_handler.go:234] Error processing
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7":
>>> failed to mark as attached: volumeattachments.storage.k8s.io
>>> <http://volumeattachments.storage.k8s.io>
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7" is
>>> forbidden: User
>>> "system:serviceaccount:kube-system:csi-cinder-controller-sa" cannot patch
>>> resource "volumeattachments/status" in API group "storage.k8s.io
>>> <http://storage.k8s.io>" at the cluster scope*
>>> I0319 12:36:58.348503       1 controller.go:210] Started VA processing
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348509       1 csi_handler.go:224] CSIHandler: processing
>>> VA "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348513       1 csi_handler.go:251] Attaching
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348517       1 csi_handler.go:421] Starting attach
>>> operation for
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348525       1 csi_handler.go:740] Found NodeID
>>> 472bf42d-5ce0-4751-8fec-57bede0024d6 in CSINode
>>> k8intcalnewer-56bgom6jntbm-node-0
>>> I0319 12:36:58.348540       1 csi_handler.go:304] VA finalizer is
>>> already set on
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348552       1 csi_handler.go:318] NodeID annotation is
>>> already set on
>>> "csi-9f81405424dc2cf210b6465f8b649ef20f85024f169b660fab235c03f64753b7"
>>> I0319 12:36:58.348564       1 connection.go:193] GRPC call:
>>> /csi.v1.Controller/ControllerPublishVolume
>>> I0319 12:36:58.348567       1 connection.go:194] GRPC request:
>>> {"node_id":"472bf42d-5ce0-4751-8fec-57bede0024d6","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext:
>>>
>>>
>>> The I tried even the most updated images :
>>>
>>>>   10.0.0.165:4000/cinder-csi-plugin:v1.26.2
>>>>   10.0.0.165:4000/csi-provisioner:v3.4.0
>>>>   10.0.0.165:4000/csi-resizer:v1.7.0
>>>>   10.0.0.165:4000/csi-snapshotter:v6.2.1
>>>>   10.0.0.165:4000/csi-attacher:v4.2.0
>>>>   10.0.0.165:4000/csi-node-driver-registrar:v2.7.0
>>>>
>>>
>>> I had the same problem.
>>>
>>> Then I tried to use an  older version of kubernetes : 1.21.11 with the
>>> older images shown above (following this link
>>> https://www.roksblog.de/deploy-kubernetes-clusters-in-openstack-within-minutes-with-magnum/),
>>> and it worked, the cinder volume was successfully mounted inside my nginx
>>> pod.
>>>
>>>
>>>
>>> - What is the meaning of the error I am having?
>>> - Is it magnum related or kubernetes related or both?
>>>
>>>
>>> Regards.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/8cfa4d09/attachment-0001.htm>

From mkopec at redhat.com  Mon Mar 20 16:02:55 2023
From: mkopec at redhat.com (Martin Kopec)
Date: Mon, 20 Mar 2023 17:02:55 +0100
Subject: [interop][ptg] Virtual Bobcat vPTG Planning
Message-ID: <CAKZGdE2S-vfpjGn6N7ect-6UUmEsPPvSVqoyRG7kDEMFMq8P2g@mail.gmail.com>

Hello everyone,

here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your topics
there if there is anything you would like to discuss / propose ...

You can also vote for time slots for our session(s), so that they fit your
schedule, at [2].
If you have any questions, feel free to reach out to me.

[1] https://etherpad.opendev.org/p/bobcat-ptg-interop
[2] https://framadate.org/2IPXOCvJNNoSGqHu

Thanks,
-- 
Martin Kopec
Senior Software Quality Engineer
Red Hat EMEA
IM: kopecmartin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/a095bb51/attachment.htm>

From skaplons at redhat.com  Mon Mar 20 16:03:07 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Mon, 20 Mar 2023 17:03:07 +0100
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
Message-ID: <3840757.STTH5IQzZg@p1>

Hi,

Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
> Hi all,
> 
> (I've tagged the thread with [ovn] because this question was raised in
> the context of OVN, but it really is about the intent of neutron
> stateless SG API.)
> 
> Neutron API supports 'stateless' field for security groups:
> https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
> 
> The API reference doesn't explain the intent of the API, merely
> walking through the field mechanics, as in
> 
> "The stateful security group extension (stateful-security-group) adds
> the stateful field to security groups, allowing users to configure
> stateful or stateless security groups for ports. The existing security
> groups will all be considered as stateful. Update of the stateful
> attribute is allowed when there is no port associated with the
> security group."
> 
> The meaning of the API is left for users to deduce. It's customary
> understood as something like
> 
> "allowing to bypass connection tracking in the firewall, potentially
> providing performance and simplicity benefits" (while imposing
> additional complexity onto rule definitions - the user now has to
> explicitly define rules for both directions of a duplex connection.)
> [This is not an official definition, nor it's quoted from a respected
> source, please don't criticize it. I don't think this is an important
> point here.]
> 
> Either way, the definition doesn't explain what should happen with
> basic network services that a user of Neutron SG API is used to rely
> on. Specifically, what happens for a port related to a stateless SG
> when it trying to fetch metadata from 169.254.169.254 (or its IPv6
> equivalent), or what happens when it attempts to use SLAAC / DHCPv6
> procedure to configure its IPv6 stack.
> 
> As part of our testing of stateless SG implementation for OVN backend,
> we've noticed that VMs fail to configure via metadata, or use SLAAC to
> configure IPv6.
> 
> metadata: https://bugs.launchpad.net/neutron/+bug/2009053
> slaac: https://bugs.launchpad.net/neutron/+bug/2006949
> 
> We've noticed that adding explicit SG rules to allow 'returning'
> communication for 169.254.169.254:80 and RA / NA fixes the problem.
> 
> I figured that these services are "base" / "basic" and should be
> provided to ports regardless of the stateful-ness of SG. I proposed
> patches for this here:
> 
> metadata series: https://review.opendev.org/q/topic:bug%252F2009053
> RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
> 
> Discussion in the patch that adjusts the existing stateless SG test
> scenarios to not create explicit SG rules for metadata and ICMP
> replies suggests that it's not a given / common understanding that
> these "base" services should work by default for stateless SGs.
> 
> See discussion in comments here:
> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
> 
> While this discussion is happening in the context of OVN, I think it
> should be resolved in a broader context. Specifically, a decision
> should be made about what Neutron API "means" by stateless SGs, and
> how "base" services are supposed to behave. Then backends can act
> accordingly.
> 
> There's also an open question of how this should be implemented.
> Whether Neutron would like to create explicit SG rules visible in API
> that would allow for the returning traffic and that could be deleted
> as needed, or whether backends should do it implicitly. We already
> have "default" egress rules, so there's a precedent here. On the other
> hand, the egress rules are broad (allowing everything) and there's
> more rationale to delete them and replace them with tighter filters.
> In my OVN series, I implement ACLs directly in OVN database, without
> creating SG rules in Neutron API.
> 
> So, questions for the community to clarify:
> - whether Neutron API should define behavior of stateless SGs in general,
> - if so, whether Neutron API should also define behavior of stateless
> SGs in terms of "base" services like metadata and DHCP,
> - if so, whether backends should implement the necessary filters
> themselves, or Neutron will create default SG rules itself.

I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?

> 
> I hope I laid the problem out clearly, let me know if anything needs
> clarification or explanation.

Yes :) At least for me.

> 
> Yours,
> Ihar
> 
> 
> 


-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/44dfe3d6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/44dfe3d6/attachment-0001.sig>

From ralonsoh at redhat.com  Mon Mar 20 16:09:12 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Mon, 20 Mar 2023 17:09:12 +0100
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <2315188.ElGaqSPkdT@p1>
 <DU0PR10MB52445364762EC3A76C94C4FBEABF9@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de>
Message-ID: <CAECr9X7ctwk_XUbykeOwa0YxKQPzuKS-hRxDn9p7QKTJuLjeSw@mail.gmail.com>

Hello:

I think I'm repeating myself here but we have two approaches to solve this
problem:
* The easiest one, at least for the L3 agent, is to report an INFO level
log before and after the full sync. That could be parsed by any tool to
detect that. You can propose a patch to the Neutron repository.
* https://bugs.launchpad.net/neutron/+bug/2011422: a more elaborated way to
report the agent status. That could provide the start flag, the revived
flag, the sync processing flag and many other ones that could be defined
only for this specific agent.

Regards.

On Mon, Mar 20, 2023 at 4:33?PM Jan Horstmann <J.Horstmann at mittwald.de>
wrote:

> On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote:
> > Hi,
> >
> > > Subject: Re: [neutron] detecting l3-agent readiness
> > >
> > > Hi,
> > >
> > > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze:
> > > > Hi Mohammed,
> > > >
> > > > > Subject: [neutron] detecting l3-agent readiness
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I'm working on improving the stability of rollouts when using
> Kubernetes as a control
> > > plane, specifically around the L3 agent, it seems that I have not
> found a clear way to
> > > detect in the code path where the L3 agent has finished it's initial
> sync..
> > > > >
> > > >
> > > > We build such a solution here:
> https://gitlab.com/yaook/images/neutron-l3-agent/-
> > > /blob/devel/files/startup_wait_for_ns.py
> > > > Basically we are checking against the neutron api what routers
> should be on the node and
> > > then validate that all keepalived processes are up and running.
> > >
> > > That would work only for HA routers. If You would also have routers
> which aren't "ha" this
> > > method may fail.
> > >
> >
> > Yep, since we only have HA routers that works fine for us. But I guess
> it should also work for non-ha routers without too much adoption (maybe
> just check for namespaces instead of keepalived).
> >
>
> Instead of counting processes I have been using the l3 agent's
> `configurations.routers` field to determine its readiness.
> From my understanding comparing this number with the number of active
> routers hosted by the agent should be a good indicator of its sync
> status.
> Using two api calls for this is inherently racy, but could be a
> sufficient workaround for environments with a moderate number of
> router events.
> So a simple test snippet for the sync status of all agents could be:
>
> ```
> import sys
> import openstack
> client = openstack.connection.Connection(
>    ...
> )
> l3_agent_synced = [
>     len([
>         router
>         for router in client.network.agent_hosted_routers(agent)
>             if router.is_admin_state_up
>     ]) <= client.network.get_agent(agent).configuration["routers"]
>     for agent in client.network.agents()
>         if agent.agent_type == "L3 agent"
>            and (agent.configuration["agent_mode"] == "dvr_snat"
>                 or agent.configuration["agent_mode"] == "legacy")
> ]
> if not all(l3_agent_synced):
>     sys.exit(1)
> ```
>
> Please let me know if I am way off with this approach :)
>
>
> > > >
> > > > > Am I missing it somewhere or is the architecture built in a way
> that doesn't really
> > > answer that question?
> > > > >
> > > >
> > > > Adding a option in the neutron api would be a lot nicer. But i guess
> that also counts
> > > for l2 and dhcp agents.
> > > >
> > > >
> > > > > Thanks
> > > > > Mohammed
> > > > >
> > > > >
> > > > > --
> > > > > Mohammed Naser
> > > > > VEXXHOST, Inc.
> > > >
> > > > --
> > > > Felix Huettner
> > > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur
> f?r die Verwertung
> > > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der
> vorgesehene Empf?nger
> > > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und
> l?schen diese E Mail.
> > > Hinweise zum Datenschutz finden Sie hier<
> https://www.datenschutz.schwarz>.
> > > >
> > >
> > >
> > > --
> > > Slawek Kaplonski
> > > Principal Software Engineer
> > > Red Hat
> >
> > --
> > Felix Huettner
> > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r
> die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht
> der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich
> in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie
> hier<https://www.datenschutz.schwarz>.
>
> --
> Jan Horstmann
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/00b0e5a7/attachment.htm>

From ts-takahashi at nec.com  Mon Mar 20 16:15:01 2023
From: ts-takahashi at nec.com (=?iso-2022-jp?B?VEFLQUhBU0hJIFRPU0hJQUtJKBskQjliNjYhIUlSTEAbKEIp?=)
Date: Mon, 20 Mar 2023 16:15:01 +0000
Subject: [openstack-helm][tacker] Proposal to provide HelmChart for Tacker
In-Reply-To: <YQXP288MB0012A1A9761B73C37C46B9A4A0809@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>
References: <TYVPR01MB10670BDBF9A3E5CC00348A5B693809@TYVPR01MB10670.jpnprd01.prod.outlook.com>
 <YQXP288MB0012A1A9761B73C37C46B9A4A0809@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>
Message-ID: <TYVPR01MB10670962AEB821A31F250C04293809@TYVPR01MB10670.jpnprd01.prod.outlook.com>

Hi Mohammed,


Thank you for your quick response.


If openstack-helm team will reserve PTG time and I can participate at the
time, I?ll participate it.

(My timezone is Asia Tokyo, and Tacker will have PTG at 6-8 UTC on 28th,
29th and 30th.)


Anyway, I?d like to proceed with development for Tacker?s helm chart!


Regards,

Toshiaki


From: Mohammed Naser <mnaser at vexxhost.com>
Sent: Tuesday, March 21, 2023 12:43 AM
To: TAKAHASHI TOSHIAKI(?????) <ts-takahashi at nec.com>;
openstack-discuss at lists.openstack.org; kozhukalov at gmail.com
Subject: Re: [openstack-helm][tacker]


Hi!


I think we?re always open to folks who are looking to contribute new
charts.  As a matter of fact, we?re in the process of adding Manila
support.


We?ve got a new PTL so perhaps it might be good to get them up to date on
the PTG and reserve ?space?.


I added Vladimir to this email thread so they can hopefully provide some
input too.


Thanks

Mohammed


From: TAKAHASHI TOSHIAKI(?????) <ts-takahashi at nec.com
<mailto:ts-takahashi at nec.com> >
Date: Monday, March 20, 2023 at 11:36 AM
To: openstack-discuss at lists.openstack.org
<mailto:openstack-discuss at lists.openstack.org>
<openstack-discuss at lists.openstack.org
<mailto:openstack-discuss at lists.openstack.org> >
Subject: [openstack-helm][tacker]

Hi Openstack-helm team,


I?m Toshiaki Takahashi, Tacker?s core developer.


To my understanding, OpenStack-helm does not currently provide a Tacker?s
helm chart.

Recently there have been some requests to deploy Tacker with helm, and we
would like to proceed with development of it if possible.


Do we need to take any action such as to participate meeting of
Openstack-helm project and propose our plan?

Or, if OpenStack-helm is planning to have a PTG, I?d like to propose the
plan at PTG, but is it planned? (I don't see any schedule at the moment).


Regards,

Toshiaki

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/2758f8cb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5764 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/2758f8cb/attachment-0001.bin>

From haleyb.dev at gmail.com  Mon Mar 20 18:46:59 2023
From: haleyb.dev at gmail.com (Brian Haley)
Date: Mon, 20 Mar 2023 14:46:59 -0400
Subject: [neutron] Bug deputy report for week of March 13th
Message-ID: <2c8cea57-7d58-4a2f-f416-be16a31dbea0@gmail.com>

Hi,

I was Neutron bug deputy last week. Below is a short summary about the 
reported bugs.

-Brian


High bugs
---------

* https://bugs.launchpad.net/neutron/+bug/2011573
   - [ovn-octavia-provider] Job pep8 failing due to bandit new lint rule
   - https://review.opendev.org/c/openstack/ovn-octavia-provider/+/877357

* https://bugs.launchpad.net/neutron/+bug/2011590
   - Startup times for large OVN dbs is greatly increased by 
frozen_row() calls
   - https://review.opendev.org/c/openstack/neutron/+/877383

* https://bugs.launchpad.net/neutron/+bug/2011600
   - functional test_get_datapath_id fails with 
neutron.common.utils.WaitTimeout: Timed out after 5 seconds
   - needs owner

* https://bugs.launchpad.net/neutron/+bug/2011800
   - ovn qos extension: update router does not remove no longer present 
qos rules
   - https://review.opendev.org/c/openstack/neutron/+/877603 (test only)
   - Needs code fix still

Medium bugs
-----------

* https://bugs.launchpad.net/neutron/+bug/2011377
   - test_agent_resync_on_non_existing_bridge failing intermittently sp
   - https://review.opendev.org/c/openstack/neutron/+/877535

* https://bugs.launchpad.net/neutron/+bug/2011724
   - [OVN] Method "create_metadata_port" should pass the "fixed_ips" 
when creating the port
   - https://review.opendev.org/c/openstack/neutron/+/877528

* https://bugs.launchpad.net/neutron/+bug/2012104
   - Neutron picking incorrect ovn records
   - Possibly related to https://bugs.launchpad.net/neutron/+bug/1951149
   - OVN chassis deleted issue - does user need to manually clean-up?

Low bugs
--------

* https://bugs.launchpad.net/neutron/+bug/2011687
   - O flag is not enabled when ipv6_ra_mode is dhcpv6-stateful
   - O=1 is actually unnecessary in this case with M=1 based on the RFC,
     will need to figure out how to update code and/or docs to be in sync
   - https://review.opendev.org/c/openstack/neutron/+/877601

Misc bugs
---------

* https://bugs.launchpad.net/neutron/+bug/2012144
   - [OVN] adding/removing floating IPs neutron server errors about 
binding port
   - Mech driver shows ovn-bridge-mappings=[], but ovn-sbctl has them
   - Asked for more information

Wishlist bugs
-------------

* https://bugs.launchpad.net/neutron/+bug/2011422
   - [RFE] The Neutron agents should report the sync process status

* https://bugs.launchpad.net/neutron/+bug/2012069
   - [OVN] Flooding issue on provider networks with disabled port security
   - https://review.opendev.org/c/openstack/neutron/+/877675


From swogatpradhan22 at gmail.com  Mon Mar 20 16:56:20 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Mon, 20 Mar 2023 22:26:20 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
Message-ID: <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>

Hi Jhon,
I checked in the ceph od dcn02, I can see the images created after
importing from the central site.
But launching an instance normally fails as it takes a long time for the
volume to get created.

When launching an instance from volume the instance is getting created
properly without any errors.

I tried to cache images in nova using
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
but getting checksum failed error.

With regards,
Swogat Pradhan

On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:

> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
> <swogatpradhan22 at gmail.com> wrote:
> >
> > Update: After restarting the nova services on the controller and running
> the deploy script on the edge site, I was able to launch the VM from volume.
> >
> > Right now the instance creation is failing as the block device creation
> is stuck in creating state, it is taking more than 10 mins for the volume
> to be created, whereas the image has already been imported to the edge
> glance.
>
> Try following this document and making the same observations in your
> environment for AZs and their local ceph cluster.
>
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>
> On a DCN site if you run a command like this:
>
> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
> /etc/ceph/dcn0.client.admin.keyring
> $ rbd --cluster dcn0 -p volumes ls -l
> NAME                                      SIZE  PARENT
>                           FMT PROT LOCK
> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
> $
>
> Then, you should see the parent of the volume is the image which is on
> the same local ceph cluster.
>
> I wonder if something is misconfigured and thus you're encountering
> the streaming behavior described here:
>
> Ideally all images should reside in the central Glance and be copied
> to DCN sites before instances of those images are booted on DCN sites.
> If an image is not copied to a DCN site before it is booted, then the
> image will be streamed to the DCN site and then the image will boot as
> an instance. This happens because Glance at the DCN site has access to
> the images store at the Central ceph cluster. Though the booting of
> the image will take time because it has not been copied in advance,
> this is still preferable to failing to boot the image.
>
> You can also exec into the cinder container at the DCN site and
> confirm it's using it's local ceph cluster.
>
>   John
>
> >
> > I will try and create a new fresh image and test again then update.
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>
> >> Update:
> >> In the hypervisor list the compute node state is showing down.
> >>
> >>
> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>
> >>> Hi Brendan,
> >>> Now i have deployed another site where i have used 2 linux bonds
> network template for both 3 compute nodes and 3 ceph nodes.
> >>> The bonding options is set to mode=802.3ad (lacp=active).
> >>> I used a cirros image to launch instance but the instance timed out so
> i waited for the volume to be created.
> >>> Once the volume was created i tried launching the instance from the
> volume and still the instance is stuck in spawning state.
> >>>
> >>> Here is the nova-compute log:
> >>>
> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep
> daemon starting
> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
> process running with uid/gid: 0/0
> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
> process running with capabilities (eff/prm/inh):
> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
> daemon running as pid 185437
> >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
> in _get_host_uuid: Unexpected error while running command.
> >>> Command: blkid overlay -s UUID -o value
> >>> Exit code: 2
> >>> Stdout: ''
> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> Unexpected error while running command.
> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
> >>>
> >>> It is stuck in creating image, do i need to run the template mentioned
> here ?:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> >>>
> >>> The volume is already created and i do not understand why the instance
> is stuck in spawning state.
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>>
> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com>
> wrote:
> >>>>
> >>>> Does your environment use different network interfaces for each of
> the networks? Or does it have a bond with everything on it?
> >>>>
> >>>> One issue I have seen before is that when launching instances, there
> is a lot of network traffic between nodes as the hypervisor needs to
> download the image from Glance. Along with various other services sending
> normal network traffic, it can be enough to cause issues if everything is
> running over a single 1Gbe interface.
> >>>>
> >>>> I have seen the same situation in fact when using a single
> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
> while you try to spawn the instance to see if you?re dropping packets. In
> the situation I described, there were dropped packets which resulted in a
> loss of communication between nova_compute and RMQ, so the node appeared
> offline. You should also confirm that nova_compute is being disconnected in
> the nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
> >>>>
> >>>> In my case, changing from active/backup to LACP helped. So, based on
> that experience, from my perspective, is certainly sounds like some kind of
> network issue.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Brendan Shephard
> >>>> Senior Software Engineer
> >>>> Red Hat Australia
> >>>>
> >>>>
> >>>>
> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I tried to help someone with a similar issue some time ago in this
> thread:
> >>>>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
> >>>>
> >>>> But apparently a neutron reinstallation fixed it for that user, not
> sure if that could apply here. But is it possible that your nova and
> neutron versions are different between central and edge site? Have you
> restarted nova and neutron services on the compute nodes after
> installation? Have you debug logs of nova-conductor and maybe nova-compute?
> Maybe they can help narrow down the issue.
> >>>> If there isn't any additional information in the debug logs I
> probably would start "tearing down" rabbitmq. I didn't have to do that in a
> production system yet so be careful. I can think of two routes:
> >>>>
> >>>> - Either remove queues, exchanges etc. while rabbit is running, this
> will most likely impact client IO depending on your load. Check out the
> rabbitmqctl commands.
> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all
> nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
> >>>>
> >>>> I can imagine that the failed reply "survives" while being replicated
> across the rabbit nodes. But I don't really know the rabbit internals too
> well, so maybe someone else can chime in here and give a better advice.
> >>>>
> >>>> Regards,
> >>>> Eugen
> >>>>
> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>>
> >>>> Hi,
> >>>> Can someone please help me out on this issue?
> >>>>
> >>>> With regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> wrote:
> >>>>
> >>>> Hi
> >>>> I don't see any major packet loss.
> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to
> packet
> >>>> loss.
> >>>>
> >>>> with regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> wrote:
> >>>>
> >>>> Hi,
> >>>> Yes the MTU is the same as the default '1500'.
> >>>> Generally I haven't seen any packet loss, but never checked when
> >>>> launching the instance.
> >>>> I will check that and come back.
> >>>> But everytime i launch an instance the instance gets stuck at spawning
> >>>> state and there the hypervisor becomes down, so not sure if packet
> loss
> >>>> causes this.
> >>>>
> >>>> With regards,
> >>>> Swogat pradhan
> >>>>
> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
> >>>>
> >>>> One more thing coming to mind is MTU size. Are they identical between
> >>>> central and edge site? Do you see packet loss through the tunnel?
> >>>>
> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>>
> >>>> > Hi Eugen,
> >>>> > Request you to please add my email either on 'to' or 'cc' as i am
> not
> >>>> > getting email's from you.
> >>>> > Coming to the issue:
> >>>> >
> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies
> -p
> >>>> /
> >>>> > Listing policies for vhost "/" ...
> >>>> > vhost   name    pattern apply-to        definition      priority
> >>>> > /       ha-all  ^(?!amq\.).*    queues
> >>>> >
> >>>>
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >>>> >
> >>>> > I have the edge site compute nodes up, it only goes down when i am
> >>>> trying
> >>>> > to launch an instance and the instance comes to a spawning state and
> >>>> then
> >>>> > gets stuck.
> >>>> >
> >>>> > I have a tunnel setup between the central and the edge sites.
> >>>> >
> >>>> > With regards,
> >>>> > Swogat Pradhan
> >>>> >
> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> >> Hi Eugen,
> >>>> >> For some reason i am not getting your email to me directly, i am
> >>>> checking
> >>>> >> the email digest and there i am able to find your reply.
> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> >>>> >> Yes, these logs are from the time when the issue occurred.
> >>>> >>
> >>>> >> *Note: i am able to create vm's and perform other activities in the
> >>>> >> central site, only facing this issue in the edge site.*
> >>>> >>
> >>>> >> With regards,
> >>>> >> Swogat Pradhan
> >>>> >>
> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com>
> >>>> >> wrote:
> >>>> >>
> >>>> >>> Hi Eugen,
> >>>> >>> Thanks for your response.
> >>>> >>> I have actually a 4 controller setup so here are the details:
> >>>> >>>
> >>>> >>> *PCS Status:*
> >>>> >>>   * Container bundle set: rabbitmq-bundle [
> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
> >>>> Started
> >>>> >>> overcloud-controller-no-ceph-3
> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
> >>>> Started
> >>>> >>> overcloud-controller-2
> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
> >>>> Started
> >>>> >>> overcloud-controller-1
> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
> >>>> Started
> >>>> >>> overcloud-controller-0
> >>>> >>>
> >>>> >>> I have tried restarting the bundle multiple times but the issue is
> >>>> still
> >>>> >>> present.
> >>>> >>>
> >>>> >>> *Cluster status:*
> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> >>>> >>> Cluster status of node
> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> >>>> >>> Basics
> >>>> >>>
> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>> >>>
> >>>> >>> Disk Nodes
> >>>> >>>
> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>
> >>>> >>> Running Nodes
> >>>> >>>
> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>
> >>>> >>> Versions
> >>>> >>>
> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
> >>>> 3.8.3
> >>>> >>> on Erlang 22.3.4.1
> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
> >>>> 3.8.3
> >>>> >>> on Erlang 22.3.4.1
> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
> >>>> 3.8.3
> >>>> >>> on Erlang 22.3.4.1
> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> >>>> RabbitMQ
> >>>> >>> 3.8.3 on Erlang 22.3.4.1
> >>>> >>>
> >>>> >>> Alarms
> >>>> >>>
> >>>> >>> (none)
> >>>> >>>
> >>>> >>> Network Partitions
> >>>> >>>
> >>>> >>> (none)
> >>>> >>>
> >>>> >>> Listeners
> >>>> >>>
> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
> CLI
> >>>> tool
> >>>> >>> communication
> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>>> >>> and AMQP 1.0
> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
> CLI
> >>>> tool
> >>>> >>> communication
> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>>> >>> and AMQP 1.0
> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
> CLI
> >>>> tool
> >>>> >>> communication
> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> >>>> >>> and AMQP 1.0
> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> interface:
> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> ,
> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
> >>>> inter-node and
> >>>> >>> CLI tool communication
> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> ,
> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose:
> AMQP
> >>>> 0-9-1
> >>>> >>> and AMQP 1.0
> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> ,
> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>
> >>>> >>> Feature flags
> >>>> >>>
> >>>> >>> Flag: drop_unroutable_metric, state: enabled
> >>>> >>> Flag: empty_basic_get_metric, state: enabled
> >>>> >>> Flag: implicit_default_bindings, state: enabled
> >>>> >>> Flag: quorum_queue, state: enabled
> >>>> >>> Flag: virtual_host_metadata, state: enabled
> >>>> >>>
> >>>> >>> *Logs:*
> >>>> >>> *(Attached)*
> >>>> >>>
> >>>> >>> With regards,
> >>>> >>> Swogat Pradhan
> >>>> >>>
> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> >>>> swogatpradhan22 at gmail.com>
> >>>> >>> wrote:
> >>>> >>>
> >>>> >>>> Hi,
> >>>> >>>> Please find the nova conductor as well as nova api log.
> >>>> >>>>
> >>>> >>>> nova-conuctor:
> >>>> >>>>
> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply
> to
> >>>> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply
> to
> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply
> to
> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
> >>>> due to a
> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
> >>>> Abandoning...:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply
> to
> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
> >>>> due to a
> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>> Abandoning...:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply
> to
> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
> >>>> due to a
> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>> Abandoning...:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
> >>>> with
> >>>> >>>> backend dogpile.cache.null.
> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply
> to
> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
> >>>> due to a
> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>> Abandoning...:
> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>
> >>>> >>>> With regards,
> >>>> >>>> Swogat Pradhan
> >>>> >>>>
> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>
> >>>> >>>>> Hi,
> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am
> trying to
> >>>> >>>>> launch vm's.
> >>>> >>>>> When the VM is in spawning state the node goes down (openstack
> >>>> compute
> >>>> >>>>> service list), the node comes backup when i restart the nova
> >>>> compute
> >>>> >>>>> service but then the launch of the vm fails.
> >>>> >>>>>
> >>>> >>>>> nova-compute.log
> >>>> >>>>>
> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
> >>>> >>>>> instance usage
> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
> 07:00:00
> >>>> to
> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
> >>>> >>>>> dcn01-hci-0.bdxworld.com
> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
> >>>> name:
> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
> enabled
> >>>> with
> >>>> >>>>> backend dogpile.cache.null.
> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
> >>>> >>>>> privsep helper:
> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> >>>> 'privsep-helper',
> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
> >>>> privsep
> >>>> >>>>> daemon via rootwrap
> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>> >>>>> daemon starting
> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>> >>>>> process running with uid/gid: 0/0
> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>> >>>>> process running with capabilities (eff/prm/inh):
> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>> >>>>> daemon running as pid 2647
> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> >>>> os_brick.initiator.connectors.nvmeof
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
> >>>> >>>>> execution error
> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
> >>>> >>>>> Command: blkid overlay -s UUID -o value
> >>>> >>>>> Exit code: 2
> >>>> >>>>> Stdout: ''
> >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> >>>> >>>>> Unexpected error while running command.
> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>> >>>>>
> >>>> >>>>> Is there a way to solve this issue?
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> With regards,
> >>>> >>>>>
> >>>> >>>>> Swogat Pradhan
> >>>> >>>>>
> >>>> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/30eef6bd/attachment-0001.htm>

From yasufum.o at gmail.com  Mon Mar 20 19:38:46 2023
From: yasufum.o at gmail.com (Yasufumi Ogawa)
Date: Tue, 21 Mar 2023 04:38:46 +0900
Subject: [tacker][ptg] Bobcat vPTG Planning
Message-ID: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com>

Hi team,

We are going to have the Bobcat vPTG through three days, 28-30 Mar 
6am-8am UTC as agreed at the IRC meeting last week. I've booked rooms 
for the sessions and uploaded etherpad [1]. Please feel free to add your 
proposal on the etherpad.

[1] https://etherpad.opendev.org/p/tacker-bobcat-ptg

Thanks,
Yasufumi


From yasufum.o at gmail.com  Mon Mar 20 19:50:33 2023
From: yasufum.o at gmail.com (Yasufumi Ogawa)
Date: Tue, 21 Mar 2023 04:50:33 +0900
Subject: [tacker] Cancelling next two IRC meetings
Message-ID: <2aa1abc9-5a7b-3af2-6104-3b3fa4043e2c@gmail.com>

Hi,

I'd like to skip the next two IRC meetings due to a holiday tomorrow for 
many of us joining from Japan and next week for the bobcat vPTG. I'm 
looking forward to meet you guys on the vPTG!

Cheers,
Yasufumi


From ihrachys at redhat.com  Mon Mar 20 21:18:00 2023
From: ihrachys at redhat.com (Ihar Hrachyshka)
Date: Mon, 20 Mar 2023 17:18:00 -0400
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <3840757.STTH5IQzZg@p1>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
 <3840757.STTH5IQzZg@p1>
Message-ID: <CAKwN9=DQbon5=v+7O-hSyZnn1rRFYxNVBfJniMR8_TpZokZ5pw@mail.gmail.com>

On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski <skaplons at redhat.com> wrote:
>
> Hi,
>
>
> Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
>
> > Hi all,
>
> >
>
> > (I've tagged the thread with [ovn] because this question was raised in
>
> > the context of OVN, but it really is about the intent of neutron
>
> > stateless SG API.)
>
> >
>
> > Neutron API supports 'stateless' field for security groups:
>
> > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
>
> >
>
> > The API reference doesn't explain the intent of the API, merely
>
> > walking through the field mechanics, as in
>
> >
>
> > "The stateful security group extension (stateful-security-group) adds
>
> > the stateful field to security groups, allowing users to configure
>
> > stateful or stateless security groups for ports. The existing security
>
> > groups will all be considered as stateful. Update of the stateful
>
> > attribute is allowed when there is no port associated with the
>
> > security group."
>
> >
>
> > The meaning of the API is left for users to deduce. It's customary
>
> > understood as something like
>
> >
>
> > "allowing to bypass connection tracking in the firewall, potentially
>
> > providing performance and simplicity benefits" (while imposing
>
> > additional complexity onto rule definitions - the user now has to
>
> > explicitly define rules for both directions of a duplex connection.)
>
> > [This is not an official definition, nor it's quoted from a respected
>
> > source, please don't criticize it. I don't think this is an important
>
> > point here.]
>
> >
>
> > Either way, the definition doesn't explain what should happen with
>
> > basic network services that a user of Neutron SG API is used to rely
>
> > on. Specifically, what happens for a port related to a stateless SG
>
> > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
>
> > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
>
> > procedure to configure its IPv6 stack.
>
> >
>
> > As part of our testing of stateless SG implementation for OVN backend,
>
> > we've noticed that VMs fail to configure via metadata, or use SLAAC to
>
> > configure IPv6.
>
> >
>
> > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
>
> > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
>
> >
>
> > We've noticed that adding explicit SG rules to allow 'returning'
>
> > communication for 169.254.169.254:80 and RA / NA fixes the problem.
>
> >
>
> > I figured that these services are "base" / "basic" and should be
>
> > provided to ports regardless of the stateful-ness of SG. I proposed
>
> > patches for this here:
>
> >
>
> > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
>
> > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
>
> >
>
> > Discussion in the patch that adjusts the existing stateless SG test
>
> > scenarios to not create explicit SG rules for metadata and ICMP
>
> > replies suggests that it's not a given / common understanding that
>
> > these "base" services should work by default for stateless SGs.
>
> >
>
> > See discussion in comments here:
>
> > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
>
> >
>
> > While this discussion is happening in the context of OVN, I think it
>
> > should be resolved in a broader context. Specifically, a decision
>
> > should be made about what Neutron API "means" by stateless SGs, and
>
> > how "base" services are supposed to behave. Then backends can act
>
> > accordingly.
>
> >
>
> > There's also an open question of how this should be implemented.
>
> > Whether Neutron would like to create explicit SG rules visible in API
>
> > that would allow for the returning traffic and that could be deleted
>
> > as needed, or whether backends should do it implicitly. We already
>
> > have "default" egress rules, so there's a precedent here. On the other
>
> > hand, the egress rules are broad (allowing everything) and there's
>
> > more rationale to delete them and replace them with tighter filters.
>
> > In my OVN series, I implement ACLs directly in OVN database, without
>
> > creating SG rules in Neutron API.
>
> >
>
> > So, questions for the community to clarify:
>
> > - whether Neutron API should define behavior of stateless SGs in general,
>
> > - if so, whether Neutron API should also define behavior of stateless
>
> > SGs in terms of "base" services like metadata and DHCP,
>
> > - if so, whether backends should implement the necessary filters
>
> > themselves, or Neutron will create default SG rules itself.
>
>
> I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
>
> We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
>
> If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
>

Thanks for clarifying the rationale on picking SG rules and not
per-backend implementation.

What would be your answer to the two other questions in the list
above, specifically, "whether Neutron API should define behavior of
stateless SGs in general" and "whether Neutron API should define
behavior of stateless SGs in relation to metadata / RA / NA". Once we
have agreement on these points, we can discuss the exact mechanism -
whether to implement in backend or in API. But these two questions are
first order in my view.

(To give an idea of my thinking, I believe API definition should not
only define fields and their mechanics but also semantics, so

- yes, api-ref should define the meaning ("behavior") of stateless SG
in general, and
- yes, api-ref should also define the meaning ("behavior") of
stateless SG in relation to "standard" services like ipv6 addressing
or metadata.

As to the last question - whether it's up to ml2 backend to implement
the behavior, or up to the core SG database plugin - I don't have a
strong opinion. I lean to "backend" solution just because it allows
for more granular definition because SG rules may not express some
filter rules, e.g. source port for metadata replies (an unfortunate
limitation of SG API that we inherited from AWS?). But perhaps others
prefer paying the price for having neutron ml2 plugin enforcing the
behavior consistently across all backends.

>
> >
>
> > I hope I laid the problem out clearly, let me know if anything needs
>
> > clarification or explanation.
>
>
> Yes :) At least for me.
>
>
> >
>
> > Yours,
>
> > Ihar
>
> >
>
> >
>
> >
>
>
>
> --
>
> Slawek Kaplonski
>
> Principal Software Engineer
>
> Red Hat


From jay at gr-oss.io  Mon Mar 20 21:39:24 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Mon, 20 Mar 2023 14:39:24 -0700
Subject: [ironic][ptg] vPTG scheduling
In-Reply-To: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
References: <CA+sTGNdD8J0nMACFDT80WEYxp3EBw+YS_nvoe8pHKCvMfH3p=w@mail.gmail.com>
Message-ID: <CA+sTGNf2R3AFN34K0VyH9j3Zphj2RXNwcb_skVhb0rfa0WbyOw@mail.gmail.com>

Hey all,

Based on the results of our quick vPTG sync this morning, I've done the
following:
* I booked one additional slot for Ironic; Wednesday 1600 UTC - 1700 UTC,
to ensure we'd have plenty of discussion time once accounting for breaks
that we'll certainly need.
* I've tentatively scheduled all topics here:
https://etherpad.opendev.org/p/ironic-bobcat-ptg -- please review, if
there's anything that creates a hardship lets work it out, in the the
etherpad, IRC, or on the mail list :).

Thanks, looking forward to planning another release of Ironic with you all!

-Jay


On Thu, Mar 9, 2023 at 3:15?PM Jay Faulkner <jay at gr-oss.io> wrote:

> Hey all,
>
> The vPTG will be upon us soon, the week of March 27.
>
> I booked the following times on behalf of Ironic + BM SIG Operator hour,
> in accordance with what times worked in Antelope. It's my hope that since
> we've had little contributor turnover, these times continue to work. I'm
> completely open to having things moved around if it's more convenient to
> participants.
>
> I've booked the following times, all in Folsom:
> - Tuesday 1400 UTC - 1700 UTC
> - Wednesday 1300 UTC Operator hour: baremetal SIG
> - Wednesday 1400 UTC - 1600 UTC
> - Wednesday 2200 - 2300 UTC
>
>
> I propose that after the Ironic meeting on March 20, we shortly sync up in
> the Bobcat PTG etherpad (https://etherpad.opendev.org/p/ironic-bobcat-ptg)
> to pick topics and assign time.
>
>
> Again, this is all meant to be a suggestion, I'm happy to move things
> around but didn't want us to miss out on getting things booked.
>
>
> -
> Jay Faulkner
> Ironic PTL
> TC Member
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230320/7cd87308/attachment.htm>

From kamil.madac at gmail.com  Tue Mar 21 10:27:37 2023
From: kamil.madac at gmail.com (Kamil Madac)
Date: Tue, 21 Mar 2023 11:27:37 +0100
Subject: [neutron]
In-Reply-To: <LO2P265MB5773804149D68F2756AD206E9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
References: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
 <LO2P265MB5773804149D68F2756AD206E9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
Message-ID: <CAP9pT+GoVpV7OyYwxPY+Xf9=SQrKdueOXGufr1TgtpHw-PktTg@mail.gmail.com>

Hi Danny,

thanks for sharing your positive experience. I'm going to deploy OVN in dev
environment with kolla-ansible. Maybe one more question. Is there any
official way to migrate from OVS to OVN with kolla-ansible, or have you
used the official migration script
https://docs.openstack.org/networking-ovn/latest/install/migration.html?


On Thu, Mar 16, 2023 at 5:43?PM Danny Webb <Danny.Webb at thehutgroup.com>
wrote:

> Hi Kamil,
>
> We're currently running 4 (soon to be 5) production regions all using
> kolla ansible as our deployer with OVN as our neutron backend.  It's been
> fairly solid for us and we've  had less issues with OVN than the
> traditional hybrid OVS / Iptables neutron driver (which we ran for about a
> year before switching to OVN).  Our regions are anywhere from 50-60 compute
> hosts with 1-2k+ VMs per region.  As far as I know most of the new
> development is going into OVN so would be a good place to start.
> Ultimately, we've only really had 2 real issues whilst running it.  First
> was an issue where we had the provider network spamming gateway changes
> into southbound as we had our anycast SVI bound to our top of rack switches
> which made OVN keep updating it's location.  We mitigated this by moving
> the provider SVIs to our border routers and the issue went away and dropped
> the load on our OVN controllers significantly.   Only other real issue we
> had was during an upgrade of a region we ended up with what we believed to
> be some sort of stale flows that resulted in some hypervisors losing
> connectivity until we rebooted them.
>
> Hope this helps!
>
> Cheers,
>
> Danny
> ------------------------------
> *From:* Kamil Madac <kamil.madac at gmail.com>
> *Sent:* 14 March 2023 09:46
> *To:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Subject:* [neutron]
>
>
> * CAUTION: This email originates from outside THG *
> ------------------------------
> Hi All,
>
> I'm in the process of planning a small public cloud based on OpenStack. I
> have quite experience with kolla-ansible deployments which use OVS
> networking and I have no issues with that. It works stable for my use cases
> (Vlan provider networks, DVR, tenant networks, floating IPs).
>
> For that new deployment I'm looking at OVN deployment which from what I
> read should be more performant (faster build of instances) and with ability
> to cover more networking features in OVN instead of needing external
> software like iptables/dnsmasq.
>
> Does anyone use OVN in production and what is your experience (pros/cons)?
> Is OVN mature enough to replace OVS in the production deployment (are
> there some basic features from OVS missing)?
>
> Thanks in advance for sharing the experience.
>
> --
> Kamil Madac <https://kmadac.github.io/>
>
> *Danny Webb*
> Principal OpenStack Engineer
> Danny.Webb at thehutgroup.com
> [image: THG Ingenuity Logo] <https://www.thg.com>
> <https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>
> <https://twitter.com/thgplc?lang=en>
>


-- 
Kamil Madac <https://kmadac.github.io/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/e5ed6dcc/attachment-0001.htm>

From Danny.Webb at thehutgroup.com  Tue Mar 21 11:22:43 2023
From: Danny.Webb at thehutgroup.com (Danny Webb)
Date: Tue, 21 Mar 2023 11:22:43 +0000
Subject: [neutron]
In-Reply-To: <CAP9pT+GoVpV7OyYwxPY+Xf9=SQrKdueOXGufr1TgtpHw-PktTg@mail.gmail.com>
References: <CAP9pT+Es_G8ejCyWRQH1x+p63eAyGgRZFOK6VOnZRKSybcW=KQ@mail.gmail.com>
 <LO2P265MB5773804149D68F2756AD206E9ABC9@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>
 <CAP9pT+GoVpV7OyYwxPY+Xf9=SQrKdueOXGufr1TgtpHw-PktTg@mail.gmail.com>
Message-ID: <LO2P265MB57733031035FE4C80FAB3AA99A819@LO2P265MB5773.GBRP265.PROD.OUTLOOK.COM>

Not via kolla-ansible as far as I know, tripleo had some migration steps built in that you may be able to mimic.  When we did our migration we had our internal tenants evacuate the region and we rebuilt in situ.
________________________________
From: Kamil Madac <kamil.madac at gmail.com>
Sent: 21 March 2023 10:27
To: Danny Webb <Danny.Webb at thehutgroup.com>
Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: Re: [neutron]


CAUTION: This email originates from outside THG

________________________________
Hi Danny,

thanks for sharing your positive experience. I'm going to deploy OVN in dev environment with kolla-ansible. Maybe one more question. Is there any official way to migrate from OVS to OVN with kolla-ansible, or have you used the official migration script https://docs.openstack.org/networking-ovn/latest/install/migration.html<https://docs.openstack.org/networking-ovn/latest/install/migration.html?>?


On Thu, Mar 16, 2023 at 5:43?PM Danny Webb <Danny.Webb at thehutgroup.com<mailto:Danny.Webb at thehutgroup.com>> wrote:
Hi Kamil,

We're currently running 4 (soon to be 5) production regions all using kolla ansible as our deployer with OVN as our neutron backend.  It's been fairly solid for us and we've  had less issues with OVN than the traditional hybrid OVS / Iptables neutron driver (which we ran for about a year before switching to OVN).  Our regions are anywhere from 50-60 compute hosts with 1-2k+ VMs per region.  As far as I know most of the new development is going into OVN so would be a good place to start.  Ultimately, we've only really had 2 real issues whilst running it.  First was an issue where we had the provider network spamming gateway changes into southbound as we had our anycast SVI bound to our top of rack switches which made OVN keep updating it's location.  We mitigated this by moving the provider SVIs to our border routers and the issue went away and dropped the load on our OVN controllers significantly.   Only other real issue we had was during an upgrade of a region we ended up with what we believed to be some sort of stale flows that resulted in some hypervisors losing connectivity until we rebooted them.

Hope this helps!

Cheers,

Danny
________________________________
From: Kamil Madac <kamil.madac at gmail.com<mailto:kamil.madac at gmail.com>>
Sent: 14 March 2023 09:46
To: openstack-discuss <openstack-discuss at lists.openstack.org<mailto:openstack-discuss at lists.openstack.org>>
Subject: [neutron]


CAUTION: This email originates from outside THG

________________________________
Hi All,

I'm in the process of planning a small public cloud based on OpenStack. I have quite experience with kolla-ansible deployments which use OVS networking and I have no issues with that. It works stable for my use cases (Vlan provider networks, DVR, tenant networks, floating IPs).

For that new deployment I'm looking at OVN deployment which from what I read should be more performant (faster build of instances) and with ability to cover more networking features in OVN instead of needing external software like iptables/dnsmasq.

Does anyone use OVN in production and what is your experience (pros/cons)?
Is OVN mature enough to replace OVS in the production deployment (are there some basic features from OVS missing)?

Thanks in advance for sharing the experience.

--
Kamil Madac<https://kmadac.github.io/>

Danny Webb
Principal OpenStack Engineer
Danny.Webb at thehutgroup.com<mailto:Danny.Webb at thehutgroup.com>
[THG Ingenuity Logo]<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>  [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgplc?lang=en>


--
Kamil Madac<https://kmadac.github.io/>

Danny Webb
Principal OpenStack Engineer
Danny.Webb at thehutgroup.com
[THG Ingenuity Logo]<https://www.thg.com>
[https://i.imgur.com/wbpVRW6.png]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk>  [https://i.imgur.com/c3040tr.png] <https://twitter.com/thgplc?lang=en>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/e1672043/attachment.htm>

From pdeore at redhat.com  Tue Mar 21 11:30:55 2023
From: pdeore at redhat.com (Pranali Deore)
Date: Tue, 21 Mar 2023 17:00:55 +0530
Subject: [Glance]Weekly Meeting Cancelled for this week
Message-ID: <CADkbuWgybggRNpiqKsgKSXqyaBuhr9Tfqy2-8D8hBmG6GxwY-A@mail.gmail.com>

Hello,

As discussed during last weekly meeting[1], Glance upstream weekly meeting
for
this week i.e., 23rd March, 2023 has been cancelled.


See you all at PTG !

Thanks & Regards,
Pranali


[1]:
https://meetings.opendev.org/irclogs/%23openstack-meeting/%23openstack-meeting.2023-03-16.log.html#t2023-03-16T14:38:06
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/26b6485b/attachment-0001.htm>

From gthiemonge at redhat.com  Tue Mar 21 11:41:57 2023
From: gthiemonge at redhat.com (Gregory Thiemonge)
Date: Tue, 21 Mar 2023 12:41:57 +0100
Subject: [Octavia][PTG] Bobcat PTG planning
Message-ID: <CAKsmYT32orpRzY8bZQ+N51p4=71gPPpkqoojbR9+LRsiM987EA@mail.gmail.com>

Hi Folks,

A reminder: the Octavia PTG will be on March 28th (14:00-18:00 UTC),
There's a dedicated etherpad for this session:

https://etherpad.opendev.org/p/bobcat-ptg-octavia

Feel free to add your topic,

Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/fd67e2c3/attachment.htm>

From swogatpradhan22 at gmail.com  Tue Mar 21 04:06:08 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 21 Mar 2023 09:36:08 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
Message-ID: <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>

Update:
I uploaded an image directly to the dcn02 store, and it takes around 10,15
minutes to create a volume with image in dcn02.
The image size is 389 MB.

On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Jhon,
> I checked in the ceph od dcn02, I can see the images created after
> importing from the central site.
> But launching an instance normally fails as it takes a long time for the
> volume to get created.
>
> When launching an instance from volume the instance is getting created
> properly without any errors.
>
> I tried to cache images in nova using
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> but getting checksum failed error.
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:
>
>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>> <swogatpradhan22 at gmail.com> wrote:
>> >
>> > Update: After restarting the nova services on the controller and
>> running the deploy script on the edge site, I was able to launch the VM
>> from volume.
>> >
>> > Right now the instance creation is failing as the block device creation
>> is stuck in creating state, it is taking more than 10 mins for the volume
>> to be created, whereas the image has already been imported to the edge
>> glance.
>>
>> Try following this document and making the same observations in your
>> environment for AZs and their local ceph cluster.
>>
>>
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>
>> On a DCN site if you run a command like this:
>>
>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>> /etc/ceph/dcn0.client.admin.keyring
>> $ rbd --cluster dcn0 -p volumes ls -l
>> NAME                                      SIZE  PARENT
>>                           FMT PROT LOCK
>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>> $
>>
>> Then, you should see the parent of the volume is the image which is on
>> the same local ceph cluster.
>>
>> I wonder if something is misconfigured and thus you're encountering
>> the streaming behavior described here:
>>
>> Ideally all images should reside in the central Glance and be copied
>> to DCN sites before instances of those images are booted on DCN sites.
>> If an image is not copied to a DCN site before it is booted, then the
>> image will be streamed to the DCN site and then the image will boot as
>> an instance. This happens because Glance at the DCN site has access to
>> the images store at the Central ceph cluster. Though the booting of
>> the image will take time because it has not been copied in advance,
>> this is still preferable to failing to boot the image.
>>
>> You can also exec into the cinder container at the DCN site and
>> confirm it's using it's local ceph cluster.
>>
>>   John
>>
>> >
>> > I will try and create a new fresh image and test again then update.
>> >
>> > With regards,
>> > Swogat Pradhan
>> >
>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>
>> >> Update:
>> >> In the hypervisor list the compute node state is showing down.
>> >>
>> >>
>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>>
>> >>> Hi Brendan,
>> >>> Now i have deployed another site where i have used 2 linux bonds
>> network template for both 3 compute nodes and 3 ceph nodes.
>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>> >>> I used a cirros image to launch instance but the instance timed out
>> so i waited for the volume to be created.
>> >>> Once the volume was created i tried launching the instance from the
>> volume and still the instance is stuck in spawning state.
>> >>>
>> >>> Here is the nova-compute log:
>> >>>
>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep
>> daemon starting
>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
>> process running with uid/gid: 0/0
>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>> process running with capabilities (eff/prm/inh):
>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>> daemon running as pid 185437
>> >>> 2023-03-15 17:35:47.974 8 WARNING
>> os_brick.initiator.connectors.nvmeof
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>> in _get_host_uuid: Unexpected error while running command.
>> >>> Command: blkid overlay -s UUID -o value
>> >>> Exit code: 2
>> >>> Stdout: ''
>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>> Unexpected error while running command.
>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>> >>>
>> >>> It is stuck in creating image, do i need to run the template
>> mentioned here ?:
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>> >>>
>> >>> The volume is already created and i do not understand why the
>> instance is stuck in spawning state.
>> >>>
>> >>> With regards,
>> >>> Swogat Pradhan
>> >>>
>> >>>
>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com>
>> wrote:
>> >>>>
>> >>>> Does your environment use different network interfaces for each of
>> the networks? Or does it have a bond with everything on it?
>> >>>>
>> >>>> One issue I have seen before is that when launching instances, there
>> is a lot of network traffic between nodes as the hypervisor needs to
>> download the image from Glance. Along with various other services sending
>> normal network traffic, it can be enough to cause issues if everything is
>> running over a single 1Gbe interface.
>> >>>>
>> >>>> I have seen the same situation in fact when using a single
>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>> while you try to spawn the instance to see if you?re dropping packets. In
>> the situation I described, there were dropped packets which resulted in a
>> loss of communication between nova_compute and RMQ, so the node appeared
>> offline. You should also confirm that nova_compute is being disconnected in
>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>> instance.
>> >>>>
>> >>>> In my case, changing from active/backup to LACP helped. So, based on
>> that experience, from my perspective, is certainly sounds like some kind of
>> network issue.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Brendan Shephard
>> >>>> Senior Software Engineer
>> >>>> Red Hat Australia
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I tried to help someone with a similar issue some time ago in this
>> thread:
>> >>>>
>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>> >>>>
>> >>>> But apparently a neutron reinstallation fixed it for that user, not
>> sure if that could apply here. But is it possible that your nova and
>> neutron versions are different between central and edge site? Have you
>> restarted nova and neutron services on the compute nodes after
>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>> Maybe they can help narrow down the issue.
>> >>>> If there isn't any additional information in the debug logs I
>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>> production system yet so be careful. I can think of two routes:
>> >>>>
>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this
>> will most likely impact client IO depending on your load. Check out the
>> rabbitmqctl commands.
>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all
>> nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>> >>>>
>> >>>> I can imagine that the failed reply "survives" while being
>> replicated across the rabbit nodes. But I don't really know the rabbit
>> internals too well, so maybe someone else can chime in here and give a
>> better advice.
>> >>>>
>> >>>> Regards,
>> >>>> Eugen
>> >>>>
>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>> >>>>
>> >>>> Hi,
>> >>>> Can someone please help me out on this issue?
>> >>>>
>> >>>> With regards,
>> >>>> Swogat Pradhan
>> >>>>
>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> Hi
>> >>>> I don't see any major packet loss.
>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to
>> packet
>> >>>> loss.
>> >>>>
>> >>>> with regards,
>> >>>> Swogat Pradhan
>> >>>>
>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Yes the MTU is the same as the default '1500'.
>> >>>> Generally I haven't seen any packet loss, but never checked when
>> >>>> launching the instance.
>> >>>> I will check that and come back.
>> >>>> But everytime i launch an instance the instance gets stuck at
>> spawning
>> >>>> state and there the hypervisor becomes down, so not sure if packet
>> loss
>> >>>> causes this.
>> >>>>
>> >>>> With regards,
>> >>>> Swogat pradhan
>> >>>>
>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>> >>>>
>> >>>> One more thing coming to mind is MTU size. Are they identical between
>> >>>> central and edge site? Do you see packet loss through the tunnel?
>> >>>>
>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>> >>>>
>> >>>> > Hi Eugen,
>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am
>> not
>> >>>> > getting email's from you.
>> >>>> > Coming to the issue:
>> >>>> >
>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>> list_policies -p
>> >>>> /
>> >>>> > Listing policies for vhost "/" ...
>> >>>> > vhost   name    pattern apply-to        definition      priority
>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>> >>>> >
>> >>>>
>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>> >>>> >
>> >>>> > I have the edge site compute nodes up, it only goes down when i am
>> >>>> trying
>> >>>> > to launch an instance and the instance comes to a spawning state
>> and
>> >>>> then
>> >>>> > gets stuck.
>> >>>> >
>> >>>> > I have a tunnel setup between the central and the edge sites.
>> >>>> >
>> >>>> > With regards,
>> >>>> > Swogat Pradhan
>> >>>> >
>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>> >>>> swogatpradhan22 at gmail.com>
>> >>>> > wrote:
>> >>>> >
>> >>>> >> Hi Eugen,
>> >>>> >> For some reason i am not getting your email to me directly, i am
>> >>>> checking
>> >>>> >> the email digest and there i am able to find your reply.
>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>> >>>> >> Yes, these logs are from the time when the issue occurred.
>> >>>> >>
>> >>>> >> *Note: i am able to create vm's and perform other activities in
>> the
>> >>>> >> central site, only facing this issue in the edge site.*
>> >>>> >>
>> >>>> >> With regards,
>> >>>> >> Swogat Pradhan
>> >>>> >>
>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>> >>>> swogatpradhan22 at gmail.com>
>> >>>> >> wrote:
>> >>>> >>
>> >>>> >>> Hi Eugen,
>> >>>> >>> Thanks for your response.
>> >>>> >>> I have actually a 4 controller setup so here are the details:
>> >>>> >>>
>> >>>> >>> *PCS Status:*
>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>> >>>> Started
>> >>>> >>> overcloud-controller-no-ceph-3
>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>> >>>> Started
>> >>>> >>> overcloud-controller-2
>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>> >>>> Started
>> >>>> >>> overcloud-controller-1
>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>> >>>> Started
>> >>>> >>> overcloud-controller-0
>> >>>> >>>
>> >>>> >>> I have tried restarting the bundle multiple times but the issue
>> is
>> >>>> still
>> >>>> >>> present.
>> >>>> >>>
>> >>>> >>> *Cluster status:*
>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>> >>>> >>> Cluster status of node
>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> >>>> >>> Basics
>> >>>> >>>
>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>> >>>> >>>
>> >>>> >>> Disk Nodes
>> >>>> >>>
>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>> >>>
>> >>>> >>> Running Nodes
>> >>>> >>>
>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>> >>>
>> >>>> >>> Versions
>> >>>> >>>
>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>> >>>> 3.8.3
>> >>>> >>> on Erlang 22.3.4.1
>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>> >>>> 3.8.3
>> >>>> >>> on Erlang 22.3.4.1
>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>> >>>> 3.8.3
>> >>>> >>> on Erlang 22.3.4.1
>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>> >>>> RabbitMQ
>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>> >>>> >>>
>> >>>> >>> Alarms
>> >>>> >>>
>> >>>> >>> (none)
>> >>>> >>>
>> >>>> >>> Network Partitions
>> >>>> >>>
>> >>>> >>> (none)
>> >>>> >>>
>> >>>> >>> Listeners
>> >>>> >>>
>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
>> CLI
>> >>>> tool
>> >>>> >>> communication
>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>>> >>> and AMQP 1.0
>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
>> CLI
>> >>>> tool
>> >>>> >>> communication
>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>>> >>> and AMQP 1.0
>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and
>> CLI
>> >>>> tool
>> >>>> >>> communication
>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>> >>>> >>> and AMQP 1.0
>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>> interface:
>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>> ,
>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>> >>>> inter-node and
>> >>>> >>> CLI tool communication
>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>> ,
>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose:
>> AMQP
>> >>>> 0-9-1
>> >>>> >>> and AMQP 1.0
>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>> ,
>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>> >>>
>> >>>> >>> Feature flags
>> >>>> >>>
>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>> >>>> >>> Flag: quorum_queue, state: enabled
>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>> >>>> >>>
>> >>>> >>> *Logs:*
>> >>>> >>> *(Attached)*
>> >>>> >>>
>> >>>> >>> With regards,
>> >>>> >>> Swogat Pradhan
>> >>>> >>>
>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>> >>>> swogatpradhan22 at gmail.com>
>> >>>> >>> wrote:
>> >>>> >>>
>> >>>> >>>> Hi,
>> >>>> >>>> Please find the nova conductor as well as nova api log.
>> >>>> >>>>
>> >>>> >>>> nova-conuctor:
>> >>>> >>>>
>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>> reply to
>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>> reply to
>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>> reply to
>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>> >>>> due to a
>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>> >>>> Abandoning...:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>> reply to
>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>> >>>> due to a
>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>> Abandoning...:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>> reply to
>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>> >>>> due to a
>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>> Abandoning...:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>> enabled
>> >>>> with
>> >>>> >>>> backend dogpile.cache.null.
>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>> reply to
>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>> >>>> due to a
>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>> Abandoning...:
>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>> >>>>
>> >>>> >>>> With regards,
>> >>>> >>>> Swogat Pradhan
>> >>>> >>>>
>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>> >>>> >>>>
>> >>>> >>>>> Hi,
>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am
>> trying to
>> >>>> >>>>> launch vm's.
>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack
>> >>>> compute
>> >>>> >>>>> service list), the node comes backup when i restart the nova
>> >>>> compute
>> >>>> >>>>> service but then the launch of the vm fails.
>> >>>> >>>>>
>> >>>> >>>>> nova-compute.log
>> >>>> >>>>>
>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>> >>>> >>>>> instance usage
>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
>> 07:00:00
>> >>>> to
>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>> >>>> name:
>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>> enabled
>> >>>> with
>> >>>> >>>>> backend dogpile.cache.null.
>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>> >>>> >>>>> privsep helper:
>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>> >>>> 'privsep-helper',
>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>> >>>> privsep
>> >>>> >>>>> daemon via rootwrap
>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-]
>> privsep
>> >>>> >>>>> daemon starting
>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-]
>> privsep
>> >>>> >>>>> process running with uid/gid: 0/0
>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>> privsep
>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>> privsep
>> >>>> >>>>> daemon running as pid 2647
>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>> >>>> os_brick.initiator.connectors.nvmeof
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>> >>>> >>>>> execution error
>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>> >>>> >>>>> Exit code: 2
>> >>>> >>>>> Stdout: ''
>> >>>> >>>>> Stderr: '':
>> oslo_concurrency.processutils.ProcessExecutionError:
>> >>>> >>>>> Unexpected error while running command.
>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>> >>>> >>>>>
>> >>>> >>>>> Is there a way to solve this issue?
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>> With regards,
>> >>>> >>>>>
>> >>>> >>>>> Swogat Pradhan
>> >>>> >>>>>
>> >>>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/2ef0fac7/attachment-0001.htm>

From johfulto at redhat.com  Tue Mar 21 10:23:44 2023
From: johfulto at redhat.com (John Fulton)
Date: Tue, 21 Mar 2023 06:23:44 -0400
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
Message-ID: <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>

in my last message under the line "On a DCN site if you run a command like
this:" I suggested some steps you could try to confirm the image is a COW
from the local glance as well as how to look at your cinder config.

On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Update:
> I uploaded an image directly to the dcn02 store, and it takes around 10,15
> minutes to create a volume with image in dcn02.
> The image size is 389 MB.
>
> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Jhon,
>> I checked in the ceph od dcn02, I can see the images created after
>> importing from the central site.
>> But launching an instance normally fails as it takes a long time for the
>> volume to get created.
>>
>> When launching an instance from volume the instance is getting created
>> properly without any errors.
>>
>> I tried to cache images in nova using
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>> but getting checksum failed error.
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:
>>
>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>> <swogatpradhan22 at gmail.com> wrote:
>>> >
>>> > Update: After restarting the nova services on the controller and
>>> running the deploy script on the edge site, I was able to launch the VM
>>> from volume.
>>> >
>>> > Right now the instance creation is failing as the block device
>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>> volume to be created, whereas the image has already been imported to the
>>> edge glance.
>>>
>>> Try following this document and making the same observations in your
>>> environment for AZs and their local ceph cluster.
>>>
>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>
>>> On a DCN site if you run a command like this:
>>>
>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>> /etc/ceph/dcn0.client.admin.keyring
>>> $ rbd --cluster dcn0 -p volumes ls -l
>>> NAME                                      SIZE  PARENT
>>>                           FMT PROT LOCK
>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>> $
>>>
>>> Then, you should see the parent of the volume is the image which is on
>>> the same local ceph cluster.
>>>
>>> I wonder if something is misconfigured and thus you're encountering
>>> the streaming behavior described here:
>>>
>>> Ideally all images should reside in the central Glance and be copied
>>> to DCN sites before instances of those images are booted on DCN sites.
>>> If an image is not copied to a DCN site before it is booted, then the
>>> image will be streamed to the DCN site and then the image will boot as
>>> an instance. This happens because Glance at the DCN site has access to
>>> the images store at the Central ceph cluster. Though the booting of
>>> the image will take time because it has not been copied in advance,
>>> this is still preferable to failing to boot the image.
>>>
>>> You can also exec into the cinder container at the DCN site and
>>> confirm it's using it's local ceph cluster.
>>>
>>>   John
>>>
>>> >
>>> > I will try and create a new fresh image and test again then update.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>
>>> >> Update:
>>> >> In the hypervisor list the compute node state is showing down.
>>> >>
>>> >>
>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>
>>> >>> Hi Brendan,
>>> >>> Now i have deployed another site where i have used 2 linux bonds
>>> network template for both 3 compute nodes and 3 ceph nodes.
>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>> >>> I used a cirros image to launch instance but the instance timed out
>>> so i waited for the volume to be created.
>>> >>> Once the volume was created i tried launching the instance from the
>>> volume and still the instance is stuck in spawning state.
>>> >>>
>>> >>> Here is the nova-compute log:
>>> >>>
>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep
>>> daemon starting
>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
>>> process running with uid/gid: 0/0
>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>>> process running with capabilities (eff/prm/inh):
>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>>> daemon running as pid 185437
>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>> in _get_host_uuid: Unexpected error while running command.
>>> >>> Command: blkid overlay -s UUID -o value
>>> >>> Exit code: 2
>>> >>> Stdout: ''
>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>> Unexpected error while running command.
>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>> >>>
>>> >>> It is stuck in creating image, do i need to run the template
>>> mentioned here ?:
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> >>>
>>> >>> The volume is already created and i do not understand why the
>>> instance is stuck in spawning state.
>>> >>>
>>> >>> With regards,
>>> >>> Swogat Pradhan
>>> >>>
>>> >>>
>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com>
>>> wrote:
>>> >>>>
>>> >>>> Does your environment use different network interfaces for each of
>>> the networks? Or does it have a bond with everything on it?
>>> >>>>
>>> >>>> One issue I have seen before is that when launching instances,
>>> there is a lot of network traffic between nodes as the hypervisor needs to
>>> download the image from Glance. Along with various other services sending
>>> normal network traffic, it can be enough to cause issues if everything is
>>> running over a single 1Gbe interface.
>>> >>>>
>>> >>>> I have seen the same situation in fact when using a single
>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>> while you try to spawn the instance to see if you?re dropping packets. In
>>> the situation I described, there were dropped packets which resulted in a
>>> loss of communication between nova_compute and RMQ, so the node appeared
>>> offline. You should also confirm that nova_compute is being disconnected in
>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>> instance.
>>> >>>>
>>> >>>> In my case, changing from active/backup to LACP helped. So, based
>>> on that experience, from my perspective, is certainly sounds like some kind
>>> of network issue.
>>> >>>>
>>> >>>> Regards,
>>> >>>>
>>> >>>> Brendan Shephard
>>> >>>> Senior Software Engineer
>>> >>>> Red Hat Australia
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I tried to help someone with a similar issue some time ago in this
>>> thread:
>>> >>>>
>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>> >>>>
>>> >>>> But apparently a neutron reinstallation fixed it for that user, not
>>> sure if that could apply here. But is it possible that your nova and
>>> neutron versions are different between central and edge site? Have you
>>> restarted nova and neutron services on the compute nodes after
>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>> Maybe they can help narrow down the issue.
>>> >>>> If there isn't any additional information in the debug logs I
>>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>>> production system yet so be careful. I can think of two routes:
>>> >>>>
>>> >>>> - Either remove queues, exchanges etc. while rabbit is running,
>>> this will most likely impact client IO depending on your load. Check out
>>> the rabbitmqctl commands.
>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all
>>> nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>> >>>>
>>> >>>> I can imagine that the failed reply "survives" while being
>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>> internals too well, so maybe someone else can chime in here and give a
>>> better advice.
>>> >>>>
>>> >>>> Regards,
>>> >>>> Eugen
>>> >>>>
>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>
>>> >>>> Hi,
>>> >>>> Can someone please help me out on this issue?
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Hi
>>> >>>> I don't see any major packet loss.
>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to
>>> packet
>>> >>>> loss.
>>> >>>>
>>> >>>> with regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>> Yes the MTU is the same as the default '1500'.
>>> >>>> Generally I haven't seen any packet loss, but never checked when
>>> >>>> launching the instance.
>>> >>>> I will check that and come back.
>>> >>>> But everytime i launch an instance the instance gets stuck at
>>> spawning
>>> >>>> state and there the hypervisor becomes down, so not sure if packet
>>> loss
>>> >>>> causes this.
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat pradhan
>>> >>>>
>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>> >>>>
>>> >>>> One more thing coming to mind is MTU size. Are they identical
>>> between
>>> >>>> central and edge site? Do you see packet loss through the tunnel?
>>> >>>>
>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>
>>> >>>> > Hi Eugen,
>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am
>>> not
>>> >>>> > getting email's from you.
>>> >>>> > Coming to the issue:
>>> >>>> >
>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>> list_policies -p
>>> >>>> /
>>> >>>> > Listing policies for vhost "/" ...
>>> >>>> > vhost   name    pattern apply-to        definition      priority
>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >>>> >
>>> >>>>
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >>>> >
>>> >>>> > I have the edge site compute nodes up, it only goes down when i am
>>> >>>> trying
>>> >>>> > to launch an instance and the instance comes to a spawning state
>>> and
>>> >>>> then
>>> >>>> > gets stuck.
>>> >>>> >
>>> >>>> > I have a tunnel setup between the central and the edge sites.
>>> >>>> >
>>> >>>> > With regards,
>>> >>>> > Swogat Pradhan
>>> >>>> >
>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>> > wrote:
>>> >>>> >
>>> >>>> >> Hi Eugen,
>>> >>>> >> For some reason i am not getting your email to me directly, i am
>>> >>>> checking
>>> >>>> >> the email digest and there i am able to find your reply.
>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>>> >>
>>> >>>> >> *Note: i am able to create vm's and perform other activities in
>>> the
>>> >>>> >> central site, only facing this issue in the edge site.*
>>> >>>> >>
>>> >>>> >> With regards,
>>> >>>> >> Swogat Pradhan
>>> >>>> >>
>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>> >> wrote:
>>> >>>> >>
>>> >>>> >>> Hi Eugen,
>>> >>>> >>> Thanks for your response.
>>> >>>> >>> I have actually a 4 controller setup so here are the details:
>>> >>>> >>>
>>> >>>> >>> *PCS Status:*
>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest
>>> ]:
>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>> Started
>>> >>>> >>> overcloud-controller-no-ceph-3
>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>> Started
>>> >>>> >>> overcloud-controller-2
>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>> Started
>>> >>>> >>> overcloud-controller-1
>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>> Started
>>> >>>> >>> overcloud-controller-0
>>> >>>> >>>
>>> >>>> >>> I have tried restarting the bundle multiple times but the issue
>>> is
>>> >>>> still
>>> >>>> >>> present.
>>> >>>> >>>
>>> >>>> >>> *Cluster status:*
>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>> >>>> >>> Cluster status of node
>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>> >>>> >>> Basics
>>> >>>> >>>
>>> >>>> >>> Cluster name:
>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>> >>>
>>> >>>> >>> Disk Nodes
>>> >>>> >>>
>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>> >>>
>>> >>>> >>> Running Nodes
>>> >>>> >>>
>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>> >>>
>>> >>>> >>> Versions
>>> >>>> >>>
>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>> 3.8.3
>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>> 3.8.3
>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>> 3.8.3
>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> >>>> RabbitMQ
>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>> >>>
>>> >>>> >>> Alarms
>>> >>>> >>>
>>> >>>> >>> (none)
>>> >>>> >>>
>>> >>>> >>> Network Partitions
>>> >>>> >>>
>>> >>>> >>> (none)
>>> >>>> >>>
>>> >>>> >>> Listeners
>>> >>>> >>>
>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>> and CLI
>>> >>>> tool
>>> >>>> >>> communication
>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>>> >>> and AMQP 1.0
>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>> and CLI
>>> >>>> tool
>>> >>>> >>> communication
>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>>> >>> and AMQP 1.0
>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>> and CLI
>>> >>>> tool
>>> >>>> >>> communication
>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>>> >>> and AMQP 1.0
>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>> interface:
>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>> ,
>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>> >>>> inter-node and
>>> >>>> >>> CLI tool communication
>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>> ,
>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose:
>>> AMQP
>>> >>>> 0-9-1
>>> >>>> >>> and AMQP 1.0
>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>> ,
>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>> >>>
>>> >>>> >>> Feature flags
>>> >>>> >>>
>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>>> >>> Flag: quorum_queue, state: enabled
>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>> >>>
>>> >>>> >>> *Logs:*
>>> >>>> >>> *(Attached)*
>>> >>>> >>>
>>> >>>> >>> With regards,
>>> >>>> >>> Swogat Pradhan
>>> >>>> >>>
>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>> >>> wrote:
>>> >>>> >>>
>>> >>>> >>>> Hi,
>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>> >>>>
>>> >>>> >>>> nova-conuctor:
>>> >>>> >>>>
>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>> reply to
>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>>> reply to
>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>>> reply to
>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
>>> seconds
>>> >>>> due to a
>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> >>>> Abandoning...:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>> reply to
>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
>>> seconds
>>> >>>> due to a
>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>> Abandoning...:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>> reply to
>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
>>> seconds
>>> >>>> due to a
>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>> Abandoning...:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>>> enabled
>>> >>>> with
>>> >>>> >>>> backend dogpile.cache.null.
>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>> reply to
>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
>>> seconds
>>> >>>> due to a
>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>> Abandoning...:
>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> >>>>
>>> >>>> >>>> With regards,
>>> >>>> >>>> Swogat Pradhan
>>> >>>> >>>>
>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>> >>>>
>>> >>>> >>>>> Hi,
>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am
>>> trying to
>>> >>>> >>>>> launch vm's.
>>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>> >>>> compute
>>> >>>> >>>>> service list), the node comes backup when i restart the nova
>>> >>>> compute
>>> >>>> >>>>> service but then the launch of the vm fails.
>>> >>>> >>>>>
>>> >>>> >>>>> nova-compute.log
>>> >>>> >>>>>
>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>> >>>> >>>>> instance usage
>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
>>> 07:00:00
>>> >>>> to
>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>> >>>> name:
>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>>> enabled
>>> >>>> with
>>> >>>> >>>>> backend dogpile.cache.null.
>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>> >>>> >>>>> privsep helper:
>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> >>>> 'privsep-helper',
>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned
>>> new
>>> >>>> privsep
>>> >>>> >>>>> daemon via rootwrap
>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-]
>>> privsep
>>> >>>> >>>>> daemon starting
>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-]
>>> privsep
>>> >>>> >>>>> process running with uid/gid: 0/0
>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>>> privsep
>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>>> privsep
>>> >>>> >>>>> daemon running as pid 2647
>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> >>>> os_brick.initiator.connectors.nvmeof
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>> >>>> >>>>> execution error
>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>> >>>>> Exit code: 2
>>> >>>> >>>>> Stdout: ''
>>> >>>> >>>>> Stderr: '':
>>> oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>> >>>>> Unexpected error while running command.
>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>> >>>>>
>>> >>>> >>>>> Is there a way to solve this issue?
>>> >>>> >>>>>
>>> >>>> >>>>>
>>> >>>> >>>>> With regards,
>>> >>>> >>>>>
>>> >>>> >>>>> Swogat Pradhan
>>> >>>> >>>>>
>>> >>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/041bff7a/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Tue Mar 21 12:03:20 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 21 Mar 2023 17:33:20 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
Message-ID: <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>

Hi,
Seems like cinder is not using the local ceph.

Ceph Output:
[ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
NAME                                       SIZE     PARENT  FMT  PROT  LOCK
2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2        excl
55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes

[ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
NAME                                         SIZE     PARENT  FMT  PROT
 LOCK
volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
[ceph: root at dcn02-ceph-all-0 /]#

Attached the cinder config.
Please let me know how I can solve this issue.

With regards,
Swogat Pradhan

On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com> wrote:

> in my last message under the line "On a DCN site if you run a command like
> this:" I suggested some steps you could try to confirm the image is a COW
> from the local glance as well as how to look at your cinder config.
>
> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Update:
>> I uploaded an image directly to the dcn02 store, and it takes
>> around 10,15 minutes to create a volume with image in dcn02.
>> The image size is 389 MB.
>>
>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>>
>>> Hi Jhon,
>>> I checked in the ceph od dcn02, I can see the images created after
>>> importing from the central site.
>>> But launching an instance normally fails as it takes a long time for the
>>> volume to get created.
>>>
>>> When launching an instance from volume the instance is getting created
>>> properly without any errors.
>>>
>>> I tried to cache images in nova using
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> but getting checksum failed error.
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:
>>>
>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>> <swogatpradhan22 at gmail.com> wrote:
>>>> >
>>>> > Update: After restarting the nova services on the controller and
>>>> running the deploy script on the edge site, I was able to launch the VM
>>>> from volume.
>>>> >
>>>> > Right now the instance creation is failing as the block device
>>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>>> volume to be created, whereas the image has already been imported to the
>>>> edge glance.
>>>>
>>>> Try following this document and making the same observations in your
>>>> environment for AZs and their local ceph cluster.
>>>>
>>>>
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>
>>>> On a DCN site if you run a command like this:
>>>>
>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>> /etc/ceph/dcn0.client.admin.keyring
>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>> NAME                                      SIZE  PARENT
>>>>                           FMT PROT LOCK
>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>> $
>>>>
>>>> Then, you should see the parent of the volume is the image which is on
>>>> the same local ceph cluster.
>>>>
>>>> I wonder if something is misconfigured and thus you're encountering
>>>> the streaming behavior described here:
>>>>
>>>> Ideally all images should reside in the central Glance and be copied
>>>> to DCN sites before instances of those images are booted on DCN sites.
>>>> If an image is not copied to a DCN site before it is booted, then the
>>>> image will be streamed to the DCN site and then the image will boot as
>>>> an instance. This happens because Glance at the DCN site has access to
>>>> the images store at the Central ceph cluster. Though the booting of
>>>> the image will take time because it has not been copied in advance,
>>>> this is still preferable to failing to boot the image.
>>>>
>>>> You can also exec into the cinder container at the DCN site and
>>>> confirm it's using it's local ceph cluster.
>>>>
>>>>   John
>>>>
>>>> >
>>>> > I will try and create a new fresh image and test again then update.
>>>> >
>>>> > With regards,
>>>> > Swogat Pradhan
>>>> >
>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>
>>>> >> Update:
>>>> >> In the hypervisor list the compute node state is showing down.
>>>> >>
>>>> >>
>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>
>>>> >>> Hi Brendan,
>>>> >>> Now i have deployed another site where i have used 2 linux bonds
>>>> network template for both 3 compute nodes and 3 ceph nodes.
>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>> >>> I used a cirros image to launch instance but the instance timed out
>>>> so i waited for the volume to be created.
>>>> >>> Once the volume was created i tried launching the instance from the
>>>> volume and still the instance is stuck in spawning state.
>>>> >>>
>>>> >>> Here is the nova-compute log:
>>>> >>>
>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep
>>>> daemon starting
>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep
>>>> process running with uid/gid: 0/0
>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>>>> process running with capabilities (eff/prm/inh):
>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep
>>>> daemon running as pid 185437
>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>> os_brick.initiator.connectors.nvmeof
>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>> Command: blkid overlay -s UUID -o value
>>>> >>> Exit code: 2
>>>> >>> Stdout: ''
>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> Unexpected error while running command.
>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>> >>>
>>>> >>> It is stuck in creating image, do i need to run the template
>>>> mentioned here ?:
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>> >>>
>>>> >>> The volume is already created and i do not understand why the
>>>> instance is stuck in spawning state.
>>>> >>>
>>>> >>> With regards,
>>>> >>> Swogat Pradhan
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>> bshephar at redhat.com> wrote:
>>>> >>>>
>>>> >>>> Does your environment use different network interfaces for each of
>>>> the networks? Or does it have a bond with everything on it?
>>>> >>>>
>>>> >>>> One issue I have seen before is that when launching instances,
>>>> there is a lot of network traffic between nodes as the hypervisor needs to
>>>> download the image from Glance. Along with various other services sending
>>>> normal network traffic, it can be enough to cause issues if everything is
>>>> running over a single 1Gbe interface.
>>>> >>>>
>>>> >>>> I have seen the same situation in fact when using a single
>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>>> while you try to spawn the instance to see if you?re dropping packets. In
>>>> the situation I described, there were dropped packets which resulted in a
>>>> loss of communication between nova_compute and RMQ, so the node appeared
>>>> offline. You should also confirm that nova_compute is being disconnected in
>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>>> instance.
>>>> >>>>
>>>> >>>> In my case, changing from active/backup to LACP helped. So, based
>>>> on that experience, from my perspective, is certainly sounds like some kind
>>>> of network issue.
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>>
>>>> >>>> Brendan Shephard
>>>> >>>> Senior Software Engineer
>>>> >>>> Red Hat Australia
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I tried to help someone with a similar issue some time ago in this
>>>> thread:
>>>> >>>>
>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>> >>>>
>>>> >>>> But apparently a neutron reinstallation fixed it for that user,
>>>> not sure if that could apply here. But is it possible that your nova and
>>>> neutron versions are different between central and edge site? Have you
>>>> restarted nova and neutron services on the compute nodes after
>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>> Maybe they can help narrow down the issue.
>>>> >>>> If there isn't any additional information in the debug logs I
>>>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>>>> production system yet so be careful. I can think of two routes:
>>>> >>>>
>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running,
>>>> this will most likely impact client IO depending on your load. Check out
>>>> the rabbitmqctl commands.
>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all
>>>> nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>> >>>>
>>>> >>>> I can imagine that the failed reply "survives" while being
>>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>>> internals too well, so maybe someone else can chime in here and give a
>>>> better advice.
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Eugen
>>>> >>>>
>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>> Can someone please help me out on this issue?
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>> Hi
>>>> >>>> I don't see any major packet loss.
>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to
>>>> packet
>>>> >>>> loss.
>>>> >>>>
>>>> >>>> with regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>> >>>> Generally I haven't seen any packet loss, but never checked when
>>>> >>>> launching the instance.
>>>> >>>> I will check that and come back.
>>>> >>>> But everytime i launch an instance the instance gets stuck at
>>>> spawning
>>>> >>>> state and there the hypervisor becomes down, so not sure if packet
>>>> loss
>>>> >>>> causes this.
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat pradhan
>>>> >>>>
>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>> >>>>
>>>> >>>> One more thing coming to mind is MTU size. Are they identical
>>>> between
>>>> >>>> central and edge site? Do you see packet loss through the tunnel?
>>>> >>>>
>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>
>>>> >>>> > Hi Eugen,
>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i
>>>> am not
>>>> >>>> > getting email's from you.
>>>> >>>> > Coming to the issue:
>>>> >>>> >
>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>> list_policies -p
>>>> >>>> /
>>>> >>>> > Listing policies for vhost "/" ...
>>>> >>>> > vhost   name    pattern apply-to        definition      priority
>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>> >>>> >
>>>> >>>>
>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>> >>>> >
>>>> >>>> > I have the edge site compute nodes up, it only goes down when i
>>>> am
>>>> >>>> trying
>>>> >>>> > to launch an instance and the instance comes to a spawning state
>>>> and
>>>> >>>> then
>>>> >>>> > gets stuck.
>>>> >>>> >
>>>> >>>> > I have a tunnel setup between the central and the edge sites.
>>>> >>>> >
>>>> >>>> > With regards,
>>>> >>>> > Swogat Pradhan
>>>> >>>> >
>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>> > wrote:
>>>> >>>> >
>>>> >>>> >> Hi Eugen,
>>>> >>>> >> For some reason i am not getting your email to me directly, i am
>>>> >>>> checking
>>>> >>>> >> the email digest and there i am able to find your reply.
>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>>> >>>> >>
>>>> >>>> >> *Note: i am able to create vm's and perform other activities in
>>>> the
>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>> >>>> >>
>>>> >>>> >> With regards,
>>>> >>>> >> Swogat Pradhan
>>>> >>>> >>
>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>> >> wrote:
>>>> >>>> >>
>>>> >>>> >>> Hi Eugen,
>>>> >>>> >>> Thanks for your response.
>>>> >>>> >>> I have actually a 4 controller setup so here are the details:
>>>> >>>> >>>
>>>> >>>> >>> *PCS Status:*
>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest
>>>> ]:
>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>> Started
>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>> Started
>>>> >>>> >>> overcloud-controller-2
>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>> Started
>>>> >>>> >>> overcloud-controller-1
>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>> Started
>>>> >>>> >>> overcloud-controller-0
>>>> >>>> >>>
>>>> >>>> >>> I have tried restarting the bundle multiple times but the
>>>> issue is
>>>> >>>> still
>>>> >>>> >>> present.
>>>> >>>> >>>
>>>> >>>> >>> *Cluster status:*
>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>> >>>> >>> Cluster status of node
>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>> >>>> >>> Basics
>>>> >>>> >>>
>>>> >>>> >>> Cluster name:
>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>> >>>> >>>
>>>> >>>> >>> Disk Nodes
>>>> >>>> >>>
>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>> >>>
>>>> >>>> >>> Running Nodes
>>>> >>>> >>>
>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>> >>>
>>>> >>>> >>> Versions
>>>> >>>> >>>
>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>> 3.8.3
>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>> 3.8.3
>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>> 3.8.3
>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> :
>>>> >>>> RabbitMQ
>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>> >>>> >>>
>>>> >>>> >>> Alarms
>>>> >>>> >>>
>>>> >>>> >>> (none)
>>>> >>>> >>>
>>>> >>>> >>> Network Partitions
>>>> >>>> >>>
>>>> >>>> >>> (none)
>>>> >>>> >>>
>>>> >>>> >>> Listeners
>>>> >>>> >>>
>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>>> and CLI
>>>> >>>> tool
>>>> >>>> >>> communication
>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>> >>> and AMQP 1.0
>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>>> and CLI
>>>> >>>> tool
>>>> >>>> >>> communication
>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>> >>> and AMQP 1.0
>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node
>>>> and CLI
>>>> >>>> tool
>>>> >>>> >>> communication
>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>> >>> and AMQP 1.0
>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>> interface:
>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>> ,
>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>> >>>> inter-node and
>>>> >>>> >>> CLI tool communication
>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>> ,
>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>>> purpose: AMQP
>>>> >>>> 0-9-1
>>>> >>>> >>> and AMQP 1.0
>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>> ,
>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>> >>>
>>>> >>>> >>> Feature flags
>>>> >>>> >>>
>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>> >>>> >>>
>>>> >>>> >>> *Logs:*
>>>> >>>> >>> *(Attached)*
>>>> >>>> >>>
>>>> >>>> >>> With regards,
>>>> >>>> >>> Swogat Pradhan
>>>> >>>> >>>
>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>> >>> wrote:
>>>> >>>> >>>
>>>> >>>> >>>> Hi,
>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>>> >>>> >>>>
>>>> >>>> >>>> nova-conuctor:
>>>> >>>> >>>>
>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
>>>> seconds
>>>> >>>> due to a
>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>> >>>> Abandoning...:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
>>>> seconds
>>>> >>>> due to a
>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>> Abandoning...:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
>>>> seconds
>>>> >>>> due to a
>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>> Abandoning...:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>>>> enabled
>>>> >>>> with
>>>> >>>> >>>> backend dogpile.cache.null.
>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
>>>> reply to
>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
>>>> seconds
>>>> >>>> due to a
>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>> Abandoning...:
>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>> >>>>
>>>> >>>> >>>> With regards,
>>>> >>>> >>>> Swogat Pradhan
>>>> >>>> >>>>
>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>> >>>>
>>>> >>>> >>>>> Hi,
>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am
>>>> trying to
>>>> >>>> >>>>> launch vm's.
>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>>> (openstack
>>>> >>>> compute
>>>> >>>> >>>>> service list), the node comes backup when i restart the nova
>>>> >>>> compute
>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>> >>>> >>>>>
>>>> >>>> >>>>> nova-compute.log
>>>> >>>> >>>>>
>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>> >>>> >>>>> instance usage
>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
>>>> 07:00:00
>>>> >>>> to
>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on
>>>> node
>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied
>>>> device
>>>> >>>> name:
>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>>>> enabled
>>>> >>>> with
>>>> >>>> >>>>> backend dogpile.cache.null.
>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>> >>>> >>>>> privsep helper:
>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>> >>>> 'privsep-helper',
>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned
>>>> new
>>>> >>>> privsep
>>>> >>>> >>>>> daemon via rootwrap
>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-]
>>>> privsep
>>>> >>>> >>>>> daemon starting
>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-]
>>>> privsep
>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>>>> privsep
>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
>>>> privsep
>>>> >>>> >>>>> daemon running as pid 2647
>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>> >>>> >>>>> execution error
>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>> >>>> >>>>> Exit code: 2
>>>> >>>> >>>>> Stdout: ''
>>>> >>>> >>>>> Stderr: '':
>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>> >>>> >>>>> Unexpected error while running command.
>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>> >>>> >>>>>
>>>> >>>> >>>>> Is there a way to solve this issue?
>>>> >>>> >>>>>
>>>> >>>> >>>>>
>>>> >>>> >>>>> With regards,
>>>> >>>> >>>>>
>>>> >>>> >>>>> Swogat Pradhan
>>>> >>>> >>>>>
>>>> >>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/c623f061/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinder.conf
Type: application/octet-stream
Size: 2768 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/c623f061/attachment-0001.obj>

From johfulto at redhat.com  Tue Mar 21 12:22:22 2023
From: johfulto at redhat.com (John Fulton)
Date: Tue, 21 Mar 2023 08:22:22 -0400
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
Message-ID: <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>

On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
<swogatpradhan22 at gmail.com> wrote:
>
> Hi,
> Seems like cinder is not using the local ceph.

That explains the issue. It's a misconfiguration.

I hope this is not a production system since the mailing list now has
the cinder.conf which contains passwords.

The section that looks like this:

[tripleo_ceph]
volume_backend_name=tripleo_ceph
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=openstack
rbd_pool=volumes
rbd_flatten_volume_from_snapshot=False
rbd_secret_uuid=<redacted>
report_discard_supported=True

Should be updated to refer to the local DCN ceph cluster and not the
central one. Use the ceph conf file for that cluster and ensure the
rbd_secret_uuid corresponds to that one.

TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
Ceph cluster. The FSID should be in the ceph.conf file. The
tripleo_nova_libvirt role will use virsh secret-* commands so that
libvirt can retrieve the cephx secret using the FSID as a key. This
can be confirmed with `podman exec nova_virtsecretd virsh
secret-get-value $FSID`.

The documentation describes how to configure the central and DCN sites
correctly but an error seems to have occurred while you were following
it.

  https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html

  John

>
> Ceph Output:
> [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
> NAME                                       SIZE     PARENT  FMT  PROT  LOCK
> 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2        excl
> 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
> 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
> 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
> 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
> 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
> 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
> 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
> 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
> b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
> b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
> e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
> e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>
> [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
> NAME                                         SIZE     PARENT  FMT  PROT  LOCK
> volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
> volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
> [ceph: root at dcn02-ceph-all-0 /]#
>
> Attached the cinder config.
> Please let me know how I can solve this issue.
>
> With regards,
> Swogat Pradhan
>
> On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com> wrote:
>>
>> in my last message under the line "On a DCN site if you run a command like this:" I suggested some steps you could try to confirm the image is a COW from the local glance as well as how to look at your cinder config.
>>
>> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>
>>> Update:
>>> I uploaded an image directly to the dcn02 store, and it takes around 10,15 minutes to create a volume with image in dcn02.
>>> The image size is 389 MB.
>>>
>>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>>
>>>> Hi Jhon,
>>>> I checked in the ceph od dcn02, I can see the images created after importing from the central site.
>>>> But launching an instance normally fails as it takes a long time for the volume to get created.
>>>>
>>>> When launching an instance from volume the instance is getting created properly without any errors.
>>>>
>>>> I tried to cache images in nova using https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html but getting checksum failed error.
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:
>>>>>
>>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>> >
>>>>> > Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume.
>>>>> >
>>>>> > Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance.
>>>>>
>>>>> Try following this document and making the same observations in your
>>>>> environment for AZs and their local ceph cluster.
>>>>>
>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>
>>>>> On a DCN site if you run a command like this:
>>>>>
>>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>> NAME                                      SIZE  PARENT
>>>>>                           FMT PROT LOCK
>>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>>> $
>>>>>
>>>>> Then, you should see the parent of the volume is the image which is on
>>>>> the same local ceph cluster.
>>>>>
>>>>> I wonder if something is misconfigured and thus you're encountering
>>>>> the streaming behavior described here:
>>>>>
>>>>> Ideally all images should reside in the central Glance and be copied
>>>>> to DCN sites before instances of those images are booted on DCN sites.
>>>>> If an image is not copied to a DCN site before it is booted, then the
>>>>> image will be streamed to the DCN site and then the image will boot as
>>>>> an instance. This happens because Glance at the DCN site has access to
>>>>> the images store at the Central ceph cluster. Though the booting of
>>>>> the image will take time because it has not been copied in advance,
>>>>> this is still preferable to failing to boot the image.
>>>>>
>>>>> You can also exec into the cinder container at the DCN site and
>>>>> confirm it's using it's local ceph cluster.
>>>>>
>>>>>   John
>>>>>
>>>>> >
>>>>> > I will try and create a new fresh image and test again then update.
>>>>> >
>>>>> > With regards,
>>>>> > Swogat Pradhan
>>>>> >
>>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>>> >>
>>>>> >> Update:
>>>>> >> In the hypervisor list the compute node state is showing down.
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>>> >>>
>>>>> >>> Hi Brendan,
>>>>> >>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>> >>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created.
>>>>> >>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state.
>>>>> >>>
>>>>> >>> Here is the nova-compute log:
>>>>> >>>
>>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting
>>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>> >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command.
>>>>> >>> Command: blkid overlay -s UUID -o value
>>>>> >>> Exit code: 2
>>>>> >>> Stdout: ''
>>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
>>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>> >>>
>>>>> >>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>> >>>
>>>>> >>> The volume is already created and i do not understand why the instance is stuck in spawning state.
>>>>> >>>
>>>>> >>> With regards,
>>>>> >>> Swogat Pradhan
>>>>> >>>
>>>>> >>>
>>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com> wrote:
>>>>> >>>>
>>>>> >>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it?
>>>>> >>>>
>>>>> >>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface.
>>>>> >>>>
>>>>> >>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance.
>>>>> >>>>
>>>>> >>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue.
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>>
>>>>> >>>> Brendan Shephard
>>>>> >>>> Senior Software Engineer
>>>>> >>>> Red Hat Australia
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>>>> >>>>
>>>>> >>>> Hi,
>>>>> >>>>
>>>>> >>>> I tried to help someone with a similar issue some time ago in this thread:
>>>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>> >>>>
>>>>> >>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue.
>>>>> >>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes:
>>>>> >>>>
>>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands.
>>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>> >>>>
>>>>> >>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice.
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>> Eugen
>>>>> >>>>
>>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>> >>>>
>>>>> >>>> Hi,
>>>>> >>>> Can someone please help me out on this issue?
>>>>> >>>>
>>>>> >>>> With regards,
>>>>> >>>> Swogat Pradhan
>>>>> >>>>
>>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>> Hi
>>>>> >>>> I don't see any major packet loss.
>>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>>>>> >>>> loss.
>>>>> >>>>
>>>>> >>>> with regards,
>>>>> >>>> Swogat Pradhan
>>>>> >>>>
>>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>> Hi,
>>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>> >>>> Generally I haven't seen any packet loss, but never checked when
>>>>> >>>> launching the instance.
>>>>> >>>> I will check that and come back.
>>>>> >>>> But everytime i launch an instance the instance gets stuck at spawning
>>>>> >>>> state and there the hypervisor becomes down, so not sure if packet loss
>>>>> >>>> causes this.
>>>>> >>>>
>>>>> >>>> With regards,
>>>>> >>>> Swogat pradhan
>>>>> >>>>
>>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>>> >>>>
>>>>> >>>> One more thing coming to mind is MTU size. Are they identical between
>>>>> >>>> central and edge site? Do you see packet loss through the tunnel?
>>>>> >>>>
>>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>> >>>>
>>>>> >>>> > Hi Eugen,
>>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>>>> >>>> > getting email's from you.
>>>>> >>>> > Coming to the issue:
>>>>> >>>> >
>>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>>>> >>>> /
>>>>> >>>> > Listing policies for vhost "/" ...
>>>>> >>>> > vhost   name    pattern apply-to        definition      priority
>>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>> >>>> >
>>>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>> >>>> >
>>>>> >>>> > I have the edge site compute nodes up, it only goes down when i am
>>>>> >>>> trying
>>>>> >>>> > to launch an instance and the instance comes to a spawning state and
>>>>> >>>> then
>>>>> >>>> > gets stuck.
>>>>> >>>> >
>>>>> >>>> > I have a tunnel setup between the central and the edge sites.
>>>>> >>>> >
>>>>> >>>> > With regards,
>>>>> >>>> > Swogat Pradhan
>>>>> >>>> >
>>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>> > wrote:
>>>>> >>>> >
>>>>> >>>> >> Hi Eugen,
>>>>> >>>> >> For some reason i am not getting your email to me directly, i am
>>>>> >>>> checking
>>>>> >>>> >> the email digest and there i am able to find your reply.
>>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>>>> >>>> >>
>>>>> >>>> >> *Note: i am able to create vm's and perform other activities in the
>>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>>> >>>> >>
>>>>> >>>> >> With regards,
>>>>> >>>> >> Swogat Pradhan
>>>>> >>>> >>
>>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>> >> wrote:
>>>>> >>>> >>
>>>>> >>>> >>> Hi Eugen,
>>>>> >>>> >>> Thanks for your response.
>>>>> >>>> >>> I have actually a 4 controller setup so here are the details:
>>>>> >>>> >>>
>>>>> >>>> >>> *PCS Status:*
>>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>> Started
>>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>> Started
>>>>> >>>> >>> overcloud-controller-2
>>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>> Started
>>>>> >>>> >>> overcloud-controller-1
>>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>> Started
>>>>> >>>> >>> overcloud-controller-0
>>>>> >>>> >>>
>>>>> >>>> >>> I have tried restarting the bundle multiple times but the issue is
>>>>> >>>> still
>>>>> >>>> >>> present.
>>>>> >>>> >>>
>>>>> >>>> >>> *Cluster status:*
>>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>>> >>>> >>> Cluster status of node
>>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>> >>>> >>> Basics
>>>>> >>>> >>>
>>>>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>> >>>> >>>
>>>>> >>>> >>> Disk Nodes
>>>>> >>>> >>>
>>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>> >>>
>>>>> >>>> >>> Running Nodes
>>>>> >>>> >>>
>>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>> >>>
>>>>> >>>> >>> Versions
>>>>> >>>> >>>
>>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>> >>>> 3.8.3
>>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>> >>>> 3.8.3
>>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>> >>>> 3.8.3
>>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>> >>>> RabbitMQ
>>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>> >>>> >>>
>>>>> >>>> >>> Alarms
>>>>> >>>> >>>
>>>>> >>>> >>> (none)
>>>>> >>>> >>>
>>>>> >>>> >>> Network Partitions
>>>>> >>>> >>>
>>>>> >>>> >>> (none)
>>>>> >>>> >>>
>>>>> >>>> >>> Listeners
>>>>> >>>> >>>
>>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> >>>> tool
>>>>> >>>> >>> communication
>>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>>> >>> and AMQP 1.0
>>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> >>>> tool
>>>>> >>>> >>> communication
>>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>>> >>> and AMQP 1.0
>>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>>> >>>> tool
>>>>> >>>> >>> communication
>>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>>> >>>> >>> and AMQP 1.0
>>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>> interface:
>>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>> ,
>>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>>> >>>> inter-node and
>>>>> >>>> >>> CLI tool communication
>>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>> ,
>>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>>>> >>>> 0-9-1
>>>>> >>>> >>> and AMQP 1.0
>>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>> ,
>>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>> >>>
>>>>> >>>> >>> Feature flags
>>>>> >>>> >>>
>>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>> >>>> >>>
>>>>> >>>> >>> *Logs:*
>>>>> >>>> >>> *(Attached)*
>>>>> >>>> >>>
>>>>> >>>> >>> With regards,
>>>>> >>>> >>> Swogat Pradhan
>>>>> >>>> >>>
>>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>> >>> wrote:
>>>>> >>>> >>>
>>>>> >>>> >>>> Hi,
>>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>>>> >>>> >>>>
>>>>> >>>> >>>> nova-conuctor:
>>>>> >>>> >>>>
>>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>>>> >>>> due to a
>>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>>> >>>> Abandoning...:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>>>> >>>> due to a
>>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>> Abandoning...:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>>>> >>>> due to a
>>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>> Abandoning...:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>>> >>>> with
>>>>> >>>> >>>> backend dogpile.cache.null.
>>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>>>> >>>> due to a
>>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>> Abandoning...:
>>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>> >>>>
>>>>> >>>> >>>> With regards,
>>>>> >>>> >>>> Swogat Pradhan
>>>>> >>>> >>>>
>>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>> >>>>
>>>>> >>>> >>>>> Hi,
>>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>>> >>>> >>>>> launch vm's.
>>>>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>>>> >>>> compute
>>>>> >>>> >>>>> service list), the node comes backup when i restart the nova
>>>>> >>>> compute
>>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>> >>>> >>>>>
>>>>> >>>> >>>>> nova-compute.log
>>>>> >>>> >>>>>
>>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>>> >>>> >>>>> instance usage
>>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>>>> >>>> to
>>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>>>> >>>> name:
>>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>>> >>>> with
>>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>>> >>>> >>>>> privsep helper:
>>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>>> >>>> 'privsep-helper',
>>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>>>> >>>> privsep
>>>>> >>>> >>>>> daemon via rootwrap
>>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>> >>>>> daemon starting
>>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>>> >>>> >>>>> daemon running as pid 2647
>>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>>> >>>> >>>>> execution error
>>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>> >>>> >>>>> Exit code: 2
>>>>> >>>> >>>>> Stdout: ''
>>>>> >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>>> >>>> >>>>> Unexpected error while running command.
>>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>> >>>> >>>>>
>>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>> >>>> >>>>>
>>>>> >>>> >>>>>
>>>>> >>>> >>>>> With regards,
>>>>> >>>> >>>>>
>>>>> >>>> >>>>> Swogat Pradhan
>>>>> >>>> >>>>>
>>>>> >>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>>


From rafaelweingartner at gmail.com  Tue Mar 21 12:39:23 2023
From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=)
Date: Tue, 21 Mar 2023 09:39:23 -0300
Subject: [CloudKitty] Virtual PTG March 2023
Message-ID: <CAG97rafcWC7ayEthKpiSYTHTo-QnK6zGr51q5dcK+wxvvEJNPg@mail.gmail.com>

Hello everyone,

As you probably heard our next PTG will be held virtually in March, 2023.

We've marked March 31, at 13:00-15:00 UTC [1]. However, if you guys would
like some other dates and/or time, just let me know. The room we selected
is called "Bexar".

I've also created an etherpad [2] to collect ideas/topics for the PTG
sessions. If you have anything to discuss, please don't hesitate to write
it there.

[1] https://ptg.opendev.org/ptg.html
[2] https://etherpad.opendev.org/p/march2023-ptg-cloudkitty


-- 
Rafael Weing?rtner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/ef705200/attachment.htm>

From pshchelokovskyy at mirantis.com  Tue Mar 21 12:58:01 2023
From: pshchelokovskyy at mirantis.com (Pavlo Shchelokovskyy)
Date: Tue, 21 Mar 2023 14:58:01 +0200
Subject: [barbican] database is growing and can not be purged
In-Reply-To: <CACfB1ut8giZzcEKgGCy0CyEop8bbqeKnbSPH8owqqR6eLmQGPA@mail.gmail.com>
References: <CACfB1ut8giZzcEKgGCy0CyEop8bbqeKnbSPH8owqqR6eLmQGPA@mail.gmail.com>
Message-ID: <CACfB1uuYTL9gbwd8aL2NUgkihfPuVxThQuPifpc5Azm5XXKOgg@mail.gmail.com>

Hi all,

after having some thoughts, I came to another solution, that I think is the
most appropriate here, kind of a variation of option 1:

4. Castellan should cleanup intermediate resources before returning secret
ID(s) to the caller

As I see it now, the root of the problem is in castellan's
BarbicanKeyManager and the way it hides implementation details from the
user.
Since it returns only IDs of created secrets to the user, the api caller
has no notion that something else has to be deleted once it is time for
this.
Since Barbican API is perfectly capable to delete orders and containers
without deleting the secrets they reference, this is what castellan should
do just before it returns IDs of generated secrets to the API caller.
The only small trouble is that with default 'legacy' API policies in
Barbican, not everybody who can create orders can delete them.. but this
can be accounted for with try..except.

Please review the patch in this regard
https://review.opendev.org/c/openstack/castellan/+/877423

Best regards,

On Mon, Mar 6, 2023 at 7:32?PM Pavlo Shchelokovskyy <
pshchelokovskyy at mirantis.com> wrote:

> Hi all,
>
> we are observing the following behavior in Barbican:
> - OpenStack environment is using both encrypted Cinder volumes and
> encrypted local storage (lvm) for Nova instances
> - over the time, the secrets and orders tables are growing
> - many soft-deleted entries in secrets DB can not be purged by the db
> cleanup script
>
> As I understand what is happening - both Cinder and Nova create secrets in
> Barbican on behalf of the user when creating an encrypted volume or booting
> an instance with encrypted local storage. They both do it via castellan
> library, that under the hood creates orders in Barbican, waits for them to
> become active and returns to the caller only the ID of the generated
> secret. When time comes to delete the thing (volume or instance)
> Cinder/Nova again use castellan, but only delete the secret, not the order
> (they are not aware that there was any 'order' created anyway). As a
> result, the orders are left in place, and DB cleanup procedure does not
> delete soft-deleted secrets when there's an ACTIVE order referencing such
> secret.
>
> This is troublesomes on many levels - users who use Cinder or Nova may not
> even be aware that they are creating something in Barbican. Orders
> accumulating like that may eventually result in cryptic errors when e.g.
> when you run out of quota for orders. And what's more, default Barbican
> policies do allow 'normal' (creator) users to create an order, but not
> delete it (only project admin can do it), so even if the users are aware of
> Barbican involvement, they can not delete those orders manually anyway.
> Plus there's no good way in API to determine outright which orders are
> referencing deleted secrets.
>
> I see several ways of dealing with that and would like to ask for your
> opinion on what would be the best one:
> 1. Amend Barbican API to allow filtering orders by the secrets, when
> castellan deletes a secret - search for corresponding order and delete it
> as well, change default policy to actually allow order deletion by the same
> users who can create them.
> 2. Cascade-delete orders when deleting secrets - this is easy but probably
> violates that very policy that disallowed normal users to delete orders.
> 3. improve the database cleanup so it first marks any order that
> references a deleted secret also as deleted, so later when time comes both
> could be purged (or something like that). This also has a similar downside
> to the previous option by not being explicit enough.
>
> I've filed a bug for that
> https://storyboard.openstack.org/#!/story/2010625 and proposed a patch
> for option 2 (cascade delete), but would like to ask what would you see as
> the most appropriate way  or may be there's something else that I've missed.
>
> Btw, the problem is probably even more pronounced with keypairs - when
> castellan is used to create those, under the hood both order and container
> are created besides the actual secrets, and again only the secret ids are
> returned to the caller. When time comes to delete things, the caller only
> knows about secret IDs, and can only delete them, leaving both container
> and order behind.
> Luckily, I did not find any place across OpenStack that actually creates
> keypairs using castellan... but the problem is definitely there.
>
> Best regards,
> --
> Dr. Pavlo Shchelokovskyy
> Principal Software Engineer
> Mirantis Inc
> www.mirantis.com
>


-- 
Dr. Pavlo Shchelokovskyy
Principal Software Engineer
Mirantis Inc
www.mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/8a71d000/attachment.htm>

From swogatpradhan22 at gmail.com  Tue Mar 21 12:40:19 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 21 Mar 2023 18:10:19 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
Message-ID: <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>

Hi Jhon,
This seems to be an issue.
When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
parameter was specified to the respective cluster names but the config
files were created in the name of ceph.conf and keyring was
ceph.client.openstack.keyring.

Which created issues in glance as well as the naming convention of the
files didn't match the cluster names, so i had to manually rename the
central ceph conf file as such:

[root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
[root at dcn02-compute-0 ceph]# ll
total 16
-rw-------. 1 root root 257 Mar 13 13:56
ceph_central.client.openstack.keyring
-rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
-rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
-rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
[root at dcn02-compute-0 ceph]#

ceph.conf and ceph.client.openstack.keyring contain the fsid of the
respective clusters in both dcn01 and dcn02.
In the above cli output, the ceph.conf and ceph.client... are the files
used to access dcn02 ceph cluster and ceph_central* files are used in for
accessing central ceph cluster.

glance multistore config:
[dcn02]
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=dcn02 rbd glance store

[ceph_central]
rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=Default glance store backend.


With regards,
Swogat Pradhan

On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:

> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
> <swogatpradhan22 at gmail.com> wrote:
> >
> > Hi,
> > Seems like cinder is not using the local ceph.
>
> That explains the issue. It's a misconfiguration.
>
> I hope this is not a production system since the mailing list now has
> the cinder.conf which contains passwords.
>
> The section that looks like this:
>
> [tripleo_ceph]
> volume_backend_name=tripleo_ceph
> volume_driver=cinder.volume.drivers.rbd.RBDDriver
> rbd_ceph_conf=/etc/ceph/ceph.conf
> rbd_user=openstack
> rbd_pool=volumes
> rbd_flatten_volume_from_snapshot=False
> rbd_secret_uuid=<redacted>
> report_discard_supported=True
>
> Should be updated to refer to the local DCN ceph cluster and not the
> central one. Use the ceph conf file for that cluster and ensure the
> rbd_secret_uuid corresponds to that one.
>
> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
> Ceph cluster. The FSID should be in the ceph.conf file. The
> tripleo_nova_libvirt role will use virsh secret-* commands so that
> libvirt can retrieve the cephx secret using the FSID as a key. This
> can be confirmed with `podman exec nova_virtsecretd virsh
> secret-get-value $FSID`.
>
> The documentation describes how to configure the central and DCN sites
> correctly but an error seems to have occurred while you were following
> it.
>
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>
>   John
>
> >
> > Ceph Output:
> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
> > NAME                                       SIZE     PARENT  FMT  PROT
> LOCK
> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
> excl
> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
> >
> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
> > NAME                                         SIZE     PARENT  FMT  PROT
> LOCK
> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
> > [ceph: root at dcn02-ceph-all-0 /]#
> >
> > Attached the cinder config.
> > Please let me know how I can solve this issue.
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com> wrote:
> >>
> >> in my last message under the line "On a DCN site if you run a command
> like this:" I suggested some steps you could try to confirm the image is a
> COW from the local glance as well as how to look at your cinder config.
> >>
> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>
> >>> Update:
> >>> I uploaded an image directly to the dcn02 store, and it takes around
> 10,15 minutes to create a volume with image in dcn02.
> >>> The image size is 389 MB.
> >>>
> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>>
> >>>> Hi Jhon,
> >>>> I checked in the ceph od dcn02, I can see the images created after
> importing from the central site.
> >>>> But launching an instance normally fails as it takes a long time for
> the volume to get created.
> >>>>
> >>>> When launching an instance from volume the instance is getting
> created properly without any errors.
> >>>>
> >>>> I tried to cache images in nova using
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> but getting checksum failed error.
> >>>>
> >>>> With regards,
> >>>> Swogat Pradhan
> >>>>
> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>>>
> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
> >>>>> <swogatpradhan22 at gmail.com> wrote:
> >>>>> >
> >>>>> > Update: After restarting the nova services on the controller and
> running the deploy script on the edge site, I was able to launch the VM
> from volume.
> >>>>> >
> >>>>> > Right now the instance creation is failing as the block device
> creation is stuck in creating state, it is taking more than 10 mins for the
> volume to be created, whereas the image has already been imported to the
> edge glance.
> >>>>>
> >>>>> Try following this document and making the same observations in your
> >>>>> environment for AZs and their local ceph cluster.
> >>>>>
> >>>>>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
> >>>>>
> >>>>> On a DCN site if you run a command like this:
> >>>>>
> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
> >>>>> /etc/ceph/dcn0.client.admin.keyring
> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
> >>>>> NAME                                      SIZE  PARENT
> >>>>>                           FMT PROT LOCK
> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
> >>>>> $
> >>>>>
> >>>>> Then, you should see the parent of the volume is the image which is
> on
> >>>>> the same local ceph cluster.
> >>>>>
> >>>>> I wonder if something is misconfigured and thus you're encountering
> >>>>> the streaming behavior described here:
> >>>>>
> >>>>> Ideally all images should reside in the central Glance and be copied
> >>>>> to DCN sites before instances of those images are booted on DCN
> sites.
> >>>>> If an image is not copied to a DCN site before it is booted, then the
> >>>>> image will be streamed to the DCN site and then the image will boot
> as
> >>>>> an instance. This happens because Glance at the DCN site has access
> to
> >>>>> the images store at the Central ceph cluster. Though the booting of
> >>>>> the image will take time because it has not been copied in advance,
> >>>>> this is still preferable to failing to boot the image.
> >>>>>
> >>>>> You can also exec into the cinder container at the DCN site and
> >>>>> confirm it's using it's local ceph cluster.
> >>>>>
> >>>>>   John
> >>>>>
> >>>>> >
> >>>>> > I will try and create a new fresh image and test again then update.
> >>>>> >
> >>>>> > With regards,
> >>>>> > Swogat Pradhan
> >>>>> >
> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>>> >>
> >>>>> >> Update:
> >>>>> >> In the hypervisor list the compute node state is showing down.
> >>>>> >>
> >>>>> >>
> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>>> >>>
> >>>>> >>> Hi Brendan,
> >>>>> >>> Now i have deployed another site where i have used 2 linux bonds
> network template for both 3 compute nodes and 3 ceph nodes.
> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
> >>>>> >>> I used a cirros image to launch instance but the instance timed
> out so i waited for the volume to be created.
> >>>>> >>> Once the volume was created i tried launching the instance from
> the volume and still the instance is stuck in spawning state.
> >>>>> >>>
> >>>>> >>> Here is the nova-compute log:
> >>>>> >>>
> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
> privsep daemon starting
> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
> privsep process running with uid/gid: 0/0
> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep process running with capabilities (eff/prm/inh):
> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep daemon running as pid 185437
> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
> os_brick.initiator.connectors.nvmeof
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
> in _get_host_uuid: Unexpected error while running command.
> >>>>> >>> Command: blkid overlay -s UUID -o value
> >>>>> >>> Exit code: 2
> >>>>> >>> Stdout: ''
> >>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
> Unexpected error while running command.
> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
> >>>>> >>>
> >>>>> >>> It is stuck in creating image, do i need to run the template
> mentioned here ?:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> >>>>> >>>
> >>>>> >>> The volume is already created and i do not understand why the
> instance is stuck in spawning state.
> >>>>> >>>
> >>>>> >>> With regards,
> >>>>> >>> Swogat Pradhan
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
> bshephar at redhat.com> wrote:
> >>>>> >>>>
> >>>>> >>>> Does your environment use different network interfaces for each
> of the networks? Or does it have a bond with everything on it?
> >>>>> >>>>
> >>>>> >>>> One issue I have seen before is that when launching instances,
> there is a lot of network traffic between nodes as the hypervisor needs to
> download the image from Glance. Along with various other services sending
> normal network traffic, it can be enough to cause issues if everything is
> running over a single 1Gbe interface.
> >>>>> >>>>
> >>>>> >>>> I have seen the same situation in fact when using a single
> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
> while you try to spawn the instance to see if you?re dropping packets. In
> the situation I described, there were dropped packets which resulted in a
> loss of communication between nova_compute and RMQ, so the node appeared
> offline. You should also confirm that nova_compute is being disconnected in
> the nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
> >>>>> >>>>
> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
> based on that experience, from my perspective, is certainly sounds like
> some kind of network issue.
> >>>>> >>>>
> >>>>> >>>> Regards,
> >>>>> >>>>
> >>>>> >>>> Brendan Shephard
> >>>>> >>>> Senior Software Engineer
> >>>>> >>>> Red Hat Australia
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
> >>>>> >>>>
> >>>>> >>>> Hi,
> >>>>> >>>>
> >>>>> >>>> I tried to help someone with a similar issue some time ago in
> this thread:
> >>>>> >>>>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
> >>>>> >>>>
> >>>>> >>>> But apparently a neutron reinstallation fixed it for that user,
> not sure if that could apply here. But is it possible that your nova and
> neutron versions are different between central and edge site? Have you
> restarted nova and neutron services on the compute nodes after
> installation? Have you debug logs of nova-conductor and maybe nova-compute?
> Maybe they can help narrow down the issue.
> >>>>> >>>> If there isn't any additional information in the debug logs I
> probably would start "tearing down" rabbitmq. I didn't have to do that in a
> production system yet so be careful. I can think of two routes:
> >>>>> >>>>
> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running,
> this will most likely impact client IO depending on your load. Check out
> the rabbitmqctl commands.
> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from
> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
> >>>>> >>>>
> >>>>> >>>> I can imagine that the failed reply "survives" while being
> replicated across the rabbit nodes. But I don't really know the rabbit
> internals too well, so maybe someone else can chime in here and give a
> better advice.
> >>>>> >>>>
> >>>>> >>>> Regards,
> >>>>> >>>> Eugen
> >>>>> >>>>
> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>>> >>>>
> >>>>> >>>> Hi,
> >>>>> >>>> Can someone please help me out on this issue?
> >>>>> >>>>
> >>>>> >>>> With regards,
> >>>>> >>>> Swogat Pradhan
> >>>>> >>>>
> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>>> >>>> wrote:
> >>>>> >>>>
> >>>>> >>>> Hi
> >>>>> >>>> I don't see any major packet loss.
> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due
> to packet
> >>>>> >>>> loss.
> >>>>> >>>>
> >>>>> >>>> with regards,
> >>>>> >>>> Swogat Pradhan
> >>>>> >>>>
> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>>> >>>> wrote:
> >>>>> >>>>
> >>>>> >>>> Hi,
> >>>>> >>>> Yes the MTU is the same as the default '1500'.
> >>>>> >>>> Generally I haven't seen any packet loss, but never checked when
> >>>>> >>>> launching the instance.
> >>>>> >>>> I will check that and come back.
> >>>>> >>>> But everytime i launch an instance the instance gets stuck at
> spawning
> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
> packet loss
> >>>>> >>>> causes this.
> >>>>> >>>>
> >>>>> >>>> With regards,
> >>>>> >>>> Swogat pradhan
> >>>>> >>>>
> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
> wrote:
> >>>>> >>>>
> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical
> between
> >>>>> >>>> central and edge site? Do you see packet loss through the
> tunnel?
> >>>>> >>>>
> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>>> >>>>
> >>>>> >>>> > Hi Eugen,
> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as
> i am not
> >>>>> >>>> > getting email's from you.
> >>>>> >>>> > Coming to the issue:
> >>>>> >>>> >
> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
> list_policies -p
> >>>>> >>>> /
> >>>>> >>>> > Listing policies for vhost "/" ...
> >>>>> >>>> > vhost   name    pattern apply-to        definition
> priority
> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
> >>>>> >>>> >
> >>>>> >>>>
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >>>>> >>>> >
> >>>>> >>>> > I have the edge site compute nodes up, it only goes down when
> i am
> >>>>> >>>> trying
> >>>>> >>>> > to launch an instance and the instance comes to a spawning
> state and
> >>>>> >>>> then
> >>>>> >>>> > gets stuck.
> >>>>> >>>> >
> >>>>> >>>> > I have a tunnel setup between the central and the edge sites.
> >>>>> >>>> >
> >>>>> >>>> > With regards,
> >>>>> >>>> > Swogat Pradhan
> >>>>> >>>> >
> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>>> >>>> > wrote:
> >>>>> >>>> >
> >>>>> >>>> >> Hi Eugen,
> >>>>> >>>> >> For some reason i am not getting your email to me directly,
> i am
> >>>>> >>>> checking
> >>>>> >>>> >> the email digest and there i am able to find your reply.
> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
> >>>>> >>>> >>
> >>>>> >>>> >> *Note: i am able to create vm's and perform other activities
> in the
> >>>>> >>>> >> central site, only facing this issue in the edge site.*
> >>>>> >>>> >>
> >>>>> >>>> >> With regards,
> >>>>> >>>> >> Swogat Pradhan
> >>>>> >>>> >>
> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>>> >>>> >> wrote:
> >>>>> >>>> >>
> >>>>> >>>> >>> Hi Eugen,
> >>>>> >>>> >>> Thanks for your response.
> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
> details:
> >>>>> >>>> >>>
> >>>>> >>>> >>> *PCS Status:*
> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
> >>>>> >>>> >>>
> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
> >>>>> >>>> Started
> >>>>> >>>> >>> overcloud-controller-no-ceph-3
> >>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
> >>>>> >>>> Started
> >>>>> >>>> >>> overcloud-controller-2
> >>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
> >>>>> >>>> Started
> >>>>> >>>> >>> overcloud-controller-1
> >>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
> >>>>> >>>> Started
> >>>>> >>>> >>> overcloud-controller-0
> >>>>> >>>> >>>
> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the
> issue is
> >>>>> >>>> still
> >>>>> >>>> >>> present.
> >>>>> >>>> >>>
> >>>>> >>>> >>> *Cluster status:*
> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> >>>>> >>>> >>> Cluster status of node
> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> >>>>> >>>> >>> Basics
> >>>>> >>>> >>>
> >>>>> >>>> >>> Cluster name:
> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>>> >>>> >>>
> >>>>> >>>> >>> Disk Nodes
> >>>>> >>>> >>>
> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>>> >>>> >>>
> >>>>> >>>> >>> Running Nodes
> >>>>> >>>> >>>
> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>>> >>>> >>>
> >>>>> >>>> >>> Versions
> >>>>> >>>> >>>
> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
> RabbitMQ
> >>>>> >>>> 3.8.3
> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
> RabbitMQ
> >>>>> >>>> 3.8.3
> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
> RabbitMQ
> >>>>> >>>> 3.8.3
> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> >>>>> >>>> RabbitMQ
> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
> >>>>> >>>> >>>
> >>>>> >>>> >>> Alarms
> >>>>> >>>> >>>
> >>>>> >>>> >>> (none)
> >>>>> >>>> >>>
> >>>>> >>>> >>> Network Partitions
> >>>>> >>>> >>>
> >>>>> >>>> >>> (none)
> >>>>> >>>> >>>
> >>>>> >>>> >>> Listeners
> >>>>> >>>> >>>
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>>> >>>> tool
> >>>>> >>>> >>> communication
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>>>> >>>> >>> and AMQP 1.0
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>>> >>>> tool
> >>>>> >>>> >>> communication
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>>>> >>>> >>> and AMQP 1.0
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>>> >>>> tool
> >>>>> >>>> >>> communication
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP
> 0-9-1
> >>>>> >>>> >>> and AMQP 1.0
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>>> >>>> interface:
> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>>> >>>> ,
> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
> >>>>> >>>> inter-node and
> >>>>> >>>> >>> CLI tool communication
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>>> >>>> ,
> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
> purpose: AMQP
> >>>>> >>>> 0-9-1
> >>>>> >>>> >>> and AMQP 1.0
> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>>> >>>> ,
> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP
> API
> >>>>> >>>> >>>
> >>>>> >>>> >>> Feature flags
> >>>>> >>>> >>>
> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
> >>>>> >>>> >>>
> >>>>> >>>> >>> *Logs:*
> >>>>> >>>> >>> *(Attached)*
> >>>>> >>>> >>>
> >>>>> >>>> >>> With regards,
> >>>>> >>>> >>> Swogat Pradhan
> >>>>> >>>> >>>
> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>>> >>>> >>> wrote:
> >>>>> >>>> >>>
> >>>>> >>>> >>>> Hi,
> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
> >>>>> >>>> >>>>
> >>>>> >>>> >>>> nova-conuctor:
> >>>>> >>>> >>>>
> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The
> reply
> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
> seconds
> >>>>> >>>> due to a
> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
> >>>>> >>>> Abandoning...:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
> reply
> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
> seconds
> >>>>> >>>> due to a
> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>>> >>>> Abandoning...:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
> reply
> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
> seconds
> >>>>> >>>> due to a
> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>>> >>>> Abandoning...:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
> enabled
> >>>>> >>>> with
> >>>>> >>>> >>>> backend dogpile.cache.null.
> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop
> reply to
> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
> reply
> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
> seconds
> >>>>> >>>> due to a
> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
> >>>>> >>>> Abandoning...:
> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>>> >>>> >>>>
> >>>>> >>>> >>>> With regards,
> >>>>> >>>> >>>> Swogat Pradhan
> >>>>> >>>> >>>>
> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>>> >>>> >>>>
> >>>>> >>>> >>>>> Hi,
> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am
> trying to
> >>>>> >>>> >>>>> launch vm's.
> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
> (openstack
> >>>>> >>>> compute
> >>>>> >>>> >>>>> service list), the node comes backup when i restart the
> nova
> >>>>> >>>> compute
> >>>>> >>>> >>>>> service but then the launch of the vm fails.
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>> nova-compute.log
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
> Running
> >>>>> >>>> >>>>> instance usage
> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
> 07:00:00
> >>>>> >>>> to
> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on
> node
> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied
> device
> >>>>> >>>> name:
> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
> enabled
> >>>>> >>>> with
> >>>>> >>>> >>>>> backend dogpile.cache.null.
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Running
> >>>>> >>>> >>>>> privsep helper:
> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> >>>>> >>>> 'privsep-helper',
> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Spawned new
> >>>>> >>>> privsep
> >>>>> >>>> >>>>> daemon via rootwrap
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>>> >>>> >>>>> daemon starting
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>>> >>>> >>>>> process running with uid/gid: 0/0
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-]
> privsep
> >>>>> >>>> >>>>> daemon running as pid 2647
> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> >>>>> >>>> os_brick.initiator.connectors.nvmeof
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Process
> >>>>> >>>> >>>>> execution error
> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
> >>>>> >>>> >>>>> Exit code: 2
> >>>>> >>>> >>>>> Stdout: ''
> >>>>> >>>> >>>>> Stderr: '':
> oslo_concurrency.processutils.ProcessExecutionError:
> >>>>> >>>> >>>>> Unexpected error while running command.
> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>> Is there a way to solve this issue?
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>> With regards,
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>> Swogat Pradhan
> >>>>> >>>> >>>>>
> >>>>> >>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/16572bf7/attachment-0001.htm>

From bence.romsics at gmail.com  Tue Mar 21 14:05:03 2023
From: bence.romsics at gmail.com (Bence Romsics)
Date: Tue, 21 Mar 2023 15:05:03 +0100
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
Message-ID: <CAHeS+3-B2Q-2JGGUmehi97qdqKD8SD3jjuZwYCZnqAav=bN_ng@mail.gmail.com>

Hi,

Thanks for all the answers!

I went back to ask what our users are using this for. At the moment
I'm not sure what they do is really supported. But you tell me. To me
it makes some sense.

Basically they have an additional and unusual compute host recovery
process, where a compute host after a failure is brought back by the
same name. Then they rebuild the servers on the same compute host
where the servers were running before. When the server's disk was
backed by a volume, so its content was not lost by the compute host
failure, they don't want to lose it either in the recovery process.
The evacute operation clearly would be a better fit to do this, but
that disallows evacuating to the "same" host. For a long time rebuild
just allowed "evacuating to the same host". So they went with it.

At the moment I did not find a prohibition in the documentation to
bring back a failed compute host by the same name. If I missed it or
this is not recommended for any reason, please let me know.

Clearly in many clouds evacuating can fully replace what they do here.
I believe they may have chosen this unusual compute host recovery
option to have some kind of recovery process for very small
deployments, where you don't always have space to evacuate before you
rebuilt the failed compute host. And this collided with a deployment
system which reuses host names.

At this point I'm not sure if this really belongs to the rebuild
operation. Could easily be better addressed in evacuate. Or in the
deployment system not reusing hostnames.

Please let me know what you think!

Thanks in advance,
Bence


From dms at danplanet.com  Tue Mar 21 14:56:43 2023
From: dms at danplanet.com (Dan Smith)
Date: Tue, 21 Mar 2023 07:56:43 -0700
Subject: [nova][cinder] future of rebuild without reimaging
In-Reply-To: <CAHeS+3-B2Q-2JGGUmehi97qdqKD8SD3jjuZwYCZnqAav=bN_ng@mail.gmail.com>
 (Bence Romsics's message of "Tue, 21 Mar 2023 15:05:03 +0100")
References: <CAHeS+38pM+g76j6X6GMWLoy2S2B-fj4p7Le5EQo8cXc8fkBY8A@mail.gmail.com>
 <CALOCmumjtDnb6mLgyNL0j5TsXXdzVBLr_3=T1W1kH33v0tkx5w@mail.gmail.com>
 <m28rfxdevr.fsf@caffeine.hv.danplanet.com>
 <CAHeS+3-B2Q-2JGGUmehi97qdqKD8SD3jjuZwYCZnqAav=bN_ng@mail.gmail.com>
Message-ID: <m2edpiywyc.fsf@caffeine.hv.danplanet.com>

> Basically they have an additional and unusual compute host recovery
> process, where a compute host after a failure is brought back by the
> same name. Then they rebuild the servers on the same compute host
> where the servers were running before. When the server's disk was
> backed by a volume, so its content was not lost by the compute host
> failure, they don't want to lose it either in the recovery process.
> The evacute operation clearly would be a better fit to do this, but
> that disallows evacuating to the "same" host. For a long time rebuild
> just allowed "evacuating to the same host". So they went with it.

Aside from the "should this be possible" question, is rebuild even
required in this case? For the non-volume-backed instances, we need
rebuild to re-download the image and create the root disk. If it's
really required for volume-backed instances, I'm guessing there's just
some trivial amount of state that isn't in place on recovery that the
rebuild "solves". It is indeed a very odd fringe use-case that is an
obvious mis-use of the function.

> At the moment I did not find a prohibition in the documentation to
> bring back a failed compute host by the same name. If I missed it or
> this is not recommended for any reason, please let me know.

I'm not sure why this would be specifically documented, but since
compute nodes are not fully stateless, your scenario is basically
"delete part of the state of the system and expect things to keep
working" which I don't think is reasonable (nor something we should need
to document).

Your scenario is basically the same as one where your /var/lib/nova is
mounted on a disk that doesn't come up after reboot, or on NFS that was
unavailable at boot. If nova were to say "meh, a bunch of state
disappeared, I must be a rebuilt compute host" then it would potentially
destroy (or desynchronize) actual state in other nodes (i.e. the
database) for a transient/accidental situation. TBH, we might should
even explicitly *block* rebuild on an instance that appears to be
missing its on-disk state to avoid users, who don't know the state of the
infra, from doing this to try to unblock their instances while ops are
doing maintenance.

I will point out that bringing back a compute node under the same name
(without cleaning the residue first) is strikingly similar to renaming a
compute host, which we do *not* support. As of Antelope, the compute
node would detect your scenario as a potential rename and refuse to
start, again because of state that has been lost in the system. So just
FYI that an actual blocker to your scenario is coming :)

> Clearly in many clouds evacuating can fully replace what they do here.
> I believe they may have chosen this unusual compute host recovery
> option to have some kind of recovery process for very small
> deployments, where you don't always have space to evacuate before you
> rebuilt the failed compute host. And this collided with a deployment
> system which reuses host names.
>
> At this point I'm not sure if this really belongs to the rebuild
> operation. Could easily be better addressed in evacuate. Or in the
> deployment system not reusing hostnames.

Evacuate can't work for this case either because it requires the compute
node to be down to perform. As you note, bringing it back under a
different name would solve that problem. However, neither "evacuate to
same host" or "use rebuild for this recovery procedure" are reasonable,
IMHO.

--Dan


From knikolla at bu.edu  Tue Mar 21 15:42:22 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Tue, 21 Mar 2023 15:42:22 +0000
Subject: [tc][all] Technical Committee next weekly meeting on March 22 at 1600
 UTC
Message-ID: <70D4500C-29CA-48F6-890A-22A294CBE5D0@bu.edu>

Hi all,

This is a reminder that the next weekly Technical Committee meeting is to be held tomorrow (March 22) at 1600 UTC on #openstack-tc on OFTC IRC. The meeting will be chaired by Kristi Nikolla.

A copy of the agenda can be found below. Items can still be proposed by editing the wiki page at https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting

* Deciding on meeting time
* Gate health check
* 2023.2 cycle Leaderless projects
** https://etherpad.opendev.org/p/2023.2-leaderless
* Virtual PTG Planning
** March 27-31, 2023, there's the Virtual PTG. 
** https://etherpad.opendev.org/p/tc-2023-2-ptg
* TC 2023.1 tracker status checks
** https://etherpad.opendev.org/p/tc-2023.1-tracker
* Cleanup of PyPI maintainer list for OpenStack Projects
** Etherpad for audit and cleanup of additional PyPi maintainers
*** https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
** ML discussion
*** https://lists.openstack.org/pipermail/openstack-discuss/2023-January/031848.html
* Recurring tasks check
** Bare 'recheck' state
*** https://etherpad.opendev.org/p/recheck-weekly-summary
* Open Reviews
** https://review.opendev.org/q/projects:openstack/governance+is:open

There are no noted absences.

Thank you,
Kristi Nikolla

From hiromu.asahina.az at hco.ntt.co.jp  Tue Mar 21 16:00:06 2023
From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina)
Date: Wed, 22 Mar 2023 01:00:06 +0900
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
Message-ID: <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>

I apologize that I couldn't reply before the Ironic meeting on Monday.

I need one slot to discuss this topic.

I asked Keystone today and Monday's first Keystone slot (14 UTC Mon, 
27)[1,2] works for them. Does this work for Ironic? I understand not all 
Ironic members will join this discussion, so I hope we can arrange a 
convenient date for you two at least and, hopefully, for those 
interested in this topic.

[1] 
https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
[2] https://ptg.opendev.org/ptg.html

Thanks,
Hiromu Asahina

On 2023/03/17 23:29, Julia Kreger wrote:
> I'm not sure how many Ironic contributors would be the ones to attend a
> discussion, in part because this is disjointed from the items they need to
> focus on. It is much more of a "big picture" item for those of us who are
> leaders in the project.
> 
> I think it would help to understand how much time you expect the discussion
> to take to determine a path forward and how we can collaborate. Ironic has
> a huge number of topics we want to discuss during the PTG, and I suspect
> our team meeting on Monday next week should yield more interest/awareness
> as well as an amount of time for each topic which will aid us in scheduling.
> 
> If you can let us know how long, then I think we can figure out when the
> best day/time will be.
> 
> Thanks!
> 
> -Julia
> 
> 
> 
> 
> 
> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
> 
>> Thank you for your reply.
>>
>> I'd like to decide the time slot for this topic.
>> I just checked PTG schedule [1].
>>
>> We have the following time slots. Which one is convenient to gether?
>> (I didn't get reply but I listed Barbican, as its cores are almost the
>> same as Keystone)
>>
>> Mon, 27:
>>
>> - 14 (keystone)
>> - 15 (keystone)
>>
>> Tue, 28
>>
>> - 13 (barbican)
>> - 14 (keystone, ironic)
>> - 15 (keysonte, ironic)
>> - 16 (ironic)
>>
>> Wed, 29
>>
>> - 13 (ironic)
>> - 14 (keystone, ironic)
>> - 15 (keystone, ironic)
>> - 21 (ironic)
>>
>> Thanks,
>>
>> [1] https://ptg.opendev.org/ptg.html
>>
>> Hiromu Asahina
>>
>>
>> On 2023/02/11 1:41, Jay Faulkner wrote:
>>> I think it's safe to say the Ironic community would be very invested in
>>> such an effort. Let's make sure the time chosen for vPTG with this is
>> such
>>> that Ironic contributors can attend as well.
>>>
>>> Thanks,
>>> Jay Faulkner
>>> Ironic PTL
>>>
>>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
>>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>>
>>>> Hello Everyone,
>>>>
>>>> Recently, Tacker and Keystone have been working together on a new
>> Keystone
>>>> Middleware that can work with external authentication
>>>> services, such as Keycloak. The code has already been submitted [1], but
>>>> we want to make this middleware a generic plugin that works
>>>> with as many OpenStack services as possible. To that end, we would like
>> to
>>>> hear from other projects with similar use cases
>>>> (especially Ironic and Barbican, which run as standalone services). We
>>>> will make a time slot to discuss this topic at the next vPTG.
>>>> Please contact me if you are interested and available to participate.
>>>>
>>>> [1] https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>>>>
>>>> --
>>>> Hiromu Asahina
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> ?-------------------------------------?
>>      NTT Network Innovation Center
>>        Hiromu Asahina
>>     -------------------------------------
>>      3-9-11, Midori-cho, Musashino-shi
>>        Tokyo 180-8585, Japan
>> Phone: +81-422-59-7008
>> Email: hiromu.asahina.az at hco.ntt.co.jp
>> ?-------------------------------------?
>>
>>
> 

-- 
?-------------------------------------?
    NTT Network Innovation Center
      Hiromu Asahina
   -------------------------------------
    3-9-11, Midori-cho, Musashino-shi
      Tokyo 180-8585, Japan
? Phone: +81-422-59-7008
? Email: hiromu.asahina.az at hco.ntt.co.jp
?-------------------------------------?


From jay at gr-oss.io  Tue Mar 21 16:03:51 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Tue, 21 Mar 2023 09:03:51 -0700
Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra
 maintainers
In-Reply-To: <CA+sTGNfCy+w9uLqpz3KFEGo14h7Qxy_6gRx7htceSDKLABgpzQ@mail.gmail.com>
References: <CA+sTGNfCy+w9uLqpz3KFEGo14h7Qxy_6gRx7htceSDKLABgpzQ@mail.gmail.com>
Message-ID: <CA+sTGNe-7TCy4qea+5WArWBzjtXyDE6LZ5665DvGOWX4HxmAzQ@mail.gmail.com>

Thanks to those who have already taken action! Fifty extra maintainers have
already been removed, with around three hundred to go.

Please reach out to me if you're having trouble finding current email
addresses for anyone, or having trouble with the process at all.

Thanks,
Jay Faulkner
TC Vice-Chair


On Thu, Mar 16, 2023 at 3:22?PM Jay Faulkner <jay at gr-oss.io> wrote:

> Hi PTLs,
>
> The TC recently voted[1] to require humans be removed from PyPI access for
> OpenStack-managed projects. This helps ensure all releases are created via
> releases team tooling and makes it less likely for a user account
> compromise to impact OpenStack packages.
>
> Many projects have already updated
> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33
> with a list of packages that contain extra maintainers. We'd like to
> request that PTLs, or their designate, reach out to any extra maintainers
> listed for projects you are responsible for and request they remove their
> access in accordance with policy. An example email, and detailed steps to
> follow have been provided at
> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template
> .
>
> Thank you for your cooperation as we work to improve our security posture
> and harden against supply chain attacks.
>
> Thank you,
> Jay Faulkner
> TC Vice-Chair
>
> 1:
> https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/58cac60c/attachment-0001.htm>

From jay at gr-oss.io  Tue Mar 21 16:05:58 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Tue, 21 Mar 2023 09:05:58 -0700
Subject: [ironic] No meeting Monday 3/27/2023
Message-ID: <CA+sTGNeVe+wWKzDAbN=wPWQKsSmFoYErxPtXSEkjHKB2ycm-zA@mail.gmail.com>

Hello,

Monday March 27, 2023 is when the vPTG is set to begin. I'm cancelling the
Ironic weekly meeting to ensure any Ironic contributors can participate in
sessions occurring Monday.

Thanks!
Jay Faulkner
Ironic PTL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/b2926315/attachment.htm>

From ralonsoh at redhat.com  Tue Mar 21 16:06:46 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Tue, 21 Mar 2023 17:06:46 +0100
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <CAKwN9=DQbon5=v+7O-hSyZnn1rRFYxNVBfJniMR8_TpZokZ5pw@mail.gmail.com>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
 <3840757.STTH5IQzZg@p1>
 <CAKwN9=DQbon5=v+7O-hSyZnn1rRFYxNVBfJniMR8_TpZokZ5pw@mail.gmail.com>
Message-ID: <CAECr9X675UpPz7NQy+_1vN6jXhyJgnu1Je5BiBCRM_CShJ4EVw@mail.gmail.com>

Hello:

I agree with having a single API meaning for all backends. We currently
support stateless SGs in iptables and ML2/OVN and both backends provide the
same behaviour: a rule won't create an opposite direction counterpart by
default, the user needs to define it explicitly.

The discussion here could be the default behaviour for standard services:
* DHCP service is currently supported in iptables, native OVS and OVN. This
should be supported even without any rule allowed (as is now). Of course,
we need to explicitly document that.
* DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by
default, as part of the DHCP protocol traffic allowance.
* Metadata service: this is not a network protocol and we should not
consider it. Actually this service is working now (with stateful SGs)
because of the default SG egress rules we add. So I'm not in favor of [2]

Regards.

[1]https://review.opendev.org/c/openstack/neutron/+/877049
[2]https://review.opendev.org/c/openstack/neutron/+/876659

On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka <ihrachys at redhat.com>
wrote:

> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski <skaplons at redhat.com>
> wrote:
> >
> > Hi,
> >
> >
> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
> >
> > > Hi all,
> >
> > >
> >
> > > (I've tagged the thread with [ovn] because this question was raised in
> >
> > > the context of OVN, but it really is about the intent of neutron
> >
> > > stateless SG API.)
> >
> > >
> >
> > > Neutron API supports 'stateless' field for security groups:
> >
> > >
> https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
> >
> > >
> >
> > > The API reference doesn't explain the intent of the API, merely
> >
> > > walking through the field mechanics, as in
> >
> > >
> >
> > > "The stateful security group extension (stateful-security-group) adds
> >
> > > the stateful field to security groups, allowing users to configure
> >
> > > stateful or stateless security groups for ports. The existing security
> >
> > > groups will all be considered as stateful. Update of the stateful
> >
> > > attribute is allowed when there is no port associated with the
> >
> > > security group."
> >
> > >
> >
> > > The meaning of the API is left for users to deduce. It's customary
> >
> > > understood as something like
> >
> > >
> >
> > > "allowing to bypass connection tracking in the firewall, potentially
> >
> > > providing performance and simplicity benefits" (while imposing
> >
> > > additional complexity onto rule definitions - the user now has to
> >
> > > explicitly define rules for both directions of a duplex connection.)
> >
> > > [This is not an official definition, nor it's quoted from a respected
> >
> > > source, please don't criticize it. I don't think this is an important
> >
> > > point here.]
> >
> > >
> >
> > > Either way, the definition doesn't explain what should happen with
> >
> > > basic network services that a user of Neutron SG API is used to rely
> >
> > > on. Specifically, what happens for a port related to a stateless SG
> >
> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
> >
> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
> >
> > > procedure to configure its IPv6 stack.
> >
> > >
> >
> > > As part of our testing of stateless SG implementation for OVN backend,
> >
> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to
> >
> > > configure IPv6.
> >
> > >
> >
> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
> >
> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
> >
> > >
> >
> > > We've noticed that adding explicit SG rules to allow 'returning'
> >
> > > communication for 169.254.169.254:80 and RA / NA fixes the problem.
> >
> > >
> >
> > > I figured that these services are "base" / "basic" and should be
> >
> > > provided to ports regardless of the stateful-ness of SG. I proposed
> >
> > > patches for this here:
> >
> > >
> >
> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
> >
> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
> >
> > >
> >
> > > Discussion in the patch that adjusts the existing stateless SG test
> >
> > > scenarios to not create explicit SG rules for metadata and ICMP
> >
> > > replies suggests that it's not a given / common understanding that
> >
> > > these "base" services should work by default for stateless SGs.
> >
> > >
> >
> > > See discussion in comments here:
> >
> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
> >
> > >
> >
> > > While this discussion is happening in the context of OVN, I think it
> >
> > > should be resolved in a broader context. Specifically, a decision
> >
> > > should be made about what Neutron API "means" by stateless SGs, and
> >
> > > how "base" services are supposed to behave. Then backends can act
> >
> > > accordingly.
> >
> > >
> >
> > > There's also an open question of how this should be implemented.
> >
> > > Whether Neutron would like to create explicit SG rules visible in API
> >
> > > that would allow for the returning traffic and that could be deleted
> >
> > > as needed, or whether backends should do it implicitly. We already
> >
> > > have "default" egress rules, so there's a precedent here. On the other
> >
> > > hand, the egress rules are broad (allowing everything) and there's
> >
> > > more rationale to delete them and replace them with tighter filters.
> >
> > > In my OVN series, I implement ACLs directly in OVN database, without
> >
> > > creating SG rules in Neutron API.
> >
> > >
> >
> > > So, questions for the community to clarify:
> >
> > > - whether Neutron API should define behavior of stateless SGs in
> general,
> >
> > > - if so, whether Neutron API should also define behavior of stateless
> >
> > > SGs in terms of "base" services like metadata and DHCP,
> >
> > > - if so, whether backends should implement the necessary filters
> >
> > > themselves, or Neutron will create default SG rules itself.
> >
> >
> > I think that we should be transparent and if we need any SG rules like
> that to allow some traffic, those rules should be be added in visible way
> for user.
> >
> > We also have in progress RFE
> https://bugs.launchpad.net/neutron/+bug/1983053 which may help
> administrators to define set of default SG rules which will be in each new
> SG. So if we will now make those additional ACLs to be visible as SG rules
> in SG it may be later easier to customize it.
> >
> > If we will hard code ACLs to allow ingress traffic from metadata server
> or RA/NA packets there will be IMO inconsistency in behaviour between
> stateful and stateless SGs as for stateful user will be able to disallow
> traffic between vm and metadata service (probably there's no real use case
> for that but it's possible) and for stateless it will not be possible as
> ingress rules will be always there. Also use who knows how stateless SG
> works may even treat it as bug as from Neutron API PoV this traffic to/from
> metadata server would work as stateful - there would be rule to allow
> egress traffic but what actually allows ingress response there?
> >
>
> Thanks for clarifying the rationale on picking SG rules and not
> per-backend implementation.
>
> What would be your answer to the two other questions in the list
> above, specifically, "whether Neutron API should define behavior of
> stateless SGs in general" and "whether Neutron API should define
> behavior of stateless SGs in relation to metadata / RA / NA". Once we
> have agreement on these points, we can discuss the exact mechanism -
> whether to implement in backend or in API. But these two questions are
> first order in my view.
>
> (To give an idea of my thinking, I believe API definition should not
> only define fields and their mechanics but also semantics, so
>
> - yes, api-ref should define the meaning ("behavior") of stateless SG
> in general, and
> - yes, api-ref should also define the meaning ("behavior") of
> stateless SG in relation to "standard" services like ipv6 addressing
> or metadata.
>
> As to the last question - whether it's up to ml2 backend to implement
> the behavior, or up to the core SG database plugin - I don't have a
> strong opinion. I lean to "backend" solution just because it allows
> for more granular definition because SG rules may not express some
> filter rules, e.g. source port for metadata replies (an unfortunate
> limitation of SG API that we inherited from AWS?). But perhaps others
> prefer paying the price for having neutron ml2 plugin enforcing the
> behavior consistently across all backends.
>
> >
> > >
> >
> > > I hope I laid the problem out clearly, let me know if anything needs
> >
> > > clarification or explanation.
> >
> >
> > Yes :) At least for me.
> >
> >
> > >
> >
> > > Yours,
> >
> > > Ihar
> >
> > >
> >
> > >
> >
> > >
> >
> >
> >
> > --
> >
> > Slawek Kaplonski
> >
> > Principal Software Engineer
> >
> > Red Hat
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/797c9927/attachment.htm>

From juliaashleykreger at gmail.com  Tue Mar 21 16:29:47 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Tue, 21 Mar 2023 09:29:47 -0700
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
Message-ID: <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>

No worries!

I think that time works for me. I'm not sure it will work for everyone, but
I can proxy information back to the whole of the ironic project as we also
have the question of this functionality listed for our Operator Hour in
order to help ironic gauge interest.

-Julia

On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
hiromu.asahina.az at hco.ntt.co.jp> wrote:

> I apologize that I couldn't reply before the Ironic meeting on Monday.
>
> I need one slot to discuss this topic.
>
> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
> 27)[1,2] works for them. Does this work for Ironic? I understand not all
> Ironic members will join this discussion, so I hope we can arrange a
> convenient date for you two at least and, hopefully, for those
> interested in this topic.
>
> [1]
>
> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
> [2] https://ptg.opendev.org/ptg.html
>
> Thanks,
> Hiromu Asahina
>
> On 2023/03/17 23:29, Julia Kreger wrote:
> > I'm not sure how many Ironic contributors would be the ones to attend a
> > discussion, in part because this is disjointed from the items they need
> to
> > focus on. It is much more of a "big picture" item for those of us who are
> > leaders in the project.
> >
> > I think it would help to understand how much time you expect the
> discussion
> > to take to determine a path forward and how we can collaborate. Ironic
> has
> > a huge number of topics we want to discuss during the PTG, and I suspect
> > our team meeting on Monday next week should yield more interest/awareness
> > as well as an amount of time for each topic which will aid us in
> scheduling.
> >
> > If you can let us know how long, then I think we can figure out when the
> > best day/time will be.
> >
> > Thanks!
> >
> > -Julia
> >
> >
> >
> >
> >
> > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
> > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> >
> >> Thank you for your reply.
> >>
> >> I'd like to decide the time slot for this topic.
> >> I just checked PTG schedule [1].
> >>
> >> We have the following time slots. Which one is convenient to gether?
> >> (I didn't get reply but I listed Barbican, as its cores are almost the
> >> same as Keystone)
> >>
> >> Mon, 27:
> >>
> >> - 14 (keystone)
> >> - 15 (keystone)
> >>
> >> Tue, 28
> >>
> >> - 13 (barbican)
> >> - 14 (keystone, ironic)
> >> - 15 (keysonte, ironic)
> >> - 16 (ironic)
> >>
> >> Wed, 29
> >>
> >> - 13 (ironic)
> >> - 14 (keystone, ironic)
> >> - 15 (keystone, ironic)
> >> - 21 (ironic)
> >>
> >> Thanks,
> >>
> >> [1] https://ptg.opendev.org/ptg.html
> >>
> >> Hiromu Asahina
> >>
> >>
> >> On 2023/02/11 1:41, Jay Faulkner wrote:
> >>> I think it's safe to say the Ironic community would be very invested in
> >>> such an effort. Let's make sure the time chosen for vPTG with this is
> >> such
> >>> that Ironic contributors can attend as well.
> >>>
> >>> Thanks,
> >>> Jay Faulkner
> >>> Ironic PTL
> >>>
> >>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> >>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
> >>>
> >>>> Hello Everyone,
> >>>>
> >>>> Recently, Tacker and Keystone have been working together on a new
> >> Keystone
> >>>> Middleware that can work with external authentication
> >>>> services, such as Keycloak. The code has already been submitted [1],
> but
> >>>> we want to make this middleware a generic plugin that works
> >>>> with as many OpenStack services as possible. To that end, we would
> like
> >> to
> >>>> hear from other projects with similar use cases
> >>>> (especially Ironic and Barbican, which run as standalone services). We
> >>>> will make a time slot to discuss this topic at the next vPTG.
> >>>> Please contact me if you are interested and available to participate.
> >>>>
> >>>> [1]
> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
> >>>>
> >>>> --
> >>>> Hiromu Asahina
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> ?-------------------------------------?
> >>      NTT Network Innovation Center
> >>        Hiromu Asahina
> >>     -------------------------------------
> >>      3-9-11, Midori-cho, Musashino-shi
> >>        Tokyo 180-8585, Japan
> >> Phone: +81-422-59-7008
> >> Email: hiromu.asahina.az at hco.ntt.co.jp
> >> ?-------------------------------------?
> >>
> >>
> >
>
> --
> ?-------------------------------------?
>     NTT Network Innovation Center
>       Hiromu Asahina
>    -------------------------------------
>     3-9-11, Midori-cho, Musashino-shi
>       Tokyo 180-8585, Japan
> Phone: +81-422-59-7008
> Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/930d7dda/attachment-0001.htm>

From noonedeadpunk at gmail.com  Tue Mar 21 17:10:33 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Tue, 21 Mar 2023 18:10:33 +0100
Subject: [vptg][ptg][openstack-ansible] Bobcat virtual Project Team Gathering
 and Operator Hours
Message-ID: <CAPd_6AvC4_tk_qpsFBWci=n6FXJK-i5+YH2PkcehyYhPErxG8Q@mail.gmail.com>

Hi everyone!

I'm happy to inform you that the OpenStack-Ansible team is going to
have a virtual PTG next Tuesday, on March 28 from 15:00 till 18:00 UTC
in Kilo room [1].
Everyone who is interested in participating in further development or
regarding project plans for the next releases are warmly welcome to
join us.

We're also continuing the tradition to have a project Operator Hours.
So all operators or folks who are wondering about OpenStack-Ansible
concept, designs or just want to share their experience with the
project are warmly welcome to join us on Wednesday, March 29 from
17:00 till 18:00 UTC in Havana room [2]

So add dates to your calendar and hope seeing/hearing everyone next week!

[1] PTG room: https://www.openstack.org/ptg/rooms/kilo
[2] Operator hours room: https://www.openstack.org/ptg/rooms/havana


From gmann at ghanshyammann.com  Tue Mar 21 17:36:44 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Tue, 21 Mar 2023 10:36:44 -0700
Subject: [ptl][tc][ops][ptg] Operator + Developers interaction
 (operator-hours) slots in 2023.2 Bobcat PTG
In-Reply-To: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>
References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>
Message-ID: <187053ea474.d7018e84897536.6821147710012863509@ghanshyammann.com>

Hello Everyone,

This is a gentle reminder to book your project's operator-hours asap. Till now we have only
5 projects booked it,

-gmann
 
 ---- On Fri, 17 Mar 2023 14:19:22 -0700  Ghanshyam Mann  wrote --- 
 > Hello Everyone/PTL,
 > 
 > To improve the interaction/feedback between operators and developers, one of the efforts is to schedule
 > the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG, which had mixed
 > productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in March
 > vPTG also. 
 > 
 > TC will not book the placeholder this time so that slots can be booked in the project room itself, and operators
 > can join developers to have a joint discussion. But at the same time, we need to avoid slot conflict for operators.
 > Every project needs to make sure its 'operator-hour' does not overlap with the related projects (integrated projects
 > which might have common operators, for example. nova, cinder, neutron etc needs to avoid conflict) 'operator-hour'.
 > 
 > Guidelines for the project team to book 'operator-hour' 
 > ---------------------------------------------------------------------------------------
 > * Request in #openinfra-events IRC channel to register the new track 'operator-hour-'.
 > For example, 'operator-hour-nova'
 > 
 > * Once the track is registered, find a spot in your project slots where no other project (which you think is related/integrated
 > project and might have common operators) has already booked their operator-hour. Accordingly, book with the newly
 > registered track 'operator-hour-'. For example, #operator-hour-nova book essex-WedB1 . 
 > 
 > * Do not book more than one slot (1 hour) so that other projects will have enough slots open to book. If more discussion is
 > needed on anything, it can be continued in project-specific slots.
 > 
 > We request that every project book an 'operator hour' slot for operators to join your PTG session.
 > For any query/conflict, ping TC in #openstack-tc or #openinfra-events IRC channel.
 > 
 > [1] https://etherpad.opendev.org/p/Oct2022_PTGFeedback#L32
 > 
 > -gmann
 > 
 > 


From manchandavishal143 at gmail.com  Tue Mar 21 18:00:59 2023
From: manchandavishal143 at gmail.com (vishal manchanda)
Date: Tue, 21 Mar 2023 23:30:59 +0530
Subject: [horizon] Cancelling next two weekly meetings
Message-ID: <CADrq38txfwoZSghrY5AF8T_wCgp-_RUp972Zsa8Xi2sHL0yRDw@mail.gmail.com>

Hello Team,

As agreed, during the last weekly meeting, we are canceling our
weekly meeting on  22nd March and 29th March.
The next weekly meeting will be on 5th April.

See you at the PTG!
Also, Please add topics for PTG discussion [1].

Thanks & regards,
Vishal Manchanda

[1] https://etherpad.opendev.org/p/horizon-bobcat-ptg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/76fa7c6e/attachment.htm>

From kennelson11 at gmail.com  Tue Mar 21 18:30:54 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Tue, 21 Mar 2023 13:30:54 -0500
Subject: PTG March 2023 Registration & Schedule
Message-ID: <CAJ6yrQhNTXKBd3yU9Deovoa5xU220jpnSbOkiLGfUXnjUVaJgA@mail.gmail.com>

Hello Everyone!

The March 2023 Project Teams Gathering is right around the corner (March
27-31) and the schedule is being setup by your team leads! Slots are going
fast, so make sure to get your time booked ASAP if you haven't already!

You can find the schedule and available slots on the PTGbot website [1].

The PTGbot site is the during-event website to keep track of what's being
discussed and any last-minute schedule changes. It is driven via commands
in the #openinfra-events IRC channel (on the OFTC network) where the PTGbot
listens. If you have questions about the commands that you can give the
bot, check out the documentation here[2]. Also, if you haven?t connected to
IRC before, here are some docs on how to get setup![3]

Lastly, please don't forget to register[4] (it is free after all!).

Please let us know if you have any questions via email to ptg at openinfra.dev.

Thanks!
-Kendall (diablo_rojo)

[1] PTGbot Site: https://ptg.opendev.org/ptg.html
[2] PTGbot Documentation:
https://github.com/openstack/ptgbot#open-infrastructure-ptg-bot
[3] Setup IRC: https://docs.openstack.org/contributors/common/irc.html
[4] PTG Registration: https://openinfra-ptg.eventbrite.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230321/87e5921c/attachment.htm>

From felipe.reyes at canonical.com  Tue Mar 21 21:40:42 2023
From: felipe.reyes at canonical.com (Felipe Reyes)
Date: Tue, 21 Mar 2023 18:40:42 -0300
Subject: [ptl][tc][ops][ptg] Operator + Developers interaction
 (operator-hours) slots in 2023.2 Bobcat PTG
In-Reply-To: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>
References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>
Message-ID: <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com>

Hi Ghanshyam,

On Fri, 2023-03-17 at 14:19 -0700, Ghanshyam Mann wrote:
> Hello Everyone/PTL,
> 
> To improve the interaction/feedback between operators and developers, one of the efforts is to
> schedule
> the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG,
> which had mixed
> productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in
> March
> vPTG also.

At OpenStack-charms project we thought it was a good idea, can we get the track 'operator-hour-
openstackcharms' registered?

Thanks,

-- 
Felipe Reyes
Software Engineer @ Canonical
felipe.reyes at canonical.com (GPG:0x9B1FFF39)
Launchpad: ~freyes | IRC: freyes


From nguyenhuukhoinw at gmail.com  Wed Mar 22 00:07:49 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 22 Mar 2023 07:07:49 +0700
Subject: [openstack][kolla-ansible]
Message-ID: <CABAODReLRGPueAt05ytdxBam=NUx+HVEt=XbkHN2fDUYs2Y59Q@mail.gmail.com>

Hello guys.,
I am using Xena(Ubuntu 20.04) and want to upgrade to zed.
With my reading from
https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html

I will do it like that:

Upgrade from Xena(20.04 container base) to Yoga(22.04 container base) then
upgrade host to 22.04 >> Upgrade from Yoga to Zed..
*** Can you verify with me that we need  kolla-ansible upgrade first then
upgrade the host?
Is that right?

I just thought about mariadb, how we can verify schema upgrade and rabbitmq
will crash with the new version?

What will we do if something crashes like an upgrade database and rabbitmq
failed?

Thank you. Regards

Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/7560796d/attachment-0001.htm>

From gmann at ghanshyammann.com  Wed Mar 22 00:36:41 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Tue, 21 Mar 2023 17:36:41 -0700
Subject: [all][tc][policy] Canceling policy popup team next week meeting
Message-ID: <18706bf1fab.d4504cb1909651.6328907432889493469@ghanshyammann.com>

Hello Everyone,

Due to vPTG weeks, I am cancelling the policy pop-up next meeting scheduled for 28th Mar.

https://wiki.openstack.org/wiki/Consistent_and_Secure_Default_Policies_Popup_Team#Meeting

-gmann


From ricolin at ricolky.com  Wed Mar 22 01:58:52 2023
From: ricolin at ricolky.com (Rico Lin)
Date: Wed, 22 Mar 2023 09:58:52 +0800
Subject: [magnum] Secure RBAC implememtion
Message-ID: <CA+WCyyobjduYMkr3_X1TwNcFDkt9Q7ZDCZ7Db4BwKs1GnjVbkA@mail.gmail.com>

Hi magnum team

I would like to make short report regarding the progress and trigger
followup discussion

Right now, with patch series
https://review.opendev.org/c/openstack/magnum/+/874945
The patchset is a follow of tc goals: Consistent and Secure Default RBAC[1]

We have now:
* Implementation of Secure RBAC in project member and project reader for
most APIs. And also add project scope check for APIs which is not design to
run across multiple projects.
* Unit test and functional test ready and passed for above features.

The change of secure RBAC is currently default to false, so it will not
affect on current running environments. And we should enable it in the
following cycle. So what it does when not enable those configs are only
provided deprecation warning.

When enabled, we will requires project_reader role for perform any
non-admin GET requests and project_member role for any other non-admin
requests(PATCH, DELETE, POST, etc). And will also requires project scope
token to allow perform those APIs.

One of the patch we can discuss is to explicit set admin authorization to
APIs in https://review.opendev.org/c/openstack/magnum/+/875625
This IMO, is an idea change to make sure we don't break admin operations on
all APIs to avoid bugs like https://bugs.launchpad.net/neutron/+bug/1997089
, but if there are any other concerns, I would love to learn about it.

The patch sets are ready, I think as we already in new developing cycle,
would really like if anyone can help to review and landing them.
 Most of projects are already have these implementation in place, so now
would be a good time for magnum to catch up with that goal.


[1]
https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html

*Rico Lin*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/7fc74802/attachment.htm>

From sbauza at redhat.com  Wed Mar 22 09:42:24 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 22 Mar 2023 10:42:24 +0100
Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting)
Message-ID: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>

Hey folks,

As a reminder, the Nova community will discuss at the vPTG. You can see the
topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg

Our agenda will be from Tuesday to Friday, everyday between 1300UTC and
1700UTC. Connection details are in the etherpad above, but you can also use
PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo
room for all the discussions)

You can't stick around for 4 hours x 4 days ? Heh, no worries !
If you (as an operator or a developer) want to engage with us (and we'd
love this honestly), you have two possibilities :
 - either you prefer to listen (and talk) to some topics you've seen in the
agenda, and then add your IRC nick (details how to use IRC are explained by
[1]) on the topics you want. Once we start to discuss about those topics,
I'll ping the courtesy ping list of each topic on #openstack-nova. Just
make sure you're around in the IRC channel.
 - or you prefer to engage with us about some pain points or some feature
requests, and then the right time is the Nova Operator Hour that will be on
*Tuesday 1500UTC*. We have a specific etherpad for this session :
https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you
can preemptively add your thoughts or concerns.

Anyway, we are eager to meet you all !

Oh, last point, given we will be at the vPTG, next week's weekly meeting on
Tuesday is CANCELLED. But I guess you'll see it either way if you lurk the
#openstack-nova channel ;-)

See you next week !
-Sylvain

[1]
https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/3043b767/attachment.htm>

From ralonsoh at redhat.com  Wed Mar 22 10:54:58 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Wed, 22 Mar 2023 11:54:58 +0100
Subject: [neutron] Neutron PTG sessions schedule
Message-ID: <CAECr9X5xouTPjWaTFytS4Ahgg7bGbvmRvSJ5YpD48o8S3Y7VEw@mail.gmail.com>

Hello all:

Please check the Neutron sessions schedule for the PTG week [1]. All
sessions will take place in Juno channel [2]. This is a summary of the
sessions and days:
* Tuesday (13UTC - 17UTC): retrospective, releases, migrations and project
deprecations.
* Wednesday (13UTC - 17UTC): new RFEs and operator hour.
* Thursday (13UTC - 17UTC): Nova-Neutron sessions, ovn-bgp-agent roadmap
and neutron-dynamic-routing RFE.
* Friday (13UTC - 17UTC): open hour for core candidates!!

If you have any questions, please reply to this email or ping me in IRC
(#openstack-neutron, <ralonsoh>).

Regards.

[1]https://etherpad.opendev.org/p/neutron-bobcat-ptg
[2]https://ptg.opendev.org/ptg.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/1beead81/attachment.htm>

From ralonsoh at redhat.com  Wed Mar 22 10:57:41 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Wed, 22 Mar 2023 11:57:41 +0100
Subject: [nova][neutron] Nova-Neutron PTG sessions
Message-ID: <CAECr9X6AQ5OLDG2Y2PP4YgmaNFdBo=Cmd7H4yBdUTeZxftLc3w@mail.gmail.com>

Hello all:

The Nova-Neutron PTG sessions will take place on Thursday, from 13UTC to
15UTC, in the Juno channel. Please check the agenda [1]. We have 3 topics:
* (artom) delete_on_termination for Neutron ports
* (dvo-plv) Blueprint: "Add support for Napatech LinkVirt SmartNICs" review
* (ralonsoh, artom): https://bugs.launchpad.net/neutron/+bug/1986003 (How
to handle the duplicated port binding activate request from Nova, both in
Nova and Neutron)

Regards.

[1]https://etherpad.opendev.org/p/neutron-bobcat-ptg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/e2db7880/attachment-0001.htm>

From ralonsoh at redhat.com  Wed Mar 22 11:00:28 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Wed, 22 Mar 2023 12:00:28 +0100
Subject: [neutron] Neutron meetings cancelled next week
Message-ID: <CAECr9X63+RnxXOVg8AEvqrsjo4ccsdXGwe8nKnd9XyXmuyp9EQ@mail.gmail.com>

Hello Neutrinos:

The regular Neutron meetings (team, CI and drivers) will be cancelled next
week because of the PTG. Join us during the next one!

Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/72b44f03/attachment.htm>

From hiromu.asahina.az at hco.ntt.co.jp  Wed Mar 22 11:01:05 2023
From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina)
Date: Wed, 22 Mar 2023 20:01:05 +0900
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
Message-ID: <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>

Thanks!

I look forward to your reply.

On 2023/03/22 1:29, Julia Kreger wrote:
> No worries!
> 
> I think that time works for me. I'm not sure it will work for everyone, but
> I can proxy information back to the whole of the ironic project as we also
> have the question of this functionality listed for our Operator Hour in
> order to help ironic gauge interest.
> 
> -Julia
> 
> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
> 
>> I apologize that I couldn't reply before the Ironic meeting on Monday.
>>
>> I need one slot to discuss this topic.
>>
>> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
>> 27)[1,2] works for them. Does this work for Ironic? I understand not all
>> Ironic members will join this discussion, so I hope we can arrange a
>> convenient date for you two at least and, hopefully, for those
>> interested in this topic.
>>
>> [1]
>>
>> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
>> [2] https://ptg.opendev.org/ptg.html
>>
>> Thanks,
>> Hiromu Asahina
>>
>> On 2023/03/17 23:29, Julia Kreger wrote:
>>> I'm not sure how many Ironic contributors would be the ones to attend a
>>> discussion, in part because this is disjointed from the items they need
>> to
>>> focus on. It is much more of a "big picture" item for those of us who are
>>> leaders in the project.
>>>
>>> I think it would help to understand how much time you expect the
>> discussion
>>> to take to determine a path forward and how we can collaborate. Ironic
>> has
>>> a huge number of topics we want to discuss during the PTG, and I suspect
>>> our team meeting on Monday next week should yield more interest/awareness
>>> as well as an amount of time for each topic which will aid us in
>> scheduling.
>>>
>>> If you can let us know how long, then I think we can figure out when the
>>> best day/time will be.
>>>
>>> Thanks!
>>>
>>> -Julia
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
>>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>>
>>>> Thank you for your reply.
>>>>
>>>> I'd like to decide the time slot for this topic.
>>>> I just checked PTG schedule [1].
>>>>
>>>> We have the following time slots. Which one is convenient to gether?
>>>> (I didn't get reply but I listed Barbican, as its cores are almost the
>>>> same as Keystone)
>>>>
>>>> Mon, 27:
>>>>
>>>> - 14 (keystone)
>>>> - 15 (keystone)
>>>>
>>>> Tue, 28
>>>>
>>>> - 13 (barbican)
>>>> - 14 (keystone, ironic)
>>>> - 15 (keysonte, ironic)
>>>> - 16 (ironic)
>>>>
>>>> Wed, 29
>>>>
>>>> - 13 (ironic)
>>>> - 14 (keystone, ironic)
>>>> - 15 (keystone, ironic)
>>>> - 21 (ironic)
>>>>
>>>> Thanks,
>>>>
>>>> [1] https://ptg.opendev.org/ptg.html
>>>>
>>>> Hiromu Asahina
>>>>
>>>>
>>>> On 2023/02/11 1:41, Jay Faulkner wrote:
>>>>> I think it's safe to say the Ironic community would be very invested in
>>>>> such an effort. Let's make sure the time chosen for vPTG with this is
>>>> such
>>>>> that Ironic contributors can attend as well.
>>>>>
>>>>> Thanks,
>>>>> Jay Faulkner
>>>>> Ironic PTL
>>>>>
>>>>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
>>>>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>>>>
>>>>>> Hello Everyone,
>>>>>>
>>>>>> Recently, Tacker and Keystone have been working together on a new
>>>> Keystone
>>>>>> Middleware that can work with external authentication
>>>>>> services, such as Keycloak. The code has already been submitted [1],
>> but
>>>>>> we want to make this middleware a generic plugin that works
>>>>>> with as many OpenStack services as possible. To that end, we would
>> like
>>>> to
>>>>>> hear from other projects with similar use cases
>>>>>> (especially Ironic and Barbican, which run as standalone services). We
>>>>>> will make a time slot to discuss this topic at the next vPTG.
>>>>>> Please contact me if you are interested and available to participate.
>>>>>>
>>>>>> [1]
>> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>>>>>>
>>>>>> --
>>>>>> Hiromu Asahina
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> ?-------------------------------------?
>>>>       NTT Network Innovation Center
>>>>         Hiromu Asahina
>>>>      -------------------------------------
>>>>       3-9-11, Midori-cho, Musashino-shi
>>>>         Tokyo 180-8585, Japan
>>>> Phone: +81-422-59-7008
>>>> Email: hiromu.asahina.az at hco.ntt.co.jp
>>>> ?-------------------------------------?
>>>>
>>>>
>>>
>>
>> --
>> ?-------------------------------------?
>>      NTT Network Innovation Center
>>        Hiromu Asahina
>>     -------------------------------------
>>      3-9-11, Midori-cho, Musashino-shi
>>        Tokyo 180-8585, Japan
>> Phone: +81-422-59-7008
>> Email: hiromu.asahina.az at hco.ntt.co.jp
>> ?-------------------------------------?
>>
>>
> 

-- 
?-------------------------------------?
    NTT Network Innovation Center
      Hiromu Asahina
   -------------------------------------
    3-9-11, Midori-cho, Musashino-shi
      Tokyo 180-8585, Japan
? Phone: +81-422-59-7008
? Email: hiromu.asahina.az at hco.ntt.co.jp
?-------------------------------------?


From mnasiadka at gmail.com  Wed Mar 22 11:09:33 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Wed, 22 Mar 2023 12:09:33 +0100
Subject: [kolla] next weekly meeting cancelled
Message-ID: <CFF29D6D-00B3-4AE9-8B7A-09E7F8FAA5E5@gmail.com>

Hello Koalas,

Next weekly meeting (29th March) is cancelled because of PTG on Mon/Tue/Thu - let?s meet there!

Michal

From swogatpradhan22 at gmail.com  Wed Mar 22 11:25:08 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 22 Mar 2023 16:55:08 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
Message-ID: <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>

Update:
Here is the log when creating a volume using cirros image:

2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Volume
bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
specification: {'status': 'creating', 'volume_name':
'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
[{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
'553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
tzinfo=datetime.timezone.utc), 'locations': [{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
'metadata': {'store': 'dcn02'}}], 'direct_url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
'owner_specified.openstack.object': 'images/cirros',
'owner_specified.openstack.sha256': ''}}, 'image_service':
<cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
2023-03-22 11:07:54.023 109 WARNING py.warnings
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -]
/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
FutureWarning: The human format is deprecated and the format parameter will
be removed. Use explicitly json instead in version 'xena'
  category=FutureWarning)

2023-03-22 11:11:12.161 109 WARNING py.warnings
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -]
/usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
FutureWarning: The human format is deprecated and the format parameter will
be removed. Use explicitly json instead in version 'xena'
  category=FutureWarning)

2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
MB/s
2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Volume
volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
(bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
[req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.

The image is present in dcn02 store but still it downloaded the image in
0.16 MB/s and then created the volume.

With regards,
Swogat Pradhan

On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Jhon,
> This seems to be an issue.
> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
> parameter was specified to the respective cluster names but the config
> files were created in the name of ceph.conf and keyring was
> ceph.client.openstack.keyring.
>
> Which created issues in glance as well as the naming convention of the
> files didn't match the cluster names, so i had to manually rename the
> central ceph conf file as such:
>
> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
> [root at dcn02-compute-0 ceph]# ll
> total 16
> -rw-------. 1 root root 257 Mar 13 13:56
> ceph_central.client.openstack.keyring
> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
> [root at dcn02-compute-0 ceph]#
>
> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
> respective clusters in both dcn01 and dcn02.
> In the above cli output, the ceph.conf and ceph.client... are the files
> used to access dcn02 ceph cluster and ceph_central* files are used in for
> accessing central ceph cluster.
>
> glance multistore config:
> [dcn02]
> rbd_store_ceph_conf=/etc/ceph/ceph.conf
> rbd_store_user=openstack
> rbd_store_pool=images
> rbd_thin_provisioning=False
> store_description=dcn02 rbd glance store
>
> [ceph_central]
> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
> rbd_store_user=openstack
> rbd_store_pool=images
> rbd_thin_provisioning=False
> store_description=Default glance store backend.
>
>
> With regards,
> Swogat Pradhan
>
> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:
>
>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>> <swogatpradhan22 at gmail.com> wrote:
>> >
>> > Hi,
>> > Seems like cinder is not using the local ceph.
>>
>> That explains the issue. It's a misconfiguration.
>>
>> I hope this is not a production system since the mailing list now has
>> the cinder.conf which contains passwords.
>>
>> The section that looks like this:
>>
>> [tripleo_ceph]
>> volume_backend_name=tripleo_ceph
>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>> rbd_ceph_conf=/etc/ceph/ceph.conf
>> rbd_user=openstack
>> rbd_pool=volumes
>> rbd_flatten_volume_from_snapshot=False
>> rbd_secret_uuid=<redacted>
>> report_discard_supported=True
>>
>> Should be updated to refer to the local DCN ceph cluster and not the
>> central one. Use the ceph conf file for that cluster and ensure the
>> rbd_secret_uuid corresponds to that one.
>>
>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>> Ceph cluster. The FSID should be in the ceph.conf file. The
>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>> libvirt can retrieve the cephx secret using the FSID as a key. This
>> can be confirmed with `podman exec nova_virtsecretd virsh
>> secret-get-value $FSID`.
>>
>> The documentation describes how to configure the central and DCN sites
>> correctly but an error seems to have occurred while you were following
>> it.
>>
>>
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>
>>   John
>>
>> >
>> > Ceph Output:
>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>> > NAME                                       SIZE     PARENT  FMT  PROT
>> LOCK
>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>> excl
>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>> >
>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>> > NAME                                         SIZE     PARENT  FMT
>> PROT  LOCK
>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>> > [ceph: root at dcn02-ceph-all-0 /]#
>> >
>> > Attached the cinder config.
>> > Please let me know how I can solve this issue.
>> >
>> > With regards,
>> > Swogat Pradhan
>> >
>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>> wrote:
>> >>
>> >> in my last message under the line "On a DCN site if you run a command
>> like this:" I suggested some steps you could try to confirm the image is a
>> COW from the local glance as well as how to look at your cinder config.
>> >>
>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>>
>> >>> Update:
>> >>> I uploaded an image directly to the dcn02 store, and it takes around
>> 10,15 minutes to create a volume with image in dcn02.
>> >>> The image size is 389 MB.
>> >>>
>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>>>
>> >>>> Hi Jhon,
>> >>>> I checked in the ceph od dcn02, I can see the images created after
>> importing from the central site.
>> >>>> But launching an instance normally fails as it takes a long time for
>> the volume to get created.
>> >>>>
>> >>>> When launching an instance from volume the instance is getting
>> created properly without any errors.
>> >>>>
>> >>>> I tried to cache images in nova using
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>> but getting checksum failed error.
>> >>>>
>> >>>> With regards,
>> >>>> Swogat Pradhan
>> >>>>
>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>> wrote:
>> >>>>>
>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>> >>>>> >
>> >>>>> > Update: After restarting the nova services on the controller and
>> running the deploy script on the edge site, I was able to launch the VM
>> from volume.
>> >>>>> >
>> >>>>> > Right now the instance creation is failing as the block device
>> creation is stuck in creating state, it is taking more than 10 mins for the
>> volume to be created, whereas the image has already been imported to the
>> edge glance.
>> >>>>>
>> >>>>> Try following this document and making the same observations in your
>> >>>>> environment for AZs and their local ceph cluster.
>> >>>>>
>> >>>>>
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>> >>>>>
>> >>>>> On a DCN site if you run a command like this:
>> >>>>>
>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>> >>>>> NAME                                      SIZE  PARENT
>> >>>>>                           FMT PROT LOCK
>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>> >>>>> $
>> >>>>>
>> >>>>> Then, you should see the parent of the volume is the image which is
>> on
>> >>>>> the same local ceph cluster.
>> >>>>>
>> >>>>> I wonder if something is misconfigured and thus you're encountering
>> >>>>> the streaming behavior described here:
>> >>>>>
>> >>>>> Ideally all images should reside in the central Glance and be copied
>> >>>>> to DCN sites before instances of those images are booted on DCN
>> sites.
>> >>>>> If an image is not copied to a DCN site before it is booted, then
>> the
>> >>>>> image will be streamed to the DCN site and then the image will boot
>> as
>> >>>>> an instance. This happens because Glance at the DCN site has access
>> to
>> >>>>> the images store at the Central ceph cluster. Though the booting of
>> >>>>> the image will take time because it has not been copied in advance,
>> >>>>> this is still preferable to failing to boot the image.
>> >>>>>
>> >>>>> You can also exec into the cinder container at the DCN site and
>> >>>>> confirm it's using it's local ceph cluster.
>> >>>>>
>> >>>>>   John
>> >>>>>
>> >>>>> >
>> >>>>> > I will try and create a new fresh image and test again then
>> update.
>> >>>>> >
>> >>>>> > With regards,
>> >>>>> > Swogat Pradhan
>> >>>>> >
>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>>>> >>
>> >>>>> >> Update:
>> >>>>> >> In the hypervisor list the compute node state is showing down.
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com> wrote:
>> >>>>> >>>
>> >>>>> >>> Hi Brendan,
>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>> >>>>> >>> I used a cirros image to launch instance but the instance timed
>> out so i waited for the volume to be created.
>> >>>>> >>> Once the volume was created i tried launching the instance from
>> the volume and still the instance is stuck in spawning state.
>> >>>>> >>>
>> >>>>> >>> Here is the nova-compute log:
>> >>>>> >>>
>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>> privsep daemon starting
>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>> privsep process running with uid/gid: 0/0
>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>> privsep process running with capabilities (eff/prm/inh):
>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>> privsep daemon running as pid 185437
>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>> os_brick.initiator.connectors.nvmeof
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>> in _get_host_uuid: Unexpected error while running command.
>> >>>>> >>> Command: blkid overlay -s UUID -o value
>> >>>>> >>> Exit code: 2
>> >>>>> >>> Stdout: ''
>> >>>>> >>> Stderr: '':
>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>> running command.
>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>> >>>>> >>>
>> >>>>> >>> It is stuck in creating image, do i need to run the template
>> mentioned here ?:
>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>> >>>>> >>>
>> >>>>> >>> The volume is already created and i do not understand why the
>> instance is stuck in spawning state.
>> >>>>> >>>
>> >>>>> >>> With regards,
>> >>>>> >>> Swogat Pradhan
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>> bshephar at redhat.com> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Does your environment use different network interfaces for
>> each of the networks? Or does it have a bond with everything on it?
>> >>>>> >>>>
>> >>>>> >>>> One issue I have seen before is that when launching instances,
>> there is a lot of network traffic between nodes as the hypervisor needs to
>> download the image from Glance. Along with various other services sending
>> normal network traffic, it can be enough to cause issues if everything is
>> running over a single 1Gbe interface.
>> >>>>> >>>>
>> >>>>> >>>> I have seen the same situation in fact when using a single
>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>> while you try to spawn the instance to see if you?re dropping packets. In
>> the situation I described, there were dropped packets which resulted in a
>> loss of communication between nova_compute and RMQ, so the node appeared
>> offline. You should also confirm that nova_compute is being disconnected in
>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>> instance.
>> >>>>> >>>>
>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
>> based on that experience, from my perspective, is certainly sounds like
>> some kind of network issue.
>> >>>>> >>>>
>> >>>>> >>>> Regards,
>> >>>>> >>>>
>> >>>>> >>>> Brendan Shephard
>> >>>>> >>>> Senior Software Engineer
>> >>>>> >>>> Red Hat Australia
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Hi,
>> >>>>> >>>>
>> >>>>> >>>> I tried to help someone with a similar issue some time ago in
>> this thread:
>> >>>>> >>>>
>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>> >>>>> >>>>
>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>> user, not sure if that could apply here. But is it possible that your nova
>> and neutron versions are different between central and edge site? Have you
>> restarted nova and neutron services on the compute nodes after
>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>> Maybe they can help narrow down the issue.
>> >>>>> >>>> If there isn't any additional information in the debug logs I
>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>> production system yet so be careful. I can think of two routes:
>> >>>>> >>>>
>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>> running, this will most likely impact client IO depending on your load.
>> Check out the rabbitmqctl commands.
>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from
>> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>> >>>>> >>>>
>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>> replicated across the rabbit nodes. But I don't really know the rabbit
>> internals too well, so maybe someone else can chime in here and give a
>> better advice.
>> >>>>> >>>>
>> >>>>> >>>> Regards,
>> >>>>> >>>> Eugen
>> >>>>> >>>>
>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>> >>>>> >>>>
>> >>>>> >>>> Hi,
>> >>>>> >>>> Can someone please help me out on this issue?
>> >>>>> >>>>
>> >>>>> >>>> With regards,
>> >>>>> >>>> Swogat Pradhan
>> >>>>> >>>>
>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>>>> >>>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Hi
>> >>>>> >>>> I don't see any major packet loss.
>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not
>> due to packet
>> >>>>> >>>> loss.
>> >>>>> >>>>
>> >>>>> >>>> with regards,
>> >>>>> >>>> Swogat Pradhan
>> >>>>> >>>>
>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>> swogatpradhan22 at gmail.com>
>> >>>>> >>>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Hi,
>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked
>> when
>> >>>>> >>>> launching the instance.
>> >>>>> >>>> I will check that and come back.
>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at
>> spawning
>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
>> packet loss
>> >>>>> >>>> causes this.
>> >>>>> >>>>
>> >>>>> >>>> With regards,
>> >>>>> >>>> Swogat pradhan
>> >>>>> >>>>
>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical
>> between
>> >>>>> >>>> central and edge site? Do you see packet loss through the
>> tunnel?
>> >>>>> >>>>
>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>> >>>>> >>>>
>> >>>>> >>>> > Hi Eugen,
>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as
>> i am not
>> >>>>> >>>> > getting email's from you.
>> >>>>> >>>> > Coming to the issue:
>> >>>>> >>>> >
>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>> list_policies -p
>> >>>>> >>>> /
>> >>>>> >>>> > Listing policies for vhost "/" ...
>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>> priority
>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>> >>>>> >>>> >
>> >>>>> >>>>
>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>> >>>>> >>>> >
>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>> when i am
>> >>>>> >>>> trying
>> >>>>> >>>> > to launch an instance and the instance comes to a spawning
>> state and
>> >>>>> >>>> then
>> >>>>> >>>> > gets stuck.
>> >>>>> >>>> >
>> >>>>> >>>> > I have a tunnel setup between the central and the edge sites.
>> >>>>> >>>> >
>> >>>>> >>>> > With regards,
>> >>>>> >>>> > Swogat Pradhan
>> >>>>> >>>> >
>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>> >>>>> >>>> swogatpradhan22 at gmail.com>
>> >>>>> >>>> > wrote:
>> >>>>> >>>> >
>> >>>>> >>>> >> Hi Eugen,
>> >>>>> >>>> >> For some reason i am not getting your email to me directly,
>> i am
>> >>>>> >>>> checking
>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>> >>>>> >>>> >>
>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>> activities in the
>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>> >>>>> >>>> >>
>> >>>>> >>>> >> With regards,
>> >>>>> >>>> >> Swogat Pradhan
>> >>>>> >>>> >>
>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>> >>>>> >>>> swogatpradhan22 at gmail.com>
>> >>>>> >>>> >> wrote:
>> >>>>> >>>> >>
>> >>>>> >>>> >>> Hi Eugen,
>> >>>>> >>>> >>> Thanks for your response.
>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>> details:
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> *PCS Status:*
>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>> >>>>> >>>> >>>
>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>> >>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>> >>>>> >>>> Started
>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>> >>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>> >>>>> >>>> Started
>> >>>>> >>>> >>> overcloud-controller-2
>> >>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>> >>>>> >>>> Started
>> >>>>> >>>> >>> overcloud-controller-1
>> >>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>> >>>>> >>>> Started
>> >>>>> >>>> >>> overcloud-controller-0
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the
>> issue is
>> >>>>> >>>> still
>> >>>>> >>>> >>> present.
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> *Cluster status:*
>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>> cluster_status
>> >>>>> >>>> >>> Cluster status of node
>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>> >>>>> >>>> >>> Basics
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Cluster name:
>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Disk Nodes
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>>>> >>>> >>>
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Running Nodes
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>> >>>>> >>>> >>>
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Versions
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>> RabbitMQ
>> >>>>> >>>> 3.8.3
>> >>>>> >>>> >>> on Erlang 22.3.4.1
>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>> RabbitMQ
>> >>>>> >>>> 3.8.3
>> >>>>> >>>> >>> on Erlang 22.3.4.1
>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>> RabbitMQ
>> >>>>> >>>> 3.8.3
>> >>>>> >>>> >>> on Erlang 22.3.4.1
>> >>>>> >>>> >>>
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>> >>>>> >>>> RabbitMQ
>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Alarms
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> (none)
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Network Partitions
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> (none)
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Listeners
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>> inter-node and CLI
>> >>>>> >>>> tool
>> >>>>> >>>> >>> communication
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP
>> 0-9-1
>> >>>>> >>>> >>> and AMQP 1.0
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>> inter-node and CLI
>> >>>>> >>>> tool
>> >>>>> >>>> >>> communication
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP
>> 0-9-1
>> >>>>> >>>> >>> and AMQP 1.0
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>> inter-node and CLI
>> >>>>> >>>> tool
>> >>>>> >>>> >>> communication
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP
>> 0-9-1
>> >>>>> >>>> >>> and AMQP 1.0
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>> >>>>> >>>> interface:
>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>>> >>>> ,
>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>> purpose:
>> >>>>> >>>> inter-node and
>> >>>>> >>>> >>> CLI tool communication
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>>> >>>> ,
>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>> purpose: AMQP
>> >>>>> >>>> 0-9-1
>> >>>>> >>>> >>> and AMQP 1.0
>> >>>>> >>>> >>> Node:
>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>> >>>>> >>>> ,
>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>> HTTP API
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Feature flags
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> *Logs:*
>> >>>>> >>>> >>> *(Attached)*
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> With regards,
>> >>>>> >>>> >>> Swogat Pradhan
>> >>>>> >>>> >>>
>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>> >>>>> >>>> swogatpradhan22 at gmail.com>
>> >>>>> >>>> >>> wrote:
>> >>>>> >>>> >>>
>> >>>>> >>>> >>>> Hi,
>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>> >>>>> >>>> >>>>
>> >>>>> >>>> >>>> nova-conuctor:
>> >>>>> >>>> >>>>
>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The
>> reply
>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
>> seconds
>> >>>>> >>>> due to a
>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>> >>>>> >>>> Abandoning...:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>> reply
>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
>> seconds
>> >>>>> >>>> due to a
>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>>> >>>> Abandoning...:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>> reply
>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
>> seconds
>> >>>>> >>>> due to a
>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>>> >>>> Abandoning...:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache
>> enabled
>> >>>>> >>>> with
>> >>>>> >>>> >>>> backend dogpile.cache.null.
>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>> drop reply to
>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>> oslo_messaging._drivers.amqpdriver
>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>> reply
>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
>> seconds
>> >>>>> >>>> due to a
>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>> >>>>> >>>> Abandoning...:
>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>> >>>>> >>>> >>>>
>> >>>>> >>>> >>>> With regards,
>> >>>>> >>>> >>>> Swogat Pradhan
>> >>>>> >>>> >>>>
>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>> >>>>> >>>> >>>>
>> >>>>> >>>> >>>>> Hi,
>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i
>> am trying to
>> >>>>> >>>> >>>>> launch vm's.
>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>> (openstack
>> >>>>> >>>> compute
>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the
>> nova
>> >>>>> >>>> compute
>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>> nova-compute.log
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>> Running
>> >>>>> >>>> >>>>> instance usage
>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26
>> 07:00:00
>> >>>>> >>>> to
>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> [instance:
>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful
>> on node
>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> [instance:
>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied
>> device
>> >>>>> >>>> name:
>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> [instance:
>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> Cache enabled
>> >>>>> >>>> with
>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> Running
>> >>>>> >>>> >>>>> privsep helper:
>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>> >>>>> >>>> 'privsep-helper',
>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> Spawned new
>> >>>>> >>>> privsep
>> >>>>> >>>> >>>>> daemon via rootwrap
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon
>> [-] privsep
>> >>>>> >>>> >>>>> daemon starting
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon
>> [-] privsep
>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>> [-] privsep
>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>> [-] privsep
>> >>>>> >>>> >>>>> daemon running as pid 2647
>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> Process
>> >>>>> >>>> >>>>> execution error
>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>> command.
>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>> >>>>> >>>> >>>>> Exit code: 2
>> >>>>> >>>> >>>>> Stdout: ''
>> >>>>> >>>> >>>>> Stderr: '':
>> oslo_concurrency.processutils.ProcessExecutionError:
>> >>>>> >>>> >>>>> Unexpected error while running command.
>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>> [instance:
>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>> With regards,
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>> Swogat Pradhan
>> >>>>> >>>> >>>>>
>> >>>>> >>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/327fd435/attachment-0001.htm>

From asimkon at otenet.gr  Wed Mar 22 12:40:05 2023
From: asimkon at otenet.gr (Konstantinos Asimakopoulos)
Date: Wed, 22 Mar 2023 14:40:05 +0200
Subject: App Deployment to OpenStack (Alfresco Community)
Message-ID: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr>

 
Hello! 

I am new to OpenStack technology (cloud) and of course willing to dive
into this interesting infrastructure. I would like to get information
(step by step wiki) on how to deploy open source applications especially
Alfresco Community Edition 

https://www.alfresco.com/ecm-software/alfresco-community-editions 

to openstack cloud and operate it globally. Is that possible and how? 

That would help me a lot! 

Regards
Kostas 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/984b26ff/attachment.htm>

From nguyenhuukhoinw at gmail.com  Wed Mar 22 13:45:00 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 22 Mar 2023 20:45:00 +0700
Subject: App Deployment to OpenStack (Alfresco Community)
In-Reply-To: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr>
References: <7013fdcfdeb17cbf174585edb769f5de@otenet.gr>
Message-ID: <CABAODRdHOUjOqv5LgWdmijOVvTHbmDh_=RPqHYWWZgh_9HUfTg@mail.gmail.com>

I think just like install and config application on vm.

On Wed, Mar 22, 2023, 8:41 PM Konstantinos Asimakopoulos <asimkon at otenet.gr>
wrote:

> Hello!
>
> I am new to OpenStack technology (cloud) and of course willing to dive
> into this interesting infrastructure. I would like to get information (step
> by step wiki) on how to deploy open source applications especially
> Alfresco Community Edition
>
> https://www.alfresco.com/ecm-software/alfresco-community-editions
>
> to openstack cloud and operate it globally. Is that possible and how?
>
>
> That would help me a lot!
>
>
> Regards
> Kostas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/b7070024/attachment.htm>

From swogatpradhan22 at gmail.com  Wed Mar 22 13:41:50 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 22 Mar 2023 19:11:50 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
Message-ID: <CAH0LXPo=5jx+MNvjGimFgD6r2oy6XMYt-9y4+Tgw=N2iRT5pWw@mail.gmail.com>

Hi Jhon,
After some changes i feel like the cinder is now trying to pull the image
from local glance as i am getting the following error in cinder-colume log:

2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
finding address for
http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
Unable to establish connection to
http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
ECONNREFUSED',))

As the endpoint it is trying to reach is the dcn02 IP address.

But when i check the ports i don't find the port 9292 running:
[root at dcn02-compute-2 ceph]# netstat -nultp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
    PID/Program name
tcp        0      0 0.0.0.0:2022            0.0.0.0:*               LISTEN
     656800/sshd
tcp        0      0 127.0.0.1:199           0.0.0.0:*               LISTEN
     4878/snmpd
tcp        0      0 172.25.228.253:2379     0.0.0.0:*               LISTEN
     6232/etcd
tcp        0      0 172.25.228.253:2380     0.0.0.0:*               LISTEN
     6232/etcd
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN
     1/systemd
tcp        0      0 127.0.0.1:6640          0.0.0.0:*               LISTEN
     2779/ovsdb-server
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
     4918/sshd
tcp6       0      0 :::2022                 :::*                    LISTEN
     656800/sshd
tcp6       0      0 :::111                  :::*                    LISTEN
     1/systemd
tcp6       0      0 :::22                   :::*                    LISTEN
     4918/sshd
udp        0      0 0.0.0.0:111             0.0.0.0:*
    1/systemd
udp        0      0 0.0.0.0:161             0.0.0.0:*
    4878/snmpd
udp        0      0 127.0.0.1:323           0.0.0.0:*
    2609/chronyd
udp        0      0 0.0.0.0:6081            0.0.0.0:*
    -
udp6       0      0 :::111                  :::*
     1/systemd
udp6       0      0 ::1:161                 :::*
     4878/snmpd
udp6       0      0 ::1:323                 :::*
     2609/chronyd
udp6       0      0 :::6081                 :::*
     -

I see in the glance-api.conf that bind port parameter is set to 9292 but
the port is not listed in netstat command.
Can you please guide me in getting this port up and running as i feel like
this would solve the issue i am facing right now.

With regards,
Swogat Pradhan

On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Update:
> Here is the log when creating a volume using cirros image:
>
> 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
> 2023-03-22 11:07:54.023 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
>   category=FutureWarning)
>
> 2023-03-22 11:11:12.161 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
>   category=FutureWarning)
>
> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
> MB/s
> 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>
> The image is present in dcn02 store but still it downloaded the image in
> 0.16 MB/s and then created the volume.
>
> With regards,
> Swogat Pradhan
>
> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Jhon,
>> This seems to be an issue.
>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>> parameter was specified to the respective cluster names but the config
>> files were created in the name of ceph.conf and keyring was
>> ceph.client.openstack.keyring.
>>
>> Which created issues in glance as well as the naming convention of the
>> files didn't match the cluster names, so i had to manually rename the
>> central ceph conf file as such:
>>
>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>> [root at dcn02-compute-0 ceph]# ll
>> total 16
>> -rw-------. 1 root root 257 Mar 13 13:56
>> ceph_central.client.openstack.keyring
>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>> [root at dcn02-compute-0 ceph]#
>>
>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>> respective clusters in both dcn01 and dcn02.
>> In the above cli output, the ceph.conf and ceph.client... are the files
>> used to access dcn02 ceph cluster and ceph_central* files are used in for
>> accessing central ceph cluster.
>>
>> glance multistore config:
>> [dcn02]
>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=dcn02 rbd glance store
>>
>> [ceph_central]
>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=Default glance store backend.
>>
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:
>>
>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>> <swogatpradhan22 at gmail.com> wrote:
>>> >
>>> > Hi,
>>> > Seems like cinder is not using the local ceph.
>>>
>>> That explains the issue. It's a misconfiguration.
>>>
>>> I hope this is not a production system since the mailing list now has
>>> the cinder.conf which contains passwords.
>>>
>>> The section that looks like this:
>>>
>>> [tripleo_ceph]
>>> volume_backend_name=tripleo_ceph
>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>> rbd_user=openstack
>>> rbd_pool=volumes
>>> rbd_flatten_volume_from_snapshot=False
>>> rbd_secret_uuid=<redacted>
>>> report_discard_supported=True
>>>
>>> Should be updated to refer to the local DCN ceph cluster and not the
>>> central one. Use the ceph conf file for that cluster and ensure the
>>> rbd_secret_uuid corresponds to that one.
>>>
>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>> secret-get-value $FSID`.
>>>
>>> The documentation describes how to configure the central and DCN sites
>>> correctly but an error seems to have occurred while you were following
>>> it.
>>>
>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>
>>>   John
>>>
>>> >
>>> > Ceph Output:
>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>> > NAME                                       SIZE     PARENT  FMT  PROT
>>> LOCK
>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>> excl
>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>>> >
>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>> > NAME                                         SIZE     PARENT  FMT
>>> PROT  LOCK
>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>> >
>>> > Attached the cinder config.
>>> > Please let me know how I can solve this issue.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>> wrote:
>>> >>
>>> >> in my last message under the line "On a DCN site if you run a command
>>> like this:" I suggested some steps you could try to confirm the image is a
>>> COW from the local glance as well as how to look at your cinder config.
>>> >>
>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>
>>> >>> Update:
>>> >>> I uploaded an image directly to the dcn02 store, and it takes around
>>> 10,15 minutes to create a volume with image in dcn02.
>>> >>> The image size is 389 MB.
>>> >>>
>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>
>>> >>>> Hi Jhon,
>>> >>>> I checked in the ceph od dcn02, I can see the images created after
>>> importing from the central site.
>>> >>>> But launching an instance normally fails as it takes a long time
>>> for the volume to get created.
>>> >>>>
>>> >>>> When launching an instance from volume the instance is getting
>>> created properly without any errors.
>>> >>>>
>>> >>>> I tried to cache images in nova using
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> but getting checksum failed error.
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>>> wrote:
>>> >>>>>
>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >
>>> >>>>> > Update: After restarting the nova services on the controller and
>>> running the deploy script on the edge site, I was able to launch the VM
>>> from volume.
>>> >>>>> >
>>> >>>>> > Right now the instance creation is failing as the block device
>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>> volume to be created, whereas the image has already been imported to the
>>> edge glance.
>>> >>>>>
>>> >>>>> Try following this document and making the same observations in
>>> your
>>> >>>>> environment for AZs and their local ceph cluster.
>>> >>>>>
>>> >>>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>> >>>>>
>>> >>>>> On a DCN site if you run a command like this:
>>> >>>>>
>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>> >>>>> NAME                                      SIZE  PARENT
>>> >>>>>                           FMT PROT LOCK
>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>> >>>>> $
>>> >>>>>
>>> >>>>> Then, you should see the parent of the volume is the image which
>>> is on
>>> >>>>> the same local ceph cluster.
>>> >>>>>
>>> >>>>> I wonder if something is misconfigured and thus you're encountering
>>> >>>>> the streaming behavior described here:
>>> >>>>>
>>> >>>>> Ideally all images should reside in the central Glance and be
>>> copied
>>> >>>>> to DCN sites before instances of those images are booted on DCN
>>> sites.
>>> >>>>> If an image is not copied to a DCN site before it is booted, then
>>> the
>>> >>>>> image will be streamed to the DCN site and then the image will
>>> boot as
>>> >>>>> an instance. This happens because Glance at the DCN site has
>>> access to
>>> >>>>> the images store at the Central ceph cluster. Though the booting of
>>> >>>>> the image will take time because it has not been copied in advance,
>>> >>>>> this is still preferable to failing to boot the image.
>>> >>>>>
>>> >>>>> You can also exec into the cinder container at the DCN site and
>>> >>>>> confirm it's using it's local ceph cluster.
>>> >>>>>
>>> >>>>>   John
>>> >>>>>
>>> >>>>> >
>>> >>>>> > I will try and create a new fresh image and test again then
>>> update.
>>> >>>>> >
>>> >>>>> > With regards,
>>> >>>>> > Swogat Pradhan
>>> >>>>> >
>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>
>>> >>>>> >> Update:
>>> >>>>> >> In the hypervisor list the compute node state is showing down.
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>>
>>> >>>>> >>> Hi Brendan,
>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>> timed out so i waited for the volume to be created.
>>> >>>>> >>> Once the volume was created i tried launching the instance
>>> from the volume and still the instance is stuck in spawning state.
>>> >>>>> >>>
>>> >>>>> >>> Here is the nova-compute log:
>>> >>>>> >>>
>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>>> privsep daemon starting
>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>>> privsep process running with uid/gid: 0/0
>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>> privsep process running with capabilities (eff/prm/inh):
>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>> privsep daemon running as pid 185437
>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>> >>>>> >>> Exit code: 2
>>> >>>>> >>> Stdout: ''
>>> >>>>> >>> Stderr: '':
>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>> running command.
>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>> >>>>> >>>
>>> >>>>> >>> It is stuck in creating image, do i need to run the template
>>> mentioned here ?:
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> >>>>> >>>
>>> >>>>> >>> The volume is already created and i do not understand why the
>>> instance is stuck in spawning state.
>>> >>>>> >>>
>>> >>>>> >>> With regards,
>>> >>>>> >>> Swogat Pradhan
>>> >>>>> >>>
>>> >>>>> >>>
>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>> bshephar at redhat.com> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Does your environment use different network interfaces for
>>> each of the networks? Or does it have a bond with everything on it?
>>> >>>>> >>>>
>>> >>>>> >>>> One issue I have seen before is that when launching
>>> instances, there is a lot of network traffic between nodes as the
>>> hypervisor needs to download the image from Glance. Along with various
>>> other services sending normal network traffic, it can be enough to cause
>>> issues if everything is running over a single 1Gbe interface.
>>> >>>>> >>>>
>>> >>>>> >>>> I have seen the same situation in fact when using a single
>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>> while you try to spawn the instance to see if you?re dropping packets. In
>>> the situation I described, there were dropped packets which resulted in a
>>> loss of communication between nova_compute and RMQ, so the node appeared
>>> offline. You should also confirm that nova_compute is being disconnected in
>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>> instance.
>>> >>>>> >>>>
>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
>>> based on that experience, from my perspective, is certainly sounds like
>>> some kind of network issue.
>>> >>>>> >>>>
>>> >>>>> >>>> Regards,
>>> >>>>> >>>>
>>> >>>>> >>>> Brendan Shephard
>>> >>>>> >>>> Senior Software Engineer
>>> >>>>> >>>> Red Hat Australia
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>>
>>> >>>>> >>>> I tried to help someone with a similar issue some time ago in
>>> this thread:
>>> >>>>> >>>>
>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>> >>>>> >>>>
>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>> user, not sure if that could apply here. But is it possible that your nova
>>> and neutron versions are different between central and edge site? Have you
>>> restarted nova and neutron services on the compute nodes after
>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>> Maybe they can help narrow down the issue.
>>> >>>>> >>>> If there isn't any additional information in the debug logs I
>>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>>> production system yet so be careful. I can think of two routes:
>>> >>>>> >>>>
>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>> running, this will most likely impact client IO depending on your load.
>>> Check out the rabbitmqctl commands.
>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from
>>> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>> >>>>> >>>>
>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>> internals too well, so maybe someone else can chime in here and give a
>>> better advice.
>>> >>>>> >>>>
>>> >>>>> >>>> Regards,
>>> >>>>> >>>> Eugen
>>> >>>>> >>>>
>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>> Can someone please help me out on this issue?
>>> >>>>> >>>>
>>> >>>>> >>>> With regards,
>>> >>>>> >>>> Swogat Pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi
>>> >>>>> >>>> I don't see any major packet loss.
>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not
>>> due to packet
>>> >>>>> >>>> loss.
>>> >>>>> >>>>
>>> >>>>> >>>> with regards,
>>> >>>>> >>>> Swogat Pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked
>>> when
>>> >>>>> >>>> launching the instance.
>>> >>>>> >>>> I will check that and come back.
>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at
>>> spawning
>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
>>> packet loss
>>> >>>>> >>>> causes this.
>>> >>>>> >>>>
>>> >>>>> >>>> With regards,
>>> >>>>> >>>> Swogat pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical
>>> between
>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>> tunnel?
>>> >>>>> >>>>
>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>> >>>>
>>> >>>>> >>>> > Hi Eugen,
>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc'
>>> as i am not
>>> >>>>> >>>> > getting email's from you.
>>> >>>>> >>>> > Coming to the issue:
>>> >>>>> >>>> >
>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>> list_policies -p
>>> >>>>> >>>> /
>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>> priority
>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >>>>> >>>> >
>>> >>>>> >>>>
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >>>>> >>>> >
>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>>> when i am
>>> >>>>> >>>> trying
>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning
>>> state and
>>> >>>>> >>>> then
>>> >>>>> >>>> > gets stuck.
>>> >>>>> >>>> >
>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>> sites.
>>> >>>>> >>>> >
>>> >>>>> >>>> > With regards,
>>> >>>>> >>>> > Swogat Pradhan
>>> >>>>> >>>> >
>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> > wrote:
>>> >>>>> >>>> >
>>> >>>>> >>>> >> Hi Eugen,
>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>> directly, i am
>>> >>>>> >>>> checking
>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>> activities in the
>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> With regards,
>>> >>>>> >>>> >> Swogat Pradhan
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> >> wrote:
>>> >>>>> >>>> >>
>>> >>>>> >>>> >>> Hi Eugen,
>>> >>>>> >>>> >>> Thanks for your response.
>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>> details:
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *PCS Status:*
>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>>>> >>>> >>>
>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>> >>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>> >>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-2
>>> >>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-1
>>> >>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-0
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the
>>> issue is
>>> >>>>> >>>> still
>>> >>>>> >>>> >>> present.
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *Cluster status:*
>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>> cluster_status
>>> >>>>> >>>> >>> Cluster status of node
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> ...
>>> >>>>> >>>> >>> Basics
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Cluster name:
>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Disk Nodes
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Running Nodes
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Versions
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> >>>>> >>>> RabbitMQ
>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Alarms
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> (none)
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Network Partitions
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> (none)
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Listeners
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>> purpose:
>>> >>>>> >>>> inter-node and
>>> >>>>> >>>> >>> CLI tool communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>> purpose: AMQP
>>> >>>>> >>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>>> HTTP API
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Feature flags
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *Logs:*
>>> >>>>> >>>> >>> *(Attached)*
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> With regards,
>>> >>>>> >>>> >>> Swogat Pradhan
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> >>> wrote:
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>>> Hi,
>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> nova-conuctor:
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Cache enabled
>>> >>>>> >>>> with
>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> With regards,
>>> >>>>> >>>> >>>> Swogat Pradhan
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>>> Hi,
>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i
>>> am trying to
>>> >>>>> >>>> >>>>> launch vm's.
>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>> (openstack
>>> >>>>> >>>> compute
>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the
>>> nova
>>> >>>>> >>>> compute
>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> nova-compute.log
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>>> Running
>>> >>>>> >>>> >>>>> instance usage
>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>> 2023-02-26 07:00:00
>>> >>>>> >>>> to
>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful
>>> on node
>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied
>>> device
>>> >>>>> >>>> name:
>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>> volume
>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Cache enabled
>>> >>>>> >>>> with
>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Running
>>> >>>>> >>>> >>>>> privsep helper:
>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> >>>>> >>>> 'privsep-helper',
>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Spawned new
>>> >>>>> >>>> privsep
>>> >>>>> >>>> >>>>> daemon via rootwrap
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> daemon starting
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Process
>>> >>>>> >>>> >>>>> execution error
>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>> command.
>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>>> >>>> >>>>> Exit code: 2
>>> >>>>> >>>> >>>>> Stdout: ''
>>> >>>>> >>>> >>>>> Stderr: '':
>>> oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> With regards,
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> Swogat Pradhan
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/4ccca88f/attachment-0001.htm>

From johfulto at redhat.com  Wed Mar 22 13:46:05 2023
From: johfulto at redhat.com (John Fulton)
Date: Wed, 22 Mar 2023 09:46:05 -0400
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPo=5jx+MNvjGimFgD6r2oy6XMYt-9y4+Tgw=N2iRT5pWw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CAH0LXPo=5jx+MNvjGimFgD6r2oy6XMYt-9y4+Tgw=N2iRT5pWw@mail.gmail.com>
Message-ID: <CAE66OMDdOtYhGSAYUdG=hjZrA=323qurO5zckAGaPNJ6MYgg+w@mail.gmail.com>

On Wed, Mar 22, 2023 at 9:42?AM Swogat Pradhan
<swogatpradhan22 at gmail.com> wrote:
>
> Hi Jhon,
> After some changes i feel like the cinder is now trying to pull the image from local glance as i am getting the following error in cinder-colume log:
>
> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error finding address for http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: Unable to establish connection to http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538: HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
>
> As the endpoint it is trying to reach is the dcn02 IP address.
>
> But when i check the ports i don't find the port 9292 running:
> [root at dcn02-compute-2 ceph]# netstat -nultp
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
> tcp        0      0 0.0.0.0:2022            0.0.0.0:*               LISTEN      656800/sshd
> tcp        0      0 127.0.0.1:199           0.0.0.0:*               LISTEN      4878/snmpd
> tcp        0      0 172.25.228.253:2379     0.0.0.0:*               LISTEN      6232/etcd
> tcp        0      0 172.25.228.253:2380     0.0.0.0:*               LISTEN      6232/etcd
> tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd
> tcp        0      0 127.0.0.1:6640          0.0.0.0:*               LISTEN      2779/ovsdb-server
> tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      4918/sshd
> tcp6       0      0 :::2022                 :::*                    LISTEN      656800/sshd
> tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd
> tcp6       0      0 :::22                   :::*                    LISTEN      4918/sshd
> udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd
> udp        0      0 0.0.0.0:161             0.0.0.0:*                           4878/snmpd
> udp        0      0 127.0.0.1:323           0.0.0.0:*                           2609/chronyd
> udp        0      0 0.0.0.0:6081            0.0.0.0:*                           -
> udp6       0      0 :::111                  :::*                                1/systemd
> udp6       0      0 ::1:161                 :::*                                4878/snmpd
> udp6       0      0 ::1:323                 :::*                                2609/chronyd
> udp6       0      0 :::6081                 :::*                                -
>
> I see in the glance-api.conf that bind port parameter is set to 9292 but the port is not listed in netstat command.
> Can you please guide me in getting this port up and running as i feel like this would solve the issue i am facing right now.

Looks like your glance container stopped running. Ask podman to show
you all containers (including stopped ones) and investigate why the
glance container stopped.

>
> With regards,
> Swogat Pradhan
>
> On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>
>> Update:
>> Here is the log when creating a volume using cirros image:
>>
>> 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4, 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at': datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap', 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>> 2023-03-22 11:07:54.023 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena'
>>   category=FutureWarning)
>>
>> 2023-03-22 11:11:12.161 109 WARNING py.warnings [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75: FutureWarning: The human format is deprecated and the format parameter will be removed. Use explicitly json instead in version 'xena'
>>   category=FutureWarning)
>>
>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00 MB/s
>> 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>
>> The image is present in dcn02 store but still it downloaded the image in 0.16 MB/s and then created the volume.
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>
>>> Hi Jhon,
>>> This seems to be an issue.
>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster parameter was specified to the respective cluster names but the config files were created in the name of ceph.conf and keyring was ceph.client.openstack.keyring.
>>>
>>> Which created issues in glance as well as the naming convention of the files didn't match the cluster names, so i had to manually rename the central ceph conf file as such:
>>>
>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>> [root at dcn02-compute-0 ceph]# ll
>>> total 16
>>> -rw-------. 1 root root 257 Mar 13 13:56 ceph_central.client.openstack.keyring
>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>> [root at dcn02-compute-0 ceph]#
>>>
>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the respective clusters in both dcn01 and dcn02.
>>> In the above cli output, the ceph.conf and ceph.client... are the files used to access dcn02 ceph cluster and ceph_central* files are used in for accessing central ceph cluster.
>>>
>>> glance multistore config:
>>> [dcn02]
>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=dcn02 rbd glance store
>>>
>>> [ceph_central]
>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=Default glance store backend.
>>>
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:
>>>>
>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>> <swogatpradhan22 at gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> > Seems like cinder is not using the local ceph.
>>>>
>>>> That explains the issue. It's a misconfiguration.
>>>>
>>>> I hope this is not a production system since the mailing list now has
>>>> the cinder.conf which contains passwords.
>>>>
>>>> The section that looks like this:
>>>>
>>>> [tripleo_ceph]
>>>> volume_backend_name=tripleo_ceph
>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>> rbd_user=openstack
>>>> rbd_pool=volumes
>>>> rbd_flatten_volume_from_snapshot=False
>>>> rbd_secret_uuid=<redacted>
>>>> report_discard_supported=True
>>>>
>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>> rbd_secret_uuid corresponds to that one.
>>>>
>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>> secret-get-value $FSID`.
>>>>
>>>> The documentation describes how to configure the central and DCN sites
>>>> correctly but an error seems to have occurred while you were following
>>>> it.
>>>>
>>>>   https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>
>>>>   John
>>>>
>>>> >
>>>> > Ceph Output:
>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>> > NAME                                       SIZE     PARENT  FMT  PROT  LOCK
>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2        excl
>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>>>> >
>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>> > NAME                                         SIZE     PARENT  FMT  PROT  LOCK
>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>> >
>>>> > Attached the cinder config.
>>>> > Please let me know how I can solve this issue.
>>>> >
>>>> > With regards,
>>>> > Swogat Pradhan
>>>> >
>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com> wrote:
>>>> >>
>>>> >> in my last message under the line "On a DCN site if you run a command like this:" I suggested some steps you could try to confirm the image is a COW from the local glance as well as how to look at your cinder config.
>>>> >>
>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>> >>>
>>>> >>> Update:
>>>> >>> I uploaded an image directly to the dcn02 store, and it takes around 10,15 minutes to create a volume with image in dcn02.
>>>> >>> The image size is 389 MB.
>>>> >>>
>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>> >>>>
>>>> >>>> Hi Jhon,
>>>> >>>> I checked in the ceph od dcn02, I can see the images created after importing from the central site.
>>>> >>>> But launching an instance normally fails as it takes a long time for the volume to get created.
>>>> >>>>
>>>> >>>> When launching an instance from volume the instance is getting created properly without any errors.
>>>> >>>>
>>>> >>>> I tried to cache images in nova using https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html but getting checksum failed error.
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com> wrote:
>>>> >>>>>
>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >
>>>> >>>>> > Update: After restarting the nova services on the controller and running the deploy script on the edge site, I was able to launch the VM from volume.
>>>> >>>>> >
>>>> >>>>> > Right now the instance creation is failing as the block device creation is stuck in creating state, it is taking more than 10 mins for the volume to be created, whereas the image has already been imported to the edge glance.
>>>> >>>>>
>>>> >>>>> Try following this document and making the same observations in your
>>>> >>>>> environment for AZs and their local ceph cluster.
>>>> >>>>>
>>>> >>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>> >>>>>
>>>> >>>>> On a DCN site if you run a command like this:
>>>> >>>>>
>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>> >>>>> NAME                                      SIZE  PARENT
>>>> >>>>>                           FMT PROT LOCK
>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>> >>>>> $
>>>> >>>>>
>>>> >>>>> Then, you should see the parent of the volume is the image which is on
>>>> >>>>> the same local ceph cluster.
>>>> >>>>>
>>>> >>>>> I wonder if something is misconfigured and thus you're encountering
>>>> >>>>> the streaming behavior described here:
>>>> >>>>>
>>>> >>>>> Ideally all images should reside in the central Glance and be copied
>>>> >>>>> to DCN sites before instances of those images are booted on DCN sites.
>>>> >>>>> If an image is not copied to a DCN site before it is booted, then the
>>>> >>>>> image will be streamed to the DCN site and then the image will boot as
>>>> >>>>> an instance. This happens because Glance at the DCN site has access to
>>>> >>>>> the images store at the Central ceph cluster. Though the booting of
>>>> >>>>> the image will take time because it has not been copied in advance,
>>>> >>>>> this is still preferable to failing to boot the image.
>>>> >>>>>
>>>> >>>>> You can also exec into the cinder container at the DCN site and
>>>> >>>>> confirm it's using it's local ceph cluster.
>>>> >>>>>
>>>> >>>>>   John
>>>> >>>>>
>>>> >>>>> >
>>>> >>>>> > I will try and create a new fresh image and test again then update.
>>>> >>>>> >
>>>> >>>>> > With regards,
>>>> >>>>> > Swogat Pradhan
>>>> >>>>> >
>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>
>>>> >>>>> >> Update:
>>>> >>>>> >> In the hypervisor list the compute node state is showing down.
>>>> >>>>> >>
>>>> >>>>> >>
>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>>
>>>> >>>>> >>> Hi Brendan,
>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>> >>>>> >>> I used a cirros image to launch instance but the instance timed out so i waited for the volume to be created.
>>>> >>>>> >>> Once the volume was created i tried launching the instance from the volume and still the instance is stuck in spawning state.
>>>> >>>>> >>>
>>>> >>>>> >>> Here is the nova-compute log:
>>>> >>>>> >>>
>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-] privsep daemon starting
>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING os_brick.initiator.connectors.nvmeof [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command.
>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>> >>>>> >>> Exit code: 2
>>>> >>>>> >>> Stdout: ''
>>>> >>>>> >>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>> >>>>> >>>
>>>> >>>>> >>> It is stuck in creating image, do i need to run the template mentioned here ?: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>> >>>>> >>>
>>>> >>>>> >>> The volume is already created and i do not understand why the instance is stuck in spawning state.
>>>> >>>>> >>>
>>>> >>>>> >>> With regards,
>>>> >>>>> >>> Swogat Pradhan
>>>> >>>>> >>>
>>>> >>>>> >>>
>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <bshephar at redhat.com> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Does your environment use different network interfaces for each of the networks? Or does it have a bond with everything on it?
>>>> >>>>> >>>>
>>>> >>>>> >>>> One issue I have seen before is that when launching instances, there is a lot of network traffic between nodes as the hypervisor needs to download the image from Glance. Along with various other services sending normal network traffic, it can be enough to cause issues if everything is running over a single 1Gbe interface.
>>>> >>>>> >>>>
>>>> >>>>> >>>> I have seen the same situation in fact when using a single active/backup bond on 1Gbe nics. It?s worth checking the network traffic while you try to spawn the instance to see if you?re dropping packets. In the situation I described, there were dropped packets which resulted in a loss of communication between nova_compute and RMQ, so the node appeared offline. You should also confirm that nova_compute is being disconnected in the nova_compute logs if you tail them on the Hypervisor while spawning the instance.
>>>> >>>>> >>>>
>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So, based on that experience, from my perspective, is certainly sounds like some kind of network issue.
>>>> >>>>> >>>>
>>>> >>>>> >>>> Regards,
>>>> >>>>> >>>>
>>>> >>>>> >>>> Brendan Shephard
>>>> >>>>> >>>> Senior Software Engineer
>>>> >>>>> >>>> Red Hat Australia
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>>
>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago in this thread:
>>>> >>>>> >>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>> >>>>> >>>>
>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that user, not sure if that could apply here. But is it possible that your nova and neutron versions are different between central and edge site? Have you restarted nova and neutron services on the compute nodes after installation? Have you debug logs of nova-conductor and maybe nova-compute? Maybe they can help narrow down the issue.
>>>> >>>>> >>>> If there isn't any additional information in the debug logs I probably would start "tearing down" rabbitmq. I didn't have to do that in a production system yet so be careful. I can think of two routes:
>>>> >>>>> >>>>
>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is running, this will most likely impact client IO depending on your load. Check out the rabbitmqctl commands.
>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>> >>>>> >>>>
>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being replicated across the rabbit nodes. But I don't really know the rabbit internals too well, so maybe someone else can chime in here and give a better advice.
>>>> >>>>> >>>>
>>>> >>>>> >>>> Regards,
>>>> >>>>> >>>> Eugen
>>>> >>>>> >>>>
>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>> >>>>> >>>>
>>>> >>>>> >>>> With regards,
>>>> >>>>> >>>> Swogat Pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi
>>>> >>>>> >>>> I don't see any major packet loss.
>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not due to packet
>>>> >>>>> >>>> loss.
>>>> >>>>> >>>>
>>>> >>>>> >>>> with regards,
>>>> >>>>> >>>> Swogat Pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked when
>>>> >>>>> >>>> launching the instance.
>>>> >>>>> >>>> I will check that and come back.
>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at spawning
>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if packet loss
>>>> >>>>> >>>> causes this.
>>>> >>>>> >>>>
>>>> >>>>> >>>> With regards,
>>>> >>>>> >>>> Swogat pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical between
>>>> >>>>> >>>> central and edge site? Do you see packet loss through the tunnel?
>>>> >>>>> >>>>
>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>> >>>>
>>>> >>>>> >>>> > Hi Eugen,
>>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>>> >>>>> >>>> > getting email's from you.
>>>> >>>>> >>>> > Coming to the issue:
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>>> >>>>> >>>> /
>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition      priority
>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>> >>>>> >>>> >
>>>> >>>>> >>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down when i am
>>>> >>>>> >>>> trying
>>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning state and
>>>> >>>>> >>>> then
>>>> >>>>> >>>> > gets stuck.
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge sites.
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > With regards,
>>>> >>>>> >>>> > Swogat Pradhan
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> > wrote:
>>>> >>>>> >>>> >
>>>> >>>>> >>>> >> Hi Eugen,
>>>> >>>>> >>>> >> For some reason i am not getting your email to me directly, i am
>>>> >>>>> >>>> checking
>>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other activities in the
>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> With regards,
>>>> >>>>> >>>> >> Swogat Pradhan
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> >> wrote:
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >>> Hi Eugen,
>>>> >>>>> >>>> >>> Thanks for your response.
>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the details:
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *PCS Status:*
>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>> >>>>> >>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-2
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-1
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-0
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the issue is
>>>> >>>>> >>>> still
>>>> >>>>> >>>> >>> present.
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *Cluster status:*
>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>>> >>>>> >>>> >>> Cluster status of node
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>> >>>>> >>>> >>> Basics
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Disk Nodes
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Running Nodes
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Versions
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>> >>>>> >>>> RabbitMQ
>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Alarms
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> (none)
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Network Partitions
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> (none)
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Listeners
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>>> >>>>> >>>> inter-node and
>>>> >>>>> >>>> >>> CLI tool communication
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>>> >>>>> >>>> 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Feature flags
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *Logs:*
>>>> >>>>> >>>> >>> *(Attached)*
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> With regards,
>>>> >>>>> >>>> >>> Swogat Pradhan
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> >>> wrote:
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>>> Hi,
>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> nova-conuctor:
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> >>>>> >>>> with
>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> With regards,
>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>>> Hi,
>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>>> >>>>> >>>> >>>>> launch vm's.
>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>>> >>>>> >>>> compute
>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the nova
>>>> >>>>> >>>> compute
>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> nova-compute.log
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>>> >>>>> >>>> >>>>> instance usage
>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>>> >>>>> >>>> to
>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>>> >>>>> >>>> name:
>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>>> >>>>> >>>> with
>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>>> >>>>> >>>> >>>>> privsep helper:
>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>> >>>>> >>>> 'privsep-helper',
>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>>> >>>>> >>>> privsep
>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> >>>> >>>>> daemon starting
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>>> >>>>> >>>> >>>>> execution error
>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>> >>>>> >>>> >>>>> Exit code: 2
>>>> >>>>> >>>> >>>>> Stdout: ''
>>>> >>>>> >>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> With regards,
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>>
>>>>


From swogatpradhan22 at gmail.com  Wed Mar 22 13:54:32 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 22 Mar 2023 19:24:32 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAE66OMDdOtYhGSAYUdG=hjZrA=323qurO5zckAGaPNJ6MYgg+w@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CAH0LXPo=5jx+MNvjGimFgD6r2oy6XMYt-9y4+Tgw=N2iRT5pWw@mail.gmail.com>
 <CAE66OMDdOtYhGSAYUdG=hjZrA=323qurO5zckAGaPNJ6MYgg+w@mail.gmail.com>
Message-ID: <CAH0LXPru8k2=-6nrJEyUo=FEFBC6EfLmGXfPC+szSaaPF9buBg@mail.gmail.com>

My glance container is running but is in an unhealthy state.
I don't see any errors in podman logs glance_api or anywhere.

[root at dcn02-compute-0 ~]# podman ps --all | grep glance
03a07452704a
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
                                 9 days ago      Exited (0) 41 minutes ago
                 container-puppet-glance_api
b61e96e9f504
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           /bin/bash -c chow...  9 days ago      Exited (0) 36 minutes ago
                 glance_init_logs
ec1734dfb072
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           /usr/bin/bootstra...  34 minutes ago  Exited (0) 34 minutes ago
                 glance_api_db_sync
a8eb5d18b8d6
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           kolla_start           31 minutes ago  Up 32 minutes ago
(healthy)                glance_api_cron
74a92f45a4a2
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           kolla_start           31 minutes ago  Up 32 minutes ago
(unhealthy)              glance_api

With regards,
Swogat Pradhan

On Wed, Mar 22, 2023 at 7:16?PM John Fulton <johfulto at redhat.com> wrote:

> On Wed, Mar 22, 2023 at 9:42?AM Swogat Pradhan
> <swogatpradhan22 at gmail.com> wrote:
> >
> > Hi Jhon,
> > After some changes i feel like the cinder is now trying to pull the
> image from local glance as i am getting the following error in
> cinder-colume log:
> >
> > 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
> finding address for
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> Unable to establish connection to
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
> NewConnectionError('<urllib3.connection.HTTPConnection object at
> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
> ECONNREFUSED',))
> >
> > As the endpoint it is trying to reach is the dcn02 IP address.
> >
> > But when i check the ports i don't find the port 9292 running:
> > [root at dcn02-compute-2 ceph]# netstat -nultp
> > Active Internet connections (only servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address
>  State       PID/Program name
> > tcp        0      0 0.0.0.0:2022            0.0.0.0:*
>  LISTEN      656800/sshd
> > tcp        0      0 127.0.0.1:199           0.0.0.0:*
>  LISTEN      4878/snmpd
> > tcp        0      0 172.25.228.253:2379     0.0.0.0:*
>  LISTEN      6232/etcd
> > tcp        0      0 172.25.228.253:2380     0.0.0.0:*
>  LISTEN      6232/etcd
> > tcp        0      0 0.0.0.0:111             0.0.0.0:*
>  LISTEN      1/systemd
> > tcp        0      0 127.0.0.1:6640          0.0.0.0:*
>  LISTEN      2779/ovsdb-server
> > tcp        0      0 0.0.0.0:22              0.0.0.0:*
>  LISTEN      4918/sshd
> > tcp6       0      0 :::2022                 :::*
> LISTEN      656800/sshd
> > tcp6       0      0 :::111                  :::*
> LISTEN      1/systemd
> > tcp6       0      0 :::22                   :::*
> LISTEN      4918/sshd
> > udp        0      0 0.0.0.0:111             0.0.0.0:*
>          1/systemd
> > udp        0      0 0.0.0.0:161             0.0.0.0:*
>          4878/snmpd
> > udp        0      0 127.0.0.1:323           0.0.0.0:*
>          2609/chronyd
> > udp        0      0 0.0.0.0:6081            0.0.0.0:*
>          -
> > udp6       0      0 :::111                  :::*
>         1/systemd
> > udp6       0      0 ::1:161                 :::*
>         4878/snmpd
> > udp6       0      0 ::1:323                 :::*
>         2609/chronyd
> > udp6       0      0 :::6081                 :::*
>         -
> >
> > I see in the glance-api.conf that bind port parameter is set to 9292 but
> the port is not listed in netstat command.
> > Can you please guide me in getting this port up and running as i feel
> like this would solve the issue i am facing right now.
>
> Looks like your glance container stopped running. Ask podman to show
> you all containers (including stopped ones) and investigate why the
> glance container stopped.
>
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Wed, Mar 22, 2023 at 4:55?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>
> >> Update:
> >> Here is the log when creating a volume using cirros image:
> >>
> >> 2023-03-22 11:04:38.449 109 INFO
> cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
> >> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
> >> 2023-03-22 11:07:54.023 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
> >>   category=FutureWarning)
> >>
> >> 2023-03-22 11:11:12.161 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
> >>   category=FutureWarning)
> >>
> >> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
> MB/s
> >> 2023-03-22 11:11:14.998 109 INFO
> cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
> >> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
> >>
> >> The image is present in dcn02 store but still it downloaded the image
> in 0.16 MB/s and then created the volume.
> >>
> >> With regards,
> >> Swogat Pradhan
> >>
> >> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>
> >>> Hi Jhon,
> >>> This seems to be an issue.
> >>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
> parameter was specified to the respective cluster names but the config
> files were created in the name of ceph.conf and keyring was
> ceph.client.openstack.keyring.
> >>>
> >>> Which created issues in glance as well as the naming convention of the
> files didn't match the cluster names, so i had to manually rename the
> central ceph conf file as such:
> >>>
> >>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
> >>> [root at dcn02-compute-0 ceph]# ll
> >>> total 16
> >>> -rw-------. 1 root root 257 Mar 13 13:56
> ceph_central.client.openstack.keyring
> >>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
> >>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
> >>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
> >>> [root at dcn02-compute-0 ceph]#
> >>>
> >>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
> respective clusters in both dcn01 and dcn02.
> >>> In the above cli output, the ceph.conf and ceph.client... are the
> files used to access dcn02 ceph cluster and ceph_central* files are used in
> for accessing central ceph cluster.
> >>>
> >>> glance multistore config:
> >>> [dcn02]
> >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
> >>> rbd_store_user=openstack
> >>> rbd_store_pool=images
> >>> rbd_thin_provisioning=False
> >>> store_description=dcn02 rbd glance store
> >>>
> >>> [ceph_central]
> >>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
> >>> rbd_store_user=openstack
> >>> rbd_store_pool=images
> >>> rbd_thin_provisioning=False
> >>> store_description=Default glance store backend.
> >>>
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>>
> >>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
> >>>> <swogatpradhan22 at gmail.com> wrote:
> >>>> >
> >>>> > Hi,
> >>>> > Seems like cinder is not using the local ceph.
> >>>>
> >>>> That explains the issue. It's a misconfiguration.
> >>>>
> >>>> I hope this is not a production system since the mailing list now has
> >>>> the cinder.conf which contains passwords.
> >>>>
> >>>> The section that looks like this:
> >>>>
> >>>> [tripleo_ceph]
> >>>> volume_backend_name=tripleo_ceph
> >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
> >>>> rbd_ceph_conf=/etc/ceph/ceph.conf
> >>>> rbd_user=openstack
> >>>> rbd_pool=volumes
> >>>> rbd_flatten_volume_from_snapshot=False
> >>>> rbd_secret_uuid=<redacted>
> >>>> report_discard_supported=True
> >>>>
> >>>> Should be updated to refer to the local DCN ceph cluster and not the
> >>>> central one. Use the ceph conf file for that cluster and ensure the
> >>>> rbd_secret_uuid corresponds to that one.
> >>>>
> >>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
> >>>> Ceph cluster. The FSID should be in the ceph.conf file. The
> >>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
> >>>> libvirt can retrieve the cephx secret using the FSID as a key. This
> >>>> can be confirmed with `podman exec nova_virtsecretd virsh
> >>>> secret-get-value $FSID`.
> >>>>
> >>>> The documentation describes how to configure the central and DCN sites
> >>>> correctly but an error seems to have occurred while you were following
> >>>> it.
> >>>>
> >>>>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
> >>>>
> >>>>   John
> >>>>
> >>>> >
> >>>> > Ceph Output:
> >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
> >>>> > NAME                                       SIZE     PARENT  FMT
> PROT  LOCK
> >>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>     excl
> >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
> >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2
> yes
> >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
> >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2
> yes
> >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
> >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2
> yes
> >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
> >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2
> yes
> >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
> >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2
> yes
> >>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
> >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2
> yes
> >>>> >
> >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
> >>>> > NAME                                         SIZE     PARENT  FMT
> PROT  LOCK
> >>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
> >>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
> >>>> > [ceph: root at dcn02-ceph-all-0 /]#
> >>>> >
> >>>> > Attached the cinder config.
> >>>> > Please let me know how I can solve this issue.
> >>>> >
> >>>> > With regards,
> >>>> > Swogat Pradhan
> >>>> >
> >>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>> >>
> >>>> >> in my last message under the line "On a DCN site if you run a
> command like this:" I suggested some steps you could try to confirm the
> image is a COW from the local glance as well as how to look at your cinder
> config.
> >>>> >>
> >>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>
> >>>> >>> Update:
> >>>> >>> I uploaded an image directly to the dcn02 store, and it takes
> around 10,15 minutes to create a volume with image in dcn02.
> >>>> >>> The image size is 389 MB.
> >>>> >>>
> >>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>
> >>>> >>>> Hi Jhon,
> >>>> >>>> I checked in the ceph od dcn02, I can see the images created
> after importing from the central site.
> >>>> >>>> But launching an instance normally fails as it takes a long time
> for the volume to get created.
> >>>> >>>>
> >>>> >>>> When launching an instance from volume the instance is getting
> created properly without any errors.
> >>>> >>>>
> >>>> >>>> I tried to cache images in nova using
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> but getting checksum failed error.
> >>>> >>>>
> >>>> >>>> With regards,
> >>>> >>>> Swogat Pradhan
> >>>> >>>>
> >>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>> >>>>>
> >>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
> >>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >
> >>>> >>>>> > Update: After restarting the nova services on the controller
> and running the deploy script on the edge site, I was able to launch the VM
> from volume.
> >>>> >>>>> >
> >>>> >>>>> > Right now the instance creation is failing as the block
> device creation is stuck in creating state, it is taking more than 10 mins
> for the volume to be created, whereas the image has already been imported
> to the edge glance.
> >>>> >>>>>
> >>>> >>>>> Try following this document and making the same observations in
> your
> >>>> >>>>> environment for AZs and their local ceph cluster.
> >>>> >>>>>
> >>>> >>>>>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
> >>>> >>>>>
> >>>> >>>>> On a DCN site if you run a command like this:
> >>>> >>>>>
> >>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
> >>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
> >>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
> >>>> >>>>> NAME                                      SIZE  PARENT
> >>>> >>>>>                           FMT PROT LOCK
> >>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
> >>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
> >>>> >>>>> $
> >>>> >>>>>
> >>>> >>>>> Then, you should see the parent of the volume is the image
> which is on
> >>>> >>>>> the same local ceph cluster.
> >>>> >>>>>
> >>>> >>>>> I wonder if something is misconfigured and thus you're
> encountering
> >>>> >>>>> the streaming behavior described here:
> >>>> >>>>>
> >>>> >>>>> Ideally all images should reside in the central Glance and be
> copied
> >>>> >>>>> to DCN sites before instances of those images are booted on DCN
> sites.
> >>>> >>>>> If an image is not copied to a DCN site before it is booted,
> then the
> >>>> >>>>> image will be streamed to the DCN site and then the image will
> boot as
> >>>> >>>>> an instance. This happens because Glance at the DCN site has
> access to
> >>>> >>>>> the images store at the Central ceph cluster. Though the
> booting of
> >>>> >>>>> the image will take time because it has not been copied in
> advance,
> >>>> >>>>> this is still preferable to failing to boot the image.
> >>>> >>>>>
> >>>> >>>>> You can also exec into the cinder container at the DCN site and
> >>>> >>>>> confirm it's using it's local ceph cluster.
> >>>> >>>>>
> >>>> >>>>>   John
> >>>> >>>>>
> >>>> >>>>> >
> >>>> >>>>> > I will try and create a new fresh image and test again then
> update.
> >>>> >>>>> >
> >>>> >>>>> > With regards,
> >>>> >>>>> > Swogat Pradhan
> >>>> >>>>> >
> >>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>
> >>>> >>>>> >> Update:
> >>>> >>>>> >> In the hypervisor list the compute node state is showing
> down.
> >>>> >>>>> >>
> >>>> >>>>> >>
> >>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>>
> >>>> >>>>> >>> Hi Brendan,
> >>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
> bonds network template for both 3 compute nodes and 3 ceph nodes.
> >>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
> >>>> >>>>> >>> I used a cirros image to launch instance but the instance
> timed out so i waited for the volume to be created.
> >>>> >>>>> >>> Once the volume was created i tried launching the instance
> from the volume and still the instance is stuck in spawning state.
> >>>> >>>>> >>>
> >>>> >>>>> >>> Here is the nova-compute log:
> >>>> >>>>> >>>
> >>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
> privsep daemon starting
> >>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
> privsep process running with uid/gid: 0/0
> >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep process running with capabilities (eff/prm/inh):
> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep daemon running as pid 185437
> >>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
> os_brick.initiator.connectors.nvmeof
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
> in _get_host_uuid: Unexpected error while running command.
> >>>> >>>>> >>> Command: blkid overlay -s UUID -o value
> >>>> >>>>> >>> Exit code: 2
> >>>> >>>>> >>> Stdout: ''
> >>>> >>>>> >>> Stderr: '':
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
> >>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
> >>>> >>>>> >>>
> >>>> >>>>> >>> It is stuck in creating image, do i need to run the
> template mentioned here ?:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> >>>> >>>>> >>>
> >>>> >>>>> >>> The volume is already created and i do not understand why
> the instance is stuck in spawning state.
> >>>> >>>>> >>>
> >>>> >>>>> >>> With regards,
> >>>> >>>>> >>> Swogat Pradhan
> >>>> >>>>> >>>
> >>>> >>>>> >>>
> >>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
> bshephar at redhat.com> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Does your environment use different network interfaces for
> each of the networks? Or does it have a bond with everything on it?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> One issue I have seen before is that when launching
> instances, there is a lot of network traffic between nodes as the
> hypervisor needs to download the image from Glance. Along with various
> other services sending normal network traffic, it can be enough to cause
> issues if everything is running over a single 1Gbe interface.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I have seen the same situation in fact when using a single
> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
> while you try to spawn the instance to see if you?re dropping packets. In
> the situation I described, there were dropped packets which resulted in a
> loss of communication between nova_compute and RMQ, so the node appeared
> offline. You should also confirm that nova_compute is being disconnected in
> the nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
> So, based on that experience, from my perspective, is certainly sounds like
> some kind of network issue.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Regards,
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Brendan Shephard
> >>>> >>>>> >>>> Senior Software Engineer
> >>>> >>>>> >>>> Red Hat Australia
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago
> in this thread:
> >>>> >>>>> >>>>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
> user, not sure if that could apply here. But is it possible that your nova
> and neutron versions are different between central and edge site? Have you
> restarted nova and neutron services on the compute nodes after
> installation? Have you debug logs of nova-conductor and maybe nova-compute?
> Maybe they can help narrow down the issue.
> >>>> >>>>> >>>> If there isn't any additional information in the debug
> logs I probably would start "tearing down" rabbitmq. I didn't have to do
> that in a production system yet so be careful. I can think of two routes:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
> running, this will most likely impact client IO depending on your load.
> Check out the rabbitmqctl commands.
> >>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
> replicated across the rabbit nodes. But I don't really know the rabbit
> internals too well, so maybe someone else can chime in here and give a
> better advice.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Regards,
> >>>> >>>>> >>>> Eugen
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>> Can someone please help me out on this issue?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> With regards,
> >>>> >>>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi
> >>>> >>>>> >>>> I don't see any major packet loss.
> >>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
> not due to packet
> >>>> >>>>> >>>> loss.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> with regards,
> >>>> >>>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
> >>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
> checked when
> >>>> >>>>> >>>> launching the instance.
> >>>> >>>>> >>>> I will check that and come back.
> >>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck
> at spawning
> >>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure
> if packet loss
> >>>> >>>>> >>>> causes this.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> With regards,
> >>>> >>>>> >>>> Swogat pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
> identical between
> >>>> >>>>> >>>> central and edge site? Do you see packet loss through the
> tunnel?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> > Hi Eugen,
> >>>> >>>>> >>>> > Request you to please add my email either on 'to' or
> 'cc' as i am not
> >>>> >>>>> >>>> > getting email's from you.
> >>>> >>>>> >>>> > Coming to the issue:
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
> list_policies -p
> >>>> >>>>> >>>> /
> >>>> >>>>> >>>> > Listing policies for vhost "/" ...
> >>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
> priority
> >>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>>
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
> when i am
> >>>> >>>>> >>>> trying
> >>>> >>>>> >>>> > to launch an instance and the instance comes to a
> spawning state and
> >>>> >>>>> >>>> then
> >>>> >>>>> >>>> > gets stuck.
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
> sites.
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > With regards,
> >>>> >>>>> >>>> > Swogat Pradhan
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> > wrote:
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> >> Hi Eugen,
> >>>> >>>>> >>>> >> For some reason i am not getting your email to me
> directly, i am
> >>>> >>>>> >>>> checking
> >>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
> >>>> >>>>> >>>> >> Here is the log for download:
> https://we.tl/t-L8FEkGZFSq
> >>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
> occurred.
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
> activities in the
> >>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> With regards,
> >>>> >>>>> >>>> >> Swogat Pradhan
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> >> wrote:
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >>> Hi Eugen,
> >>>> >>>>> >>>> >>> Thanks for your response.
> >>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
> details:
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *PCS Status:*
> >>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
> >>>> >>>>> >>>> >>>
> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-2
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-1
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-0
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
> the issue is
> >>>> >>>>> >>>> still
> >>>> >>>>> >>>> >>> present.
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *Cluster status:*
> >>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
> cluster_status
> >>>> >>>>> >>>> >>> Cluster status of node
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> ...
> >>>> >>>>> >>>> >>> Basics
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Cluster name:
> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Disk Nodes
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Running Nodes
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Versions
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> >>>> >>>>> >>>> RabbitMQ
> >>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Alarms
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> (none)
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Network Partitions
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> (none)
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Listeners
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
> purpose:
> >>>> >>>>> >>>> inter-node and
> >>>> >>>>> >>>> >>> CLI tool communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
> purpose: AMQP
> >>>> >>>>> >>>> 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
> HTTP API
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Feature flags
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
> >>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
> >>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
> >>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
> >>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *Logs:*
> >>>> >>>>> >>>> >>> *(Attached)*
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> With regards,
> >>>> >>>>> >>>> >>> Swogat Pradhan
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> >>> wrote:
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>>> Hi,
> >>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
> log.
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> nova-conuctor:
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_276049ec36a84486a8a406911d9802f4).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Cache enabled
> >>>> >>>>> >>>> with
> >>>> >>>>> >>>> >>>> backend dogpile.cache.null.
> >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> With regards,
> >>>> >>>>> >>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>>> Hi,
> >>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where
> i am trying to
> >>>> >>>>> >>>> >>>>> launch vm's.
> >>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
> (openstack
> >>>> >>>>> >>>> compute
> >>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
> the nova
> >>>> >>>>> >>>> compute
> >>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> nova-compute.log
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
> Running
> >>>> >>>>> >>>> >>>>> instance usage
> >>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
> 2023-02-26 07:00:00
> >>>> >>>>> >>>> to
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
> successful on node
> >>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
> nova.virt.libvirt.driver
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
> supplied device
> >>>> >>>>> >>>> name:
> >>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
> names
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
> volume
> >>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Cache enabled
> >>>> >>>>> >>>> with
> >>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Running
> >>>> >>>>> >>>> >>>>> privsep helper:
> >>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> >>>> >>>>> >>>> 'privsep-helper',
> >>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
> '--config-file',
> >>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Spawned new
> >>>> >>>>> >>>> privsep
> >>>> >>>>> >>>> >>>>> daemon via rootwrap
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> daemon starting
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
> >>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> daemon running as pid 2647
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> >>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Process
> >>>> >>>>> >>>> >>>>> execution error
> >>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
> command.
> >>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
> >>>> >>>>> >>>> >>>>> Exit code: 2
> >>>> >>>>> >>>> >>>>> Stdout: ''
> >>>> >>>>> >>>> >>>>> Stderr: '':
> oslo_concurrency.processutils.ProcessExecutionError:
> >>>> >>>>> >>>> >>>>> Unexpected error while running command.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
> nova.virt.libvirt.driver
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> With regards,
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> Swogat Pradhan
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>>
> >>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/2ca7d2a6/attachment-0001.htm>

From abishop at redhat.com  Wed Mar 22 14:41:27 2023
From: abishop at redhat.com (Alan Bishop)
Date: Wed, 22 Mar 2023 07:41:27 -0700
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
Message-ID: <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>

On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Update:
> Here is the log when creating a volume using cirros image:
>
> 2023-03-22 11:04:38.449 109 INFO cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>

As Adam Savage would say, well there's your problem ^^ (Image download
15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
suggests you have a network issue.

John Fulton previously stated your cinder-volume service at the edge site
is not using the local ceph image store. Assuming you are deploying
GlanceApiEdge service [1], then the cinder-volume service should be
configured to use the local glance service [2]. You should check cinder's
glance_api_servers to confirm it's the edge site's glance service.

[1]
https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
[2]
https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80

Alan


> 2023-03-22 11:07:54.023 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
>   category=FutureWarning)
>
> 2023-03-22 11:11:12.161 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
>   category=FutureWarning)
>
> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
> MB/s
> 2023-03-22 11:11:14.998 109 INFO cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>
> The image is present in dcn02 store but still it downloaded the image in
> 0.16 MB/s and then created the volume.
>
> With regards,
> Swogat Pradhan
>
> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Jhon,
>> This seems to be an issue.
>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>> parameter was specified to the respective cluster names but the config
>> files were created in the name of ceph.conf and keyring was
>> ceph.client.openstack.keyring.
>>
>> Which created issues in glance as well as the naming convention of the
>> files didn't match the cluster names, so i had to manually rename the
>> central ceph conf file as such:
>>
>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>> [root at dcn02-compute-0 ceph]# ll
>> total 16
>> -rw-------. 1 root root 257 Mar 13 13:56
>> ceph_central.client.openstack.keyring
>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>> [root at dcn02-compute-0 ceph]#
>>
>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>> respective clusters in both dcn01 and dcn02.
>> In the above cli output, the ceph.conf and ceph.client... are the files
>> used to access dcn02 ceph cluster and ceph_central* files are used in for
>> accessing central ceph cluster.
>>
>> glance multistore config:
>> [dcn02]
>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=dcn02 rbd glance store
>>
>> [ceph_central]
>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=Default glance store backend.
>>
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:
>>
>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>> <swogatpradhan22 at gmail.com> wrote:
>>> >
>>> > Hi,
>>> > Seems like cinder is not using the local ceph.
>>>
>>> That explains the issue. It's a misconfiguration.
>>>
>>> I hope this is not a production system since the mailing list now has
>>> the cinder.conf which contains passwords.
>>>
>>> The section that looks like this:
>>>
>>> [tripleo_ceph]
>>> volume_backend_name=tripleo_ceph
>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>> rbd_user=openstack
>>> rbd_pool=volumes
>>> rbd_flatten_volume_from_snapshot=False
>>> rbd_secret_uuid=<redacted>
>>> report_discard_supported=True
>>>
>>> Should be updated to refer to the local DCN ceph cluster and not the
>>> central one. Use the ceph conf file for that cluster and ensure the
>>> rbd_secret_uuid corresponds to that one.
>>>
>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>> secret-get-value $FSID`.
>>>
>>> The documentation describes how to configure the central and DCN sites
>>> correctly but an error seems to have occurred while you were following
>>> it.
>>>
>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>
>>>   John
>>>
>>> >
>>> > Ceph Output:
>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>> > NAME                                       SIZE     PARENT  FMT  PROT
>>> LOCK
>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>> excl
>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>>> >
>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>> > NAME                                         SIZE     PARENT  FMT
>>> PROT  LOCK
>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>> >
>>> > Attached the cinder config.
>>> > Please let me know how I can solve this issue.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>> wrote:
>>> >>
>>> >> in my last message under the line "On a DCN site if you run a command
>>> like this:" I suggested some steps you could try to confirm the image is a
>>> COW from the local glance as well as how to look at your cinder config.
>>> >>
>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>
>>> >>> Update:
>>> >>> I uploaded an image directly to the dcn02 store, and it takes around
>>> 10,15 minutes to create a volume with image in dcn02.
>>> >>> The image size is 389 MB.
>>> >>>
>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>
>>> >>>> Hi Jhon,
>>> >>>> I checked in the ceph od dcn02, I can see the images created after
>>> importing from the central site.
>>> >>>> But launching an instance normally fails as it takes a long time
>>> for the volume to get created.
>>> >>>>
>>> >>>> When launching an instance from volume the instance is getting
>>> created properly without any errors.
>>> >>>>
>>> >>>> I tried to cache images in nova using
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> but getting checksum failed error.
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>>> wrote:
>>> >>>>>
>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >
>>> >>>>> > Update: After restarting the nova services on the controller and
>>> running the deploy script on the edge site, I was able to launch the VM
>>> from volume.
>>> >>>>> >
>>> >>>>> > Right now the instance creation is failing as the block device
>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>> volume to be created, whereas the image has already been imported to the
>>> edge glance.
>>> >>>>>
>>> >>>>> Try following this document and making the same observations in
>>> your
>>> >>>>> environment for AZs and their local ceph cluster.
>>> >>>>>
>>> >>>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>> >>>>>
>>> >>>>> On a DCN site if you run a command like this:
>>> >>>>>
>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>> >>>>> NAME                                      SIZE  PARENT
>>> >>>>>                           FMT PROT LOCK
>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>> >>>>> $
>>> >>>>>
>>> >>>>> Then, you should see the parent of the volume is the image which
>>> is on
>>> >>>>> the same local ceph cluster.
>>> >>>>>
>>> >>>>> I wonder if something is misconfigured and thus you're encountering
>>> >>>>> the streaming behavior described here:
>>> >>>>>
>>> >>>>> Ideally all images should reside in the central Glance and be
>>> copied
>>> >>>>> to DCN sites before instances of those images are booted on DCN
>>> sites.
>>> >>>>> If an image is not copied to a DCN site before it is booted, then
>>> the
>>> >>>>> image will be streamed to the DCN site and then the image will
>>> boot as
>>> >>>>> an instance. This happens because Glance at the DCN site has
>>> access to
>>> >>>>> the images store at the Central ceph cluster. Though the booting of
>>> >>>>> the image will take time because it has not been copied in advance,
>>> >>>>> this is still preferable to failing to boot the image.
>>> >>>>>
>>> >>>>> You can also exec into the cinder container at the DCN site and
>>> >>>>> confirm it's using it's local ceph cluster.
>>> >>>>>
>>> >>>>>   John
>>> >>>>>
>>> >>>>> >
>>> >>>>> > I will try and create a new fresh image and test again then
>>> update.
>>> >>>>> >
>>> >>>>> > With regards,
>>> >>>>> > Swogat Pradhan
>>> >>>>> >
>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>
>>> >>>>> >> Update:
>>> >>>>> >> In the hypervisor list the compute node state is showing down.
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>>
>>> >>>>> >>> Hi Brendan,
>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>> timed out so i waited for the volume to be created.
>>> >>>>> >>> Once the volume was created i tried launching the instance
>>> from the volume and still the instance is stuck in spawning state.
>>> >>>>> >>>
>>> >>>>> >>> Here is the nova-compute log:
>>> >>>>> >>>
>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>>> privsep daemon starting
>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>>> privsep process running with uid/gid: 0/0
>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>> privsep process running with capabilities (eff/prm/inh):
>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>> privsep daemon running as pid 185437
>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>> >>>>> >>> Exit code: 2
>>> >>>>> >>> Stdout: ''
>>> >>>>> >>> Stderr: '':
>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>> running command.
>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>> >>>>> >>>
>>> >>>>> >>> It is stuck in creating image, do i need to run the template
>>> mentioned here ?:
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>> >>>>> >>>
>>> >>>>> >>> The volume is already created and i do not understand why the
>>> instance is stuck in spawning state.
>>> >>>>> >>>
>>> >>>>> >>> With regards,
>>> >>>>> >>> Swogat Pradhan
>>> >>>>> >>>
>>> >>>>> >>>
>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>> bshephar at redhat.com> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Does your environment use different network interfaces for
>>> each of the networks? Or does it have a bond with everything on it?
>>> >>>>> >>>>
>>> >>>>> >>>> One issue I have seen before is that when launching
>>> instances, there is a lot of network traffic between nodes as the
>>> hypervisor needs to download the image from Glance. Along with various
>>> other services sending normal network traffic, it can be enough to cause
>>> issues if everything is running over a single 1Gbe interface.
>>> >>>>> >>>>
>>> >>>>> >>>> I have seen the same situation in fact when using a single
>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>> while you try to spawn the instance to see if you?re dropping packets. In
>>> the situation I described, there were dropped packets which resulted in a
>>> loss of communication between nova_compute and RMQ, so the node appeared
>>> offline. You should also confirm that nova_compute is being disconnected in
>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>> instance.
>>> >>>>> >>>>
>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
>>> based on that experience, from my perspective, is certainly sounds like
>>> some kind of network issue.
>>> >>>>> >>>>
>>> >>>>> >>>> Regards,
>>> >>>>> >>>>
>>> >>>>> >>>> Brendan Shephard
>>> >>>>> >>>> Senior Software Engineer
>>> >>>>> >>>> Red Hat Australia
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>>
>>> >>>>> >>>> I tried to help someone with a similar issue some time ago in
>>> this thread:
>>> >>>>> >>>>
>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>> >>>>> >>>>
>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>> user, not sure if that could apply here. But is it possible that your nova
>>> and neutron versions are different between central and edge site? Have you
>>> restarted nova and neutron services on the compute nodes after
>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>> Maybe they can help narrow down the issue.
>>> >>>>> >>>> If there isn't any additional information in the debug logs I
>>> probably would start "tearing down" rabbitmq. I didn't have to do that in a
>>> production system yet so be careful. I can think of two routes:
>>> >>>>> >>>>
>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>> running, this will most likely impact client IO depending on your load.
>>> Check out the rabbitmqctl commands.
>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables from
>>> all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>> >>>>> >>>>
>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>> internals too well, so maybe someone else can chime in here and give a
>>> better advice.
>>> >>>>> >>>>
>>> >>>>> >>>> Regards,
>>> >>>>> >>>> Eugen
>>> >>>>> >>>>
>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>> Can someone please help me out on this issue?
>>> >>>>> >>>>
>>> >>>>> >>>> With regards,
>>> >>>>> >>>> Swogat Pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi
>>> >>>>> >>>> I don't see any major packet loss.
>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not
>>> due to packet
>>> >>>>> >>>> loss.
>>> >>>>> >>>>
>>> >>>>> >>>> with regards,
>>> >>>>> >>>> Swogat Pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> Hi,
>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked
>>> when
>>> >>>>> >>>> launching the instance.
>>> >>>>> >>>> I will check that and come back.
>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck at
>>> spawning
>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
>>> packet loss
>>> >>>>> >>>> causes this.
>>> >>>>> >>>>
>>> >>>>> >>>> With regards,
>>> >>>>> >>>> Swogat pradhan
>>> >>>>> >>>>
>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>> wrote:
>>> >>>>> >>>>
>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they identical
>>> between
>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>> tunnel?
>>> >>>>> >>>>
>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>> >>>>> >>>>
>>> >>>>> >>>> > Hi Eugen,
>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc'
>>> as i am not
>>> >>>>> >>>> > getting email's from you.
>>> >>>>> >>>> > Coming to the issue:
>>> >>>>> >>>> >
>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>> list_policies -p
>>> >>>>> >>>> /
>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>> priority
>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >>>>> >>>> >
>>> >>>>> >>>>
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >>>>> >>>> >
>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>>> when i am
>>> >>>>> >>>> trying
>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning
>>> state and
>>> >>>>> >>>> then
>>> >>>>> >>>> > gets stuck.
>>> >>>>> >>>> >
>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>> sites.
>>> >>>>> >>>> >
>>> >>>>> >>>> > With regards,
>>> >>>>> >>>> > Swogat Pradhan
>>> >>>>> >>>> >
>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> > wrote:
>>> >>>>> >>>> >
>>> >>>>> >>>> >> Hi Eugen,
>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>> directly, i am
>>> >>>>> >>>> checking
>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>> activities in the
>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> With regards,
>>> >>>>> >>>> >> Swogat Pradhan
>>> >>>>> >>>> >>
>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> >> wrote:
>>> >>>>> >>>> >>
>>> >>>>> >>>> >>> Hi Eugen,
>>> >>>>> >>>> >>> Thanks for your response.
>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>> details:
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *PCS Status:*
>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>>>> >>>> >>>
>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>> >>>>> >>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>> >>>>> >>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-2
>>> >>>>> >>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-1
>>> >>>>> >>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>> >>>>> >>>> Started
>>> >>>>> >>>> >>> overcloud-controller-0
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but the
>>> issue is
>>> >>>>> >>>> still
>>> >>>>> >>>> >>> present.
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *Cluster status:*
>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>> cluster_status
>>> >>>>> >>>> >>> Cluster status of node
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> ...
>>> >>>>> >>>> >>> Basics
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Cluster name:
>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Disk Nodes
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Running Nodes
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Versions
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>>>> >>>> 3.8.3
>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>> >>>>> >>>> >>>
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> >>>>> >>>> RabbitMQ
>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Alarms
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> (none)
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Network Partitions
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> (none)
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Listeners
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and CLI
>>> >>>>> >>>> tool
>>> >>>>> >>>> >>> communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> >>>>> >>>> interface:
>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>> purpose:
>>> >>>>> >>>> inter-node and
>>> >>>>> >>>> >>> CLI tool communication
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>> purpose: AMQP
>>> >>>>> >>>> 0-9-1
>>> >>>>> >>>> >>> and AMQP 1.0
>>> >>>>> >>>> >>> Node:
>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>>> >>>> ,
>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>>> HTTP API
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Feature flags
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> *Logs:*
>>> >>>>> >>>> >>> *(Attached)*
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> With regards,
>>> >>>>> >>>> >>> Swogat Pradhan
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>> >>>>> >>>> >>> wrote:
>>> >>>>> >>>> >>>
>>> >>>>> >>>> >>>> Hi,
>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> nova-conuctor:
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Cache enabled
>>> >>>>> >>>> with
>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>> drop reply to
>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>> oslo_messaging._drivers.amqpdriver
>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The
>>> reply
>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60
>>> seconds
>>> >>>>> >>>> due to a
>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> >>>>> >>>> Abandoning...:
>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> With regards,
>>> >>>>> >>>> >>>> Swogat Pradhan
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>> >>>> >>>>
>>> >>>>> >>>> >>>>> Hi,
>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i
>>> am trying to
>>> >>>>> >>>> >>>>> launch vm's.
>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>> (openstack
>>> >>>>> >>>> compute
>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart the
>>> nova
>>> >>>>> >>>> compute
>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> nova-compute.log
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>>> Running
>>> >>>>> >>>> >>>>> instance usage
>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>> 2023-02-26 07:00:00
>>> >>>>> >>>> to
>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful
>>> on node
>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied
>>> device
>>> >>>>> >>>> name:
>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>> volume
>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Cache enabled
>>> >>>>> >>>> with
>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Running
>>> >>>>> >>>> >>>>> privsep helper:
>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> >>>>> >>>> 'privsep-helper',
>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Spawned new
>>> >>>>> >>>> privsep
>>> >>>>> >>>> >>>>> daemon via rootwrap
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> daemon starting
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>> [-] privsep
>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> Process
>>> >>>>> >>>> >>>>> execution error
>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>> command.
>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>>> >>>> >>>>> Exit code: 2
>>> >>>>> >>>> >>>>> Stdout: ''
>>> >>>>> >>>> >>>>> Stderr: '':
>>> oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>> [instance:
>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> With regards,
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>> Swogat Pradhan
>>> >>>>> >>>> >>>>>
>>> >>>>> >>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>> >>>>
>>> >>>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/40e6b2cb/attachment-0001.htm>

From jay at gr-oss.io  Wed Mar 22 15:14:21 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Wed, 22 Mar 2023 08:14:21 -0700
Subject: [ironic] Meeting cancelled April 10; PTL availabliity
Message-ID: <CA+sTGNc270PufCdaiYhJa4BkT6W2uvzxZ-KDR431ecxKjWEkjQ@mail.gmail.com>

After a quick poll of some Ironic contributors, it looks like most of us
will be out April 10 for holiday or vacation. I'm cancelling the team
meeting as we likely won't have quorum.

Along those lines; I will be out the entire week of April 10. If you have
anything that will need my attention specifically, please reach out before
then. I trust any one of the PTLs-emeritus that are Ironic contributors to
handle any event requiring PTL approval while I'm gone.

Thanks,
Jay Faulkner
Ironic PTL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/8baa1c9c/attachment.htm>

From hberaud at redhat.com  Wed Mar 22 15:18:47 2023
From: hberaud at redhat.com (Herve Beraud)
Date: Wed, 22 Mar 2023 16:18:47 +0100
Subject: OpenStack 2023.1 Antelope is officially released!
Message-ID: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>

Hello OpenStack community,

I'm excited to announce the final releases for the components of
OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope
development cycle.

You will find a complete list of all components, their latest
versions, and links to individual project release notes documents
listed on the new release site.

  https://releases.openstack.org/antelope/

Congratulations to all of the teams who have contributed to this
release!

Our next production cycle, 2023.2 Bobcat, has already started. We will
meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan
the work for the upcoming cycle. I hope to see you there!

Thanks,

OpenStack Release Management team

-- 
Herv? Beraud
Senior Software Engineer at Red Hat
irc: hberaud
https://github.com/4383/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/f2ef96e9/attachment.htm>

From thierry at openstack.org  Wed Mar 22 15:26:36 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Wed, 22 Mar 2023 16:26:36 +0100
Subject: [largescale-sig] Next meeting: March 22, 15utc
In-Reply-To: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org>
References: <2afa2e24-b2b9-4954-ad89-9112a7714f1b@openstack.org>
Message-ID: <adf1da25-df30-78f8-7256-811a9360a48a@openstack.org>

Here is the summary of our SIG meeting today.

We discussed our next OpenInfra Live episode on April 6, featuring 
Societe Generale. We also decided to alter IRC meeting times to account 
for DST and make them slightly friendlier to our APAC friends.

You can read the detailed meeting logs at:

https://meetings.opendev.org/meetings/large_scale_sig/2023/large_scale_sig.2023-03-22-15.01.html

Our next IRC meeting will be April 19, 14:00UTC on #openstack-operators 
on OFTC.

Regards,

-- 
Thierry Carrez (ttx)


From gmann at ghanshyammann.com  Wed Mar 22 15:45:49 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 08:45:49 -0700
Subject: [ptl][tc] OpenStack packages PyPi additional external
 maintainers audit & cleanup
In-Reply-To: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com>
References: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com>
Message-ID: <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com>

 ---- On Fri, 20 Jan 2023 15:36:08 -0800  Ghanshyam Mann  wrote --- 
 > Hi PTLs,
 > 
 > As you might know or have seen for your project package on PyPi, OpenStack deliverables on PyPi have
 > additional maintainers, For example, https://pypi.org/project/murano/, https://pypi.org/project/glance/
 > 
 > We should keep only  'openstackci' as a maintainer in PyPi so that releases of OpenStack deliverables
 > can be managed in a single place. Otherwise, we might face the two sets of maintainers' places and
 > packages might get released in PyPi by additional maintainers without the OpenStack project team
 > knowing about it. One such case is in Horizon repo 'xstatic-font-awesome' where a new maintainer is
 > added by an existing additional maintainer and this package was released without the Horizon team
 > knowing about the changes and release.
 > - https://github.com/openstack/xstatic-font-awesome/pull/2
 > 
 > To avoid the 'xstatic-font-awesome' case for other packages, TC discussed it in their weekly meetings[1]
 > and agreed to audit all the OpenStack packages and then clean up the additional maintainers in PyPi
 > (keep only 'openstackci' as maintainers). 
 > 
 > To help in this task, TC requests project PTL to perform the audit for their project's repo and add comments
 > in the below etherpad.
 > 
 > - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup

Hello Everyone,

To update, there is an extra step for project PTLs in this task:

* Step 1.1: Project PTL/team needs to communicate to the additional maintainers about removing themselves
 and transferring ownership to 'openstackci'
 - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L23

Initially, TC thought we could do a cleanup with the help of openstackci admin for all repo. But, to avoid any issue
or misunderstanding/panic among additional maintainers on removal, it is better that projects communicate with
additional maintainers and ask them to remove themself. JayF sent the email format to communicate to additional
maintainers[1]. Please use that and let TC know if any queries/issues you are facing.
 
[1] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032780.html

-gmann

 > 
 > Thanks to knikolla to automate the listing of the OpenStack packages with additional maintainers in PyPi which
 > you can find the result in output.txt at the bottom of this link. I have added the project list of who needs to check
 > their repo in etherpad.
 > 
 > - https://gist.github.com/knikolla/7303a65a5ddaa2be553fc6e54619a7a1
 > 
 > Please complete the audit for your project before March 15 so that TC can discuss the next step in vPTG.
 > 
 > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-01-11-16.00.log.html#l-41
 > 
 > 
 > -gmann
 > 
 > 


From gmann at ghanshyammann.com  Wed Mar 22 15:57:45 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 08:57:45 -0700
Subject: [ptl][tc][ops][ptg] Operator + Developers interaction
 (operator-hours) slots in 2023.2 Bobcat PTG
In-Reply-To: <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com>
References: <186f171095b.d9075d4e658691.6614784213130492110@ghanshyammann.com>
 <8e7b678c3d4e0aad8ab74436ed8ca6065cc1735f.camel@canonical.com>
Message-ID: <1870a0a60f6.10ebbdd99985795.2857132943834492770@ghanshyammann.com>

 ---- On Tue, 21 Mar 2023 14:40:42 -0700  Felipe Reyes  wrote --- 
 > Hi Ghanshyam,
 > 
 > On Fri, 2023-03-17 at 14:19 -0700, Ghanshyam Mann wrote:
 > > Hello Everyone/PTL,
 > > 
 > > To improve the interaction/feedback between operators and developers, one of the efforts is to
 > > schedule
 > > the 'operator-hour' in developers' events. We scheduled the 'operator-hour' in the last vPTG,
 > > which had mixed
 > > productivity feedback[1]. The TC discussed it and thinks we should continue the 'operator-hour' in
 > > March
 > > vPTG also.
 > 
 > At OpenStack-charms project we thought it was a good idea, can we get the track 'operator-hour-
 > openstackcharms' registered?

Hi Felipe,

Just in case you have noticed in IRC, 'operator-hour-openstackcharms' track is now registered (thanks fungi),
you can book the slot.

-gmann

 > 
 > Thanks,
 > 
 > -- 
 > Felipe Reyes
 > Software Engineer @ Canonical
 > felipe.reyes at canonical.com (GPG:0x9B1FFF39)
 > Launchpad: ~freyes | IRC: freyes
 > 
 > 
 > 


From elod.illes at est.tech  Wed Mar 22 15:58:16 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Wed, 22 Mar 2023 15:58:16 +0000
Subject: OpenStack 2023.1 Antelope is officially released!
In-Reply-To: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
References: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
Message-ID: <VI1P18901MB0751BDF13B076B86A32D9C34FF869@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Let me join and thank to all who were part of the 2023.1 Antelope
development cycle!

Also note, that this marks the official opening of the openstack/releases
repository for 2023.2 Bobcat releases, and freezes are now lifted.
stable/2023.1 is now a fully normal stable branch, and the normal stable
policy applies from now on.

Thanks,

El?d Ill?s


________________________________
From: Herve Beraud <hberaud at redhat.com>
Sent: Wednesday, March 22, 2023 4:18 PM
To: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: OpenStack 2023.1 Antelope is officially released!


Hello OpenStack community,

I'm excited to announce the final releases for the components of
OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope
development cycle.

You will find a complete list of all components, their latest
versions, and links to individual project release notes documents
listed on the new release site.

  https://releases.openstack.org/antelope/

Congratulations to all of the teams who have contributed to this
release!

Our next production cycle, 2023.2 Bobcat, has already started. We will
meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan
the work for the upcoming cycle. I hope to see you there!

Thanks,

OpenStack Release Management team

--
Herv? Beraud
Senior Software Engineer at Red Hat
irc: hberaud
https://github.com/4383/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/d855716f/attachment-0001.htm>

From thierry at openstack.org  Wed Mar 22 16:03:47 2023
From: thierry at openstack.org (Thierry Carrez)
Date: Wed, 22 Mar 2023 17:03:47 +0100
Subject: OpenStack 2023.1 Antelope is officially released!
In-Reply-To: <VI1P18901MB0751BDF13B076B86A32D9C34FF869@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
References: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
 <VI1P18901MB0751BDF13B076B86A32D9C34FF869@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
Message-ID: <2814547e-1463-22c9-8c22-61b52ca69d2d@openstack.org>

Woohoo!

El?d Ill?s wrote:
> Let me join and thank to all who were part of the 2023.1 Antelope 
> development cycle! Also note, that this marks the official opening of 
> the openstack/releases repository for 2023.2 Bobcat releases, and 
> freezes are now lifted. stable/2023.1 is now a fully normal stable 
> branch, and the normal stable policy applies from now on. Thanks, El?d 
> Ill?s
> 
> 
> ------------------------------------------------------------------------
> *From:* Herve Beraud <hberaud at redhat.com>
> *Sent:* Wednesday, March 22, 2023 4:18 PM
> *To:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Subject:* OpenStack 2023.1 Antelope is officially released!
> 
> Hello OpenStack community,
> 
> I'm excited to announce the final releases for the components of
> OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope
> development cycle.
> 
> You will find a complete list of all components, their latest
> versions, and links to individual project release notes documents
> listed on the new release site.
> 
>    https://releases.openstack.org/antelope/  <https://releases.openstack.org/antelope/>
> 
> Congratulations to all of the teams who have contributed to this
> release!
> 
> Our next production cycle, 2023.2 Bobcat, has already started. We will
> meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan
> the work for the upcoming cycle. I hope to see you there!
> 
> Thanks,
> 
> OpenStack Release Management team
> 
> -- 
> Herv? Beraud
> Senior Software Engineer at Red Hat
> irc: hberaud
> https://github.com/4383/ <https://github.com/4383/>
> 

-- 
Thierry Carrez (ttx)


From amy at demarco.com  Wed Mar 22 16:19:14 2023
From: amy at demarco.com (Amy Marrich)
Date: Wed, 22 Mar 2023 11:19:14 -0500
Subject: OpenStack 2023.1 Antelope is officially released!
In-Reply-To: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
References: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
Message-ID: <CAFs83QqNty01L8RZK6N+EMWjoFx-7PB+JT5+3DunVMVze5RuGA@mail.gmail.com>

Congrats everyone!!

Amy(spotz)

On Wed, Mar 22, 2023 at 10:25?AM Herve Beraud <hberaud at redhat.com> wrote:
>
> Hello OpenStack community,
>
> I'm excited to announce the final releases for the components of
> OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope
> development cycle.
>
> You will find a complete list of all components, their latest
> versions, and links to individual project release notes documents
> listed on the new release site.
>
>   https://releases.openstack.org/antelope/
>
> Congratulations to all of the teams who have contributed to this
> release!
>
> Our next production cycle, 2023.2 Bobcat, has already started. We will
> meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan
> the work for the upcoming cycle. I hope to see you there!
>
> Thanks,
>
> OpenStack Release Management team
>
> --
> Herv? Beraud
> Senior Software Engineer at Red Hat
> irc: hberaud
> https://github.com/4383/
>


From jay at gr-oss.io  Wed Mar 22 16:19:38 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Wed, 22 Mar 2023 09:19:38 -0700
Subject: [ptls] PyPI maintainer cleanup - Action needed: Contact extra
 maintainers
In-Reply-To: <CA+sTGNe-7TCy4qea+5WArWBzjtXyDE6LZ5665DvGOWX4HxmAzQ@mail.gmail.com>
References: <CA+sTGNfCy+w9uLqpz3KFEGo14h7Qxy_6gRx7htceSDKLABgpzQ@mail.gmail.com>
 <CA+sTGNe-7TCy4qea+5WArWBzjtXyDE6LZ5665DvGOWX4HxmAzQ@mail.gmail.com>
Message-ID: <CA+sTGNd6vk47SwCMgcRz_qA5ChjJL+x7+qD8j692gHV1Q3am8Q@mail.gmail.com>

Hey all,

Wanted to remind you all: vPTG is a great time to address this issue! Even
if the PyPI maintainers you would need to contact are emeritus
contributors; you may have someone still on the project team who has
contact with them. I strongly recommend you utilize this time to help clean
your projects up.

Thanks,
Jay Faulkner
TC Vice-Chair

On Tue, Mar 21, 2023 at 9:03?AM Jay Faulkner <jay at gr-oss.io> wrote:

> Thanks to those who have already taken action! Fifty extra maintainers
> have already been removed, with around three hundred to go.
>
> Please reach out to me if you're having trouble finding current email
> addresses for anyone, or having trouble with the process at all.
>
> Thanks,
> Jay Faulkner
> TC Vice-Chair
>
>
> On Thu, Mar 16, 2023 at 3:22?PM Jay Faulkner <jay at gr-oss.io> wrote:
>
>> Hi PTLs,
>>
>> The TC recently voted[1] to require humans be removed from PyPI access
>> for OpenStack-managed projects. This helps ensure all releases are created
>> via releases team tooling and makes it less likely for a user account
>> compromise to impact OpenStack packages.
>>
>> Many projects have already updated
>> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L33
>> with a list of packages that contain extra maintainers. We'd like to
>> request that PTLs, or their designate, reach out to any extra maintainers
>> listed for projects you are responsible for and request they remove their
>> access in accordance with policy. An example email, and detailed steps to
>> follow have been provided at
>> https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup-email-template
>> .
>>
>> Thank you for your cooperation as we work to improve our security posture
>> and harden against supply chain attacks.
>>
>> Thank you,
>> Jay Faulkner
>> TC Vice-Chair
>>
>> 1:
>> https://opendev.org/openstack/governance/commit/979e339f899ef62d2a6871a99c99537744c5808d
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/81d3d79b/attachment.htm>

From kristin at openinfra.dev  Wed Mar 22 16:33:23 2023
From: kristin at openinfra.dev (Kristin Barrientos)
Date: Wed, 22 Mar 2023 11:33:23 -0500
Subject: OpenInfra Live episode: March 23, 2023, at 9 a.m. CT (14:00 UTC)
Message-ID: <B9FDD2E8-8228-44B8-883D-C0D895976DB5@openinfra.dev>

Hi everyone,

This week?s OpenInfra Live episode is brought to you by the OpenStack community!

Episode: OpenStack Antelope: A New Era

The OpenStack community released Antelope, the 27th version of the world?s most widely deployed open source cloud infrastructure software, this week. Join us to learn about the latest from community leaders about what was delivered in Antelope and what we can expect in Bobcat, OpenStack's 28th release targeting October 2023.

Speakers: Carlos Silva, Rajat Dhasmana, Sylvain Bauza, Jay Faulkner, Kendall Nelson

Date and time: March 23, 2023, at 9 a.m. CT (14:00 UTC)

You can watch us live on:
YouTube: https://www.youtube.com/watch?v=YdLTUTyJ1eU
LinkedIn: https://www.linkedin.com/events/7042534262494941185/comments/
WeChat: recording will be posted on OpenStack WeChat after the live stream

Have an idea for a future episode? Share it now at ideas.openinfra.live.

Thanks, 

Kristin Barrientos
Marketing Coordinator 
OpenInfra Foundation


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/5179268b/attachment.htm>

From stephenfin at redhat.com  Wed Mar 22 16:38:06 2023
From: stephenfin at redhat.com (Stephen Finucane)
Date: Wed, 22 Mar 2023 16:38:06 +0000
Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will
 remove sqlalchemy-migrate support
Message-ID: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>

tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to start
their switch to alembic immediately. Projects with "legacy" sqlalchemy-migrated
based migrations need to drop them.

A quick heads up that oslo.db 13.0.0 will be release in the next month or so and
will remove sqlalchemy-migrate support and formally add support for sqlalchemy
2.x. The removal of sqlalchemy-migrate support should only affect projects using
oslo.db's sqlalchemy-migrate wrappers, as opposed to using sqlalchemy-migrate
directly. For any projects that rely on this functionality, a short-term fix is
to vendor the removed code [1] in your project. However, I must emphasise that
we're not removing sqlalchemy-migrate integration for the fun of it: it's not
compatible with sqlalchemy 2.x and is no longer maintained. If your project uses
sqlalchemy-migrate and you haven't migrated to alembic yet, you need to start
doing so immediately. If you have migrated to alembic but still have sqlalchemy-
migrate "legacy" migrations in-tree, you need to look at dropping these asap.
Anything less will result in broken master when we bump upper-constraints to
allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that appear to
be using the removed modules.

For more advice on migrating to sqlalchemy 2.x and alembic, please look at my
previous post on the matter [2].

Cheers,
Stephen

[1] https://review.opendev.org/c/openstack/oslo.db/+/853025
[2] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html


From allison at openinfra.dev  Wed Mar 22 16:40:54 2023
From: allison at openinfra.dev (Allison Price)
Date: Wed, 22 Mar 2023 11:40:54 -0500
Subject: OpenStack 2023.1 Antelope is officially released!
In-Reply-To: <CAFs83QqNty01L8RZK6N+EMWjoFx-7PB+JT5+3DunVMVze5RuGA@mail.gmail.com>
References: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
 <CAFs83QqNty01L8RZK6N+EMWjoFx-7PB+JT5+3DunVMVze5RuGA@mail.gmail.com>
Message-ID: <2F187FB1-D47C-4288-BD04-8EEFED614FF2@openinfra.dev>

Congratulations everyone! Thank you for all of your contributions! I hope everyone has a chance in your part of the world to celebrate for a minute or two today :) 

> On Mar 22, 2023, at 11:19 AM, Amy Marrich <amy at demarco.com> wrote:
> 
> Congrats everyone!!
> 
> Amy(spotz)
> 
> On Wed, Mar 22, 2023 at 10:25?AM Herve Beraud <hberaud at redhat.com> wrote:
>> 
>> Hello OpenStack community,
>> 
>> I'm excited to announce the final releases for the components of
>> OpenStack 2023.1 Antelope, which conclude the 2023.1 Antelope
>> development cycle.
>> 
>> You will find a complete list of all components, their latest
>> versions, and links to individual project release notes documents
>> listed on the new release site.
>> 
>>  https://releases.openstack.org/antelope/
>> 
>> Congratulations to all of the teams who have contributed to this
>> release!
>> 
>> Our next production cycle, 2023.2 Bobcat, has already started. We will
>> meet at the Virtual Project Team Gathering, March 27-31, 2023, to plan
>> the work for the upcoming cycle. I hope to see you there!
>> 
>> Thanks,
>> 
>> OpenStack Release Management team
>> 
>> --
>> Herv? Beraud
>> Senior Software Engineer at Red Hat
>> irc: hberaud
>> https://github.com/4383/
>> 
> 


From ihrachys at redhat.com  Wed Mar 22 16:55:05 2023
From: ihrachys at redhat.com (Ihar Hrachyshka)
Date: Wed, 22 Mar 2023 12:55:05 -0400
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <CAECr9X675UpPz7NQy+_1vN6jXhyJgnu1Je5BiBCRM_CShJ4EVw@mail.gmail.com>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
 <3840757.STTH5IQzZg@p1>
 <CAKwN9=DQbon5=v+7O-hSyZnn1rRFYxNVBfJniMR8_TpZokZ5pw@mail.gmail.com>
 <CAECr9X675UpPz7NQy+_1vN6jXhyJgnu1Je5BiBCRM_CShJ4EVw@mail.gmail.com>
Message-ID: <CAKwN9=C15QqWqbE2TU_BE57eRWDMXe3UciAJ_+MoHD3Z8rm4wQ@mail.gmail.com>

On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez
<ralonsoh at redhat.com> wrote:
>
> Hello:
>
> I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly.

Thanks for this, I didn't realize that iptables may be considered prior art.

>
> The discussion here could be the default behaviour for standard services:
> * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that.
> * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance.

Agreed DHCPv6 rules are closer to "base" and that the argument for RA
/ NA flows is stronger because of the parallel to DHCPv4 operation.

> * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2]

At this point I am more ambivalent to the decision of whether to
include metadata into the list of "base" services, as long as we
define the list (behavior) in api-ref. But to address the point, since
Slawek leans to creating SG rules in Neutron API to handle ICMP
traffic necessary for RA / NA (which seems to have a merit and
internal logic) anyway, we could as well at this point create another
"default" rule for metadata replies.

But - I will repeat - as long as a decision on what the list of "base"
services enabled for any SG by default is, I can live with metadata
out of the list. It may not be as convenient to users (which is my
concern), but that's probably a matter of taste in API design.

BTW Rodolfo, thanks for allocating a time slot for this discussion at
vPTG. I hope we get to the bottom of it then. See you all next Wed
@13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg)

Ihar

>
> Regards.
>
> [1]https://review.opendev.org/c/openstack/neutron/+/877049
> [2]https://review.opendev.org/c/openstack/neutron/+/876659
>
> On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka <ihrachys at redhat.com> wrote:
>>
>> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski <skaplons at redhat.com> wrote:
>> >
>> > Hi,
>> >
>> >
>> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
>> >
>> > > Hi all,
>> >
>> > >
>> >
>> > > (I've tagged the thread with [ovn] because this question was raised in
>> >
>> > > the context of OVN, but it really is about the intent of neutron
>> >
>> > > stateless SG API.)
>> >
>> > >
>> >
>> > > Neutron API supports 'stateless' field for security groups:
>> >
>> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
>> >
>> > >
>> >
>> > > The API reference doesn't explain the intent of the API, merely
>> >
>> > > walking through the field mechanics, as in
>> >
>> > >
>> >
>> > > "The stateful security group extension (stateful-security-group) adds
>> >
>> > > the stateful field to security groups, allowing users to configure
>> >
>> > > stateful or stateless security groups for ports. The existing security
>> >
>> > > groups will all be considered as stateful. Update of the stateful
>> >
>> > > attribute is allowed when there is no port associated with the
>> >
>> > > security group."
>> >
>> > >
>> >
>> > > The meaning of the API is left for users to deduce. It's customary
>> >
>> > > understood as something like
>> >
>> > >
>> >
>> > > "allowing to bypass connection tracking in the firewall, potentially
>> >
>> > > providing performance and simplicity benefits" (while imposing
>> >
>> > > additional complexity onto rule definitions - the user now has to
>> >
>> > > explicitly define rules for both directions of a duplex connection.)
>> >
>> > > [This is not an official definition, nor it's quoted from a respected
>> >
>> > > source, please don't criticize it. I don't think this is an important
>> >
>> > > point here.]
>> >
>> > >
>> >
>> > > Either way, the definition doesn't explain what should happen with
>> >
>> > > basic network services that a user of Neutron SG API is used to rely
>> >
>> > > on. Specifically, what happens for a port related to a stateless SG
>> >
>> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
>> >
>> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
>> >
>> > > procedure to configure its IPv6 stack.
>> >
>> > >
>> >
>> > > As part of our testing of stateless SG implementation for OVN backend,
>> >
>> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to
>> >
>> > > configure IPv6.
>> >
>> > >
>> >
>> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
>> >
>> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
>> >
>> > >
>> >
>> > > We've noticed that adding explicit SG rules to allow 'returning'
>> >
>> > > communication for 169.254.169.254:80 and RA / NA fixes the problem.
>> >
>> > >
>> >
>> > > I figured that these services are "base" / "basic" and should be
>> >
>> > > provided to ports regardless of the stateful-ness of SG. I proposed
>> >
>> > > patches for this here:
>> >
>> > >
>> >
>> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
>> >
>> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
>> >
>> > >
>> >
>> > > Discussion in the patch that adjusts the existing stateless SG test
>> >
>> > > scenarios to not create explicit SG rules for metadata and ICMP
>> >
>> > > replies suggests that it's not a given / common understanding that
>> >
>> > > these "base" services should work by default for stateless SGs.
>> >
>> > >
>> >
>> > > See discussion in comments here:
>> >
>> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
>> >
>> > >
>> >
>> > > While this discussion is happening in the context of OVN, I think it
>> >
>> > > should be resolved in a broader context. Specifically, a decision
>> >
>> > > should be made about what Neutron API "means" by stateless SGs, and
>> >
>> > > how "base" services are supposed to behave. Then backends can act
>> >
>> > > accordingly.
>> >
>> > >
>> >
>> > > There's also an open question of how this should be implemented.
>> >
>> > > Whether Neutron would like to create explicit SG rules visible in API
>> >
>> > > that would allow for the returning traffic and that could be deleted
>> >
>> > > as needed, or whether backends should do it implicitly. We already
>> >
>> > > have "default" egress rules, so there's a precedent here. On the other
>> >
>> > > hand, the egress rules are broad (allowing everything) and there's
>> >
>> > > more rationale to delete them and replace them with tighter filters.
>> >
>> > > In my OVN series, I implement ACLs directly in OVN database, without
>> >
>> > > creating SG rules in Neutron API.
>> >
>> > >
>> >
>> > > So, questions for the community to clarify:
>> >
>> > > - whether Neutron API should define behavior of stateless SGs in general,
>> >
>> > > - if so, whether Neutron API should also define behavior of stateless
>> >
>> > > SGs in terms of "base" services like metadata and DHCP,
>> >
>> > > - if so, whether backends should implement the necessary filters
>> >
>> > > themselves, or Neutron will create default SG rules itself.
>> >
>> >
>> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
>> >
>> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
>> >
>> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
>> >
>>
>> Thanks for clarifying the rationale on picking SG rules and not
>> per-backend implementation.
>>
>> What would be your answer to the two other questions in the list
>> above, specifically, "whether Neutron API should define behavior of
>> stateless SGs in general" and "whether Neutron API should define
>> behavior of stateless SGs in relation to metadata / RA / NA". Once we
>> have agreement on these points, we can discuss the exact mechanism -
>> whether to implement in backend or in API. But these two questions are
>> first order in my view.
>>
>> (To give an idea of my thinking, I believe API definition should not
>> only define fields and their mechanics but also semantics, so
>>
>> - yes, api-ref should define the meaning ("behavior") of stateless SG
>> in general, and
>> - yes, api-ref should also define the meaning ("behavior") of
>> stateless SG in relation to "standard" services like ipv6 addressing
>> or metadata.
>>
>> As to the last question - whether it's up to ml2 backend to implement
>> the behavior, or up to the core SG database plugin - I don't have a
>> strong opinion. I lean to "backend" solution just because it allows
>> for more granular definition because SG rules may not express some
>> filter rules, e.g. source port for metadata replies (an unfortunate
>> limitation of SG API that we inherited from AWS?). But perhaps others
>> prefer paying the price for having neutron ml2 plugin enforcing the
>> behavior consistently across all backends.
>>
>> >
>> > >
>> >
>> > > I hope I laid the problem out clearly, let me know if anything needs
>> >
>> > > clarification or explanation.
>> >
>> >
>> > Yes :) At least for me.
>> >
>> >
>> > >
>> >
>> > > Yours,
>> >
>> > > Ihar
>> >
>> > >
>> >
>> > >
>> >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Slawek Kaplonski
>> >
>> > Principal Software Engineer
>> >
>> > Red Hat
>>
>>


From gmann at ghanshyammann.com  Wed Mar 22 17:09:03 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 10:09:03 -0700
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
References: <CAHV77z9uS9YnRZke75En-pAS=n0zy-SBF0hDEurz-myL98mXGQ@mail.gmail.com>
 <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com>
 <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
 <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
Message-ID: <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>


 ---- On Fri, 10 Mar 2023 12:55:49 -0800  Ghanshyam Mann  wrote --- 
 >  ---- On Wed, 22 Feb 2023 10:13:32 -0800  James Slagle  wrote --- 
 >  > On Wed, Feb 22, 2023 at 12:43 PM Ghanshyam Mann gmann at ghanshyammann.com> wrote:
 >  > > Hi James,
 >  > >
 >  > > Just checking if you got a chance to discuss this with the TripleO team?
 >  > 
 >  > Yes, I asked folks to reply here if there are any volunteers for
 >  > stable/zed maintenance, or any other feedback about the approach. I do
 >  > not personally know of any volunteers.
 >  
 > Ok. We discussed the stable/zed case in the TC meeting and decided[1] to keep stable/zed as 'supported
 > but no maintainers' (will update this information in stable/zed README.rst file).
 > 
 > For the master branch, you can follow the normal deprecation process mentioned in the project-team-guide[2].
 > I have proposed step 1 in governance to mark it deprecated, please check and we need PTL +1
 > on that.
 > 
 > - https://review.opendev.org/c/openstack/governance/+/877132
 > 
 > NOTE: As this is deprecated and not retired yet, we still need PTL nomination for TrilpeO[3]

Hi James, TripleO team,

Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL
as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects
- https://etherpad.opendev.org/p/2023.2-leaderless


-gmann

 > 
 > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-03-08-15.59.log.html#l-256
 > [2] https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository
 > [3] https://etherpad.opendev.org/p/2023.2-leaderless#L26
 > 
 > -gmann'
 > 
 > > 
 >  > -- 
 >  > -- James Slagle
 >  > --
 >  > 
 > 
 > 


From swogatpradhan22 at gmail.com  Wed Mar 22 15:37:28 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Wed, 22 Mar 2023 21:07:28 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
Message-ID: <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>

Hi Adam,
The systems are in same LAN, in this case it seemed like the image was
getting pulled from the central site which was caused due to an
misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
directory, which seems to have been resolved after the changes i made to
fix it.

Right now the glance api podman is running in unhealthy state and the
podman logs don't show any error whatsoever and when issued the command
netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
site, which is why cinder is throwing an error stating:

2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
finding address for
http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
Unable to establish connection to
http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
ECONNREFUSED',))

Now i need to find out why the port is not listed as the glance service is
running, which i am not sure how to find out.

With regards,
Swogat Pradhan

On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com> wrote:

>
>
> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Update:
>> Here is the log when creating a volume using cirros image:
>>
>> 2023-03-22 11:04:38.449 109 INFO
>> cinder.volume.flows.manager.create_volume
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>> specification: {'status': 'creating', 'volume_name':
>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>> 'owner_specified.openstack.object': 'images/cirros',
>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>
>
> As Adam Savage would say, well there's your problem ^^ (Image download
> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
> suggests you have a network issue.
>
> John Fulton previously stated your cinder-volume service at the edge site
> is not using the local ceph image store. Assuming you are deploying
> GlanceApiEdge service [1], then the cinder-volume service should be
> configured to use the local glance service [2]. You should check cinder's
> glance_api_servers to confirm it's the edge site's glance service.
>
> [1]
> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
> [2]
> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>
> Alan
>
>
>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -]
>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>> FutureWarning: The human format is deprecated and the format parameter will
>> be removed. Use explicitly json instead in version 'xena'
>>   category=FutureWarning)
>>
>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -]
>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>> FutureWarning: The human format is deprecated and the format parameter will
>> be removed. Use explicitly json instead in version 'xena'
>>   category=FutureWarning)
>>
>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>> MB/s
>> 2023-03-22 11:11:14.998 109 INFO
>> cinder.volume.flows.manager.create_volume
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>
>> The image is present in dcn02 store but still it downloaded the image in
>> 0.16 MB/s and then created the volume.
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi Jhon,
>>> This seems to be an issue.
>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>> parameter was specified to the respective cluster names but the config
>>> files were created in the name of ceph.conf and keyring was
>>> ceph.client.openstack.keyring.
>>>
>>> Which created issues in glance as well as the naming convention of the
>>> files didn't match the cluster names, so i had to manually rename the
>>> central ceph conf file as such:
>>>
>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>> [root at dcn02-compute-0 ceph]# ll
>>> total 16
>>> -rw-------. 1 root root 257 Mar 13 13:56
>>> ceph_central.client.openstack.keyring
>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>> [root at dcn02-compute-0 ceph]#
>>>
>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>> respective clusters in both dcn01 and dcn02.
>>> In the above cli output, the ceph.conf and ceph.client... are the files
>>> used to access dcn02 ceph cluster and ceph_central* files are used in for
>>> accessing central ceph cluster.
>>>
>>> glance multistore config:
>>> [dcn02]
>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=dcn02 rbd glance store
>>>
>>> [ceph_central]
>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=Default glance store backend.
>>>
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com> wrote:
>>>
>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>> <swogatpradhan22 at gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> > Seems like cinder is not using the local ceph.
>>>>
>>>> That explains the issue. It's a misconfiguration.
>>>>
>>>> I hope this is not a production system since the mailing list now has
>>>> the cinder.conf which contains passwords.
>>>>
>>>> The section that looks like this:
>>>>
>>>> [tripleo_ceph]
>>>> volume_backend_name=tripleo_ceph
>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>> rbd_user=openstack
>>>> rbd_pool=volumes
>>>> rbd_flatten_volume_from_snapshot=False
>>>> rbd_secret_uuid=<redacted>
>>>> report_discard_supported=True
>>>>
>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>> rbd_secret_uuid corresponds to that one.
>>>>
>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>> secret-get-value $FSID`.
>>>>
>>>> The documentation describes how to configure the central and DCN sites
>>>> correctly but an error seems to have occurred while you were following
>>>> it.
>>>>
>>>>
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>
>>>>   John
>>>>
>>>> >
>>>> > Ceph Output:
>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>> > NAME                                       SIZE     PARENT  FMT
>>>> PROT  LOCK
>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>   excl
>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>>>> >
>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>> > NAME                                         SIZE     PARENT  FMT
>>>> PROT  LOCK
>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>> >
>>>> > Attached the cinder config.
>>>> > Please let me know how I can solve this issue.
>>>> >
>>>> > With regards,
>>>> > Swogat Pradhan
>>>> >
>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>> wrote:
>>>> >>
>>>> >> in my last message under the line "On a DCN site if you run a
>>>> command like this:" I suggested some steps you could try to confirm the
>>>> image is a COW from the local glance as well as how to look at your cinder
>>>> config.
>>>> >>
>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>
>>>> >>> Update:
>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>> >>> The image size is 389 MB.
>>>> >>>
>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>
>>>> >>>> Hi Jhon,
>>>> >>>> I checked in the ceph od dcn02, I can see the images created after
>>>> importing from the central site.
>>>> >>>> But launching an instance normally fails as it takes a long time
>>>> for the volume to get created.
>>>> >>>>
>>>> >>>> When launching an instance from volume the instance is getting
>>>> created properly without any errors.
>>>> >>>>
>>>> >>>> I tried to cache images in nova using
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>> but getting checksum failed error.
>>>> >>>>
>>>> >>>> With regards,
>>>> >>>> Swogat Pradhan
>>>> >>>>
>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >
>>>> >>>>> > Update: After restarting the nova services on the controller
>>>> and running the deploy script on the edge site, I was able to launch the VM
>>>> from volume.
>>>> >>>>> >
>>>> >>>>> > Right now the instance creation is failing as the block device
>>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>>> volume to be created, whereas the image has already been imported to the
>>>> edge glance.
>>>> >>>>>
>>>> >>>>> Try following this document and making the same observations in
>>>> your
>>>> >>>>> environment for AZs and their local ceph cluster.
>>>> >>>>>
>>>> >>>>>
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>> >>>>>
>>>> >>>>> On a DCN site if you run a command like this:
>>>> >>>>>
>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>> >>>>> NAME                                      SIZE  PARENT
>>>> >>>>>                           FMT PROT LOCK
>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>> >>>>> $
>>>> >>>>>
>>>> >>>>> Then, you should see the parent of the volume is the image which
>>>> is on
>>>> >>>>> the same local ceph cluster.
>>>> >>>>>
>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>> encountering
>>>> >>>>> the streaming behavior described here:
>>>> >>>>>
>>>> >>>>> Ideally all images should reside in the central Glance and be
>>>> copied
>>>> >>>>> to DCN sites before instances of those images are booted on DCN
>>>> sites.
>>>> >>>>> If an image is not copied to a DCN site before it is booted, then
>>>> the
>>>> >>>>> image will be streamed to the DCN site and then the image will
>>>> boot as
>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>> access to
>>>> >>>>> the images store at the Central ceph cluster. Though the booting
>>>> of
>>>> >>>>> the image will take time because it has not been copied in
>>>> advance,
>>>> >>>>> this is still preferable to failing to boot the image.
>>>> >>>>>
>>>> >>>>> You can also exec into the cinder container at the DCN site and
>>>> >>>>> confirm it's using it's local ceph cluster.
>>>> >>>>>
>>>> >>>>>   John
>>>> >>>>>
>>>> >>>>> >
>>>> >>>>> > I will try and create a new fresh image and test again then
>>>> update.
>>>> >>>>> >
>>>> >>>>> > With regards,
>>>> >>>>> > Swogat Pradhan
>>>> >>>>> >
>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>
>>>> >>>>> >> Update:
>>>> >>>>> >> In the hypervisor list the compute node state is showing down.
>>>> >>>>> >>
>>>> >>>>> >>
>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>>
>>>> >>>>> >>> Hi Brendan,
>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>>> timed out so i waited for the volume to be created.
>>>> >>>>> >>> Once the volume was created i tried launching the instance
>>>> from the volume and still the instance is stuck in spawning state.
>>>> >>>>> >>>
>>>> >>>>> >>> Here is the nova-compute log:
>>>> >>>>> >>>
>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>>>> privsep daemon starting
>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>>>> privsep process running with uid/gid: 0/0
>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>> privsep process running with capabilities (eff/prm/inh):
>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>> privsep daemon running as pid 185437
>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>> os_brick.initiator.connectors.nvmeof
>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>> in _get_host_uuid: Unexpected error while running command.
>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>> >>>>> >>> Exit code: 2
>>>> >>>>> >>> Stdout: ''
>>>> >>>>> >>> Stderr: '':
>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>> running command.
>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>> >>>>> >>>
>>>> >>>>> >>> It is stuck in creating image, do i need to run the template
>>>> mentioned here ?:
>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>> >>>>> >>>
>>>> >>>>> >>> The volume is already created and i do not understand why the
>>>> instance is stuck in spawning state.
>>>> >>>>> >>>
>>>> >>>>> >>> With regards,
>>>> >>>>> >>> Swogat Pradhan
>>>> >>>>> >>>
>>>> >>>>> >>>
>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>> bshephar at redhat.com> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Does your environment use different network interfaces for
>>>> each of the networks? Or does it have a bond with everything on it?
>>>> >>>>> >>>>
>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>> instances, there is a lot of network traffic between nodes as the
>>>> hypervisor needs to download the image from Glance. Along with various
>>>> other services sending normal network traffic, it can be enough to cause
>>>> issues if everything is running over a single 1Gbe interface.
>>>> >>>>> >>>>
>>>> >>>>> >>>> I have seen the same situation in fact when using a single
>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>>> while you try to spawn the instance to see if you?re dropping packets. In
>>>> the situation I described, there were dropped packets which resulted in a
>>>> loss of communication between nova_compute and RMQ, so the node appeared
>>>> offline. You should also confirm that nova_compute is being disconnected in
>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>>> instance.
>>>> >>>>> >>>>
>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
>>>> based on that experience, from my perspective, is certainly sounds like
>>>> some kind of network issue.
>>>> >>>>> >>>>
>>>> >>>>> >>>> Regards,
>>>> >>>>> >>>>
>>>> >>>>> >>>> Brendan Shephard
>>>> >>>>> >>>> Senior Software Engineer
>>>> >>>>> >>>> Red Hat Australia
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>>
>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago
>>>> in this thread:
>>>> >>>>> >>>>
>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>> >>>>> >>>>
>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>>> user, not sure if that could apply here. But is it possible that your nova
>>>> and neutron versions are different between central and edge site? Have you
>>>> restarted nova and neutron services on the compute nodes after
>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>> Maybe they can help narrow down the issue.
>>>> >>>>> >>>> If there isn't any additional information in the debug logs
>>>> I probably would start "tearing down" rabbitmq. I didn't have to do that in
>>>> a production system yet so be careful. I can think of two routes:
>>>> >>>>> >>>>
>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>> running, this will most likely impact client IO depending on your load.
>>>> Check out the rabbitmqctl commands.
>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>> >>>>> >>>>
>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>>> internals too well, so maybe someone else can chime in here and give a
>>>> better advice.
>>>> >>>>> >>>>
>>>> >>>>> >>>> Regards,
>>>> >>>>> >>>> Eugen
>>>> >>>>> >>>>
>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>> >>>>> >>>>
>>>> >>>>> >>>> With regards,
>>>> >>>>> >>>> Swogat Pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi
>>>> >>>>> >>>> I don't see any major packet loss.
>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not
>>>> due to packet
>>>> >>>>> >>>> loss.
>>>> >>>>> >>>>
>>>> >>>>> >>>> with regards,
>>>> >>>>> >>>> Swogat Pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> Hi,
>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked
>>>> when
>>>> >>>>> >>>> launching the instance.
>>>> >>>>> >>>> I will check that and come back.
>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck
>>>> at spawning
>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
>>>> packet loss
>>>> >>>>> >>>> causes this.
>>>> >>>>> >>>>
>>>> >>>>> >>>> With regards,
>>>> >>>>> >>>> Swogat pradhan
>>>> >>>>> >>>>
>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>>> wrote:
>>>> >>>>> >>>>
>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>> identical between
>>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>>> tunnel?
>>>> >>>>> >>>>
>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>> >>>>> >>>>
>>>> >>>>> >>>> > Hi Eugen,
>>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc'
>>>> as i am not
>>>> >>>>> >>>> > getting email's from you.
>>>> >>>>> >>>> > Coming to the issue:
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>> list_policies -p
>>>> >>>>> >>>> /
>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>> priority
>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>> >>>>> >>>> >
>>>> >>>>> >>>>
>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>>>> when i am
>>>> >>>>> >>>> trying
>>>> >>>>> >>>> > to launch an instance and the instance comes to a spawning
>>>> state and
>>>> >>>>> >>>> then
>>>> >>>>> >>>> > gets stuck.
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>>> sites.
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > With regards,
>>>> >>>>> >>>> > Swogat Pradhan
>>>> >>>>> >>>> >
>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> > wrote:
>>>> >>>>> >>>> >
>>>> >>>>> >>>> >> Hi Eugen,
>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>> directly, i am
>>>> >>>>> >>>> checking
>>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue occurred.
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>> activities in the
>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> With regards,
>>>> >>>>> >>>> >> Swogat Pradhan
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> >> wrote:
>>>> >>>>> >>>> >>
>>>> >>>>> >>>> >>> Hi Eugen,
>>>> >>>>> >>>> >>> Thanks for your response.
>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>>> details:
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *PCS Status:*
>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>> >>>>> >>>> >>>
>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-2
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-1
>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>> >>>>> >>>> Started
>>>> >>>>> >>>> >>> overcloud-controller-0
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
>>>> the issue is
>>>> >>>>> >>>> still
>>>> >>>>> >>>> >>> present.
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *Cluster status:*
>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>> cluster_status
>>>> >>>>> >>>> >>> Cluster status of node
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> ...
>>>> >>>>> >>>> >>> Basics
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Cluster name:
>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Disk Nodes
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Running Nodes
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Versions
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>> RabbitMQ
>>>> >>>>> >>>> 3.8.3
>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>> >>>>> >>>> >>>
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>> >>>>> >>>> RabbitMQ
>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Alarms
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> (none)
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Network Partitions
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> (none)
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Listeners
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>> inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
>>>> AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>> inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
>>>> AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>> inter-node and CLI
>>>> >>>>> >>>> tool
>>>> >>>>> >>>> >>> communication
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
>>>> AMQP 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>> >>>>> >>>> interface:
>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>> purpose:
>>>> >>>>> >>>> inter-node and
>>>> >>>>> >>>> >>> CLI tool communication
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>>> purpose: AMQP
>>>> >>>>> >>>> 0-9-1
>>>> >>>>> >>>> >>> and AMQP 1.0
>>>> >>>>> >>>> >>> Node:
>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>> >>>>> >>>> ,
>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>>>> HTTP API
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Feature flags
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> *Logs:*
>>>> >>>>> >>>> >>> *(Attached)*
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> With regards,
>>>> >>>>> >>>> >>> Swogat Pradhan
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>> >>>>> >>>> >>> wrote:
>>>> >>>>> >>>> >>>
>>>> >>>>> >>>> >>>> Hi,
>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> nova-conuctor:
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>> The reply
>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after
>>>> 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> The reply
>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after
>>>> 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> The reply
>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after
>>>> 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> Cache enabled
>>>> >>>>> >>>> with
>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>> drop reply to
>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>> oslo_messaging._drivers.amqpdriver
>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>> The reply
>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after
>>>> 60 seconds
>>>> >>>>> >>>> due to a
>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>> >>>>> >>>> Abandoning...:
>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> With regards,
>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>> >>>>> Hi,
>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where i
>>>> am trying to
>>>> >>>>> >>>> >>>>> launch vm's.
>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>>> (openstack
>>>> >>>>> >>>> compute
>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
>>>> the nova
>>>> >>>>> >>>> compute
>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> nova-compute.log
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>>>> Running
>>>> >>>>> >>>> >>>>> instance usage
>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>> 2023-02-26 07:00:00
>>>> >>>>> >>>> to
>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful
>>>> on node
>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>> supplied device
>>>> >>>>> >>>> name:
>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>>> volume
>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> Cache enabled
>>>> >>>>> >>>> with
>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> Running
>>>> >>>>> >>>> >>>>> privsep helper:
>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>> >>>>> >>>> 'privsep-helper',
>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>> '--config-file',
>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> Spawned new
>>>> >>>>> >>>> privsep
>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon
>>>> [-] privsep
>>>> >>>>> >>>> >>>>> daemon starting
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon
>>>> [-] privsep
>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>>> [-] privsep
>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>>> [-] privsep
>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> Process
>>>> >>>>> >>>> >>>>> execution error
>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>> command.
>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>> >>>>> >>>> >>>>> Exit code: 2
>>>> >>>>> >>>> >>>>> Stdout: ''
>>>> >>>>> >>>> >>>>> Stderr: '':
>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>> [instance:
>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> With regards,
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>> >>>>> >>>> >>>>>
>>>> >>>>> >>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>> >>>>
>>>> >>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/2c7b84fb/attachment-0001.htm>

From abishop at redhat.com  Wed Mar 22 15:49:37 2023
From: abishop at redhat.com (Alan Bishop)
Date: Wed, 22 Mar 2023 08:49:37 -0700
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
Message-ID: <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>

On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Adam,
> The systems are in same LAN, in this case it seemed like the image was
> getting pulled from the central site which was caused due to an
> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
> directory, which seems to have been resolved after the changes i made to
> fix it.
>
> Right now the glance api podman is running in unhealthy state and the
> podman logs don't show any error whatsoever and when issued the command
> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
> site, which is why cinder is throwing an error stating:
>
> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
> finding address for
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> Unable to establish connection to
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
> NewConnectionError('<urllib3.connection.HTTPConnection object at
> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
> ECONNREFUSED',))
>
> Now i need to find out why the port is not listed as the glance service is
> running, which i am not sure how to find out.
>

One other thing to investigate is whether your deployment includes this
patch [1]. If it does, then bear in mind
the glance-api service running at the edge site will be an "internal" (non
public facing) instance that uses port 9293
instead of 9292. You should familiarize yourself with the release note [2].

[1]
https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
[2]
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml

Alan


> With regards,
> Swogat Pradhan
>
> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Update:
>>> Here is the log when creating a volume using cirros image:
>>>
>>> 2023-03-22 11:04:38.449 109 INFO
>>> cinder.volume.flows.manager.create_volume
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>> specification: {'status': 'creating', 'volume_name':
>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>> 'owner_specified.openstack.object': 'images/cirros',
>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>
>>
>> As Adam Savage would say, well there's your problem ^^ (Image download
>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
>> suggests you have a network issue.
>>
>> John Fulton previously stated your cinder-volume service at the edge site
>> is not using the local ceph image store. Assuming you are deploying
>> GlanceApiEdge service [1], then the cinder-volume service should be
>> configured to use the local glance service [2]. You should check cinder's
>> glance_api_servers to confirm it's the edge site's glance service.
>>
>> [1]
>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>> [2]
>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>
>> Alan
>>
>>
>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>> FutureWarning: The human format is deprecated and the format parameter will
>>> be removed. Use explicitly json instead in version 'xena'
>>>   category=FutureWarning)
>>>
>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>> FutureWarning: The human format is deprecated and the format parameter will
>>> be removed. Use explicitly json instead in version 'xena'
>>>   category=FutureWarning)
>>>
>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>> MB/s
>>> 2023-03-22 11:11:14.998 109 INFO
>>> cinder.volume.flows.manager.create_volume
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>
>>> The image is present in dcn02 store but still it downloaded the image in
>>> 0.16 MB/s and then created the volume.
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Hi Jhon,
>>>> This seems to be an issue.
>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>> parameter was specified to the respective cluster names but the config
>>>> files were created in the name of ceph.conf and keyring was
>>>> ceph.client.openstack.keyring.
>>>>
>>>> Which created issues in glance as well as the naming convention of the
>>>> files didn't match the cluster names, so i had to manually rename the
>>>> central ceph conf file as such:
>>>>
>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>> [root at dcn02-compute-0 ceph]# ll
>>>> total 16
>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>> ceph_central.client.openstack.keyring
>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>> [root at dcn02-compute-0 ceph]#
>>>>
>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>>> respective clusters in both dcn01 and dcn02.
>>>> In the above cli output, the ceph.conf and ceph.client... are the files
>>>> used to access dcn02 ceph cluster and ceph_central* files are used in for
>>>> accessing central ceph cluster.
>>>>
>>>> glance multistore config:
>>>> [dcn02]
>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=dcn02 rbd glance store
>>>>
>>>> [ceph_central]
>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=Default glance store backend.
>>>>
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>> wrote:
>>>>
>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>> >
>>>>> > Hi,
>>>>> > Seems like cinder is not using the local ceph.
>>>>>
>>>>> That explains the issue. It's a misconfiguration.
>>>>>
>>>>> I hope this is not a production system since the mailing list now has
>>>>> the cinder.conf which contains passwords.
>>>>>
>>>>> The section that looks like this:
>>>>>
>>>>> [tripleo_ceph]
>>>>> volume_backend_name=tripleo_ceph
>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>> rbd_user=openstack
>>>>> rbd_pool=volumes
>>>>> rbd_flatten_volume_from_snapshot=False
>>>>> rbd_secret_uuid=<redacted>
>>>>> report_discard_supported=True
>>>>>
>>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>>> rbd_secret_uuid corresponds to that one.
>>>>>
>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>> secret-get-value $FSID`.
>>>>>
>>>>> The documentation describes how to configure the central and DCN sites
>>>>> correctly but an error seems to have occurred while you were following
>>>>> it.
>>>>>
>>>>>
>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>
>>>>>   John
>>>>>
>>>>> >
>>>>> > Ceph Output:
>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>> > NAME                                       SIZE     PARENT  FMT
>>>>> PROT  LOCK
>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>>   excl
>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2  yes
>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2  yes
>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2  yes
>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2  yes
>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2  yes
>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2  yes
>>>>> >
>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>> > NAME                                         SIZE     PARENT  FMT
>>>>> PROT  LOCK
>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>> >
>>>>> > Attached the cinder config.
>>>>> > Please let me know how I can solve this issue.
>>>>> >
>>>>> > With regards,
>>>>> > Swogat Pradhan
>>>>> >
>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>>> wrote:
>>>>> >>
>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>> config.
>>>>> >>
>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>
>>>>> >>> Update:
>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>> >>> The image size is 389 MB.
>>>>> >>>
>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> Hi Jhon,
>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>> after importing from the central site.
>>>>> >>>> But launching an instance normally fails as it takes a long time
>>>>> for the volume to get created.
>>>>> >>>>
>>>>> >>>> When launching an instance from volume the instance is getting
>>>>> created properly without any errors.
>>>>> >>>>
>>>>> >>>> I tried to cache images in nova using
>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>> but getting checksum failed error.
>>>>> >>>>
>>>>> >>>> With regards,
>>>>> >>>> Swogat Pradhan
>>>>> >>>>
>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>>>>> wrote:
>>>>> >>>>>
>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>> >
>>>>> >>>>> > Update: After restarting the nova services on the controller
>>>>> and running the deploy script on the edge site, I was able to launch the VM
>>>>> from volume.
>>>>> >>>>> >
>>>>> >>>>> > Right now the instance creation is failing as the block device
>>>>> creation is stuck in creating state, it is taking more than 10 mins for the
>>>>> volume to be created, whereas the image has already been imported to the
>>>>> edge glance.
>>>>> >>>>>
>>>>> >>>>> Try following this document and making the same observations in
>>>>> your
>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>> >>>>>
>>>>> >>>>>
>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>> >>>>>
>>>>> >>>>> On a DCN site if you run a command like this:
>>>>> >>>>>
>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>> >>>>>                           FMT PROT LOCK
>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>>> >>>>> $
>>>>> >>>>>
>>>>> >>>>> Then, you should see the parent of the volume is the image which
>>>>> is on
>>>>> >>>>> the same local ceph cluster.
>>>>> >>>>>
>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>> encountering
>>>>> >>>>> the streaming behavior described here:
>>>>> >>>>>
>>>>> >>>>> Ideally all images should reside in the central Glance and be
>>>>> copied
>>>>> >>>>> to DCN sites before instances of those images are booted on DCN
>>>>> sites.
>>>>> >>>>> If an image is not copied to a DCN site before it is booted,
>>>>> then the
>>>>> >>>>> image will be streamed to the DCN site and then the image will
>>>>> boot as
>>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>>> access to
>>>>> >>>>> the images store at the Central ceph cluster. Though the booting
>>>>> of
>>>>> >>>>> the image will take time because it has not been copied in
>>>>> advance,
>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>> >>>>>
>>>>> >>>>> You can also exec into the cinder container at the DCN site and
>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>> >>>>>
>>>>> >>>>>   John
>>>>> >>>>>
>>>>> >>>>> >
>>>>> >>>>> > I will try and create a new fresh image and test again then
>>>>> update.
>>>>> >>>>> >
>>>>> >>>>> > With regards,
>>>>> >>>>> > Swogat Pradhan
>>>>> >>>>> >
>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>> >>
>>>>> >>>>> >> Update:
>>>>> >>>>> >> In the hypervisor list the compute node state is showing down.
>>>>> >>>>> >>
>>>>> >>>>> >>
>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>> >>>
>>>>> >>>>> >>> Hi Brendan,
>>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>>>> timed out so i waited for the volume to be created.
>>>>> >>>>> >>> Once the volume was created i tried launching the instance
>>>>> from the volume and still the instance is stuck in spawning state.
>>>>> >>>>> >>>
>>>>> >>>>> >>> Here is the nova-compute log:
>>>>> >>>>> >>>
>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>>>>> privsep daemon starting
>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>>>>> privsep process running with uid/gid: 0/0
>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>>> privsep process running with capabilities (eff/prm/inh):
>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>>> privsep daemon running as pid 185437
>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>> os_brick.initiator.connectors.nvmeof
>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>> >>>>> >>> Exit code: 2
>>>>> >>>>> >>> Stdout: ''
>>>>> >>>>> >>> Stderr: '':
>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>> running command.
>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>> >>>>> >>>
>>>>> >>>>> >>> It is stuck in creating image, do i need to run the template
>>>>> mentioned here ?:
>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>> >>>>> >>>
>>>>> >>>>> >>> The volume is already created and i do not understand why
>>>>> the instance is stuck in spawning state.
>>>>> >>>>> >>>
>>>>> >>>>> >>> With regards,
>>>>> >>>>> >>> Swogat Pradhan
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>> bshephar at redhat.com> wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Does your environment use different network interfaces for
>>>>> each of the networks? Or does it have a bond with everything on it?
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>> instances, there is a lot of network traffic between nodes as the
>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>> other services sending normal network traffic, it can be enough to cause
>>>>> issues if everything is running over a single 1Gbe interface.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> I have seen the same situation in fact when using a single
>>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>>>> while you try to spawn the instance to see if you?re dropping packets. In
>>>>> the situation I described, there were dropped packets which resulted in a
>>>>> loss of communication between nova_compute and RMQ, so the node appeared
>>>>> offline. You should also confirm that nova_compute is being disconnected in
>>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>>>> instance.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped. So,
>>>>> based on that experience, from my perspective, is certainly sounds like
>>>>> some kind of network issue.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Regards,
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Brendan Shephard
>>>>> >>>>> >>>> Senior Software Engineer
>>>>> >>>>> >>>> Red Hat Australia
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>> wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Hi,
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago
>>>>> in this thread:
>>>>> >>>>> >>>>
>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>>>> user, not sure if that could apply here. But is it possible that your nova
>>>>> and neutron versions are different between central and edge site? Have you
>>>>> restarted nova and neutron services on the compute nodes after
>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>> Maybe they can help narrow down the issue.
>>>>> >>>>> >>>> If there isn't any additional information in the debug logs
>>>>> I probably would start "tearing down" rabbitmq. I didn't have to do that in
>>>>> a production system yet so be careful. I can think of two routes:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>> running, this will most likely impact client IO depending on your load.
>>>>> Check out the rabbitmqctl commands.
>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>>>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>>>> internals too well, so maybe someone else can chime in here and give a
>>>>> better advice.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Regards,
>>>>> >>>>> >>>> Eugen
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Hi,
>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> With regards,
>>>>> >>>>> >>>> Swogat Pradhan
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com>
>>>>> >>>>> >>>> wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Hi
>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but not
>>>>> due to packet
>>>>> >>>>> >>>> loss.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> with regards,
>>>>> >>>>> >>>> Swogat Pradhan
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com>
>>>>> >>>>> >>>> wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Hi,
>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never checked
>>>>> when
>>>>> >>>>> >>>> launching the instance.
>>>>> >>>>> >>>> I will check that and come back.
>>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck
>>>>> at spawning
>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure if
>>>>> packet loss
>>>>> >>>>> >>>> causes this.
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> With regards,
>>>>> >>>>> >>>> Swogat pradhan
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>>>> wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>> identical between
>>>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>>>> tunnel?
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> > Hi Eugen,
>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or 'cc'
>>>>> as i am not
>>>>> >>>>> >>>> > getting email's from you.
>>>>> >>>>> >>>> > Coming to the issue:
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>>> list_policies -p
>>>>> >>>>> >>>> /
>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>> priority
>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>>
>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>>>>> when i am
>>>>> >>>>> >>>> trying
>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>> spawning state and
>>>>> >>>>> >>>> then
>>>>> >>>>> >>>> > gets stuck.
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>>>> sites.
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> > With regards,
>>>>> >>>>> >>>> > Swogat Pradhan
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>>> >>>> > wrote:
>>>>> >>>>> >>>> >
>>>>> >>>>> >>>> >> Hi Eugen,
>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>> directly, i am
>>>>> >>>>> >>>> checking
>>>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>>>> >>>>> >>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>> occurred.
>>>>> >>>>> >>>> >>
>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>> activities in the
>>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>>> >>>>> >>>> >>
>>>>> >>>>> >>>> >> With regards,
>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>> >>>>> >>>> >>
>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>>> >>>> >> wrote:
>>>>> >>>>> >>>> >>
>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>>>> details:
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>> >>>>> >>>> >>>
>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>>> >>>> Started
>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>>> >>>> Started
>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>>> >>>> Started
>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>> >>>>> >>>> Started
>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
>>>>> the issue is
>>>>> >>>>> >>>> still
>>>>> >>>>> >>>> >>> present.
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>> cluster_status
>>>>> >>>>> >>>> >>> Cluster status of node
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> ...
>>>>> >>>>> >>>> >>> Basics
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Cluster name:
>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Disk Nodes
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>>
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Running Nodes
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>>
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Versions
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>>> RabbitMQ
>>>>> >>>>> >>>> 3.8.3
>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>>> RabbitMQ
>>>>> >>>>> >>>> 3.8.3
>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>>> RabbitMQ
>>>>> >>>>> >>>> 3.8.3
>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>> >>>>> >>>> >>>
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>> >>>>> >>>> RabbitMQ
>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Alarms
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> (none)
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Network Partitions
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> (none)
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Listeners
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>> inter-node and CLI
>>>>> >>>>> >>>> tool
>>>>> >>>>> >>>> >>> communication
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
>>>>> AMQP 0-9-1
>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>> inter-node and CLI
>>>>> >>>>> >>>> tool
>>>>> >>>>> >>>> >>> communication
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
>>>>> AMQP 0-9-1
>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>> inter-node and CLI
>>>>> >>>>> >>>> tool
>>>>> >>>>> >>>> >>> communication
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
>>>>> AMQP 0-9-1
>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>> >>>>> >>>> interface:
>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>>> >>>> ,
>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>>> purpose:
>>>>> >>>>> >>>> inter-node and
>>>>> >>>>> >>>> >>> CLI tool communication
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>>> >>>> ,
>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>>>> purpose: AMQP
>>>>> >>>>> >>>> 0-9-1
>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>> >>>>> >>>> >>> Node:
>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>> >>>>> >>>> ,
>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>>>>> HTTP API
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Feature flags
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> *Logs:*
>>>>> >>>>> >>>> >>> *(Attached)*
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> With regards,
>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>> >>>>> >>>> >>> wrote:
>>>>> >>>>> >>>> >>>
>>>>> >>>>> >>>> >>>> Hi,
>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api log.
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>> The reply
>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after
>>>>> 60 seconds
>>>>> >>>>> >>>> due to a
>>>>> >>>>> >>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>>>> >>>>> >>>> Abandoning...:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> The reply
>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after
>>>>> 60 seconds
>>>>> >>>>> >>>> due to a
>>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>>> >>>> Abandoning...:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> The reply
>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after
>>>>> 60 seconds
>>>>> >>>>> >>>> due to a
>>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>>> >>>> Abandoning...:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> Cache enabled
>>>>> >>>>> >>>> with
>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>> drop reply to
>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>> oslo_messaging._drivers.amqpdriver
>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>> The reply
>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after
>>>>> 60 seconds
>>>>> >>>>> >>>> due to a
>>>>> >>>>> >>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>>>> >>>>> >>>> Abandoning...:
>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>> >>>> With regards,
>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>> >>>>> Hi,
>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where
>>>>> i am trying to
>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>>>> (openstack
>>>>> >>>>> >>>> compute
>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
>>>>> the nova
>>>>> >>>>> >>>> compute
>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>>>>> Running
>>>>> >>>>> >>>> >>>>> instance usage
>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>> 2023-02-26 07:00:00
>>>>> >>>>> >>>> to
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> [instance:
>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>> successful on node
>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>> nova.virt.libvirt.driver
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> [instance:
>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>> supplied device
>>>>> >>>>> >>>> name:
>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> [instance:
>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>>>> volume
>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> Cache enabled
>>>>> >>>>> >>>> with
>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> Running
>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>>> >>>>> >>>> 'privsep-helper',
>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>> '--config-file',
>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> Spawned new
>>>>> >>>>> >>>> privsep
>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon
>>>>> [-] privsep
>>>>> >>>>> >>>> >>>>> daemon starting
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon
>>>>> [-] privsep
>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>>>> [-] privsep
>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon
>>>>> [-] privsep
>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> Process
>>>>> >>>>> >>>> >>>>> execution error
>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>>> command.
>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>> nova.virt.libvirt.driver
>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>> [instance:
>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>> With regards,
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>> >>>>> >>>> >>>>>
>>>>> >>>>> >>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>>
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/d52cb3b4/attachment-0001.htm>

From gmann at ghanshyammann.com  Wed Mar 22 17:20:28 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 10:20:28 -0700
Subject: OpenStack 2023.1 Antelope is officially released!
In-Reply-To: <VI1P18901MB0751BDF13B076B86A32D9C34FF869@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
References: <CAFDq9gWVTrZSPCZ+3JU27e2VsA5nnBSEMWcYXqv51AGt==83xw@mail.gmail.com>
 <VI1P18901MB0751BDF13B076B86A32D9C34FF869@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
Message-ID: <1870a561cc9.c41d7834993030.554598293953461081@ghanshyammann.com>

Congratulation and thanks, Everyone.

A special thanks to the release team for another *on-time* release and great work.

-gmann

 ---- On Wed, 22 Mar 2023 08:58:16 -0700  El?d Ill?s  wrote --- 
 > div.zm_-5934569411081988831_parse_4848926974378694865 P { margin-top: 0; margin-bottom: 0 }Let me join and thank to all who were part of the 2023.1 Antelopedevelopment cycle!Also note, that this marks the official opening of the openstack/releasesrepository for 2023.2 Bobcat releases, and freezes are now lifted.stable/2023.1 is now a fully normal stable branch, and the normal stablepolicy applies from now on.Thanks,El?d Ill?s
 > 
 > From: Herve Beraud hberaud at redhat.com>
 > Sent: Wednesday, March 22, 2023 4:18 PM
 > To: openstack-discuss openstack-discuss at lists.openstack.org>
 > Subject: OpenStack 2023.1 Antelope is officially released!?Hello OpenStack community,I'm excited to announce the final releases for the components ofOpenStack 2023.1 Antelope, which conclude the 2023.1 Antelopedevelopment cycle.You will find a complete list of all components, their latestversions, and links to individual project release notes documentslisted on the new release site.  https://releases.openstack.org/antelope/Congratulations to all of the teams who have contributed to thisrelease!Our next production cycle, 2023.2 Bobcat, has already started. We willmeet at the Virtual Project Team Gathering, March 27-31, 2023, to planthe work for the upcoming cycle. I hope to see you there!Thanks,OpenStack Release Management team-- 
 > Herv? BeraudSenior Software Engineer at Red Hatirc: hberaudhttps://github.com/4383/
 > 
 > 


From gmann at ghanshyammann.com  Wed Mar 22 17:43:48 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 10:43:48 -0700
Subject: [ptl] Need PTL volunteer for OpenStack Winstackers
Message-ID: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com>

Hi Lukas,

I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle.

There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please
check if you or anyone you know would like to lead this project.

- https://etherpad.opendev.org/p/2023.2-leaderless

Also, if anyone else would like to help leading this project, this is time to let TC knows.

-gmann


From gmann at ghanshyammann.com  Wed Mar 22 17:43:55 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 10:43:55 -0700
Subject: [ptl] Need PTL volunteer for OpenStack Vitrage
Message-ID: <1870a6b95c5.105ea64ba994248.8904640311035076666@ghanshyammann.com>

Hi Eyal,

I am reaching out to you as you were PTL for OpenStack Vitrage project in the last cycle.

There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please
check if you or anyone you know would like to lead this project.

- https://etherpad.opendev.org/p/2023.2-leaderless

Also, if anyone else would like to help leading this project, this is time to let TC knows.

-gmann


From rdhasman at redhat.com  Wed Mar 22 18:14:06 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Wed, 22 Mar 2023 23:44:06 +0530
Subject: [cinder] Error when creating backups from iscsi volume
In-Reply-To: <20230316121023.tdzgu6zinm7spvjp@localhost>
References: <CAB_Ljt4LbpvyC0SYn-8CiGD0FsQhWCzuATC3kc0Ym5BQV5o5BA@mail.gmail.com>
 <20230306113543.a57aywefbn4cgsu3@localhost>
 <CAB_Ljt4q_wq3gWgLHwHzQ0DW4EXRetWdps5U43yiSdd7i=YowA@mail.gmail.com>
 <20230309095514.l3i67tys2ujaq6dp@localhost>
 <CAB_Ljt6RD7LmgYZKT6uoeiKPpDw7-8XgVPJmPXGHk=WAZ_3J9A@mail.gmail.com>
 <20230313163251.xpnzyvzb65b6zaal@localhost>
 <20230314084601.t2ez24gcljnu5plq@localhost>
 <CAB_Ljt746dHvTArtAuNv0Rk7NEwRAnXR3PNnuAJ+hi_8_zaE0Q@mail.gmail.com>
 <20230316121023.tdzgu6zinm7spvjp@localhost>
Message-ID: <CAARK8KR6FRo8a6BsfeeqsrbEBmq3iXpbBh-PdNihRxPV4HZP1Q@mail.gmail.com>

Hi Gorka and Rishat,

As discussed with Gorka, I will be working on the issues reported.

I've reported 2 bugs for case 1) and 3) since we aren't sure on case 2) yet.

*Bug 1*: https://bugs.launchpad.net/os-brick/+bug/2012251
*Fix 1*: https://review.opendev.org/c/openstack/os-brick/+/878045

*Bug 2*: https://bugs.launchpad.net/os-brick/+bug/2012352
*Fix 2*: https://review.opendev.org/c/openstack/os-brick/+/878242

I'm not 100% sure that the approach in *Fix 2* is the best way to do it but
it works with my test scenarios and reviews are always appreciated.

Thanks
Rajat Dhasmana

On Thu, Mar 16, 2023 at 5:45?PM Gorka Eguileor <geguileo at redhat.com> wrote:

> On 16/03, Rishat Azizov wrote:
> > Hi Gorka,
> >
> > Thanks!
> > I fixed issue by adding to multipathd config uxsock_timeout directive:
> > uxsock_timeout 10000
> >
> > Because in multipathd logs I saw this error:
> > 3624a93705842cfae35d7483200015fd8: map flushed
> > cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
> > 4.858561 secs
> >
> > Now large disk backups work fine.
> >
> > 2. This happens because despite the timeout of the first attempt and exit
> > code 1, the multipath device was disconnected, so the next attempts ended
> > with an error "is not a multipath device", since the multipath device had
> > already disconnected.
> >
>
> Hi,
>
> That's a nice workaround until we fix it upstream!!
>
> Thanks for confirming my suspicions were right. This is the 3rd thing I
> mentioned was happening, flush call failed but it actually removed the
> device.
>
> We'll proceed to fix the flushing code in master.
>
> Cheers,
> Gorka.
>
> >
> > ??, 14 ???. 2023??. ? 14:46, Gorka Eguileor <geguileo at redhat.com>:
> >
> > > [Sending the email again as it seems it didn't reach the ML]
> > >
> > >
> > > On 13/03, Gorka Eguileor wrote:
> > > > On 11/03, Rishat Azizov wrote:
> > > > > Hi, Gorka,
> > > > >
> > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > > > >
> > >
> > >
> > >
> > > Hi,
> > >
> > > There are multiple things going on here:
> > >
> > > 1. There is a bug in os-brick, because the disconnect_volume should not
> > >    fail, since it is being called with force=True and
> > >    ignore_errors=True.
> > >
> > >    The issues is that this call [1] is not wrapped in the
> > >    ExceptionChainer context manager, and it should not even be a flush
> > >    call, it should be a call to "multipathd remove map $map" instead.
> > >
> > > 2. The way multipath code is written [2][3], the error we see about
> > >    "3624a93705842cfae35d7483200015fce is not a multipath device" means
> 2
> > >    different things: it is not a multipath or an error happened.
> > >
> > >    So we don't really know what happened without enabling more verbose
> > >    multipathd log levels.
> > >
> > > 3. The "multipath -f" call should not be failing in the first place,
> > >    because the failure is happening on disconnecting the source volume,
> > >    which has no data buffered to be written and therefore no reason to
> > >    fail the flush (unless it's using a friendly name).
> > >
> > >    I don't know if it could be happening that the first flush fails
> with
> > >    a timeout (maybe because there is an extend operation happening),
> but
> > >    multipathd keeps trying to flush it in the background and when it
> > >    succeeds it removes the multipath device, which makes following
> calls
> > >    fail.
> > >
> > >    If that's the case we would need to change the retry from automatic
> > >    [4] to manual and check in-between to see if the device has been
> > >    removed in-between calls.
> > >
> > > The first issue is definitely a bug, the 2nd one is something that
> could
> > > be changed in the deployment to try to get additional information on
> the
> > > failure, and the 3rd one could be a bug.
> > >
> > > I'll see if I can find someone who wants to work on the 1st and 3rd
> > > points.
> > >
> > > Cheers,
> > > Gorka.
> > >
> > > [1]:
> > >
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> > > [2]:
> > >
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> > > [3]:
> > >
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> > > [4]:
> > >
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
> > >
> > >
> > >
> > > > >
> > > > > ??, 9 ???. 2023??. ? 15:55, Gorka Eguileor <geguileo at redhat.com>:
> > > > >
> > > > > > On 06/03, Rishat Azizov wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > It works with smaller volumes.
> > > > > > >
> > > > > > > multipath.conf attached to thist email.
> > > > > > >
> > > > > > > Cinder version - 18.2.0 Wallaby
> > > > > >
> > >
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/d7662aa4/attachment.htm>

From gmann at ghanshyammann.com  Wed Mar 22 18:35:29 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 22 Mar 2023 11:35:29 -0700
Subject: [ptl] Need PTL volunteer for OpenStack Sahara
Message-ID: <1870a9accaa.ca4d3931996653.2888367201531485088@ghanshyammann.com>

Hi Qiu,

I am reaching out to you as you were PTL for OpenStack Sahara project in the last cycle.

There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please
check if you or anyone you know would like to lead this project.

- https://etherpad.opendev.org/p/2023.2-leaderless

Also, if anyone else would like to help leading this project, this is time to let TC knows.

-gmann


From mnaser at vexxhost.com  Wed Mar 22 19:27:20 2023
From: mnaser at vexxhost.com (Mohammed Naser)
Date: Wed, 22 Mar 2023 19:27:20 +0000
Subject: [neutron] detecting l3-agent readiness
In-Reply-To: <CAECr9X7ctwk_XUbykeOwa0YxKQPzuKS-hRxDn9p7QKTJuLjeSw@mail.gmail.com>
References: <CAEs876hBPv398kd3yzmnTn5ntCgG2GWsCAiBngQmi+uYa87wHA@mail.gmail.com>
 <DU0PR10MB5244B99F8776D03D6AAB5ED7EAB99@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <2315188.ElGaqSPkdT@p1>
 <DU0PR10MB52445364762EC3A76C94C4FBEABF9@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM>
 <14af9155a882030464f4adce1bf71f8ffac74d0f.camel@mittwald.de>
 <CAECr9X7ctwk_XUbykeOwa0YxKQPzuKS-hRxDn9p7QKTJuLjeSw@mail.gmail.com>
Message-ID: <YQXP288MB0012D5C2AFCEFFDD74BF4EEFA0869@YQXP288MB0012.CANP288.PROD.OUTLOOK.COM>


From: Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
Date: Monday, March 20, 2023 at 12:09 PM
To: Jan Horstmann <J.Horstmann at mittwald.de>
Cc: Mohammed Naser <mnaser at vexxhost.com>, felix.huettner at mail.schwarz <felix.huettner at mail.schwarz>, skaplons at redhat.com <skaplons at redhat.com>, openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: [neutron] detecting l3-agent readiness
Hello:

I think I'm repeating myself here but we have two approaches to solve this problem:
* The easiest one, at least for the L3 agent, is to report an INFO level log before and after the full sync. That could be parsed by any tool to detect that. You can propose a patch to the Neutron repository.

I?ve kicked this off with this:

https://review.opendev.org/c/openstack/neutron/+/878248 fix: add log message for periodic_sync_routers_task fullsync [NEW]

* https://bugs.launchpad.net/neutron/+bug/2011422<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fneutron%2F%2Bbug%2F2011422&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MFnN5FJXrtHIctgceAu5gxc8dcZoVbXFyd0RSwCQqH4%3D&reserved=0>: a more elaborated way to report the agent status. That could provide the start flag, the revived flag, the sync processing flag and many other ones that could be defined only for this specific agent.

Regards.

On Mon, Mar 20, 2023 at 4:33?PM Jan Horstmann <J.Horstmann at mittwald.de<mailto:J.Horstmann at mittwald.de>> wrote:
On Wed, 2023-03-15 at 16:10 +0000, Felix H?ttner wrote:
> Hi,
>
> > Subject: Re: [neutron] detecting l3-agent readiness
> >
> > Hi,
> >
> > Dnia poniedzia?ek, 13 marca 2023 16:35:43 CET Felix H?ttner pisze:
> > > Hi Mohammed,
> > >
> > > > Subject: [neutron] detecting l3-agent readiness
> > > >
> > > > Hi folks,
> > > >
> > > > I'm working on improving the stability of rollouts when using Kubernetes as a control
> > plane, specifically around the L3 agent, it seems that I have not found a clear way to
> > detect in the code path where the L3 agent has finished it's initial sync..
> > > >
> > >
> > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fyaook%2Fimages%2Fneutron-l3-agent%2F-&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eI3UC%2FIfb9TuoPdy1xrmCOENIUrTiNndqMmx98J0u5s%3D&reserved=0>
> > /blob/devel/files/startup_wait_for_ns.py
> > > Basically we are checking against the neutron api what routers should be on the node and
> > then validate that all keepalived processes are up and running.
> >
> > That would work only for HA routers. If You would also have routers which aren't "ha" this
> > method may fail.
> >
>
> Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived).
>

Instead of counting processes I have been using the l3 agent's
`configurations.routers` field to determine its readiness.
From my understanding comparing this number with the number of active
routers hosted by the agent should be a good indicator of its sync
status.
Using two api calls for this is inherently racy, but could be a
sufficient workaround for environments with a moderate number of
router events.
So a simple test snippet for the sync status of all agents could be:

```
import sys
import openstack
client = openstack.connection.Connection(
   ...
)
l3_agent_synced = [
    len([
        router
        for router in client.network.agent_hosted_routers(agent)
            if router.is_admin_state_up
    ]) <= client.network.get_agent(agent).configuration["routers"]
    for agent in client.network.agents()
        if agent.agent_type == "L3 agent"
           and (agent.configuration["agent_mode"] == "dvr_snat"
                or agent.configuration["agent_mode"] == "legacy")
]
if not all(l3_agent_synced):
    sys.exit(1)
```

Please let me know if I am way off with this approach :)


> > >
> > > > Am I missing it somewhere or is the architecture built in a way that doesn't really
> > answer that question?
> > > >
> > >
> > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts
> > for l2 and dhcp agents.
> > >
> > >
> > > > Thanks
> > > > Mohammed
> > > >
> > > >
> > > > --
> > > > Mohammed Naser
> > > > VEXXHOST, Inc.
> > >
> > > --
> > > Felix Huettner
> > > Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung
> > durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger
> > sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail.
> > Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datenschutz.schwarz%2F&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tfDvMmUFLZV2JbmqqeVlQq%2FzoWTRqNVrgQdKyeuCWOc%3D&reserved=0>>.
> > >
> >
> >
> > --
> > Slawek Kaplonski
> > Principal Software Engineer
> > Red Hat
>
> --
> Felix Huettner
> Diese E Mail enth?lt m?glicherweise vertrauliche Inhalte und ist nur f?r die Verwertung durch den vorgesehenen Empf?nger bestimmt. Sollten Sie nicht der vorgesehene Empf?nger sein, setzen Sie den Absender bitte unverz?glich in Kenntnis und l?schen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datenschutz.schwarz%2F&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tfDvMmUFLZV2JbmqqeVlQq%2FzoWTRqNVrgQdKyeuCWOc%3D&reserved=0>>.

--
Jan Horstmann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/4f5a4d89/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Wed Mar 22 20:59:18 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 02:29:18 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
Message-ID: <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>

I still have the same issue, I'm not sure what's left to try.
All the pods are now in a healthy state, I am getting log entries 3 mins
after I hit the create volume button in cinder-volume when I try to create
a volume with an image.
And the volumes are just stuck in creating state for more than 20 mins now.

Cinder logs:
2023-03-22 20:32:44.010 108 INFO cinder.rpc
[req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
cinder-volume RPC version 3.17 as minimum service version.
2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume
[req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Volume
5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
specification: {'status': 'creating', 'volume_name':
'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
[{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
'553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
tzinfo=datetime.timezone.utc), 'locations': [{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'dcn02'}}], 'direct_url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
'owner_specified.openstack.object': 'images/cirros',
'owner_specified.openstack.sha256': ''}}, 'image_service':
<cinder.image.glance.GlanceImageService object at 0x7f8147973438>}

With regards,
Swogat Pradhan

On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com> wrote:

>
>
> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Adam,
>> The systems are in same LAN, in this case it seemed like the image was
>> getting pulled from the central site which was caused due to an
>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>> directory, which seems to have been resolved after the changes i made to
>> fix it.
>>
>> Right now the glance api podman is running in unhealthy state and the
>> podman logs don't show any error whatsoever and when issued the command
>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>> site, which is why cinder is throwing an error stating:
>>
>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>> finding address for
>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>> Unable to establish connection to
>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>> ECONNREFUSED',))
>>
>> Now i need to find out why the port is not listed as the glance service
>> is running, which i am not sure how to find out.
>>
>
> One other thing to investigate is whether your deployment includes this
> patch [1]. If it does, then bear in mind
> the glance-api service running at the edge site will be an "internal" (non
> public facing) instance that uses port 9293
> instead of 9292. You should familiarize yourself with the release note [2].
>
> [1]
> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
> [2]
> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>
> Alan
>
>
>> With regards,
>> Swogat Pradhan
>>
>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Update:
>>>> Here is the log when creating a volume using cirros image:
>>>>
>>>> 2023-03-22 11:04:38.449 109 INFO
>>>> cinder.volume.flows.manager.create_volume
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>> specification: {'status': 'creating', 'volume_name':
>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> [{'url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>
>>>
>>> As Adam Savage would say, well there's your problem ^^ (Image download
>>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
>>> suggests you have a network issue.
>>>
>>> John Fulton previously stated your cinder-volume service at the edge
>>> site is not using the local ceph image store. Assuming you are deploying
>>> GlanceApiEdge service [1], then the cinder-volume service should be
>>> configured to use the local glance service [2]. You should check cinder's
>>> glance_api_servers to confirm it's the edge site's glance service.
>>>
>>> [1]
>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>> [2]
>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>
>>> Alan
>>>
>>>
>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>> be removed. Use explicitly json instead in version 'xena'
>>>>   category=FutureWarning)
>>>>
>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>> be removed. Use explicitly json instead in version 'xena'
>>>>   category=FutureWarning)
>>>>
>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>> MB/s
>>>> 2023-03-22 11:11:14.998 109 INFO
>>>> cinder.volume.flows.manager.create_volume
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>
>>>> The image is present in dcn02 store but still it downloaded the image
>>>> in 0.16 MB/s and then created the volume.
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Hi Jhon,
>>>>> This seems to be an issue.
>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>>> parameter was specified to the respective cluster names but the config
>>>>> files were created in the name of ceph.conf and keyring was
>>>>> ceph.client.openstack.keyring.
>>>>>
>>>>> Which created issues in glance as well as the naming convention of the
>>>>> files didn't match the cluster names, so i had to manually rename the
>>>>> central ceph conf file as such:
>>>>>
>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>> total 16
>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>> ceph_central.client.openstack.keyring
>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>> [root at dcn02-compute-0 ceph]#
>>>>>
>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>>>> respective clusters in both dcn01 and dcn02.
>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>> for accessing central ceph cluster.
>>>>>
>>>>> glance multistore config:
>>>>> [dcn02]
>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=dcn02 rbd glance store
>>>>>
>>>>> [ceph_central]
>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=Default glance store backend.
>>>>>
>>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>> >
>>>>>> > Hi,
>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>
>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>
>>>>>> I hope this is not a production system since the mailing list now has
>>>>>> the cinder.conf which contains passwords.
>>>>>>
>>>>>> The section that looks like this:
>>>>>>
>>>>>> [tripleo_ceph]
>>>>>> volume_backend_name=tripleo_ceph
>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>> rbd_user=openstack
>>>>>> rbd_pool=volumes
>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>> rbd_secret_uuid=<redacted>
>>>>>> report_discard_supported=True
>>>>>>
>>>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>
>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>> secret-get-value $FSID`.
>>>>>>
>>>>>> The documentation describes how to configure the central and DCN sites
>>>>>> correctly but an error seems to have occurred while you were following
>>>>>> it.
>>>>>>
>>>>>>
>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>
>>>>>>   John
>>>>>>
>>>>>> >
>>>>>> > Ceph Output:
>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>> > NAME                                       SIZE     PARENT  FMT
>>>>>> PROT  LOCK
>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>>>     excl
>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2
>>>>>> yes
>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2
>>>>>> yes
>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2
>>>>>> yes
>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2
>>>>>> yes
>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2
>>>>>> yes
>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2
>>>>>> yes
>>>>>> >
>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>> > NAME                                         SIZE     PARENT  FMT
>>>>>> PROT  LOCK
>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>> >
>>>>>> > Attached the cinder config.
>>>>>> > Please let me know how I can solve this issue.
>>>>>> >
>>>>>> > With regards,
>>>>>> > Swogat Pradhan
>>>>>> >
>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>> config.
>>>>>> >>
>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>
>>>>>> >>> Update:
>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>>> >>> The image size is 389 MB.
>>>>>> >>>
>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi Jhon,
>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>>> after importing from the central site.
>>>>>> >>>> But launching an instance normally fails as it takes a long time
>>>>>> for the volume to get created.
>>>>>> >>>>
>>>>>> >>>> When launching an instance from volume the instance is getting
>>>>>> created properly without any errors.
>>>>>> >>>>
>>>>>> >>>> I tried to cache images in nova using
>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>> but getting checksum failed error.
>>>>>> >>>>
>>>>>> >>>> With regards,
>>>>>> >>>> Swogat Pradhan
>>>>>> >>>>
>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <johfulto at redhat.com>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>>> >
>>>>>> >>>>> > Update: After restarting the nova services on the controller
>>>>>> and running the deploy script on the edge site, I was able to launch the VM
>>>>>> from volume.
>>>>>> >>>>> >
>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>> to the edge glance.
>>>>>> >>>>>
>>>>>> >>>>> Try following this document and making the same observations in
>>>>>> your
>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>> >>>>>
>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>> >>>>>
>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>> >>>>>                           FMT PROT LOCK
>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>>>> >>>>> $
>>>>>> >>>>>
>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>> which is on
>>>>>> >>>>> the same local ceph cluster.
>>>>>> >>>>>
>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>> encountering
>>>>>> >>>>> the streaming behavior described here:
>>>>>> >>>>>
>>>>>> >>>>> Ideally all images should reside in the central Glance and be
>>>>>> copied
>>>>>> >>>>> to DCN sites before instances of those images are booted on DCN
>>>>>> sites.
>>>>>> >>>>> If an image is not copied to a DCN site before it is booted,
>>>>>> then the
>>>>>> >>>>> image will be streamed to the DCN site and then the image will
>>>>>> boot as
>>>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>>>> access to
>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>> booting of
>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>> advance,
>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>> >>>>>
>>>>>> >>>>> You can also exec into the cinder container at the DCN site and
>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>> >>>>>
>>>>>> >>>>>   John
>>>>>> >>>>>
>>>>>> >>>>> >
>>>>>> >>>>> > I will try and create a new fresh image and test again then
>>>>>> update.
>>>>>> >>>>> >
>>>>>> >>>>> > With regards,
>>>>>> >>>>> > Swogat Pradhan
>>>>>> >>>>> >
>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>>> >>
>>>>>> >>>>> >> Update:
>>>>>> >>>>> >> In the hypervisor list the compute node state is showing
>>>>>> down.
>>>>>> >>>>> >>
>>>>>> >>>>> >>
>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Hi Brendan,
>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>>>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>>>>> timed out so i waited for the volume to be created.
>>>>>> >>>>> >>> Once the volume was created i tried launching the instance
>>>>>> from the volume and still the instance is stuck in spawning state.
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
>>>>>> privsep daemon starting
>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
>>>>>> privsep process running with uid/gid: 0/0
>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>>>> privsep process running with capabilities (eff/prm/inh):
>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
>>>>>> privsep daemon running as pid 185437
>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>> >>>>> >>> Exit code: 2
>>>>>> >>>>> >>> Stdout: ''
>>>>>> >>>>> >>> Stderr: '':
>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>> running command.
>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>> template mentioned here ?:
>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> The volume is already created and i do not understand why
>>>>>> the instance is stuck in spawning state.
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> With regards,
>>>>>> >>>>> >>> Swogat Pradhan
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>> bshephar at redhat.com> wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Does your environment use different network interfaces for
>>>>>> each of the networks? Or does it have a bond with everything on it?
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> I have seen the same situation in fact when using a single
>>>>>> active/backup bond on 1Gbe nics. It?s worth checking the network traffic
>>>>>> while you try to spawn the instance to see if you?re dropping packets. In
>>>>>> the situation I described, there were dropped packets which resulted in a
>>>>>> loss of communication between nova_compute and RMQ, so the node appeared
>>>>>> offline. You should also confirm that nova_compute is being disconnected in
>>>>>> the nova_compute logs if you tail them on the Hypervisor while spawning the
>>>>>> instance.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
>>>>>> So, based on that experience, from my perspective, is certainly sounds like
>>>>>> some kind of network issue.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Regards,
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Brendan Shephard
>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>> >>>>> >>>> Red Hat Australia
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>> wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Hi,
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time ago
>>>>>> in this thread:
>>>>>> >>>>> >>>>
>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>>>>> user, not sure if that could apply here. But is it possible that your nova
>>>>>> and neutron versions are different between central and edge site? Have you
>>>>>> restarted nova and neutron services on the compute nodes after
>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>> Maybe they can help narrow down the issue.
>>>>>> >>>>> >>>> If there isn't any additional information in the debug
>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do
>>>>>> that in a production system yet so be careful. I can think of two routes:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>>> running, this will most likely impact client IO depending on your load.
>>>>>> Check out the rabbitmqctl commands.
>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
>>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
>>>>>> replicated across the rabbit nodes. But I don't really know the rabbit
>>>>>> internals too well, so maybe someone else can chime in here and give a
>>>>>> better advice.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Regards,
>>>>>> >>>>> >>>> Eugen
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Hi,
>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> With regards,
>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com>
>>>>>> >>>>> >>>> wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Hi
>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
>>>>>> not due to packet
>>>>>> >>>>> >>>> loss.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> with regards,
>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com>
>>>>>> >>>>> >>>> wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Hi,
>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>> checked when
>>>>>> >>>>> >>>> launching the instance.
>>>>>> >>>>> >>>> I will check that and come back.
>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck
>>>>>> at spawning
>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure
>>>>>> if packet loss
>>>>>> >>>>> >>>> causes this.
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> With regards,
>>>>>> >>>>> >>>> Swogat pradhan
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>>>>> wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>> identical between
>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>>>>> tunnel?
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or
>>>>>> 'cc' as i am not
>>>>>> >>>>> >>>> > getting email's from you.
>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>>>> list_policies -p
>>>>>> >>>>> >>>> /
>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>> priority
>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>>
>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
>>>>>> when i am
>>>>>> >>>>> >>>> trying
>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>> spawning state and
>>>>>> >>>>> >>>> then
>>>>>> >>>>> >>>> > gets stuck.
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>>>>> sites.
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> > With regards,
>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>> >>>>> >>>> > wrote:
>>>>>> >>>>> >>>> >
>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>> directly, i am
>>>>>> >>>>> >>>> checking
>>>>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>> occurred.
>>>>>> >>>>> >>>> >>
>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>> activities in the
>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>>>> >>>>> >>>> >>
>>>>>> >>>>> >>>> >> With regards,
>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>> >>>>> >>>> >>
>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>> >>>>> >>>> >> wrote:
>>>>>> >>>>> >>>> >>
>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>>>>> details:
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>> >>>>> >>>> >>>
>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>> >>>>> >>>> Started
>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>> >>>>> >>>> Started
>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>> >>>>> >>>> Started
>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>> >>>>> >>>> Started
>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
>>>>>> the issue is
>>>>>> >>>>> >>>> still
>>>>>> >>>>> >>>> >>> present.
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>> cluster_status
>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>> ...
>>>>>> >>>>> >>>> >>> Basics
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>>
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>>
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Versions
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>>>> RabbitMQ
>>>>>> >>>>> >>>> 3.8.3
>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>>>> RabbitMQ
>>>>>> >>>>> >>>> 3.8.3
>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>>>> RabbitMQ
>>>>>> >>>>> >>>> 3.8.3
>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>> >>>>> >>>> >>>
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>> >>>>> >>>> RabbitMQ
>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Alarms
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> (none)
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> (none)
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Listeners
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>> inter-node and CLI
>>>>>> >>>>> >>>> tool
>>>>>> >>>>> >>>> >>> communication
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
>>>>>> AMQP 0-9-1
>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>> inter-node and CLI
>>>>>> >>>>> >>>> tool
>>>>>> >>>>> >>>> >>> communication
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
>>>>>> AMQP 0-9-1
>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>> inter-node and CLI
>>>>>> >>>>> >>>> tool
>>>>>> >>>>> >>>> >>> communication
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
>>>>>> AMQP 0-9-1
>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>> >>>>> >>>> interface:
>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>> >>>>> >>>> ,
>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>>>> purpose:
>>>>>> >>>>> >>>> inter-node and
>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>> >>>>> >>>> ,
>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
>>>>>> purpose: AMQP
>>>>>> >>>>> >>>> 0-9-1
>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>> >>>>> >>>> >>> Node:
>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>> >>>>> >>>> ,
>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
>>>>>> HTTP API
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Feature flags
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> With regards,
>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>> >>>>> >>>> >>> wrote:
>>>>>> >>>>> >>>> >>>
>>>>>> >>>>> >>>> >>>> Hi,
>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
>>>>>> log.
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>> The reply
>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after
>>>>>> 60 seconds
>>>>>> >>>>> >>>> due to a
>>>>>> >>>>> >>>> >>>> missing queue
>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>> >>>>> >>>> Abandoning...:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> The reply
>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after
>>>>>> 60 seconds
>>>>>> >>>>> >>>> due to a
>>>>>> >>>>> >>>> >>>> missing queue
>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>> >>>>> >>>> Abandoning...:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> The reply
>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after
>>>>>> 60 seconds
>>>>>> >>>>> >>>> due to a
>>>>>> >>>>> >>>> >>>> missing queue
>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>> >>>>> >>>> Abandoning...:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> Cache enabled
>>>>>> >>>>> >>>> with
>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
>>>>>> drop reply to
>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>> The reply
>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after
>>>>>> 60 seconds
>>>>>> >>>>> >>>> due to a
>>>>>> >>>>> >>>> >>>> missing queue
>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>> >>>>> >>>> Abandoning...:
>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>> >>>> With regards,
>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where
>>>>>> i am trying to
>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>>>>> (openstack
>>>>>> >>>>> >>>> compute
>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
>>>>>> the nova
>>>>>> >>>>> >>>> compute
>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
>>>>>> Running
>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>> 2023-02-26 07:00:00
>>>>>> >>>>> >>>> to
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> [instance:
>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>> successful on node
>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>> nova.virt.libvirt.driver
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> [instance:
>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>> supplied device
>>>>>> >>>>> >>>> name:
>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
>>>>>> names
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> [instance:
>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>>>>> volume
>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> Cache enabled
>>>>>> >>>>> >>>> with
>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> Running
>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>> '--config-file',
>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> Spawned new
>>>>>> >>>>> >>>> privsep
>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>> oslo.privsep.daemon [-] privsep
>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>> oslo.privsep.daemon [-] privsep
>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>> oslo.privsep.daemon [-] privsep
>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>> oslo.privsep.daemon [-] privsep
>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> Process
>>>>>> >>>>> >>>> >>>>> execution error
>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>>>> command.
>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>> nova.virt.libvirt.driver
>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>> [instance:
>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>> >>>>> >>>> >>>>>
>>>>>> >>>>> >>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>>
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/dbbd5457/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Wed Mar 22 21:07:32 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 02:37:32 +0530
Subject: Unable to create volume with image in edge site | Glance-Cinder |
 Openstack DCN | Wallaby
Message-ID: <CAH0LXPpJsMRBTQK2h6d76RcJDEjXo8hwOX7K9HHCHxx=HqkeqQ@mail.gmail.com>

Hi,
I am creating a fresh thread for this glance issue.
I have setup glance multistore for my infra.

glance.yaml for dcn site:

(overcloud) [stack at hkg2director ~]$ cat dcn02/glance_dcn02.yaml
parameter_defaults:
        #GlanceShowMultipleLocations: true
    GlanceEnabledImportMethods: web-download,copy-image
    GlanceBackend: rbd
    GlanceStoreDescription: 'dcn02 rbd glance store'
    GlanceBackendID: dcn02
    CephClusterName: dcn02
    GlanceMultistoreConfig:
      ceph:
        GlanceBackend: rbd
        GlanceStoreDescription: 'Default glance store backend.'
        CephClusterName: ceph

Now i have created a cirros image and have imported it to dcn store using
copy-image method. When I create an empty volume in the DCN site the volume
gets created without any issues.
But when I create a volume with image (volume source) the volume gets stuck
in the creating state forever. I get logs in cinder-volume 3-4 mins after I
have hit the create volume button.

Cinder logs:
2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume
[req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Volume
5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
specification: {'status': 'creating', 'volume_name':
'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
[{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
'553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
tzinfo=datetime.timezone.utc), 'locations': [{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'metadata': {'store': 'dcn02'}}], 'direct_url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
'owner_specified.openstack.object': 'images/cirros',
'owner_specified.openstack.sha256': ''}}, 'image_service':
<cinder.image.glance.GlanceImageService object at 0x7f8147973438>}

I checked both glance and cinder containers are running in a healthy state.
I see no errors or whatsoever. I am not sure how to fix the cinder volume
stuck in the creating state in the DCN edge site.

With regards,
Swogat Pradhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/1e69a3b8/attachment.htm>

From swogatpradhan22 at gmail.com  Wed Mar 22 21:15:55 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 02:45:55 +0530
Subject: Unable to create volume with image in edge site | Glance-Cinder |
 Openstack DCN | Wallaby
In-Reply-To: <CAH0LXPpJsMRBTQK2h6d76RcJDEjXo8hwOX7K9HHCHxx=HqkeqQ@mail.gmail.com>
References: <CAH0LXPpJsMRBTQK2h6d76RcJDEjXo8hwOX7K9HHCHxx=HqkeqQ@mail.gmail.com>
Message-ID: <CAH0LXPoGqJ_cnq0aEdUxJcPzW1xK6fr4yt6OcQqP4QMm=DfKaA@mail.gmail.com>

Cinder volume config:

[tripleo_ceph]
volume_backend_name=tripleo_ceph
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_user=openstack
rbd_pool=volumes
rbd_flatten_volume_from_snapshot=False
rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
report_discard_supported=True
rbd_ceph_conf=/etc/ceph/dcn02.conf
rbd_cluster_name=dcn02

Glance api config:

[dcn02]
rbd_store_ceph_conf=/etc/ceph/dcn02.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=dcn02 rbd glance store
[ceph]
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=Default glance store backend.

On Thu, Mar 23, 2023 at 2:37?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> I am creating a fresh thread for this glance issue.
> I have setup glance multistore for my infra.
>
> glance.yaml for dcn site:
>
> (overcloud) [stack at hkg2director ~]$ cat dcn02/glance_dcn02.yaml
> parameter_defaults:
>         #GlanceShowMultipleLocations: true
>     GlanceEnabledImportMethods: web-download,copy-image
>     GlanceBackend: rbd
>     GlanceStoreDescription: 'dcn02 rbd glance store'
>     GlanceBackendID: dcn02
>     CephClusterName: dcn02
>     GlanceMultistoreConfig:
>       ceph:
>         GlanceBackend: rbd
>         GlanceStoreDescription: 'Default glance store backend.'
>         CephClusterName: ceph
>
> Now i have created a cirros image and have imported it to dcn store using
> copy-image method. When I create an empty volume in the DCN site the volume
> gets created without any issues.
> But when I create a volume with image (volume source) the volume gets
> stuck in the creating state forever. I get logs in cinder-volume 3-4 mins
> after I have hit the create volume button.
>
> Cinder logs:
> 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume
> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>
> I checked both glance and cinder containers are running in a healthy state.
> I see no errors or whatsoever. I am not sure how to fix the cinder volume
> stuck in the creating state in the DCN edge site.
>
> With regards,
> Swogat Pradhan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/42b04618/attachment.htm>

From tkajinam at redhat.com  Thu Mar 23 07:40:16 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 23 Mar 2023 16:40:16 +0900
Subject: [heat][PTG] 2023.2 (Bobcat) PTG Planning
In-Reply-To: <CAL_crJSnCPFaHSPOPkDX=ctJbF17E=u3hbEoUc1Dv_HFOXAOgg@mail.gmail.com>
References: <CAL_crJSnCPFaHSPOPkDX=ctJbF17E=u3hbEoUc1Dv_HFOXAOgg@mail.gmail.com>
Message-ID: <CAL_crJTdUJ7TS8SPJf9u3kwMw0vGbyf6zM_HRHNV0cP_TEycVg@mail.gmail.com>

Hello,


It seems all of the attendees who have signed up are based around APAC
so I allocated the bexer room for 4 UTC ~ 7 UTC slot on Wednesday.

I updated the schedule etherpad based on the items added to the planning
etherpad
but in case you have anything you want to add then please let me know.
 https://etherpad.opendev.org/p/march2023-ptg-heat

Thank you,
Takashi


On Mon, Mar 13, 2023 at 4:39?PM Takashi Kajinami <tkajinam at redhat.com>
wrote:

> Hello,
>
>
> I've signed up for the upcoming virtual PTG so that we can have some slots
> for Heat discussion.
> In case you are interested in attending the sessions or have any topics
> you want to discuss,
> please put your name and the proposed topics in the etherpad.
>  https://etherpad.opendev.org/p/march2023-ptg-heat-planning
>
> It'd be nice if we can update the planning etherpad this week so that I'll
> fix our slots and topics
> early next week.
>
> Thank you,
> Takashi Kajinami
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/d6adc42d/attachment-0001.htm>

From nguyenhuukhoinw at gmail.com  Thu Mar 23 07:44:44 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 14:44:44 +0700
Subject: [nova]host cpu reserve
Message-ID: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>

Hello guys.
I am trying google for nova host cpu reserve to prevent host overload but I
cannot find any resource about it. Could you give me some information?
Thanks.
Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/1e53413d/attachment.htm>

From yasufum.o at gmail.com  Thu Mar 23 07:58:55 2023
From: yasufum.o at gmail.com (Yasufumi Ogawa)
Date: Thu, 23 Mar 2023 16:58:55 +0900
Subject: [tacker][ptg] Bobcat vPTG Planning
In-Reply-To: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com>
References: <79abe530-5ce0-1ad1-d3f6-4cb61cc970cf@gmail.com>
Message-ID: <1d53b57b-6de0-91db-fcc5-756b4a43b2ab@gmail.com>

Hi all,

As hiromu proposed a cross-project session for a new feature in keystone 
middleware [1][2], we've setup a etherpad for the discussion[3]. Please 
everyone add your name to attendees if you're going to join the session. 
The time slot will be fixed soon.

[1] 
https://lists.openstack.org/pipermail/openstack-discuss/2023-February/032138.html
[2] 
https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032791.html
[3] https://etherpad.opendev.org/p/bobcat-ptg-ext_oauth2_server

Thanks,
Yasufumi

On 2023/03/21 4:38, Yasufumi Ogawa wrote:
> Hi team,
> 
> We are going to have the Bobcat vPTG through three days, 28-30 Mar 
> 6am-8am UTC as agreed at the IRC meeting last week. I've booked rooms 
> for the sessions and uploaded etherpad [1]. Please feel free to add your 
> proposal on the etherpad.
> 
> [1] https://etherpad.opendev.org/p/tacker-bobcat-ptg
> 
> Thanks,
> Yasufumi


From mkopec at redhat.com  Thu Mar 23 08:51:19 2023
From: mkopec at redhat.com (Martin Kopec)
Date: Thu, 23 Mar 2023 09:51:19 +0100
Subject: [qa][ptg] Virtual Bobcat vPTG Planning
In-Reply-To: <CAKZGdE3RG-_dLZxr0Jbza19648dFzpVxt40-VRDxptCHNQnO-Q@mail.gmail.com>
References: <CAKZGdE3RG-_dLZxr0Jbza19648dFzpVxt40-VRDxptCHNQnO-Q@mail.gmail.com>
Message-ID: <CAKZGdE0j-1EENxL2-TMdtcUe6eJYBgpftgs_f3fJyO=gv6b3zQ@mail.gmail.com>

Based on the responses in the pool [2], I booked 2 one hour sessions:
* Wed 14-15 UTC @ kilo
* Wed 17-18 UTC @ kilo

On Fri, 17 Mar 2023 at 14:20, Martin Kopec <mkopec at redhat.com> wrote:

> Hello everyone,
>
> here is [1] our etherpad for the 2023.2 Bobcat PTG. Please, add your
> topics there if there is anything you would like to discuss / propose ...
> You can also vote for time slots for our sessions so that they fit your
> schedule at [2].
>
> We will go most likely with 1-hour slot per day, as they usually fit
> easier into everyone's schedule. The number of slots will depend on the
> number of topics proposed in [1].
>
> [1] https://etherpad.opendev.org/p/qa-bobcat-ptg
> [2] https://framadate.org/sLZppMVkFw2FcEhO
>
> Thanks,
> --
> Martin Kopec
> Senior Software Quality Engineer
> Red Hat EMEA
> IM: kopecmartin
>
>
>
>

-- 
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/8a0a18a2/attachment.htm>

From tkajinam at redhat.com  Thu Mar 23 09:15:16 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 23 Mar 2023 18:15:16 +0900
Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will
 remove sqlalchemy-migrate support
In-Reply-To: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>
References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>
Message-ID: <CAL_crJRDsiK_9mGWkUw+_cps0KtR0f8sRiPaTDuu5MJVJv4aeA@mail.gmail.com>

Thank you for the heads up, Stephen.

Today I spent some time attempting to remove the dependency on
sqlalchemy-migrate from heat.
I've pushed current patch sets but so far these seem to be working
(according to CI).
 https://review.opendev.org/q/topic:alembic+project:openstack/heat

We'll try to get these merged ASAP so that we can bump oslo.db timely after
the new version without
sqlalchemy support is released. If you have time to help reviewing these,
that would be much appreciated.

Thank you,
Takashi


On Thu, Mar 23, 2023 at 1:43?AM Stephen Finucane <stephenfin at redhat.com>
wrote:

> tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to
> start
> their switch to alembic immediately. Projects with "legacy"
> sqlalchemy-migrated
> based migrations need to drop them.
>
> A quick heads up that oslo.db 13.0.0 will be release in the next month or
> so and
> will remove sqlalchemy-migrate support and formally add support for
> sqlalchemy
> 2.x. The removal of sqlalchemy-migrate support should only affect projects
> using
> oslo.db's sqlalchemy-migrate wrappers, as opposed to using
> sqlalchemy-migrate
> directly. For any projects that rely on this functionality, a short-term
> fix is
> to vendor the removed code [1] in your project. However, I must emphasise
> that
> we're not removing sqlalchemy-migrate integration for the fun of it: it's
> not
> compatible with sqlalchemy 2.x and is no longer maintained. If your
> project uses
> sqlalchemy-migrate and you haven't migrated to alembic yet, you need to
> start
> doing so immediately. If you have migrated to alembic but still have
> sqlalchemy-
> migrate "legacy" migrations in-tree, you need to look at dropping these
> asap.
> Anything less will result in broken master when we bump upper-constraints
> to
> allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that
> appear to
> be using the removed modules.
>
> For more advice on migrating to sqlalchemy 2.x and alembic, please look at
> my
> previous post on the matter [2].
>
> Cheers,
> Stephen
>
> [1] https://review.opendev.org/c/openstack/oslo.db/+/853025
> [2]
> https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/31fd97e4/attachment.htm>

From tkajinam at redhat.com  Thu Mar 23 09:16:11 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 23 Mar 2023 18:16:11 +0900
Subject: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will
 remove sqlalchemy-migrate support
In-Reply-To: <CAL_crJRDsiK_9mGWkUw+_cps0KtR0f8sRiPaTDuu5MJVJv4aeA@mail.gmail.com>
References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>
 <CAL_crJRDsiK_9mGWkUw+_cps0KtR0f8sRiPaTDuu5MJVJv4aeA@mail.gmail.com>
Message-ID: <CAL_crJSV1DXj4LCBi14zRiio_NN_Th4EEEMKtiCUSy-8F0HARQ@mail.gmail.com>

On Thu, Mar 23, 2023 at 6:15?PM Takashi Kajinami <tkajinam at redhat.com>
wrote:

> Thank you for the heads up, Stephen.
>
> Today I spent some time attempting to remove the dependency on
> sqlalchemy-migrate from heat.
> I've pushed current patch sets but so far these seem to be working
> (according to CI).
>  https://review.opendev.org/q/topic:alembic+project:openstack/heat
>
> We'll try to get these merged ASAP so that we can bump oslo.db timely
> after the new version without
> sqlalchemy support is released. If you have time to help reviewing these,
> that would be much appreciated.
>

tiny but important correction

the new version without "sqlalchemy-migrate" support


>
> Thank you,
> Takashi
>
>
>
> On Thu, Mar 23, 2023 at 1:43?AM Stephen Finucane <stephenfin at redhat.com>
> wrote:
>
>> tl;dr: Projects still relying on sqlalchemy-migrate for migrations need
>> to start
>> their switch to alembic immediately. Projects with "legacy"
>> sqlalchemy-migrated
>> based migrations need to drop them.
>>
>> A quick heads up that oslo.db 13.0.0 will be release in the next month or
>> so and
>> will remove sqlalchemy-migrate support and formally add support for
>> sqlalchemy
>> 2.x. The removal of sqlalchemy-migrate support should only affect
>> projects using
>> oslo.db's sqlalchemy-migrate wrappers, as opposed to using
>> sqlalchemy-migrate
>> directly. For any projects that rely on this functionality, a short-term
>> fix is
>> to vendor the removed code [1] in your project. However, I must emphasise
>> that
>> we're not removing sqlalchemy-migrate integration for the fun of it: it's
>> not
>> compatible with sqlalchemy 2.x and is no longer maintained. If your
>> project uses
>> sqlalchemy-migrate and you haven't migrated to alembic yet, you need to
>> start
>> doing so immediately. If you have migrated to alembic but still have
>> sqlalchemy-
>> migrate "legacy" migrations in-tree, you need to look at dropping these
>> asap.
>> Anything less will result in broken master when we bump upper-constraints
>> to
>> allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that
>> appear to
>> be using the removed modules.
>>
>> For more advice on migrating to sqlalchemy 2.x and alembic, please look
>> at my
>> previous post on the matter [2].
>>
>> Cheers,
>> Stephen
>>
>> [1] https://review.opendev.org/c/openstack/oslo.db/+/853025
>> [2]
>> https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/2bf3d3b6/attachment-0001.htm>

From smooney at redhat.com  Thu Mar 23 12:09:55 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 23 Mar 2023 12:09:55 +0000
Subject: [nova]host cpu reserve
In-Reply-To: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
Message-ID: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>

generally you should not 
you can use it but the preferd way to do this is use
cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set)
https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set

if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms
when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use.

https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/

have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set,

the issue with usein 
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus

is that you have to multiple the number of cores that are resverved by the 
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio

which means if you decide to manage that via placement api by using 
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead
then you need to update your nova.conf to modify the reservationfi you change the allocation ratio.

if instead you use cpu_shared_set and cpu_dedicated_set
you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd.

in general you shoudl reserve the first core on each cpu socket for the host os.
if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
form the cpu_shared_set and cpu_dedicated_set


On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> Hello guys.
> I am trying google for nova host cpu reserve to prevent host overload but I
> cannot find any resource about it. Could you give me some information?
> Thanks.
> Nguyen Huu Khoi


From swogatpradhan22 at gmail.com  Wed Mar 22 21:16:04 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 02:46:04 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
Message-ID: <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>

Cinder volume config:

[tripleo_ceph]
volume_backend_name=tripleo_ceph
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_user=openstack
rbd_pool=volumes
rbd_flatten_volume_from_snapshot=False
rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
report_discard_supported=True
rbd_ceph_conf=/etc/ceph/dcn02.conf
rbd_cluster_name=dcn02

Glance api config:

[dcn02]
rbd_store_ceph_conf=/etc/ceph/dcn02.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=dcn02 rbd glance store
[ceph]
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_user=openstack
rbd_store_pool=images
rbd_thin_provisioning=False
store_description=Default glance store backend.

On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> I still have the same issue, I'm not sure what's left to try.
> All the pods are now in a healthy state, I am getting log entries 3 mins
> after I hit the create volume button in cinder-volume when I try to create
> a volume with an image.
> And the volumes are just stuck in creating state for more than 20 mins now.
>
> Cinder logs:
> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
> cinder-volume RPC version 3.17 as minimum service version.
> 2023-03-22 20:34:59.166 108 INFO cinder.volume.flows.manager.create_volume
> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>
> With regards,
> Swogat Pradhan
>
> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi Adam,
>>> The systems are in same LAN, in this case it seemed like the image was
>>> getting pulled from the central site which was caused due to an
>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>> directory, which seems to have been resolved after the changes i made to
>>> fix it.
>>>
>>> Right now the glance api podman is running in unhealthy state and the
>>> podman logs don't show any error whatsoever and when issued the command
>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>> site, which is why cinder is throwing an error stating:
>>>
>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>> finding address for
>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>> Unable to establish connection to
>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>> ECONNREFUSED',))
>>>
>>> Now i need to find out why the port is not listed as the glance service
>>> is running, which i am not sure how to find out.
>>>
>>
>> One other thing to investigate is whether your deployment includes this
>> patch [1]. If it does, then bear in mind
>> the glance-api service running at the edge site will be an "internal"
>> (non public facing) instance that uses port 9293
>> instead of 9292. You should familiarize yourself with the release note
>> [2].
>>
>> [1]
>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>> [2]
>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>
>> Alan
>>
>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Update:
>>>>> Here is the log when creating a volume using cirros image:
>>>>>
>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>> cinder.volume.flows.manager.create_volume
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>> specification: {'status': 'creating', 'volume_name':
>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>
>>>>
>>>> As Adam Savage would say, well there's your problem ^^ (Image download
>>>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
>>>> suggests you have a network issue.
>>>>
>>>> John Fulton previously stated your cinder-volume service at the edge
>>>> site is not using the local ceph image store. Assuming you are deploying
>>>> GlanceApiEdge service [1], then the cinder-volume service should be
>>>> configured to use the local glance service [2]. You should check cinder's
>>>> glance_api_servers to confirm it's the edge site's glance service.
>>>>
>>>> [1]
>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>> [2]
>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>
>>>> Alan
>>>>
>>>>
>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>   category=FutureWarning)
>>>>>
>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>   category=FutureWarning)
>>>>>
>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>> MB/s
>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>> cinder.volume.flows.manager.create_volume
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>
>>>>> The image is present in dcn02 store but still it downloaded the image
>>>>> in 0.16 MB/s and then created the volume.
>>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> Hi Jhon,
>>>>>> This seems to be an issue.
>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>>>> parameter was specified to the respective cluster names but the config
>>>>>> files were created in the name of ceph.conf and keyring was
>>>>>> ceph.client.openstack.keyring.
>>>>>>
>>>>>> Which created issues in glance as well as the naming convention of
>>>>>> the files didn't match the cluster names, so i had to manually rename the
>>>>>> central ceph conf file as such:
>>>>>>
>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>> total 16
>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>> ceph_central.client.openstack.keyring
>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>
>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>>>>> respective clusters in both dcn01 and dcn02.
>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>> for accessing central ceph cluster.
>>>>>>
>>>>>> glance multistore config:
>>>>>> [dcn02]
>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>> rbd_store_user=openstack
>>>>>> rbd_store_pool=images
>>>>>> rbd_thin_provisioning=False
>>>>>> store_description=dcn02 rbd glance store
>>>>>>
>>>>>> [ceph_central]
>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>> rbd_store_user=openstack
>>>>>> rbd_store_pool=images
>>>>>> rbd_thin_provisioning=False
>>>>>> store_description=Default glance store backend.
>>>>>>
>>>>>>
>>>>>> With regards,
>>>>>> Swogat Pradhan
>>>>>>
>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>> >
>>>>>>> > Hi,
>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>
>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>
>>>>>>> I hope this is not a production system since the mailing list now has
>>>>>>> the cinder.conf which contains passwords.
>>>>>>>
>>>>>>> The section that looks like this:
>>>>>>>
>>>>>>> [tripleo_ceph]
>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>> rbd_user=openstack
>>>>>>> rbd_pool=volumes
>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>> report_discard_supported=True
>>>>>>>
>>>>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>
>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of the
>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>> secret-get-value $FSID`.
>>>>>>>
>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>> sites
>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>> following
>>>>>>> it.
>>>>>>>
>>>>>>>
>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>
>>>>>>>   John
>>>>>>>
>>>>>>> >
>>>>>>> > Ceph Output:
>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>> > NAME                                       SIZE     PARENT  FMT
>>>>>>> PROT  LOCK
>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>>>>     excl
>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2
>>>>>>> yes
>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2
>>>>>>> yes
>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2
>>>>>>> yes
>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2
>>>>>>> yes
>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2
>>>>>>> yes
>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2
>>>>>>> yes
>>>>>>> >
>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>> > NAME                                         SIZE     PARENT  FMT
>>>>>>> PROT  LOCK
>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>> >
>>>>>>> > Attached the cinder config.
>>>>>>> > Please let me know how I can solve this issue.
>>>>>>> >
>>>>>>> > With regards,
>>>>>>> > Swogat Pradhan
>>>>>>> >
>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>> config.
>>>>>>> >>
>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>> Update:
>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>>>> >>> The image size is 389 MB.
>>>>>>> >>>
>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Hi Jhon,
>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>>>> after importing from the central site.
>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>> time for the volume to get created.
>>>>>>> >>>>
>>>>>>> >>>> When launching an instance from volume the instance is getting
>>>>>>> created properly without any errors.
>>>>>>> >>>>
>>>>>>> >>>> I tried to cache images in nova using
>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>> but getting checksum failed error.
>>>>>>> >>>>
>>>>>>> >>>> With regards,
>>>>>>> >>>> Swogat Pradhan
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>> johfulto at redhat.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>>> >
>>>>>>> >>>>> > Update: After restarting the nova services on the controller
>>>>>>> and running the deploy script on the edge site, I was able to launch the VM
>>>>>>> from volume.
>>>>>>> >>>>> >
>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>> to the edge glance.
>>>>>>> >>>>>
>>>>>>> >>>>> Try following this document and making the same observations
>>>>>>> in your
>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>> >>>>>
>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>> >>>>>
>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
>>>>>>> >>>>> $
>>>>>>> >>>>>
>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>> which is on
>>>>>>> >>>>> the same local ceph cluster.
>>>>>>> >>>>>
>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>> encountering
>>>>>>> >>>>> the streaming behavior described here:
>>>>>>> >>>>>
>>>>>>> >>>>> Ideally all images should reside in the central Glance and be
>>>>>>> copied
>>>>>>> >>>>> to DCN sites before instances of those images are booted on
>>>>>>> DCN sites.
>>>>>>> >>>>> If an image is not copied to a DCN site before it is booted,
>>>>>>> then the
>>>>>>> >>>>> image will be streamed to the DCN site and then the image will
>>>>>>> boot as
>>>>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>>>>> access to
>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>> booting of
>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>> advance,
>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>> >>>>>
>>>>>>> >>>>> You can also exec into the cinder container at the DCN site and
>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>> >>>>>
>>>>>>> >>>>>   John
>>>>>>> >>>>>
>>>>>>> >>>>> >
>>>>>>> >>>>> > I will try and create a new fresh image and test again then
>>>>>>> update.
>>>>>>> >>>>> >
>>>>>>> >>>>> > With regards,
>>>>>>> >>>>> > Swogat Pradhan
>>>>>>> >>>>> >
>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>>> >>
>>>>>>> >>>>> >> Update:
>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing
>>>>>>> down.
>>>>>>> >>>>> >>
>>>>>>> >>>>> >>
>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
>>>>>>> bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>>>>>> timed out so i waited for the volume to be created.
>>>>>>> >>>>> >>> Once the volume was created i tried launching the instance
>>>>>>> from the volume and still the instance is stuck in spawning state.
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon
>>>>>>> [-] privsep daemon starting
>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon
>>>>>>> [-] privsep process running with uid/gid: 0/0
>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>> [-] privsep process running with capabilities (eff/prm/inh):
>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>> [-] privsep daemon running as pid 185437
>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>> >>>>> >>> Exit code: 2
>>>>>>> >>>>> >>> Stdout: ''
>>>>>>> >>>>> >>> Stderr: '':
>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>> running command.
>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>> template mentioned here ?:
>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> The volume is already created and i do not understand why
>>>>>>> the instance is stuck in spawning state.
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> With regards,
>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>>
>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>> bshephar at redhat.com> wrote:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Does your environment use different network interfaces
>>>>>>> for each of the networks? Or does it have a bond with everything on it?
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>> while spawning the instance.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
>>>>>>> So, based on that experience, from my perspective, is certainly sounds like
>>>>>>> some kind of network issue.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Regards,
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>> wrote:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Hi,
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time
>>>>>>> ago in this thread:
>>>>>>> >>>>> >>>>
>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
>>>>>>> user, not sure if that could apply here. But is it possible that your nova
>>>>>>> and neutron versions are different between central and edge site? Have you
>>>>>>> restarted nova and neutron services on the compute nodes after
>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>> Maybe they can help narrow down the issue.
>>>>>>> >>>>> >>>> If there isn't any additional information in the debug
>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do
>>>>>>> that in a production system yet so be careful. I can think of two routes:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>>>> running, this will most likely impact client IO depending on your load.
>>>>>>> Check out the rabbitmqctl commands.
>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
>>>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>> a better advice.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Regards,
>>>>>>> >>>>> >>>> Eugen
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Hi,
>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> With regards,
>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>> >>>>> >>>> wrote:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Hi
>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
>>>>>>> not due to packet
>>>>>>> >>>>> >>>> loss.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> with regards,
>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>> >>>>> >>>> wrote:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Hi,
>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>> checked when
>>>>>>> >>>>> >>>> launching the instance.
>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>> stuck at spawning
>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure
>>>>>>> if packet loss
>>>>>>> >>>>> >>>> causes this.
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> With regards,
>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <eblock at nde.ag>
>>>>>>> wrote:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>> identical between
>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through the
>>>>>>> tunnel?
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or
>>>>>>> 'cc' as i am not
>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>>>>> list_policies -p
>>>>>>> >>>>> >>>> /
>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>> priority
>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>>
>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>> down when i am
>>>>>>> >>>>> >>>> trying
>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>> spawning state and
>>>>>>> >>>>> >>>> then
>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>>>>>> sites.
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> > With regards,
>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>> >>>>> >>>> > wrote:
>>>>>>> >>>>> >>>> >
>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>> directly, i am
>>>>>>> >>>>> >>>> checking
>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>> reply.
>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>> occurred.
>>>>>>> >>>>> >>>> >>
>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>> activities in the
>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
>>>>>>> >>>>> >>>> >>
>>>>>>> >>>>> >>>> >> With regards,
>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>> >>>>> >>>> >>
>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>> >>>>> >>>> >> wrote:
>>>>>>> >>>>> >>>> >>
>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>>>>>> details:
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>> >>>>> >>>> >>>
>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>> >>>>> >>>> Started
>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>> >>>>> >>>> Started
>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>> >>>>> >>>> Started
>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>> >>>>> >>>> Started
>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
>>>>>>> the issue is
>>>>>>> >>>>> >>>> still
>>>>>>> >>>>> >>>> >>> present.
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>> cluster_status
>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>> >>>>> >>>> >>> Basics
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Versions
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>> >>>>> >>>> 3.8.3
>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>> >>>>> >>>> 3.8.3
>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>> >>>>> >>>> 3.8.3
>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>> >>>>> >>>> >>>
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> (none)
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> (none)
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>> inter-node and CLI
>>>>>>> >>>>> >>>> tool
>>>>>>> >>>>> >>>> >>> communication
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
>>>>>>> AMQP 0-9-1
>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>> inter-node and CLI
>>>>>>> >>>>> >>>> tool
>>>>>>> >>>>> >>>> >>> communication
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
>>>>>>> AMQP 0-9-1
>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>> inter-node and CLI
>>>>>>> >>>>> >>>> tool
>>>>>>> >>>>> >>>> >>> communication
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
>>>>>>> AMQP 0-9-1
>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>> >>>>> >>>> interface:
>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> ,
>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>>>>> purpose:
>>>>>>> >>>>> >>>> inter-node and
>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> ,
>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>> amqp, purpose: AMQP
>>>>>>> >>>>> >>>> 0-9-1
>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>> >>>>> >>>> >>> Node:
>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>> >>>>> >>>> ,
>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>> purpose: HTTP API
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>> >>>>> >>>> >>>
>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
>>>>>>> log.
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>>> The reply
>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>> after 60 seconds
>>>>>>> >>>>> >>>> due to a
>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> The reply
>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>> after 60 seconds
>>>>>>> >>>>> >>>> due to a
>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> The reply
>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>> after 60 seconds
>>>>>>> >>>>> >>>> due to a
>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> Cache enabled
>>>>>>> >>>>> >>>> with
>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>> exist, drop reply to
>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>> The reply
>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>> after 60 seconds
>>>>>>> >>>>> >>>> due to a
>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>> where i am trying to
>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
>>>>>>> (openstack
>>>>>>> >>>>> >>>> compute
>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
>>>>>>> the nova
>>>>>>> >>>>> >>>> compute
>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - -
>>>>>>> -] Running
>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>> 2023-02-26 07:00:00
>>>>>>> >>>>> >>>> to
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> [instance:
>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>> successful on node
>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>> nova.virt.libvirt.driver
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> [instance:
>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>> supplied device
>>>>>>> >>>>> >>>> name:
>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
>>>>>>> names
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>> nova.virt.block_device
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> [instance:
>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>>>>>> volume
>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> Cache enabled
>>>>>>> >>>>> >>>> with
>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> Running
>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>> '--config-file',
>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>> '--privsep_sock_path',
>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> Spawned new
>>>>>>> >>>>> >>>> privsep
>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> Process
>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>>>>> command.
>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>> nova.virt.libvirt.driver
>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>> [instance:
>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>> >>>>> >>>> >>>>>
>>>>>>> >>>>> >>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>> >>>>
>>>>>>> >>>>>
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/a930f10c/attachment-0001.htm>

From renliang at uniontech.com  Thu Mar 23 07:01:05 2023
From: renliang at uniontech.com (=?utf-8?B?5Lu75Lqu?=)
Date: Thu, 23 Mar 2023 15:01:05 +0800
Subject: [ironic]Questions about the use of the build image tool
Message-ID: <tencent_4AB7648B35DF534556D0E656@qq.com>

Hi
We are using diskimage-builder to make a custom image, There is a problem in extract_image, The question is chroot: failed to run command 'bin/tar': No such file or directory, I found the /bin/tar file in the working directory /tmp/ tmp.00yldwe1xt. But errors are still being reported. It's not clear if this is a custom image problem, There are also requirements for custom images.


????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/dad51924/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.log
Type: application/octet-stream
Size: 21357 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/dad51924/attachment-0001.obj>

From renliang at uniontech.com  Thu Mar 23 08:41:00 2023
From: renliang at uniontech.com (=?utf-8?B?5Lu75Lqu?=)
Date: Thu, 23 Mar 2023 16:41:00 +0800
Subject: =?utf-8?B?5Zue5aSN77yaIFtpcm9uaWNdUXVlc3Rpb25zIGFi?=
 =?utf-8?B?b3V0IHRoZSB1c2Ugb2YgdGhlIGJ1aWxkIGltYWdl?=
 =?utf-8?B?IHRvb2w=?=
Message-ID: <tencent_3C35C2E30995DE9F6A73BCD1@qq.com>

I have found the cause of the problem, which is because tar does not exist in the working directory.
I wonder if there are requirements on the base image when building from the base image.


&nbsp;Thank you.


????????


       ----------???????----------
       ??<renliang at uniontech.com&gt;&nbsp;?2023-03-23 ?? 15:01???


Hi
We are using diskimage-builder to make a custom image, There is a problem in extract_image, The question is chroot: failed to run command 'bin/tar': No such file or directory, I found the /bin/tar file in the working directory /tmp/ tmp.00yldwe1xt. But errors are still being reported. It's not clear if this is a custom image problem, There are also requirements for custom images.


????????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/8de33e3f/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Thu Mar 23 12:20:35 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 17:50:35 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
Message-ID: <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>

Hi,
Is this bind not required for cinder_scheduler container?
"/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
I do not see this particular bind on the cinder scheduler containers on my
controller nodes.

With regards,
Swogat Pradhan

On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Cinder volume config:
>
> [tripleo_ceph]
> volume_backend_name=tripleo_ceph
> volume_driver=cinder.volume.drivers.rbd.RBDDriver
> rbd_user=openstack
> rbd_pool=volumes
> rbd_flatten_volume_from_snapshot=False
> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
> report_discard_supported=True
> rbd_ceph_conf=/etc/ceph/dcn02.conf
> rbd_cluster_name=dcn02
>
> Glance api config:
>
> [dcn02]
> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
> rbd_store_user=openstack
> rbd_store_pool=images
> rbd_thin_provisioning=False
> store_description=dcn02 rbd glance store
> [ceph]
> rbd_store_ceph_conf=/etc/ceph/ceph.conf
> rbd_store_user=openstack
> rbd_store_pool=images
> rbd_thin_provisioning=False
> store_description=Default glance store backend.
>
> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> I still have the same issue, I'm not sure what's left to try.
>> All the pods are now in a healthy state, I am getting log entries 3 mins
>> after I hit the create volume button in cinder-volume when I try to create
>> a volume with an image.
>> And the volumes are just stuck in creating state for more than 20 mins
>> now.
>>
>> Cinder logs:
>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>> cinder-volume RPC version 3.17 as minimum service version.
>> 2023-03-22 20:34:59.166 108 INFO
>> cinder.volume.flows.manager.create_volume
>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>> specification: {'status': 'creating', 'volume_name':
>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>> 'owner_specified.openstack.object': 'images/cirros',
>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>> The systems are in same LAN, in this case it seemed like the image was
>>>> getting pulled from the central site which was caused due to an
>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>> directory, which seems to have been resolved after the changes i made to
>>>> fix it.
>>>>
>>>> Right now the glance api podman is running in unhealthy state and the
>>>> podman logs don't show any error whatsoever and when issued the command
>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>> site, which is why cinder is throwing an error stating:
>>>>
>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>> finding address for
>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>> Unable to establish connection to
>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>> ECONNREFUSED',))
>>>>
>>>> Now i need to find out why the port is not listed as the glance service
>>>> is running, which i am not sure how to find out.
>>>>
>>>
>>> One other thing to investigate is whether your deployment includes this
>>> patch [1]. If it does, then bear in mind
>>> the glance-api service running at the edge site will be an "internal"
>>> (non public facing) instance that uses port 9293
>>> instead of 9292. You should familiarize yourself with the release note
>>> [2].
>>>
>>> [1]
>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>> [2]
>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>
>>> Alan
>>>
>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> Update:
>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>
>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>> cinder.volume.flows.manager.create_volume
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>
>>>>>
>>>>> As Adam Savage would say, well there's your problem ^^ (Image download
>>>>> 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and 0.16 MB/s
>>>>> suggests you have a network issue.
>>>>>
>>>>> John Fulton previously stated your cinder-volume service at the edge
>>>>> site is not using the local ceph image store. Assuming you are deploying
>>>>> GlanceApiEdge service [1], then the cinder-volume service should be
>>>>> configured to use the local glance service [2]. You should check cinder's
>>>>> glance_api_servers to confirm it's the edge site's glance service.
>>>>>
>>>>> [1]
>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>> [2]
>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>
>>>>> Alan
>>>>>
>>>>>
>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>   category=FutureWarning)
>>>>>>
>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>   category=FutureWarning)
>>>>>>
>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>> MB/s
>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>> cinder.volume.flows.manager.create_volume
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>
>>>>>> The image is present in dcn02 store but still it downloaded the image
>>>>>> in 0.16 MB/s and then created the volume.
>>>>>>
>>>>>> With regards,
>>>>>> Swogat Pradhan
>>>>>>
>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Jhon,
>>>>>>> This seems to be an issue.
>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>>>>> parameter was specified to the respective cluster names but the config
>>>>>>> files were created in the name of ceph.conf and keyring was
>>>>>>> ceph.client.openstack.keyring.
>>>>>>>
>>>>>>> Which created issues in glance as well as the naming convention of
>>>>>>> the files didn't match the cluster names, so i had to manually rename the
>>>>>>> central ceph conf file as such:
>>>>>>>
>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>> total 16
>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>> ceph_central.client.openstack.keyring
>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>> ceph.client.openstack.keyring
>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>
>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>>>>>> respective clusters in both dcn01 and dcn02.
>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>>> for accessing central ceph cluster.
>>>>>>>
>>>>>>> glance multistore config:
>>>>>>> [dcn02]
>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>> rbd_store_user=openstack
>>>>>>> rbd_store_pool=images
>>>>>>> rbd_thin_provisioning=False
>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>
>>>>>>> [ceph_central]
>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>> rbd_store_user=openstack
>>>>>>> rbd_store_pool=images
>>>>>>> rbd_thin_provisioning=False
>>>>>>> store_description=Default glance store backend.
>>>>>>>
>>>>>>>
>>>>>>> With regards,
>>>>>>> Swogat Pradhan
>>>>>>>
>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Hi,
>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>
>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>
>>>>>>>> I hope this is not a production system since the mailing list now
>>>>>>>> has
>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>
>>>>>>>> The section that looks like this:
>>>>>>>>
>>>>>>>> [tripleo_ceph]
>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>> rbd_user=openstack
>>>>>>>> rbd_pool=volumes
>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>> report_discard_supported=True
>>>>>>>>
>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not the
>>>>>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>
>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of
>>>>>>>> the
>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>> secret-get-value $FSID`.
>>>>>>>>
>>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>>> sites
>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>> following
>>>>>>>> it.
>>>>>>>>
>>>>>>>>
>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>
>>>>>>>>   John
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Ceph Output:
>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>> > NAME                                       SIZE     PARENT  FMT
>>>>>>>> PROT  LOCK
>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>>>>>       excl
>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2
>>>>>>>> yes
>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2
>>>>>>>> yes
>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2
>>>>>>>> yes
>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2
>>>>>>>> yes
>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2
>>>>>>>> yes
>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2
>>>>>>>> yes
>>>>>>>> >
>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>> FMT  PROT  LOCK
>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>> >
>>>>>>>> > Attached the cinder config.
>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>> >
>>>>>>>> > With regards,
>>>>>>>> > Swogat Pradhan
>>>>>>>> >
>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>> config.
>>>>>>>> >>
>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Update:
>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>> >>> The image size is 389 MB.
>>>>>>>> >>>
>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>> Hi Jhon,
>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>>>>> after importing from the central site.
>>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>>> time for the volume to get created.
>>>>>>>> >>>>
>>>>>>>> >>>> When launching an instance from volume the instance is getting
>>>>>>>> created properly without any errors.
>>>>>>>> >>>>
>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>> but getting checksum failed error.
>>>>>>>> >>>>
>>>>>>>> >>>> With regards,
>>>>>>>> >>>> Swogat Pradhan
>>>>>>>> >>>>
>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>> launch the VM from volume.
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>> to the edge glance.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Try following this document and making the same observations
>>>>>>>> in your
>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>> >>>>>
>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>> >>>>>
>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>> excl
>>>>>>>> >>>>> $
>>>>>>>> >>>>>
>>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>>> which is on
>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>> >>>>>
>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>> encountering
>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>> >>>>>
>>>>>>>> >>>>> Ideally all images should reside in the central Glance and be
>>>>>>>> copied
>>>>>>>> >>>>> to DCN sites before instances of those images are booted on
>>>>>>>> DCN sites.
>>>>>>>> >>>>> If an image is not copied to a DCN site before it is booted,
>>>>>>>> then the
>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>> will boot as
>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>>>>>> access to
>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>> booting of
>>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>>> advance,
>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>> >>>>>
>>>>>>>> >>>>> You can also exec into the cinder container at the DCN site
>>>>>>>> and
>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>> >>>>>
>>>>>>>> >>>>>   John
>>>>>>>> >>>>>
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > I will try and create a new fresh image and test again then
>>>>>>>> update.
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > With regards,
>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> Update:
>>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing
>>>>>>>> down.
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the instance
>>>>>>>> timed out so i waited for the volume to be created.
>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon
>>>>>>>> [-] privsep daemon starting
>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon
>>>>>>>> [-] privsep process running with uid/gid: 0/0
>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>> [-] privsep process running with capabilities (eff/prm/inh):
>>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>> [-] privsep daemon running as pid 185437
>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>> running command.
>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>> template mentioned here ?:
>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> The volume is already created and i do not understand why
>>>>>>>> the instance is stuck in spawning state.
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> With regards,
>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Does your environment use different network interfaces
>>>>>>>> for each of the networks? Or does it have a bond with everything on it?
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>> while spawning the instance.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
>>>>>>>> So, based on that experience, from my perspective, is certainly sounds like
>>>>>>>> some kind of network issue.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Regards,
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>>> wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Hi,
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time
>>>>>>>> ago in this thread:
>>>>>>>> >>>>> >>>>
>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>> >>>>> >>>> If there isn't any additional information in the debug
>>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do
>>>>>>>> that in a production system yet so be careful. I can think of two routes:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>>>>> running, this will most likely impact client IO depending on your load.
>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
>>>>>>>> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>> a better advice.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Regards,
>>>>>>>> >>>>> >>>> Eugen
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Hi,
>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> With regards,
>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>> >>>>> >>>> wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Hi
>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
>>>>>>>> not due to packet
>>>>>>>> >>>>> >>>> loss.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> with regards,
>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>> >>>>> >>>> wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Hi,
>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>> checked when
>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>> stuck at spawning
>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure
>>>>>>>> if packet loss
>>>>>>>> >>>>> >>>> causes this.
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> With regards,
>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>> identical between
>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through
>>>>>>>> the tunnel?
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or
>>>>>>>> 'cc' as i am not
>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>>>>>> list_policies -p
>>>>>>>> >>>>> >>>> /
>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>>>   priority
>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>>
>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>>> down when i am
>>>>>>>> >>>>> >>>> trying
>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>> spawning state and
>>>>>>>> >>>>> >>>> then
>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
>>>>>>>> sites.
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>> >>>>> >>>> >
>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>> directly, i am
>>>>>>>> >>>>> >>>> checking
>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>> reply.
>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>> occurred.
>>>>>>>> >>>>> >>>> >>
>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>> activities in the
>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>> site.*
>>>>>>>> >>>>> >>>> >>
>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>> >>>>> >>>> >>
>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>> >>>>> >>>> >>
>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
>>>>>>>> details:
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>> >>>>> >>>> Started
>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>> >>>>> >>>> Started
>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>> >>>>> >>>> Started
>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>> >>>>> >>>> Started
>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times
>>>>>>>> but the issue is
>>>>>>>> >>>>> >>>> still
>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>> cluster_status
>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>> inter-node and CLI
>>>>>>>> >>>>> >>>> tool
>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
>>>>>>>> AMQP 0-9-1
>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>> inter-node and CLI
>>>>>>>> >>>>> >>>> tool
>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
>>>>>>>> AMQP 0-9-1
>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>> inter-node and CLI
>>>>>>>> >>>>> >>>> tool
>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
>>>>>>>> AMQP 0-9-1
>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>> >>>>> >>>> interface:
>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> ,
>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>>>>>> purpose:
>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> ,
>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>> amqp, purpose: AMQP
>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>> >>>>> >>>> ,
>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>> purpose: HTTP API
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>> >>>>> >>>> >>>
>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
>>>>>>>> log.
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>> -] The reply
>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>> after 60 seconds
>>>>>>>> >>>>> >>>> due to a
>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>> -] The reply
>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>> after 60 seconds
>>>>>>>> >>>>> >>>> due to a
>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>> -] The reply
>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>> after 60 seconds
>>>>>>>> >>>>> >>>> due to a
>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
>>>>>>>> Cache enabled
>>>>>>>> >>>>> >>>> with
>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>> exist, drop reply to
>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>> -] The reply
>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>> after 60 seconds
>>>>>>>> >>>>> >>>> due to a
>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>> where i am trying to
>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>> down (openstack
>>>>>>>> >>>>> >>>> compute
>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>> restart the nova
>>>>>>>> >>>>> >>>> compute
>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - -
>>>>>>>> -] Running
>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>> 2023-02-26 07:00:00
>>>>>>>> >>>>> >>>> to
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] [instance:
>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>> successful on node
>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>> nova.virt.libvirt.driver
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] [instance:
>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>> supplied device
>>>>>>>> >>>>> >>>> name:
>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
>>>>>>>> names
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>> nova.virt.block_device
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] [instance:
>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
>>>>>>>> volume
>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] Cache enabled
>>>>>>>> >>>>> >>>> with
>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] Running
>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>> '--config-file',
>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>> '--privsep_sock_path',
>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] Spawned new
>>>>>>>> >>>>> >>>> privsep
>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] Process
>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>>>>>> command.
>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>> nova.virt.libvirt.driver
>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>> default] [instance:
>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>> image
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>> >>>>> >>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>>
>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/51decdd1/attachment-0001.htm>

From noonedeadpunk at gmail.com  Thu Mar 23 12:35:02 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 23 Mar 2023 13:35:02 +0100
Subject: [nova]host cpu reserve
In-Reply-To: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
Message-ID: <CAPd_6AvTpcrdrvyPBbCMWfj056mhCx1LeGs444nGZv5L4nYb6w@mail.gmail.com>

Hey,

It's a config option for nova-compute:
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus

Also related ones are regarding ram and disk.

You might also find a good idea to apply a cgroups rule to ensure you
_really_ have CPU reserved, like this:
https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd


??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:

> Hello guys.
> I am trying google for nova host cpu reserve to prevent host overload but
> I cannot find any resource about it. Could you give me some information?
> Thanks.
> Nguyen Huu Khoi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/dea428ef/attachment.htm>

From noonedeadpunk at gmail.com  Thu Mar 23 12:36:53 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 23 Mar 2023 13:36:53 +0100
Subject: [nova]host cpu reserve
In-Reply-To: <CAPd_6AvTpcrdrvyPBbCMWfj056mhCx1LeGs444nGZv5L4nYb6w@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <CAPd_6AvTpcrdrvyPBbCMWfj056mhCx1LeGs444nGZv5L4nYb6w@mail.gmail.com>
Message-ID: <CAPd_6AuyFBvm6EEig1+u0k7-AWPrN_4q4FTkOtQVgm=a0WCtLA@mail.gmail.com>

Forget my reply, Sean's proposal is way better and the correct one.

??, 23 ???. 2023 ?., 13:35 Dmitriy Rabotyagov <noonedeadpunk at gmail.com>:

> Hey,
>
> It's a config option for nova-compute:
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>
> Also related ones are regarding ram and disk.
>
> You might also find a good idea to apply a cgroups rule to ensure you
> _really_ have CPU reserved, like this:
> https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd
>
>
>
> ??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
>
>> Hello guys.
>> I am trying google for nova host cpu reserve to prevent host overload but
>> I cannot find any resource about it. Could you give me some information?
>> Thanks.
>> Nguyen Huu Khoi
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/65ad5a8f/attachment.htm>

From nguyenhuukhoinw at gmail.com  Thu Mar 23 12:55:52 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 19:55:52 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <CAPd_6AuyFBvm6EEig1+u0k7-AWPrN_4q4FTkOtQVgm=a0WCtLA@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <CAPd_6AvTpcrdrvyPBbCMWfj056mhCx1LeGs444nGZv5L4nYb6w@mail.gmail.com>
 <CAPd_6AuyFBvm6EEig1+u0k7-AWPrN_4q4FTkOtQVgm=a0WCtLA@mail.gmail.com>
Message-ID: <CABAODRfnYkW+=x=deuoDu=mSNOyc9Jp-vGd=bCmjjY_PNLfv9A@mail.gmail.com>

Thank both of you much.
I am using cpu allocation ratio but I dont understand how host cpu can work
if all vm using 100% cpu. Vmware have cpu and ram reserve for host.

On Thu, Mar 23, 2023, 7:44 PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> Forget my reply, Sean's proposal is way better and the correct one.
>
> ??, 23 ???. 2023 ?., 13:35 Dmitriy Rabotyagov <noonedeadpunk at gmail.com>:
>
>> Hey,
>>
>> It's a config option for nova-compute:
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>>
>> Also related ones are regarding ram and disk.
>>
>> You might also find a good idea to apply a cgroups rule to ensure you
>> _really_ have CPU reserved, like this:
>> https://gist.github.com/noonedeadpunk/a4e691e64da031084c071b554a5b40cd
>>
>>
>>
>> ??, 23 ???. 2023 ?., 08:48 Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
>>
>>> Hello guys.
>>> I am trying google for nova host cpu reserve to prevent host overload
>>> but I cannot find any resource about it. Could you give me some information?
>>> Thanks.
>>> Nguyen Huu Khoi
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/0533855f/attachment.htm>

From noonedeadpunk at gmail.com  Thu Mar 23 13:13:52 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 23 Mar 2023 14:13:52 +0100
Subject: [nova]host cpu reserve
In-Reply-To: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
Message-ID: <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>

Just to double check with you, given that you have
cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
physical cores, then it should be defined like:

[compute]
cpu_shared_set="2-32,34-64,66-96,98-128"?

> in general you shoudl reserve the first core on each cpu socket for the host os.
> if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
> form the cpu_shared_set and cpu_dedicated_set

??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
>
> generally you should not
> you can use it but the preferd way to do this is use
> cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set)
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
>
> if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms
> when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use.
>
> https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/
>
> have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set,
>
> the issue with usein
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>
> is that you have to multiple the number of cores that are resverved by the
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
>
> which means if you decide to manage that via placement api by using
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead
> then you need to update your nova.conf to modify the reservationfi you change the allocation ratio.
>
> if instead you use cpu_shared_set and cpu_dedicated_set
> you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd.
>
> in general you shoudl reserve the first core on each cpu socket for the host os.
> if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
> form the cpu_shared_set and cpu_dedicated_set
>
>
>
> On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> > Hello guys.
> > I am trying google for nova host cpu reserve to prevent host overload but I
> > cannot find any resource about it. Could you give me some information?
> > Thanks.
> > Nguyen Huu Khoi
>
>


From nguyenhuukhoinw at gmail.com  Thu Mar 23 13:35:02 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 20:35:02 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
Message-ID: <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>

Ok. I will try to understand it. I will let you know when I get it.
Many thanks for your help. :)

On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> Just to double check with you, given that you have
> cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
> physical cores, then it should be defined like:
>
> [compute]
> cpu_shared_set="2-32,34-64,66-96,98-128"?
>
> > in general you shoudl reserve the first core on each cpu socket for the
> host os.
> > if you use hyperthreading then both hyperthread of the first cpu core on
> each socket shoudl be omitted
> > form the cpu_shared_set and cpu_dedicated_set
>
> ??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
> >
> > generally you should not
> > you can use it but the preferd way to do this is use
> > cpu_shared_set and cpu_dedicated_set (in old releases you would have
> used vcpu_pin_set)
> >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
> >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
> >
> > if you dont need cpu pinning just use cpu_share_set to spcify the cores
> that can be sued for floatign vms
> > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified
> are reseved for host use.
> >
> > https://that.guru/blog/cpu-resources/ and
> https://that.guru/blog/cpu-resources-redux/
> >
> > have some useful info but that mostly looking at it form a cpu pinning
> angel althoguh the secon one covers cpu_shared_set,
> >
> > the issue with usein
> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
> >
> > is that you have to multiple the number of cores that are resverved by
> the
> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
> >
> > which means if you decide to manage that via placement api by using
> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio
> instead
> > then you need to update your nova.conf to modify the reservationfi you
> change the allocation ratio.
> >
> > if instead you use cpu_shared_set and cpu_dedicated_set
> > you are specifying exactly which cpus nova can use and the allocation
> ration nolonger needs to be conisderd.
> >
> > in general you shoudl reserve the first core on each cpu socket for the
> host os.
> > if you use hyperthreading then both hyperthread of the first cpu core on
> each socket shoudl be omitted
> > form the cpu_shared_set and cpu_dedicated_set
> >
> >
> >
> > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> > > Hello guys.
> > > I am trying google for nova host cpu reserve to prevent host overload
> but I
> > > cannot find any resource about it. Could you give me some information?
> > > Thanks.
> > > Nguyen Huu Khoi
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/6e2ddd8c/attachment-0001.htm>

From senrique at redhat.com  Thu Mar 23 13:50:48 2023
From: senrique at redhat.com (Sofia Enriquez)
Date: Thu, 23 Mar 2023 13:50:48 +0000
Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting)
In-Reply-To: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>
References: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>
Message-ID: <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>

Hi Sylvain,

I hope you're doing well. Apologies for the delay in responding to your
previous message. I wanted to suggest a cross-project topic with Cinder
that involves adding support for NFS encryption.

To complement the work on Cinder[3], I've proposed two patches [1][2] and
would appreciate any feedback you may have. I believe it would be
beneficial to discuss my approach during the PTG, but I'm open to
discussing it during a weekly meeting as well.

I've looked at the etherpad and am considering the first slot on Wednesday
or the last slot on Thursday or Friday.

Please let me know your thoughts.

Thank you,
Sofia

[1] https://review.opendev.org/c/openstack/nova/+/854030
[2] https://review.opendev.org/c/openstack/nova/+/870012
[3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption

On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza <sbauza at redhat.com> wrote:

> Hey folks,
>
> As a reminder, the Nova community will discuss at the vPTG. You can see
> the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg
>
> Our agenda will be from Tuesday to Friday, everyday between 1300UTC and
> 1700UTC. Connection details are in the etherpad above, but you can also use
> PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo
> room for all the discussions)
>
> You can't stick around for 4 hours x 4 days ? Heh, no worries !
> If you (as an operator or a developer) want to engage with us (and we'd
> love this honestly), you have two possibilities :
>  - either you prefer to listen (and talk) to some topics you've seen in
> the agenda, and then add your IRC nick (details how to use IRC are
> explained by [1]) on the topics you want. Once we start to discuss about
> those topics, I'll ping the courtesy ping list of each topic on
> #openstack-nova. Just make sure you're around in the IRC channel.
>  - or you prefer to engage with us about some pain points or some feature
> requests, and then the right time is the Nova Operator Hour that will be on
> *Tuesday 1500UTC*. We have a specific etherpad for this session :
> https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you
> can preemptively add your thoughts or concerns.
>
> Anyway, we are eager to meet you all !
>
> Oh, last point, given we will be at the vPTG, next week's weekly meeting
> on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk
> the #openstack-nova channel ;-)
>
> See you next week !
> -Sylvain
>
> [1]
> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html
>
>
>

-- 

Sof?a Enriquez

she/her

Software Engineer

Red Hat PnT <https://www.redhat.com>

IRC: @enriquetaso
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/31b32c47/attachment.htm>

From nguyenhuukhoinw at gmail.com  Thu Mar 23 13:50:59 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 20:50:59 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
Message-ID: <CABAODRdTFbfPOajcqPqnTkYa7eJ=qgvDb7qv6nSZJYvnJPTiLg@mail.gmail.com>

Could you help me to explain how host cpu handle with cpu ratio?

On Thu, Mar 23, 2023, 7:10 PM Sean Mooney <smooney at redhat.com> wrote:

> generally you should not
> you can use it but the preferd way to do this is use
> cpu_shared_set and cpu_dedicated_set (in old releases you would have used
> vcpu_pin_set)
>
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
>
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
>
> if you dont need cpu pinning just use cpu_share_set to spcify the cores
> that can be sued for floatign vms
> when you use cpu_shared_set and cpu_dedicated_set any cpu not specified
> are reseved for host use.
>
> https://that.guru/blog/cpu-resources/ and
> https://that.guru/blog/cpu-resources-redux/
>
> have some useful info but that mostly looking at it form a cpu pinning
> angel althoguh the secon one covers cpu_shared_set,
>
> the issue with usein
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>
> is that you have to multiple the number of cores that are resverved by the
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
>
> which means if you decide to manage that via placement api by using
>
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio
> instead
> then you need to update your nova.conf to modify the reservationfi you
> change the allocation ratio.
>
> if instead you use cpu_shared_set and cpu_dedicated_set
> you are specifying exactly which cpus nova can use and the allocation
> ration nolonger needs to be conisderd.
>
> in general you shoudl reserve the first core on each cpu socket for the
> host os.
> if you use hyperthreading then both hyperthread of the first cpu core on
> each socket shoudl be omitted
> form the cpu_shared_set and cpu_dedicated_set
>
>
>
> On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> > Hello guys.
> > I am trying google for nova host cpu reserve to prevent host overload
> but I
> > cannot find any resource about it. Could you give me some information?
> > Thanks.
> > Nguyen Huu Khoi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/7ca6545b/attachment.htm>

From noonedeadpunk at gmail.com  Thu Mar 23 13:51:14 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 23 Mar 2023 14:51:14 +0100
Subject: [nova]host cpu reserve
In-Reply-To: <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
 <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>
Message-ID: <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>

Just in case, you DO have options to control cpu and ram reservation
for the hypervisor. It's just more about that, that it's not the best
way to do it, especially if you're overcommitting, as things in real
life are more complicated then just defining the amount of reserved
CPUs.

For example, if you have cpu_allocation_ratio set to 3, then you're
getting 3 times more CPUs to signup VMs then you actually have
(cores*sockets*threads*cpu_allocation_ratio). With that you really
can't set any decent amount of reserved CPUs that will 100% ensure
that hypervisor will be able to gain required resources at any given
time. So with that approach the only option is to disable cpu
overcommit, but even then you might get CPU in socket 1 fully utilized
which might have negative side-effects for the hypervisor.

And based on that, as Sean has mentioned, you can tell nova to
explicitly exclude specific cores from being utilized, which will make
them reserved for the hypervisor.

??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
>
> Ok. I will try to understand it. I will let you know when I get it.
> Many thanks for your help. :)
>
> On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
>>
>> Just to double check with you, given that you have
>> cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
>> physical cores, then it should be defined like:
>>
>> [compute]
>> cpu_shared_set="2-32,34-64,66-96,98-128"?
>>
>> > in general you shoudl reserve the first core on each cpu socket for the host os.
>> > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
>> > form the cpu_shared_set and cpu_dedicated_set
>>
>> ??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
>> >
>> > generally you should not
>> > you can use it but the preferd way to do this is use
>> > cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set)
>> > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
>> > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
>> >
>> > if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms
>> > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use.
>> >
>> > https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/
>> >
>> > have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set,
>> >
>> > the issue with usein
>> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>> >
>> > is that you have to multiple the number of cores that are resverved by the
>> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
>> >
>> > which means if you decide to manage that via placement api by using
>> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead
>> > then you need to update your nova.conf to modify the reservationfi you change the allocation ratio.
>> >
>> > if instead you use cpu_shared_set and cpu_dedicated_set
>> > you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd.
>> >
>> > in general you shoudl reserve the first core on each cpu socket for the host os.
>> > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
>> > form the cpu_shared_set and cpu_dedicated_set
>> >
>> >
>> >
>> > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
>> > > Hello guys.
>> > > I am trying google for nova host cpu reserve to prevent host overload but I
>> > > cannot find any resource about it. Could you give me some information?
>> > > Thanks.
>> > > Nguyen Huu Khoi
>> >
>> >


From nguyenhuukhoinw at gmail.com  Thu Mar 23 13:57:42 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 20:57:42 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <CABAODRdTFbfPOajcqPqnTkYa7eJ=qgvDb7qv6nSZJYvnJPTiLg@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CABAODRdTFbfPOajcqPqnTkYa7eJ=qgvDb7qv6nSZJYvnJPTiLg@mail.gmail.com>
Message-ID: <CABAODReD6-KOXzqB5Mw45nsv+NhLN=Gzeq67NuH7=5r4nC8hSQ@mail.gmail.com>

Hello Dmitriy Rabotyagov and Sean Mooney, very thank you for your sharing.
Nguyen Huu Khoi


On Thu, Mar 23, 2023 at 8:50?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Could you help me to explain how host cpu handle with cpu ratio?
>
> On Thu, Mar 23, 2023, 7:10 PM Sean Mooney <smooney at redhat.com> wrote:
>
>> generally you should not
>> you can use it but the preferd way to do this is use
>> cpu_shared_set and cpu_dedicated_set (in old releases you would have used
>> vcpu_pin_set)
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
>>
>> if you dont need cpu pinning just use cpu_share_set to spcify the cores
>> that can be sued for floatign vms
>> when you use cpu_shared_set and cpu_dedicated_set any cpu not specified
>> are reseved for host use.
>>
>> https://that.guru/blog/cpu-resources/ and
>> https://that.guru/blog/cpu-resources-redux/
>>
>> have some useful info but that mostly looking at it form a cpu pinning
>> angel althoguh the secon one covers cpu_shared_set,
>>
>> the issue with usein
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
>>
>> is that you have to multiple the number of cores that are resverved by
>> the
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
>>
>> which means if you decide to manage that via placement api by using
>>
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio
>> instead
>> then you need to update your nova.conf to modify the reservationfi you
>> change the allocation ratio.
>>
>> if instead you use cpu_shared_set and cpu_dedicated_set
>> you are specifying exactly which cpus nova can use and the allocation
>> ration nolonger needs to be conisderd.
>>
>> in general you shoudl reserve the first core on each cpu socket for the
>> host os.
>> if you use hyperthreading then both hyperthread of the first cpu core on
>> each socket shoudl be omitted
>> form the cpu_shared_set and cpu_dedicated_set
>>
>>
>>
>> On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
>> > Hello guys.
>> > I am trying google for nova host cpu reserve to prevent host overload
>> but I
>> > cannot find any resource about it. Could you give me some information?
>> > Thanks.
>> > Nguyen Huu Khoi
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/6dd98c7c/attachment.htm>

From nguyenhuukhoinw at gmail.com  Thu Mar 23 14:26:18 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Thu, 23 Mar 2023 21:26:18 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
 <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>
 <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>
Message-ID: <CABAODRexq4sTVoEOkNyUDndjaKwMuk-CkbvUp0vHi_kcmeGtcg@mail.gmail.com>

Hi.
Too many new things for me. It is interesting. I will read more. Thank
you Dmitriy
Rabotyagov
Nice to meet you!

Nguyen Huu Khoi


On Thu, Mar 23, 2023 at 8:58?PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> Just in case, you DO have options to control cpu and ram reservation
> for the hypervisor. It's just more about that, that it's not the best
> way to do it, especially if you're overcommitting, as things in real
> life are more complicated then just defining the amount of reserved
> CPUs.
>
> For example, if you have cpu_allocation_ratio set to 3, then you're
> getting 3 times more CPUs to signup VMs then you actually have
> (cores*sockets*threads*cpu_allocation_ratio). With that you really
> can't set any decent amount of reserved CPUs that will 100% ensure
> that hypervisor will be able to gain required resources at any given
> time. So with that approach the only option is to disable cpu
> overcommit, but even then you might get CPU in socket 1 fully utilized
> which might have negative side-effects for the hypervisor.
>
> And based on that, as Sean has mentioned, you can tell nova to
> explicitly exclude specific cores from being utilized, which will make
> them reserved for the hypervisor.
>
> ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
> >
> > Ok. I will try to understand it. I will let you know when I get it.
> > Many thanks for your help. :)
> >
> > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov <
> noonedeadpunk at gmail.com> wrote:
> >>
> >> Just to double check with you, given that you have
> >> cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
> >> physical cores, then it should be defined like:
> >>
> >> [compute]
> >> cpu_shared_set="2-32,34-64,66-96,98-128"?
> >>
> >> > in general you shoudl reserve the first core on each cpu socket for
> the host os.
> >> > if you use hyperthreading then both hyperthread of the first cpu core
> on each socket shoudl be omitted
> >> > form the cpu_shared_set and cpu_dedicated_set
> >>
> >> ??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
> >> >
> >> > generally you should not
> >> > you can use it but the preferd way to do this is use
> >> > cpu_shared_set and cpu_dedicated_set (in old releases you would have
> used vcpu_pin_set)
> >> >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
> >> >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
> >> >
> >> > if you dont need cpu pinning just use cpu_share_set to spcify the
> cores that can be sued for floatign vms
> >> > when you use cpu_shared_set and cpu_dedicated_set any cpu not
> specified are reseved for host use.
> >> >
> >> > https://that.guru/blog/cpu-resources/ and
> https://that.guru/blog/cpu-resources-redux/
> >> >
> >> > have some useful info but that mostly looking at it form a cpu
> pinning angel althoguh the secon one covers cpu_shared_set,
> >> >
> >> > the issue with usein
> >> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
> >> >
> >> > is that you have to multiple the number of cores that are resverved
> by the
> >> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
> >> >
> >> > which means if you decide to manage that via placement api by using
> >> >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio
> instead
> >> > then you need to update your nova.conf to modify the reservationfi
> you change the allocation ratio.
> >> >
> >> > if instead you use cpu_shared_set and cpu_dedicated_set
> >> > you are specifying exactly which cpus nova can use and the allocation
> ration nolonger needs to be conisderd.
> >> >
> >> > in general you shoudl reserve the first core on each cpu socket for
> the host os.
> >> > if you use hyperthreading then both hyperthread of the first cpu core
> on each socket shoudl be omitted
> >> > form the cpu_shared_set and cpu_dedicated_set
> >> >
> >> >
> >> >
> >> > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> >> > > Hello guys.
> >> > > I am trying google for nova host cpu reserve to prevent host
> overload but I
> >> > > cannot find any resource about it. Could you give me some
> information?
> >> > > Thanks.
> >> > > Nguyen Huu Khoi
> >> >
> >> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/033f3bcb/attachment.htm>

From smooney at redhat.com  Thu Mar 23 14:29:49 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 23 Mar 2023 14:29:49 +0000
Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting)
In-Reply-To: <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>
References: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>
 <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>
Message-ID: <dd2c626457d477ecb2eabe2a69f6e2d5ee5b36c2.camel@redhat.com>

On Thu, 2023-03-23 at 13:50 +0000, Sofia Enriquez wrote:
> Hi Sylvain,
> 
> I hope you're doing well. Apologies for the delay in responding to your
> previous message. I wanted to suggest a cross-project topic with Cinder
> that involves adding support for NFS encryption.
> 
> To complement the work on Cinder[3], I've proposed two patches [1][2] and
> would appreciate any feedback you may have. I believe it would be
> beneficial to discuss my approach during the PTG, but I'm open to
> discussing it during a weekly meeting as well.
i have seen those patches come in over the last while but since you did not create a nova bluepirnt
or spec i was assuming that you were waiting for the ptg to talk about them.
i have not atcully reviewed them but we can defeintly discuss it next week
if the changes are small we can proably proceed with a specless bluepirnt but if they have any api
impact or upgrade consideration or move operatoion considertioant we will need an actul spec even
if its short.
> 
> I've looked at the etherpad and am considering the first slot on Wednesday
> or the last slot on Thursday or Friday.
> 
> Please let me know your thoughts.
> 
> Thank you,
> Sofia
> 
> [1] https://review.opendev.org/c/openstack/nova/+/854030
> [2] https://review.opendev.org/c/openstack/nova/+/870012
> [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption
> 
> On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza <sbauza at redhat.com> wrote:
> 
> > Hey folks,
> > 
> > As a reminder, the Nova community will discuss at the vPTG. You can see
> > the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg
> > 
> > Our agenda will be from Tuesday to Friday, everyday between 1300UTC and
> > 1700UTC. Connection details are in the etherpad above, but you can also use
> > PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo
> > room for all the discussions)
> > 
> > You can't stick around for 4 hours x 4 days ? Heh, no worries !
> > If you (as an operator or a developer) want to engage with us (and we'd
> > love this honestly), you have two possibilities :
> >  - either you prefer to listen (and talk) to some topics you've seen in
> > the agenda, and then add your IRC nick (details how to use IRC are
> > explained by [1]) on the topics you want. Once we start to discuss about
> > those topics, I'll ping the courtesy ping list of each topic on
> > #openstack-nova. Just make sure you're around in the IRC channel.
> >  - or you prefer to engage with us about some pain points or some feature
> > requests, and then the right time is the Nova Operator Hour that will be on
> > *Tuesday 1500UTC*. We have a specific etherpad for this session :
> > https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where you
> > can preemptively add your thoughts or concerns.
> > 
> > Anyway, we are eager to meet you all !
> > 
> > Oh, last point, given we will be at the vPTG, next week's weekly meeting
> > on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk
> > the #openstack-nova channel ;-)
> > 
> > See you next week !
> > -Sylvain
> > 
> > [1]
> > https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html
> > 
> > 
> > 
> 


From sbauza at redhat.com  Thu Mar 23 14:32:44 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Thu, 23 Mar 2023 15:32:44 +0100
Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting)
In-Reply-To: <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>
References: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>
 <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>
Message-ID: <CALOCmukOONqWe6p_mvZxhfxjYEP7QUuHRhj66yE-v_01UwNeTw@mail.gmail.com>

Le jeu. 23 mars 2023 ? 14:51, Sofia Enriquez <senrique at redhat.com> a ?crit :

>
> Hi Sylvain,
>
> I hope you're doing well. Apologies for the delay in responding to your
> previous message. I wanted to suggest a cross-project topic with Cinder
> that involves adding support for NFS encryption.
>
> To complement the work on Cinder[3], I've proposed two patches [1][2] and
> would appreciate any feedback you may have. I believe it would be
> beneficial to discuss my approach during the PTG, but I'm open to
> discussing it during a weekly meeting as well.
>
>
Hey Sofia, I'm indeed well, thank you. I was actually considering to ping
Rajat over IRC since we weren't having any cinder-related topics yet in the
etherpad but I was almost sure that we would have some last-minute thoughts
during the week.
I'm quite OK with discussing your item into some cross-project session.

I've looked at the etherpad and am considering the first slot on Wednesday
> or the last slot on Thursday or Friday.
>

Cool, I'll arrange some common timeslot between teams with Rajat once he's
done with the OpenInfra Live presentation, like me :-)
-Sylvain


> Please let me know your thoughts.
>
> Thank you,
> Sofia
>
> [1] https://review.opendev.org/c/openstack/nova/+/854030
> [2] https://review.opendev.org/c/openstack/nova/+/870012
> [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption
>
> On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza <sbauza at redhat.com> wrote:
>
>> Hey folks,
>>
>> As a reminder, the Nova community will discuss at the vPTG. You can see
>> the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg
>>
>> Our agenda will be from Tuesday to Friday, everyday between 1300UTC and
>> 1700UTC. Connection details are in the etherpad above, but you can also use
>> PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo
>> room for all the discussions)
>>
>> You can't stick around for 4 hours x 4 days ? Heh, no worries !
>> If you (as an operator or a developer) want to engage with us (and we'd
>> love this honestly), you have two possibilities :
>>  - either you prefer to listen (and talk) to some topics you've seen in
>> the agenda, and then add your IRC nick (details how to use IRC are
>> explained by [1]) on the topics you want. Once we start to discuss about
>> those topics, I'll ping the courtesy ping list of each topic on
>> #openstack-nova. Just make sure you're around in the IRC channel.
>>  - or you prefer to engage with us about some pain points or some feature
>> requests, and then the right time is the Nova Operator Hour that will be on
>> *Tuesday 1500UTC*. We have a specific etherpad for this session :
>> https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where
>> you can preemptively add your thoughts or concerns.
>>
>> Anyway, we are eager to meet you all !
>>
>> Oh, last point, given we will be at the vPTG, next week's weekly meeting
>> on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk
>> the #openstack-nova channel ;-)
>>
>> See you next week !
>> -Sylvain
>>
>> [1]
>> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html
>>
>>
>>
>
> --
>
> Sof?a Enriquez
>
> she/her
>
> Software Engineer
>
> Red Hat PnT <https://www.redhat.com>
>
> IRC: @enriquetaso
> @RedHat <https://twitter.com/redhat>   Red Hat
> <https://www.linkedin.com/company/red-hat>  Red Hat
> <https://www.facebook.com/RedHatInc>
> <https://www.redhat.com>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/c11b488a/attachment.htm>

From smooney at redhat.com  Thu Mar 23 14:47:44 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 23 Mar 2023 14:47:44 +0000
Subject: [nova]host cpu reserve
In-Reply-To: <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
 <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>
 <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>
Message-ID: <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com>

On Thu, 2023-03-23 at 14:51 +0100, Dmitriy Rabotyagov wrote:
> Just in case, you DO have options to control cpu and ram reservation
> for the hypervisor. It's just more about that, that it's not the best
> way to do it, especially if you're overcommitting, as things in real
> life are more complicated then just defining the amount of reserved
> CPUs.
> 
> For example, if you have cpu_allocation_ratio set to 3, then you're
> getting 3 times more CPUs to signup VMs then you actually have
> (cores*sockets*threads*cpu_allocation_ratio). With that you really
> can't set any decent amount of reserved CPUs that will 100% ensure
> that hypervisor will be able to gain required resources at any given
> time. So with that approach the only option is to disable cpu
> overcommit, but even then you might get CPU in socket 1 fully utilized
> which might have negative side-effects for the hypervisor.
> 
> And based on that, as Sean has mentioned, you can tell nova to
> explicitly exclude specific cores from being utilized, which will make
> them reserved for the hypervisor.
yep exactly.
without geting into all the details the host reserved cpu option was added in the really early days and then vcpu_pin_set was
added to adress the fact that the existing option didnt really work the way peopel wanted.
it was later used for cpu pinning and we realise we wanted to have 2 sepreate pools of cpus

cpu_shared_set for shared core useed by floating vms (anything with out hw:cpu_policy=dedicated) and
cpu_dedicated_set for explictlly pinned vms.

in general using cpu_shared_set and cpu_dedicated_set is a much more intitive way to resver cores since you get to select exaction which
cores can be used for nova vms.

that allows you do the use systemd or other tools like taskset  to affiites nova-cpu or libvirtd or  sshd to run  on core that wont have vms
that prevents the vms form staving those host process of cpu resouces.
> 
> ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>:
> > 
> > Ok. I will try to understand it. I will let you know when I get it.
> > Many thanks for your help. :)
> > 
> > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com> wrote:
> > > 
> > > Just to double check with you, given that you have
> > > cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
> > > physical cores, then it should be defined like:
> > > 
> > > [compute]
> > > cpu_shared_set="2-32,34-64,66-96,98-128"?
> > > 
> > > > in general you shoudl reserve the first core on each cpu socket for the host os.
> > > > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
> > > > form the cpu_shared_set and cpu_dedicated_set
> > > 
> > > ??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
> > > > 
> > > > generally you should not
> > > > you can use it but the preferd way to do this is use
> > > > cpu_shared_set and cpu_dedicated_set (in old releases you would have used vcpu_pin_set)
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
> > > > 
> > > > if you dont need cpu pinning just use cpu_share_set to spcify the cores that can be sued for floatign vms
> > > > when you use cpu_shared_set and cpu_dedicated_set any cpu not specified are reseved for host use.
> > > > 
> > > > https://that.guru/blog/cpu-resources/ and https://that.guru/blog/cpu-resources-redux/
> > > > 
> > > > have some useful info but that mostly looking at it form a cpu pinning angel althoguh the secon one covers cpu_shared_set,
> > > > 
> > > > the issue with usein
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
> > > > 
> > > > is that you have to multiple the number of cores that are resverved by the
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
> > > > 
> > > > which means if you decide to manage that via placement api by using
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio instead
> > > > then you need to update your nova.conf to modify the reservationfi you change the allocation ratio.
> > > > 
> > > > if instead you use cpu_shared_set and cpu_dedicated_set
> > > > you are specifying exactly which cpus nova can use and the allocation ration nolonger needs to be conisderd.
> > > > 
> > > > in general you shoudl reserve the first core on each cpu socket for the host os.
> > > > if you use hyperthreading then both hyperthread of the first cpu core on each socket shoudl be omitted
> > > > form the cpu_shared_set and cpu_dedicated_set
> > > > 
> > > > 
> > > > 
> > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> > > > > Hello guys.
> > > > > I am trying google for nova host cpu reserve to prevent host overload but I
> > > > > cannot find any resource about it. Could you give me some information?
> > > > > Thanks.
> > > > > Nguyen Huu Khoi
> > > > 
> > > > 
> 


From sbauza at redhat.com  Thu Mar 23 16:54:13 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Thu, 23 Mar 2023 17:54:13 +0100
Subject: [nova][ptg][ops] Nova at the vPTG (+ skipping next weekly meeting)
In-Reply-To: <CALOCmukOONqWe6p_mvZxhfxjYEP7QUuHRhj66yE-v_01UwNeTw@mail.gmail.com>
References: <CALOCmungOJPUQty1WHTkhd8zro9SD5qJFh2v4O5CsL2Ww5W+Lg@mail.gmail.com>
 <CANtmtpGnD_QjeQOnqzv8vfVpFb+Wm9Oh6fbbqShjU=Rb7jVnKg@mail.gmail.com>
 <CALOCmukOONqWe6p_mvZxhfxjYEP7QUuHRhj66yE-v_01UwNeTw@mail.gmail.com>
Message-ID: <CALOCmu=HCzK_xp7aHnwKbt=wLxrPLRPAW4JXUUjOaZAULgk31w@mail.gmail.com>

Le jeu. 23 mars 2023 ? 15:32, Sylvain Bauza <sbauza at redhat.com> a ?crit :

>
>
> Le jeu. 23 mars 2023 ? 14:51, Sofia Enriquez <senrique at redhat.com> a
> ?crit :
>
>>
>> Hi Sylvain,
>>
>> I hope you're doing well. Apologies for the delay in responding to your
>> previous message. I wanted to suggest a cross-project topic with Cinder
>> that involves adding support for NFS encryption.
>>
>> To complement the work on Cinder[3], I've proposed two patches [1][2] and
>> would appreciate any feedback you may have. I believe it would be
>> beneficial to discuss my approach during the PTG, but I'm open to
>> discussing it during a weekly meeting as well.
>>
>>
> Hey Sofia, I'm indeed well, thank you. I was actually considering to ping
> Rajat over IRC since we weren't having any cinder-related topics yet in the
> etherpad but I was almost sure that we would have some last-minute thoughts
> during the week.
> I'm quite OK with discussing your item into some cross-project session.
>
> I've looked at the etherpad and am considering the first slot on Wednesday
>> or the last slot on Thursday or Friday.
>>
>
> Cool, I'll arrange some common timeslot between teams with Rajat once he's
> done with the OpenInfra Live presentation, like me :-)
> -Sylvain
>
>
Just a quick wrap-up : Rajat and I agreed on a cross-project session
between Cinder and Nova on Thursday Mar30 1600UTC in the nova (diablo) room.
-Sylvain


>> Please let me know your thoughts.
>>
>> Thank you,
>> Sofia
>>
>> [1] https://review.opendev.org/c/openstack/nova/+/854030
>> [2] https://review.opendev.org/c/openstack/nova/+/870012
>> [3] https://review.opendev.org/q/topic:bp%252Fnfs-volume-encryption
>>
>> On Wed, Mar 22, 2023 at 9:46?AM Sylvain Bauza <sbauza at redhat.com> wrote:
>>
>>> Hey folks,
>>>
>>> As a reminder, the Nova community will discuss at the vPTG. You can see
>>> the topics we'll talk in https://etherpad.opendev.org/p/nova-bobcat-ptg
>>>
>>> Our agenda will be from Tuesday to Friday, everyday between 1300UTC and
>>> 1700UTC. Connection details are in the etherpad above, but you can also use
>>> PTGbot website : https://ptg.opendev.org/ptg.html (we'll use the diablo
>>> room for all the discussions)
>>>
>>> You can't stick around for 4 hours x 4 days ? Heh, no worries !
>>> If you (as an operator or a developer) want to engage with us (and we'd
>>> love this honestly), you have two possibilities :
>>>  - either you prefer to listen (and talk) to some topics you've seen in
>>> the agenda, and then add your IRC nick (details how to use IRC are
>>> explained by [1]) on the topics you want. Once we start to discuss about
>>> those topics, I'll ping the courtesy ping list of each topic on
>>> #openstack-nova. Just make sure you're around in the IRC channel.
>>>  - or you prefer to engage with us about some pain points or some
>>> feature requests, and then the right time is the Nova Operator Hour that
>>> will be on *Tuesday 1500UTC*. We have a specific etherpad for this session
>>> : https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova where
>>> you can preemptively add your thoughts or concerns.
>>>
>>> Anyway, we are eager to meet you all !
>>>
>>> Oh, last point, given we will be at the vPTG, next week's weekly meeting
>>> on Tuesday is CANCELLED. But I guess you'll see it either way if you lurk
>>> the #openstack-nova channel ;-)
>>>
>>> See you next week !
>>> -Sylvain
>>>
>>> [1]
>>> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032853.html
>>>
>>>
>>>
>>
>> --
>>
>> Sof?a Enriquez
>>
>> she/her
>>
>> Software Engineer
>>
>> Red Hat PnT <https://www.redhat.com>
>>
>> IRC: @enriquetaso
>> @RedHat <https://twitter.com/redhat>   Red Hat
>> <https://www.linkedin.com/company/red-hat>  Red Hat
>> <https://www.facebook.com/RedHatInc>
>> <https://www.redhat.com>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/2136f3ea/attachment-0001.htm>

From kennelson11 at gmail.com  Thu Mar 23 18:38:08 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Thu, 23 Mar 2023 13:38:08 -0500
Subject: Fwd: [PTG] Environmental Sustainability WG goes to the vPTG next week!
In-Reply-To: <CAJ6yrQg+_5EX7JgJNbg+RXbf2MyD+FjtsM+okVh5q+u-J0uN5Q@mail.gmail.com>
References: <CAJ6yrQg+_5EX7JgJNbg+RXbf2MyD+FjtsM+okVh5q+u-J0uN5Q@mail.gmail.com>
Message-ID: <CAJ6yrQjeW0t9jCaAybR8vF+fyJKfWy_afPkoL+NcSDe4wsqtCA@mail.gmail.com>

[Cross posting from the foundation ML]

Hello Everyone!

I have finally begun setting up the etherpad for our time during the PTG. I
have the Austin room reserved on Tuesday from 16-18 UTC.

I hope to see you all there and please add to the etherpad if there is a
related topic you think we should discuss!

Here is the etherpad: https://etherpad.opendev.org/p/march2023-ptg-env-sus

-Kendall Nelson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/a0a6b56b/attachment.htm>

From gmann at ghanshyammann.com  Thu Mar 23 19:07:25 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Thu, 23 Mar 2023 12:07:25 -0700
Subject: [all][tc][goal][policy] RBAC goal discussion in 2023.2 PTG
Message-ID: <1870fde64e3.dd20d8991092719.4320346159123288024@ghanshyammann.com>

Hello Everyone,

I have booked the Tuesday 17-18 UTC slot bexar room for the RBAC goal discussion.

You can add the topics/queries to be discussed in vPTG in the below etherpad.

- https://etherpad.opendev.org/p/rbac-2023.2-ptg


-gmann


From knikolla at bu.edu  Thu Mar 23 19:49:03 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Thu, 23 Mar 2023 19:49:03 +0000
Subject: [tc][ptl][ptg] TC + Community Leaders Interaction for 2023.2 vPTG
Message-ID: <CA8A94F5-F3CC-4827-BB2D-F091BAA37913@bu.edu>

Hello everyone,

On Monday, March 27, 2023 16.00UTC to 18.00UTC, the TC is organizing the Technical Committee & Community Leaders Interaction.

While this meeting is open to all, we would like to invite participation especially from PTL, TC, and SIG Chairs with the goal of gathering feedback and promoting collaboration.

This meeting has been quite successful in the last 2 PTG and I'm hoping we can continue this tradition.

If you have an item you'd like to propose for discussion please add it to the purposefully quite empty etherpad[0], where you can also find information and details on how to join.

0. https://etherpad.opendev.org/p/tc-leaders-interaction-2023-2

Hope to see you there,
Kristi Nikolla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/39458fd1/attachment.htm>

From knikolla at bu.edu  Thu Mar 23 21:00:08 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Thu, 23 Mar 2023 21:00:08 +0000
Subject: [tc] No TC weekly meeting next week + meeting time change
Message-ID: <823CC989-9363-4A9C-8FF6-38D860E6F806@bu.edu>

Hi all,

Due to the vPTG being held next week, the TC will not hold its regular weekly meeting that was scheduled for Wednesday, March 29, 2023.

Please note, that after the PTG, the new meeting time will be Tuesdays 18.00 UTC. More information and an ICS file can be found here https://meetings.opendev.org/#Technical_Committee_Meeting

Thank you,
Kristi Nikolla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/d250fd26/attachment.htm>

From ces.eduardo98 at gmail.com  Fri Mar 24 01:21:03 2023
From: ces.eduardo98 at gmail.com (Carlos Silva)
Date: Thu, 23 Mar 2023 22:21:03 -0300
Subject: [manila] Bobcat vPTG slots and topics
In-Reply-To: <CAE51gQJiXAq6LrC8-NQ2E+1PtDu+O7ftqjxM6oGX5XBARHfhEQ@mail.gmail.com>
References: <CAE51gQJiXAq6LrC8-NQ2E+1PtDu+O7ftqjxM6oGX5XBARHfhEQ@mail.gmail.com>
Message-ID: <CAE51gQJRoLB1QkcMUuAtvNYyug9=us+3BnZuP=ZcxC1u0Oto_g@mail.gmail.com>

Hello everyone! Just a quick update on this:

Time slots are assigned to the topics, please check it out in the official
PTG etherpad [3]. Please let me know if you would like to have a session
being moved around and I can work on accommodating it if feasible.

We will also have a cross-project discussion with Nova on Wednesday, 15 UTC
to talk about preventing shares deletion while it's attached to an instance.

I have also scheduled an operator hour, on Thursday at 16 UTC, in the same
room we will be meeting for the other sessions (Austin). We would like to
gather some feedback from operators and hear more from you on what we can
improve. Please join us!

[3] https://etherpad.opendev.org/p/manila-bobcat-ptg

Looking forward to the great discussions!
carloss

Em qui., 16 de mar. de 2023 ?s 11:26, Carlos Silva <ces.eduardo98 at gmail.com>
escreveu:

> Hello, Zorillas!
>
> PTG is right around the corner and I would like to remind you to please
> add the topics you would like to bring up during our sessions to the
> planning etherpad [1] until next Tuesday (Mar 21st).
>
> I have already allocated some slots for our sessions:
>
>    - Monday: 16:00 to 17:00 UTC
>    - Wednesday: 14:00 to 16:00 UTC
>    - Thursday: 14:00 to 16:00 UTC
>    - Friday: 14:00 to 17:00 UTC
>
>
> We will be meeting in the Austin room, you can access the meeting room
> through the PTG page [2].
>
> If you have a preference of date/time for your topic to be discussed,
> please let me know and I will try to accommodate it.
>
> Looking forward to meeting you!
>
> [1] https://etherpad.opendev.org/p/manila-bobcat-ptg-planning
> [2] https://ptg.opendev.org/ptg.html
>
> Thanks,
> carloss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/70fbfc3f/attachment-0001.htm>

From ces.eduardo98 at gmail.com  Fri Mar 24 01:25:31 2023
From: ces.eduardo98 at gmail.com (Carlos Silva)
Date: Thu, 23 Mar 2023 22:25:31 -0300
Subject: [manila] Cancelling March 30th IRC weekly meeting
Message-ID: <CAE51gQLKUWc=qopPxmqQ7Okr3r-=6y7U7Cdy2MeLLU7wE8PcMg@mail.gmail.com>

Hello Zorillas!

As mentioned in today's IRC meeting, we will be having several meetings at
the PTG next week, so we will not have our usual IRC meeting on Thursday
March 30th 15 UTC.

See you at the PTG!

Regards,
carloss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/8227188c/attachment.htm>

From sbaker at redhat.com  Fri Mar 24 02:16:23 2023
From: sbaker at redhat.com (Steve Baker)
Date: Fri, 24 Mar 2023 15:16:23 +1300
Subject: =?UTF-8?B?UmU6IOWbnuWkje+8miBbaXJvbmljXVF1ZXN0aW9ucyBhYm91dCB0aGUg?=
 =?UTF-8?Q?use_of_the_build_image_tool?=
In-Reply-To: <tencent_3C35C2E30995DE9F6A73BCD1@qq.com>
References: <tencent_3C35C2E30995DE9F6A73BCD1@qq.com>
Message-ID: <c33ae874-fc31-765d-3641-0bcd6d540f3c@redhat.com>

We are likely assuming the source image is compliant enough with the 
LSB[1] which references the Filesystem Hierarchy Standard[2] that 
specifies a /bin directory which includes the tar command[3].

Any improvement in LSB compliance would be beneficial for the UOS 
distribution.

[1] 
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/normativerefs.html#STD.FHS

[2] https://refspecs.linuxbase.org/fhs

[3] https://refspecs.linuxbase.org/FHS_3.0/fhs/ch03s04.html

On 23/03/23 21:41, ?? wrote:
>
> I have found the cause of the problem, which is because tar does not 
> exist in the working directory.
> I wonder if there are requirements on the base image when building 
> from the base image.
>
> ?Thank you.
>
> ------------------------------------------------------------------------
> ???????? 
> <https://work.weixin.qq.com/wework_admin/user/h5/qqmail_user_card/vc22c69ad124845845?from=myprofile>
>
>
> ----------???????----------
> ??<renliang at uniontech.com>??2023-03-23 ?? 15:01???
>
>
> Hi
> We are using diskimage-builder to make a custom image, There is a 
> problem in extract_image, The question is chroot: failed to run 
> command 'bin/tar': No such file or directory, I found the /bin/tar 
> file in the working directory /tmp/ tmp.00yldwe1xt. But errors are 
> still being reported. It's not clear if this is a custom image 
> problem, There are also requirements for custom images.
>
> ------------------------------------------------------------------------
> ???????? 
> <https://work.weixin.qq.com/wework_admin/user/h5/qqmail_user_card/vc22c69ad124845845?from=myprofile>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/518eee00/attachment.htm>

From nguyenhuukhoinw at gmail.com  Fri Mar 24 07:47:08 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Fri, 24 Mar 2023 14:47:08 +0700
Subject: [nova]host cpu reserve
In-Reply-To: <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com>
References: <CABAODRfk0-_j4s-GRQx7zZ=m9zL7C34d3TDGELxHPEzm3MkLcA@mail.gmail.com>
 <84c63ca4564b9d17285e81e1b722278db66a2803.camel@redhat.com>
 <CAPd_6At994Y1hZfu7VJSeTXaLSH-2nbaOzr0sdQySGD34kZcFQ@mail.gmail.com>
 <CABAODRczkmofK-UW73gv8BBsuwAkq7Cd8q2xg3E1=5GJ6hTNbA@mail.gmail.com>
 <CAPd_6AtCEz63RSYqCFG4XDnwFGQJOxkdxD_ajm2x+5Ak8Cm80Q@mail.gmail.com>
 <1cc331bf24541e551d6ad87d407a4c9a90a23665.camel@redhat.com>
Message-ID: <CABAODReZ4bTdjpynK2vF=CvsBz8t2CLYE-6aeEpcQe9P80RCMw@mail.gmail.com>

Hello/
After chasing links and your examples, I found this example is good for
beginners like me, I want to show that for previous people.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/configuring_the_compute_service_for_instance_creation/index#proc_configuring-compute-nodes-for-cpu-pinning_cpu-pinning

Thank you much.

Nguyen Huu Khoi


On Thu, Mar 23, 2023 at 9:54?PM Sean Mooney <smooney at redhat.com> wrote:

> On Thu, 2023-03-23 at 14:51 +0100, Dmitriy Rabotyagov wrote:
> > Just in case, you DO have options to control cpu and ram reservation
> > for the hypervisor. It's just more about that, that it's not the best
> > way to do it, especially if you're overcommitting, as things in real
> > life are more complicated then just defining the amount of reserved
> > CPUs.
> >
> > For example, if you have cpu_allocation_ratio set to 3, then you're
> > getting 3 times more CPUs to signup VMs then you actually have
> > (cores*sockets*threads*cpu_allocation_ratio). With that you really
> > can't set any decent amount of reserved CPUs that will 100% ensure
> > that hypervisor will be able to gain required resources at any given
> > time. So with that approach the only option is to disable cpu
> > overcommit, but even then you might get CPU in socket 1 fully utilized
> > which might have negative side-effects for the hypervisor.
> >
> > And based on that, as Sean has mentioned, you can tell nova to
> > explicitly exclude specific cores from being utilized, which will make
> > them reserved for the hypervisor.
> yep exactly.
> without geting into all the details the host reserved cpu option was added
> in the really early days and then vcpu_pin_set was
> added to adress the fact that the existing option didnt really work the
> way peopel wanted.
> it was later used for cpu pinning and we realise we wanted to have 2
> sepreate pools of cpus
>
> cpu_shared_set for shared core useed by floating vms (anything with out
> hw:cpu_policy=dedicated) and
> cpu_dedicated_set for explictlly pinned vms.
>
> in general using cpu_shared_set and cpu_dedicated_set is a much more
> intitive way to resver cores since you get to select exaction which
> cores can be used for nova vms.
>
> that allows you do the use systemd or other tools like taskset  to
> affiites nova-cpu or libvirtd or  sshd to run  on core that wont have vms
> that prevents the vms form staving those host process of cpu resouces.
> >
> > ??, 23 ???. 2023??. ? 14:35, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com
> >:
> > >
> > > Ok. I will try to understand it. I will let you know when I get it.
> > > Many thanks for your help. :)
> > >
> > > On Thu, Mar 23, 2023, 8:14 PM Dmitriy Rabotyagov <
> noonedeadpunk at gmail.com> wrote:
> > > >
> > > > Just to double check with you, given that you have
> > > > cpu_overcommit_ratio>1, 2 sockets and HT enabled, and each CPU has 32
> > > > physical cores, then it should be defined like:
> > > >
> > > > [compute]
> > > > cpu_shared_set="2-32,34-64,66-96,98-128"?
> > > >
> > > > > in general you shoudl reserve the first core on each cpu socket
> for the host os.
> > > > > if you use hyperthreading then both hyperthread of the first cpu
> core on each socket shoudl be omitted
> > > > > form the cpu_shared_set and cpu_dedicated_set
> > > >
> > > > ??, 23 ???. 2023??. ? 13:12, Sean Mooney <smooney at redhat.com>:
> > > > >
> > > > > generally you should not
> > > > > you can use it but the preferd way to do this is use
> > > > > cpu_shared_set and cpu_dedicated_set (in old releases you would
> have used vcpu_pin_set)
> > > > >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
> > > > >
> https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
> > > > >
> > > > > if you dont need cpu pinning just use cpu_share_set to spcify the
> cores that can be sued for floatign vms
> > > > > when you use cpu_shared_set and cpu_dedicated_set any cpu not
> specified are reseved for host use.
> > > > >
> > > > > https://that.guru/blog/cpu-resources/ and
> https://that.guru/blog/cpu-resources-redux/
> > > > >
> > > > > have some useful info but that mostly looking at it form a cpu
> pinning angel althoguh the secon one covers cpu_shared_set,
> > > > >
> > > > > the issue with usein
> > > > >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_cpus
> > > > >
> > > > > is that you have to multiple the number of cores that are
> resverved by the
> > > > >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.cpu_allocation_ratio
> > > > >
> > > > > which means if you decide to manage that via placement api by using
> > > > >
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.initial_cpu_allocation_ratio
> instead
> > > > > then you need to update your nova.conf to modify the reservationfi
> you change the allocation ratio.
> > > > >
> > > > > if instead you use cpu_shared_set and cpu_dedicated_set
> > > > > you are specifying exactly which cpus nova can use and the
> allocation ration nolonger needs to be conisderd.
> > > > >
> > > > > in general you shoudl reserve the first core on each cpu socket
> for the host os.
> > > > > if you use hyperthreading then both hyperthread of the first cpu
> core on each socket shoudl be omitted
> > > > > form the cpu_shared_set and cpu_dedicated_set
> > > > >
> > > > >
> > > > >
> > > > > On Thu, 2023-03-23 at 14:44 +0700, Nguy?n H?u Kh?i wrote:
> > > > > > Hello guys.
> > > > > > I am trying google for nova host cpu reserve to prevent host
> overload but I
> > > > > > cannot find any resource about it. Could you give me some
> information?
> > > > > > Thanks.
> > > > > > Nguyen Huu Khoi
> > > > >
> > > > >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/056e345d/attachment-0001.htm>

From ralonsoh at redhat.com  Fri Mar 24 09:46:57 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Fri, 24 Mar 2023 10:46:57 +0100
Subject: [neutron] Neutron driver's meeting cancelled
Message-ID: <CAECr9X59FdQ7w3Yya_+KP2MGVQ1jw42is1DXo2D4Mip8hSMXtw@mail.gmail.com>

Hello Neutrinos:

Today's drivers meeting is cancelled. The only topic in the agenda [1] was
agreed to be discussed during the PTG.

Join us next week in the PTG sessions! Here is the Neutron agenda [2] and
the PTG website [3]. We'll be on the Juno channel.

Have a nice weekend.

[1]https://wiki.openstack.org/wiki/Meetings/NeutronDrivers
[2]https://etherpad.opendev.org/p/neutron-bobcat-ptg
[3]https://ptg.opendev.org/ptg.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/ae5b10b2/attachment.htm>

From artem.goncharov at gmail.com  Fri Mar 24 10:35:58 2023
From: artem.goncharov at gmail.com (Artem Goncharov)
Date: Fri, 24 Mar 2023 11:35:58 +0100
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible collection
 OpenStack is now booked
Message-ID: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>

Hi all,

A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room.

Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli

Feel free to feel in topics you want to discuss

Cheers,
Artem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/42bfb671/attachment.htm>

From pierre at stackhpc.com  Fri Mar 24 12:08:22 2023
From: pierre at stackhpc.com (Pierre Riteau)
Date: Fri, 24 Mar 2023 13:08:22 +0100
Subject: [blazar][ptg] Bobcat PTG scheduling
In-Reply-To: <CA+ny2szWXDRG8SE0tDREnoUjiYDPauSN8ybXf5vGVQ8R+bZ_Gg@mail.gmail.com>
References: <CA+ny2szWXDRG8SE0tDREnoUjiYDPauSN8ybXf5vGVQ8R+bZ_Gg@mail.gmail.com>
Message-ID: <CA+ny2syZ4r9Hn6eV7P3mRWDeKBbCORaNnQ3SQ91LufmBTbWVkA@mail.gmail.com>

Due to a conflict with another PTG session, we have decided to start the
Blazar session one hour later. The new time is 1500 UTC to 1700 UTC.

On Fri, 10 Mar 2023 at 18:07, Pierre Riteau <pierre at stackhpc.com> wrote:

> Hello,
>
> The Bobcat PTG will happen online during the week starting March 27.
>
> As the Blazar project has done in the past, I suggest we meet on Thursday,
> but starting 1400 UTC rather than the usual 1500 of our biweekly meeting. I
> have booked two hours in the Bexar room. If you want to join, please let me
> know if this works for you.
>
> To summarise, the Blazar project will meet on Thursday March 30 from 1400
> UTC to 1600 UTC.
>
> We will prepare discussion topics on Etherpad.
>
> Cheers,
> Pierre Riteau (priteau)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/1c260575/attachment.htm>

From fungi at yuggoth.org  Fri Mar 24 15:17:55 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Fri, 24 Mar 2023 15:17:55 +0000
Subject: [security-sig] vPTG sessions 16:00 UTC Tuesday and Wednesday
Message-ID: <20230324151755.ke35cniiaiyx5ekm@yuggoth.org>

I've booked two hours on the vPTG schedule, Tuesday and Wednesday
16:00-17:00 UTC, in the hopes that interested parties will be able
to make at least one of those if not both. We'll use OpenDev's
Meetpad service:

    https://meetpad.opendev.org/march2023-ptg-os-security

I tried to avoid booking conflicts with Barbican and Keystone since
those are the two projects our participants traditionally have
obligatory conflicts from (also worked around the TC, Release and
Diversity WG sessions). I know folks from Ironic wanted to talk
about VMT topics, but our times overlap with some of theirs so we
can either try to talk about that in one of the non-overlapping
Ironic sessions or they can join us during ours, whichever works
better.

I've started adding some proposed discussion topics to the
corresponding pad, but anyone can feel free to throw ideas in there,
or just bring them up once we're on the call... I'm not one to stand
on ceremony:

    https://etherpad.opendev.org/p/march2023-ptg-os-security

Hopefully some people will be able to make it, but if you want other
times on the schedule too then let me know and I'll try to work
something out.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/2b405b9e/attachment.sig>

From christian.rohmann at inovex.de  Fri Mar 24 15:28:47 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Fri, 24 Mar 2023 16:28:47 +0100
Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder or
 Nova
Message-ID: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de>

Hello OpenStack-discuss,

I am currently looking into how one can provide fast ephemeral storage 
(backed by local NVME drives) to instances.


There seem to be two approaches and I would love to double-check my 
thoughts and assumptions.

1) *Via Nova* instance storage and the configurable "ephemeral" volume 
for a flavor

a) We currently use Ceph RBD als image_type 
(https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), 
so instance images are stored in Ceph, not locally on disk. I believe 
this setting will also cause ephemeral volumes (destination_local) to be 
placed on a RBD and not /var/lib/nova/instances?
Or is there a setting to set a different backend for local block devices 
providing "ephemeral" storage? So RBD for the root disk and a local LVM 
VG for ephemeral?

b) Will an ephemeral volume also be migrated when the instance is 
shutoff as with live-migration?
Or will there be an new volume created on the target host? I am asking 
because I want to avoid syncing 500G or 1T when it's only "ephemeral" 
and the instance will not expect any data on it on the next boot.

c) Is the size of the ephemeral storage for flavors a fixed size or just 
the upper bound for users? So if I limit this to 1T, will such a flavor 
always provision a block device with his size?
I suppose using LVM this will be thin provisioned anyways?


2) *Via Cinder*, running cinder-volume on each compute node to provide a 
volume type "ephemeral", using e.g. the LVM driver

a) While not really "ephemeral" and bound to the instance lifecycle, 
this would allow users to provision ephemeral volume just as they need them.
I suppose I could use backend specific quotas 
(https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) 
to
limit the number of size of such volumes?

b) Do I need to use the instance locality filter 
(https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) 
then?

c)? Since a volume will always be bound to a certain host, I suppose 
this will cause side-effects to instance scheduling?
With the volume remaining after an instance has been destroyed (beating 
the purpose of it being "ephemeral") I suppose any other instance 
attaching this volume will
be scheduling on this very machine? Is there any way around this? Maybe 
a driver setting to have such volumes "self-destroy" if they are not 
attached anymore?

d) Same question as with Nova: What happens when an instance is 
live-migrated?


Maybe others also have this use case and you can share your solution(s)?
Thanks and with regards


Christian


From abishop at redhat.com  Thu Mar 23 12:36:35 2023
From: abishop at redhat.com (Alan Bishop)
Date: Thu, 23 Mar 2023 05:36:35 -0700
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
Message-ID: <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>

On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> Is this bind not required for cinder_scheduler container?
>
> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
> I do not see this particular bind on the cinder scheduler containers on my
> controller nodes.
>

That is correct, because the scheduler does not access the ceph cluster.

Alan


> With regards,
> Swogat Pradhan
>
> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Cinder volume config:
>>
>> [tripleo_ceph]
>> volume_backend_name=tripleo_ceph
>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>> rbd_user=openstack
>> rbd_pool=volumes
>> rbd_flatten_volume_from_snapshot=False
>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>> report_discard_supported=True
>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>> rbd_cluster_name=dcn02
>>
>> Glance api config:
>>
>> [dcn02]
>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=dcn02 rbd glance store
>> [ceph]
>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>> rbd_store_user=openstack
>> rbd_store_pool=images
>> rbd_thin_provisioning=False
>> store_description=Default glance store backend.
>>
>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> I still have the same issue, I'm not sure what's left to try.
>>> All the pods are now in a healthy state, I am getting log entries 3 mins
>>> after I hit the create volume button in cinder-volume when I try to create
>>> a volume with an image.
>>> And the volumes are just stuck in creating state for more than 20 mins
>>> now.
>>>
>>> Cinder logs:
>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>> cinder-volume RPC version 3.17 as minimum service version.
>>> 2023-03-22 20:34:59.166 108 INFO
>>> cinder.volume.flows.manager.create_volume
>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>> specification: {'status': 'creating', 'volume_name':
>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>> 'owner_specified.openstack.object': 'images/cirros',
>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Hi Adam,
>>>>> The systems are in same LAN, in this case it seemed like the image was
>>>>> getting pulled from the central site which was caused due to an
>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>> directory, which seems to have been resolved after the changes i made to
>>>>> fix it.
>>>>>
>>>>> Right now the glance api podman is running in unhealthy state and the
>>>>> podman logs don't show any error whatsoever and when issued the command
>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>> site, which is why cinder is throwing an error stating:
>>>>>
>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>> finding address for
>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>> Unable to establish connection to
>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>> ECONNREFUSED',))
>>>>>
>>>>> Now i need to find out why the port is not listed as the glance
>>>>> service is running, which i am not sure how to find out.
>>>>>
>>>>
>>>> One other thing to investigate is whether your deployment includes this
>>>> patch [1]. If it does, then bear in mind
>>>> the glance-api service running at the edge site will be an "internal"
>>>> (non public facing) instance that uses port 9293
>>>> instead of 9292. You should familiarize yourself with the release note
>>>> [2].
>>>>
>>>> [1]
>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>> [2]
>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>
>>>> Alan
>>>>
>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>
>>>>>>> Update:
>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>
>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> [{'url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>
>>>>>>
>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>
>>>>>> John Fulton previously stated your cinder-volume service at the edge
>>>>>> site is not using the local ceph image store. Assuming you are deploying
>>>>>> GlanceApiEdge service [1], then the cinder-volume service should be
>>>>>> configured to use the local glance service [2]. You should check cinder's
>>>>>> glance_api_servers to confirm it's the edge site's glance service.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>> [2]
>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>
>>>>>> Alan
>>>>>>
>>>>>>
>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>   category=FutureWarning)
>>>>>>>
>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>   category=FutureWarning)
>>>>>>>
>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>> MB/s
>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>
>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>
>>>>>>> With regards,
>>>>>>> Swogat Pradhan
>>>>>>>
>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Jhon,
>>>>>>>> This seems to be an issue.
>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>>>>>> parameter was specified to the respective cluster names but the config
>>>>>>>> files were created in the name of ceph.conf and keyring was
>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>
>>>>>>>> Which created issues in glance as well as the naming convention of
>>>>>>>> the files didn't match the cluster names, so i had to manually rename the
>>>>>>>> central ceph conf file as such:
>>>>>>>>
>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>> total 16
>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>> ceph.client.openstack.keyring
>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>
>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
>>>>>>>> respective clusters in both dcn01 and dcn02.
>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>>>> for accessing central ceph cluster.
>>>>>>>>
>>>>>>>> glance multistore config:
>>>>>>>> [dcn02]
>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>> rbd_store_user=openstack
>>>>>>>> rbd_store_pool=images
>>>>>>>> rbd_thin_provisioning=False
>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>
>>>>>>>> [ceph_central]
>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>> rbd_store_user=openstack
>>>>>>>> rbd_store_pool=images
>>>>>>>> rbd_thin_provisioning=False
>>>>>>>> store_description=Default glance store backend.
>>>>>>>>
>>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Swogat Pradhan
>>>>>>>>
>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi,
>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>
>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>
>>>>>>>>> I hope this is not a production system since the mailing list now
>>>>>>>>> has
>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>
>>>>>>>>> The section that looks like this:
>>>>>>>>>
>>>>>>>>> [tripleo_ceph]
>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>> rbd_user=openstack
>>>>>>>>> rbd_pool=volumes
>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>> report_discard_supported=True
>>>>>>>>>
>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not
>>>>>>>>> the
>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure the
>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>
>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of
>>>>>>>>> the
>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key. This
>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>
>>>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>>>> sites
>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>> following
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>
>>>>>>>>>   John
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > Ceph Output:
>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>> > NAME                                       SIZE     PARENT  FMT
>>>>>>>>> PROT  LOCK
>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>>>>>>>>>       excl
>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>> 2  yes
>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>> 2  yes
>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>> 2  yes
>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>> 2  yes
>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>> 2  yes
>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>> 2  yes
>>>>>>>>> >
>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>> >
>>>>>>>>> > Attached the cinder config.
>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>> >
>>>>>>>>> > With regards,
>>>>>>>>> > Swogat Pradhan
>>>>>>>>> >
>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <johfulto at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>> config.
>>>>>>>>> >>
>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Update:
>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>> >>>
>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>>>>>> after importing from the central site.
>>>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>>>> time for the volume to get created.
>>>>>>>>> >>>>
>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>> getting created properly without any errors.
>>>>>>>>> >>>>
>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>> but getting checksum failed error.
>>>>>>>>> >>>>
>>>>>>>>> >>>> With regards,
>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>> launch the VM from volume.
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>> to the edge glance.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Try following this document and making the same observations
>>>>>>>>> in your
>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>> excl
>>>>>>>>> >>>>> $
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>>>> which is on
>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>> encountering
>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and
>>>>>>>>> be copied
>>>>>>>>> >>>>> to DCN sites before instances of those images are booted on
>>>>>>>>> DCN sites.
>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is booted,
>>>>>>>>> then the
>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>> will boot as
>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site has
>>>>>>>>> access to
>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>> booting of
>>>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>>>> advance,
>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN site
>>>>>>>>> and
>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>   John
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>> then update.
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > With regards,
>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> Update:
>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing
>>>>>>>>> down.
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon
>>>>>>>>> [-] privsep daemon starting
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon
>>>>>>>>> [-] privsep process running with uid/gid: 0/0
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>>> [-] privsep process running with capabilities (eff/prm/inh):
>>>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>>> [-] privsep daemon running as pid 185437
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>> running command.
>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>> template mentioned here ?:
>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Does your environment use different network interfaces
>>>>>>>>> for each of the networks? Or does it have a bond with everything on it?
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>> while spawning the instance.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
>>>>>>>>> So, based on that experience, from my perspective, is certainly sounds like
>>>>>>>>> some kind of network issue.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>>>> wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time
>>>>>>>>> ago in this thread:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>> >>>>> >>>> If there isn't any additional information in the debug
>>>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do
>>>>>>>>> that in a production system yet so be careful. I can think of two routes:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>>>>>> running, this will most likely impact client IO depending on your load.
>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>> rebuild.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>> a better advice.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
>>>>>>>>> not due to packet
>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>> checked when
>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>> stuck at spawning
>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>> sure if packet loss
>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>> identical between
>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through
>>>>>>>>> the tunnel?
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or
>>>>>>>>> 'cc' as i am not
>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
>>>>>>>>> list_policies -p
>>>>>>>>> >>>>> >>>> /
>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>>>>   priority
>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>>
>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>>>> down when i am
>>>>>>>>> >>>>> >>>> trying
>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>> spawning state and
>>>>>>>>> >>>>> >>>> then
>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>> edge sites.
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>> >>>>> >>>> >
>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>> directly, i am
>>>>>>>>> >>>>> >>>> checking
>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>> reply.
>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>> occurred.
>>>>>>>>> >>>>> >>>> >>
>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>> activities in the
>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>> site.*
>>>>>>>>> >>>>> >>>> >>
>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>> >>>>> >>>> >>
>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>> >>>>> >>>> >>
>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>> the details:
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>> >>>>> >>>> Started
>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>> >>>>> >>>> Started
>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>> >>>>> >>>> Started
>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>> >>>>> >>>> Started
>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times
>>>>>>>>> but the issue is
>>>>>>>>> >>>>> >>>> still
>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>> cluster_status
>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>> inter-node and CLI
>>>>>>>>> >>>>> >>>> tool
>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>> inter-node and CLI
>>>>>>>>> >>>>> >>>> tool
>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>> inter-node and CLI
>>>>>>>>> >>>>> >>>> tool
>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> ,
>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
>>>>>>>>> purpose:
>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> ,
>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>> >>>>> >>>> ,
>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>> purpose: HTTP API
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
>>>>>>>>> log.
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>> -] The reply
>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>> after 60 seconds
>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -] The reply
>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>> after 60 seconds
>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -] The reply
>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>> after 60 seconds
>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] Cache enabled
>>>>>>>>> >>>>> >>>> with
>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -]
>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>> exist, drop reply to
>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>> -] The reply
>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>> after 60 seconds
>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>> where i am trying to
>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>> down (openstack
>>>>>>>>> >>>>> >>>> compute
>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>> restart the nova
>>>>>>>>> >>>>> >>>> compute
>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>> nova.compute.manager
>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - -
>>>>>>>>> -] Running
>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>> >>>>> >>>> to
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] [instance:
>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>> successful on node
>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] [instance:
>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>> supplied device
>>>>>>>>> >>>>> >>>> name:
>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
>>>>>>>>> names
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>> nova.virt.block_device
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] [instance:
>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>> with volume
>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] Cache enabled
>>>>>>>>> >>>>> >>>> with
>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] Running
>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>> '--config-file',
>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>> '--privsep_context',
>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>> '--privsep_sock_path',
>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] Spawned new
>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] Process
>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
>>>>>>>>> command.
>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>> default] [instance:
>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>> image
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>>
>>>>>>>>>
>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/5e26f849/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Thu Mar 23 12:42:36 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 18:12:36 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
 <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
Message-ID: <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>

Hi Jhon,
Thank you for clarifying that.
Right now the cinder volume is stuck in *creating *state when adding image
as volume source.
But when creating an empty volume the volumes are getting created
successfully without any errors.

We are getting volume creation request in cinder-volume.log as such:
2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume
[req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
4160ce999a31485fa643aed0936dfef0 - - -] Volume
872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
specification: {'status': 'creating', 'volume_name':
'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
[{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
'553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
tzinfo=datetime.timezone.utc), 'locations': [{'url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
'metadata': {'store': 'ceph'}}, {'url':
'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
'metadata': {'store': 'dcn02'}}], 'direct_url':
'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
'owner_specified.openstack.object': 'images/cirros',
'owner_specified.openstack.sha256': ''}}, 'image_service':
<cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}

But there is nothing else after that and the volume doesn't even timeout,
it just gets stuck in creating state.
Can you advise what might be the issue here?
All the containers are in a healthy state now.

With regards,
Swogat Pradhan


On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop <abishop at redhat.com> wrote:

>
>
> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> Is this bind not required for cinder_scheduler container?
>>
>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>> I do not see this particular bind on the cinder scheduler containers on
>> my controller nodes.
>>
>
> That is correct, because the scheduler does not access the ceph cluster.
>
> Alan
>
>
>> With regards,
>> Swogat Pradhan
>>
>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Cinder volume config:
>>>
>>> [tripleo_ceph]
>>> volume_backend_name=tripleo_ceph
>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>> rbd_user=openstack
>>> rbd_pool=volumes
>>> rbd_flatten_volume_from_snapshot=False
>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>> report_discard_supported=True
>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>> rbd_cluster_name=dcn02
>>>
>>> Glance api config:
>>>
>>> [dcn02]
>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=dcn02 rbd glance store
>>> [ceph]
>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>> rbd_store_user=openstack
>>> rbd_store_pool=images
>>> rbd_thin_provisioning=False
>>> store_description=Default glance store backend.
>>>
>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> I still have the same issue, I'm not sure what's left to try.
>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>> create a volume with an image.
>>>> And the volumes are just stuck in creating state for more than 20 mins
>>>> now.
>>>>
>>>> Cinder logs:
>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>> 2023-03-22 20:34:59.166 108 INFO
>>>> cinder.volume.flows.manager.create_volume
>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>> specification: {'status': 'creating', 'volume_name':
>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> [{'url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> Hi Adam,
>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>> was getting pulled from the central site which was caused due to an
>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>> fix it.
>>>>>>
>>>>>> Right now the glance api podman is running in unhealthy state and the
>>>>>> podman logs don't show any error whatsoever and when issued the command
>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>
>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>> finding address for
>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>> Unable to establish connection to
>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>> ECONNREFUSED',))
>>>>>>
>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>> service is running, which i am not sure how to find out.
>>>>>>
>>>>>
>>>>> One other thing to investigate is whether your deployment includes
>>>>> this patch [1]. If it does, then bear in mind
>>>>> the glance-api service running at the edge site will be an "internal"
>>>>> (non public facing) instance that uses port 9293
>>>>> instead of 9292. You should familiarize yourself with the release note
>>>>> [2].
>>>>>
>>>>> [1]
>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>> [2]
>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>
>>>>> Alan
>>>>>
>>>>>
>>>>>> With regards,
>>>>>> Swogat Pradhan
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Update:
>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>
>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> [{'url':
>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>
>>>>>>>
>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>
>>>>>>> John Fulton previously stated your cinder-volume service at the edge
>>>>>>> site is not using the local ceph image store. Assuming you are deploying
>>>>>>> GlanceApiEdge service [1], then the cinder-volume service should be
>>>>>>> configured to use the local glance service [2]. You should check cinder's
>>>>>>> glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>> [2]
>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>
>>>>>>> Alan
>>>>>>>
>>>>>>>
>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>   category=FutureWarning)
>>>>>>>>
>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>   category=FutureWarning)
>>>>>>>>
>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>> MB/s
>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>
>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Swogat Pradhan
>>>>>>>>
>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Jhon,
>>>>>>>>> This seems to be an issue.
>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
>>>>>>>>> parameter was specified to the respective cluster names but the config
>>>>>>>>> files were created in the name of ceph.conf and keyring was
>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>
>>>>>>>>> Which created issues in glance as well as the naming convention of
>>>>>>>>> the files didn't match the cluster names, so i had to manually rename the
>>>>>>>>> central ceph conf file as such:
>>>>>>>>>
>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>> total 16
>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>
>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>>>>> for accessing central ceph cluster.
>>>>>>>>>
>>>>>>>>> glance multistore config:
>>>>>>>>> [dcn02]
>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>> rbd_store_user=openstack
>>>>>>>>> rbd_store_pool=images
>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>
>>>>>>>>> [ceph_central]
>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>> rbd_store_user=openstack
>>>>>>>>> rbd_store_pool=images
>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Swogat Pradhan
>>>>>>>>>
>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi,
>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>
>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>
>>>>>>>>>> I hope this is not a production system since the mailing list now
>>>>>>>>>> has
>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>
>>>>>>>>>> The section that looks like this:
>>>>>>>>>>
>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>> rbd_user=openstack
>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>
>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not
>>>>>>>>>> the
>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure
>>>>>>>>>> the
>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>
>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID of
>>>>>>>>>> the
>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>> This
>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>
>>>>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>>>>> sites
>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>> following
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>
>>>>>>>>>>   John
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > Ceph Output:
>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>> 2        excl
>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>> 2  yes
>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>> 2  yes
>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>> 2  yes
>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>> 2  yes
>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>> 2  yes
>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>> 2  yes
>>>>>>>>>> >
>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>> 2
>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>> 2
>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>> >
>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>> >
>>>>>>>>>> > With regards,
>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>> >
>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <
>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>> config.
>>>>>>>>>> >>
>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Update:
>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it takes
>>>>>>>>>> around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images created
>>>>>>>>>> after importing from the central site.
>>>>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>>>>> time for the volume to get created.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> With regards,
>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>>> >
>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>> launch the VM from volume.
>>>>>>>>>> >>>>> >
>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>>> to the edge glance.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>> observations in your
>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>> excl
>>>>>>>>>> >>>>> $
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>>>>> which is on
>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>> encountering
>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and
>>>>>>>>>> be copied
>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted on
>>>>>>>>>> DCN sites.
>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>> booted, then the
>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>>> will boot as
>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>> has access to
>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>> booting of
>>>>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>>>>> advance,
>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN site
>>>>>>>>>> and
>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>   John
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> >
>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>> then update.
>>>>>>>>>> >>>>> >
>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>> >>>>> >
>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>>> >>
>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is showing
>>>>>>>>>> down.
>>>>>>>>>> >>>>> >>
>>>>>>>>>> >>>>> >>
>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>> (lacp=active).
>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon
>>>>>>>>>> [-] privsep daemon starting
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon
>>>>>>>>>> [-] privsep process running with uid/gid: 0/0
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>>>> [-] privsep process running with capabilities (eff/prm/inh):
>>>>>>>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon
>>>>>>>>>> [-] privsep daemon running as pid 185437
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>> running command.
>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>>>>>>>>> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>> template mentioned here ?:
>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>>
>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Does your environment use different network interfaces
>>>>>>>>>> for each of the networks? Or does it have a bond with everything on it?
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>> while spawning the instance.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some time
>>>>>>>>>> ago in this thread:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the debug
>>>>>>>>>> logs I probably would start "tearing down" rabbitmq. I didn't have to do
>>>>>>>>>> that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
>>>>>>>>>> running, this will most likely impact client IO depending on your load.
>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>> rebuild.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>> a better advice.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>> but not due to packet
>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>> checked when
>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>>> stuck at spawning
>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>> sure if packet loss
>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>> identical between
>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through
>>>>>>>>>> the tunnel?
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to' or
>>>>>>>>>> 'cc' as i am not
>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>>>>>     priority
>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>>>>> down when i am
>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>> spawning state and
>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>> edge sites.
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>>> directly, i am
>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>>> reply.
>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>> occurred.
>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>>> activities in the
>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>> site.*
>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>>> the details:
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times
>>>>>>>>>> but the issue is
>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>> cluster_status
>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>> inter-node and CLI
>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>> API
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>> inter-node and CLI
>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>> API
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>> inter-node and CLI
>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>> API
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>> clustering, purpose:
>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>> purpose: HTTP API
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>> api log.
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - -
>>>>>>>>>> -] The reply
>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>>> after 60 seconds
>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -] The reply
>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>>> after 60 seconds
>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -] The reply
>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>>> after 60 seconds
>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>> nova.cache_utils
>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] Cache enabled
>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -]
>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>> exist, drop reply to
>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - -
>>>>>>>>>> -] The reply
>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>>> after 60 seconds
>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>>> where i am trying to
>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>>> down (openstack
>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>> restart the nova
>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>> nova.compute.manager
>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - -
>>>>>>>>>> - -] Running
>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>> nova.compute.claims
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] [instance:
>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>> successful on node
>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] [instance:
>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>>> supplied device
>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
>>>>>>>>>> names
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>> nova.virt.block_device
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] [instance:
>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>> with volume
>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>> nova.cache_utils
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] Cache enabled
>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] Running
>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>> '--config-file',
>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>> '--privsep_context',
>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] Spawned new
>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] Process
>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>> running command.
>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>> default] [instance:
>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>>> image
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>> >>>>
>>>>>>>>>> >>>>>
>>>>>>>>>>
>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/6db98edf/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Thu Mar 23 12:43:31 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 18:13:31 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
 <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
 <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
Message-ID: <CAH0LXPpbzN-_QaZqsv3SqYCPr3DqnnUy5F=K9WiKHuQ8Ps402g@mail.gmail.com>

Hi Alan,
My bad i didn't see it was you who replied.
Thanks for clarifying my doubt.

On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Jhon,
> Thank you for clarifying that.
> Right now the cinder volume is stuck in *creating *state when adding
> image as volume source.
> But when creating an empty volume the volumes are getting created
> successfully without any errors.
>
> We are getting volume creation request in cinder-volume.log as such:
> 2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume
> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}
>
> But there is nothing else after that and the volume doesn't even timeout,
> it just gets stuck in creating state.
> Can you advise what might be the issue here?
> All the containers are in a healthy state now.
>
> With regards,
> Swogat Pradhan
>
>
> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop <abishop at redhat.com> wrote:
>
>>
>>
>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Is this bind not required for cinder_scheduler container?
>>>
>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>>> I do not see this particular bind on the cinder scheduler containers on
>>> my controller nodes.
>>>
>>
>> That is correct, because the scheduler does not access the ceph cluster.
>>
>> Alan
>>
>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Cinder volume config:
>>>>
>>>> [tripleo_ceph]
>>>> volume_backend_name=tripleo_ceph
>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>> rbd_user=openstack
>>>> rbd_pool=volumes
>>>> rbd_flatten_volume_from_snapshot=False
>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>>> report_discard_supported=True
>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>>> rbd_cluster_name=dcn02
>>>>
>>>> Glance api config:
>>>>
>>>> [dcn02]
>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=dcn02 rbd glance store
>>>> [ceph]
>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=Default glance store backend.
>>>>
>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> I still have the same issue, I'm not sure what's left to try.
>>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>>> create a volume with an image.
>>>>> And the volumes are just stuck in creating state for more than 20 mins
>>>>> now.
>>>>>
>>>>> Cinder logs:
>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>>> 2023-03-22 20:34:59.166 108 INFO
>>>>> cinder.volume.flows.manager.create_volume
>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>>> specification: {'status': 'creating', 'volume_name':
>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Adam,
>>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>>> was getting pulled from the central site which was caused due to an
>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>>> fix it.
>>>>>>>
>>>>>>> Right now the glance api podman is running in unhealthy state and
>>>>>>> the podman logs don't show any error whatsoever and when issued the command
>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>>
>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>>> finding address for
>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>> Unable to establish connection to
>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>>> ECONNREFUSED',))
>>>>>>>
>>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>>> service is running, which i am not sure how to find out.
>>>>>>>
>>>>>>
>>>>>> One other thing to investigate is whether your deployment includes
>>>>>> this patch [1]. If it does, then bear in mind
>>>>>> the glance-api service running at the edge site will be an "internal"
>>>>>> (non public facing) instance that uses port 9293
>>>>>> instead of 9292. You should familiarize yourself with the release
>>>>>> note [2].
>>>>>>
>>>>>> [1]
>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>>> [2]
>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>>
>>>>>> Alan
>>>>>>
>>>>>>
>>>>>>> With regards,
>>>>>>> Swogat Pradhan
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Update:
>>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> [{'url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>>
>>>>>>>>
>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>>
>>>>>>>> John Fulton previously stated your cinder-volume service at the
>>>>>>>> edge site is not using the local ceph image store. Assuming you are
>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should
>>>>>>>> be configured to use the local glance service [2]. You should check
>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>>> [2]
>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>>
>>>>>>>> Alan
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>   category=FutureWarning)
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>   category=FutureWarning)
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>>> MB/s
>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>>
>>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Swogat Pradhan
>>>>>>>>>
>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jhon,
>>>>>>>>>> This seems to be an issue.
>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the
>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the
>>>>>>>>>> config files were created in the name of ceph.conf and keyring was
>>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>>
>>>>>>>>>> Which created issues in glance as well as the naming convention
>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename
>>>>>>>>>> the central ceph conf file as such:
>>>>>>>>>>
>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>>> total 16
>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>>
>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>>>>>> for accessing central ceph cluster.
>>>>>>>>>>
>>>>>>>>>> glance multistore config:
>>>>>>>>>> [dcn02]
>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>>
>>>>>>>>>> [ceph_central]
>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With regards,
>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi,
>>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>>
>>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>>
>>>>>>>>>>> I hope this is not a production system since the mailing list
>>>>>>>>>>> now has
>>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>>
>>>>>>>>>>> The section that looks like this:
>>>>>>>>>>>
>>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>> rbd_user=openstack
>>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>>
>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not
>>>>>>>>>>> the
>>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure
>>>>>>>>>>> the
>>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>>
>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID
>>>>>>>>>>> of the
>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so
>>>>>>>>>>> that
>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>>> This
>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>>
>>>>>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>>>>>> sites
>>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>>> following
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>>
>>>>>>>>>>>   John
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>> > Ceph Output:
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>>> 2        excl
>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> >
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>>>   2
>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>>>   2
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>>> >
>>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>>> >
>>>>>>>>>>> > With regards,
>>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>>> >
>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <
>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>>> config.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Update:
>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it
>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images
>>>>>>>>>>> created after importing from the central site.
>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>>>>>> time for the volume to get created.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> With regards,
>>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>>> launch the VM from volume.
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>>>> to the edge glance.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>>> observations in your
>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>>> excl
>>>>>>>>>>> >>>>> $
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>>>>>> which is on
>>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>>> encountering
>>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and
>>>>>>>>>>> be copied
>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted
>>>>>>>>>>> on DCN sites.
>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>>> booted, then the
>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>>>> will boot as
>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>>> has access to
>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>>> booting of
>>>>>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>>>>>> advance,
>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN
>>>>>>>>>>> site and
>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>   John
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>>> then update.
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is
>>>>>>>>>>> showing down.
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>>> (lacp=active).
>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities
>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>>> running command.
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO
>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266
>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>>> template mentioned here ?:
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Does your environment use different network
>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything
>>>>>>>>>>> on it?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>>> while spawning the instance.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some
>>>>>>>>>>> time ago in this thread:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the
>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to
>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit
>>>>>>>>>>> is running, this will most likely impact client IO depending on your load.
>>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>>> rebuild.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>>> a better advice.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>>> but not due to packet
>>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>>> checked when
>>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>>>> stuck at spawning
>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>>> sure if packet loss
>>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>>> identical between
>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through
>>>>>>>>>>> the tunnel?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to'
>>>>>>>>>>> or 'cc' as i am not
>>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>>>>>>     priority
>>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>>>>>> down when i am
>>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>>> spawning state and
>>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>>> edge sites.
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>>>> directly, i am
>>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>>>> reply.
>>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>>> occurred.
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>>>> activities in the
>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>>> site.*
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>>>> the details:
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times
>>>>>>>>>>> but the issue is
>>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>>> cluster_status
>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>>> clustering, purpose:
>>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>>> purpose: HTTP API
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>>> api log.
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>>>> where i am trying to
>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>>>> down (openstack
>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>>> restart the nova
>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>>> nova.compute.manager
>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - -
>>>>>>>>>>> - -] Running
>>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>>> nova.compute.claims
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>>> successful on node
>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>>>> supplied device
>>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied
>>>>>>>>>>> dev names
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>>> nova.virt.block_device
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>>> with volume
>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Running
>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>>> '--config-file',
>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>>> '--privsep_context',
>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Spawned new
>>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Process
>>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>>> running command.
>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>>>> image
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>>
>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/198be6ff/attachment-0001.htm>

From swogatpradhan22 at gmail.com  Thu Mar 23 16:01:16 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 23 Mar 2023 21:31:16 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
 <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
 <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
Message-ID: <CAH0LXPruScWnyea=BbGcXo5qRBNup6sGLpsxX3_t_JLh7jDwPA@mail.gmail.com>

Hi,
Can someone please help me identify the issue here?
Latest cinder-volume logs from dcn02:
(ATTACHED)

The volume is stuck in creating state.

With regards,
Swogat Pradhan

On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi Jhon,
> Thank you for clarifying that.
> Right now the cinder volume is stuck in *creating *state when adding
> image as volume source.
> But when creating an empty volume the volumes are getting created
> successfully without any errors.
>
> We are getting volume creation request in cinder-volume.log as such:
> 2023-03-23 12:34:40.152 108 INFO cinder.volume.flows.manager.create_volume
> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}
>
> But there is nothing else after that and the volume doesn't even timeout,
> it just gets stuck in creating state.
> Can you advise what might be the issue here?
> All the containers are in a healthy state now.
>
> With regards,
> Swogat Pradhan
>
>
> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop <abishop at redhat.com> wrote:
>
>>
>>
>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Is this bind not required for cinder_scheduler container?
>>>
>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>>> I do not see this particular bind on the cinder scheduler containers on
>>> my controller nodes.
>>>
>>
>> That is correct, because the scheduler does not access the ceph cluster.
>>
>> Alan
>>
>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Cinder volume config:
>>>>
>>>> [tripleo_ceph]
>>>> volume_backend_name=tripleo_ceph
>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>> rbd_user=openstack
>>>> rbd_pool=volumes
>>>> rbd_flatten_volume_from_snapshot=False
>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>>> report_discard_supported=True
>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>>> rbd_cluster_name=dcn02
>>>>
>>>> Glance api config:
>>>>
>>>> [dcn02]
>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=dcn02 rbd glance store
>>>> [ceph]
>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>> rbd_store_user=openstack
>>>> rbd_store_pool=images
>>>> rbd_thin_provisioning=False
>>>> store_description=Default glance store backend.
>>>>
>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> I still have the same issue, I'm not sure what's left to try.
>>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>>> create a volume with an image.
>>>>> And the volumes are just stuck in creating state for more than 20 mins
>>>>> now.
>>>>>
>>>>> Cinder logs:
>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>>> 2023-03-22 20:34:59.166 108 INFO
>>>>> cinder.volume.flows.manager.create_volume
>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>>> specification: {'status': 'creating', 'volume_name':
>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Adam,
>>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>>> was getting pulled from the central site which was caused due to an
>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>>> fix it.
>>>>>>>
>>>>>>> Right now the glance api podman is running in unhealthy state and
>>>>>>> the podman logs don't show any error whatsoever and when issued the command
>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>>
>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>>> finding address for
>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>> Unable to establish connection to
>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>>> ECONNREFUSED',))
>>>>>>>
>>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>>> service is running, which i am not sure how to find out.
>>>>>>>
>>>>>>
>>>>>> One other thing to investigate is whether your deployment includes
>>>>>> this patch [1]. If it does, then bear in mind
>>>>>> the glance-api service running at the edge site will be an "internal"
>>>>>> (non public facing) instance that uses port 9293
>>>>>> instead of 9292. You should familiarize yourself with the release
>>>>>> note [2].
>>>>>>
>>>>>> [1]
>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>>> [2]
>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>>
>>>>>> Alan
>>>>>>
>>>>>>
>>>>>>> With regards,
>>>>>>> Swogat Pradhan
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Update:
>>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> [{'url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>>
>>>>>>>>
>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>>
>>>>>>>> John Fulton previously stated your cinder-volume service at the
>>>>>>>> edge site is not using the local ceph image store. Assuming you are
>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should
>>>>>>>> be configured to use the local glance service [2]. You should check
>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>>> [2]
>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>>
>>>>>>>> Alan
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>   category=FutureWarning)
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>   category=FutureWarning)
>>>>>>>>>
>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>>> MB/s
>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>>
>>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Swogat Pradhan
>>>>>>>>>
>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jhon,
>>>>>>>>>> This seems to be an issue.
>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the
>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the
>>>>>>>>>> config files were created in the name of ceph.conf and keyring was
>>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>>
>>>>>>>>>> Which created issues in glance as well as the naming convention
>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename
>>>>>>>>>> the central ceph conf file as such:
>>>>>>>>>>
>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>>> total 16
>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>>
>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are the
>>>>>>>>>> files used to access dcn02 ceph cluster and ceph_central* files are used in
>>>>>>>>>> for accessing central ceph cluster.
>>>>>>>>>>
>>>>>>>>>> glance multistore config:
>>>>>>>>>> [dcn02]
>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>>
>>>>>>>>>> [ceph_central]
>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With regards,
>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi,
>>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>>
>>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>>
>>>>>>>>>>> I hope this is not a production system since the mailing list
>>>>>>>>>>> now has
>>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>>
>>>>>>>>>>> The section that looks like this:
>>>>>>>>>>>
>>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>> rbd_user=openstack
>>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>>
>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and not
>>>>>>>>>>> the
>>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure
>>>>>>>>>>> the
>>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>>
>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID
>>>>>>>>>>> of the
>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so
>>>>>>>>>>> that
>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>>> This
>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>>
>>>>>>>>>>> The documentation describes how to configure the central and DCN
>>>>>>>>>>> sites
>>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>>> following
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>>
>>>>>>>>>>>   John
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>> > Ceph Output:
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>>> 2        excl
>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>>> 2  yes
>>>>>>>>>>> >
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>>>   2
>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>>>   2
>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>>> >
>>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>>> >
>>>>>>>>>>> > With regards,
>>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>>> >
>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <
>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run a
>>>>>>>>>>> command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>>> config.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Update:
>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it
>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images
>>>>>>>>>>> created after importing from the central site.
>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a long
>>>>>>>>>>> time for the volume to get created.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> With regards,
>>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>>> launch the VM from volume.
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>>>> to the edge glance.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>>> observations in your
>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>>> excl
>>>>>>>>>>> >>>>> $
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the image
>>>>>>>>>>> which is on
>>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>>> encountering
>>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance and
>>>>>>>>>>> be copied
>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted
>>>>>>>>>>> on DCN sites.
>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>>> booted, then the
>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>>>> will boot as
>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>>> has access to
>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>>> booting of
>>>>>>>>>>> >>>>> the image will take time because it has not been copied in
>>>>>>>>>>> advance,
>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN
>>>>>>>>>>> site and
>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>   John
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>>> then update.
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>>> >>>>> >
>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is
>>>>>>>>>>> showing down.
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >>
>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>>> (lacp=active).
>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities
>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>>> running command.
>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO
>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266
>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>>> template mentioned here ?:
>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Does your environment use different network
>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything
>>>>>>>>>>> on it?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>>> while spawning the instance.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some
>>>>>>>>>>> time ago in this thread:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the
>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to
>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit
>>>>>>>>>>> is running, this will most likely impact client IO depending on your load.
>>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>>> rebuild.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>>> a better advice.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>>> but not due to packet
>>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>>> checked when
>>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>>>> stuck at spawning
>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>>> sure if packet loss
>>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>>> identical between
>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss through
>>>>>>>>>>> the tunnel?
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to'
>>>>>>>>>>> or 'cc' as i am not
>>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
>>>>>>>>>>>     priority
>>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes
>>>>>>>>>>> down when i am
>>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>>> spawning state and
>>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>>> edge sites.
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>>>> directly, i am
>>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>>>> reply.
>>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>>> occurred.
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>>>> activities in the
>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>>> site.*
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>>>> the details:
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times
>>>>>>>>>>> but the issue is
>>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>>> cluster_status
>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
>>>>>>>>>>> inter-node and CLI
>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>> API
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>>> clustering, purpose:
>>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>>> purpose: HTTP API
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>>> api log.
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -]
>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>> - -] The reply
>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>>>> after 60 seconds
>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>>>> where i am trying to
>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>>>> down (openstack
>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>>> restart the nova
>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>>> nova.compute.manager
>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - -
>>>>>>>>>>> - -] Running
>>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>>> nova.compute.claims
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>>> successful on node
>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>>>> supplied device
>>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied
>>>>>>>>>>> dev names
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>>> nova.virt.block_device
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>>> with volume
>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Running
>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>>> '--config-file',
>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>>> '--privsep_context',
>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Spawned new
>>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] Process
>>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>>> running command.
>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>> default] [instance:
>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>>>> image
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>>
>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/53ab27f0/attachment-0001.htm>
-------------- next part --------------
2023-03-23 12:36:23.852 108 INFO cinder.volume.flows.manager.create_volume [req-e196679a-cf81-447d-9dc9-0b1b397b0849 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume d714560e-aca9-4fac-8d2d-f8be86d58c2e: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-d714560e-aca9-4fac-8d2d-f8be86d58c2e', 'volume_size': 1, 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at': datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap', 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file', 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': <cinder.image.glance.GlanceImageService object at 0x7f2cb5c10e10>}
2023-03-23 15:49:45.182 108 INFO cinder.volume.flows.manager.create_volume [req-c7be83db-2ddc-413b-a176-cf36adc9435e b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - - -] Volume 72a23a26-1b04-49fc-86d1-453d9b021b68: being created as image with specification: {'status': 'creating', 'volume_name': 'volume-72a23a26-1b04-49fc-86d1-453d9b021b68', 'volume_size': 50, 'image_id': '2735662a-ad29-497b-b7c8-edb235594769', 'image_location': ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://cec7cdfd-3667-57f1-afaf-5dfca9b0e975/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn01'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros', 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public', 'size': 16338944, 'virtual_size': 117440512, 'status': 'active', 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False, 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0', 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value': '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46', 'id': '2735662a-ad29-497b-b7c8-edb235594769', 'created_at': datetime.datetime(2023, 3, 23, 15, 28, 41, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 3, 23, 15, 34, 44, tzinfo=datetime.timezone.utc), 'locations': [{'url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'ceph'}}, {'url': 'rbd://cec7cdfd-3667-57f1-afaf-5dfca9b0e975/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn01'}}, {'url': 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'metadata': {'store': 'dcn02'}}], 'direct_url': 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/2735662a-ad29-497b-b7c8-edb235594769/snap', 'tags': [], 'file': '/v2/images/2735662a-ad29-497b-b7c8-edb235594769/file', 'stores': 'ceph,dcn01,dcn02', 'properties': {'os_glance_failed_import': '', 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '', 'owner_specified.openstack.object': 'images/cirros', 'owner_specified.openstack.sha256': ''}}, 'image_service': <cinder.image.glance.GlanceImageService object at 0x7f2cb60442e8>}

From abishop at redhat.com  Thu Mar 23 17:05:30 2023
From: abishop at redhat.com (Alan Bishop)
Date: Thu, 23 Mar 2023 10:05:30 -0700
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CAH0LXPruScWnyea=BbGcXo5qRBNup6sGLpsxX3_t_JLh7jDwPA@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
 <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
 <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
 <CAH0LXPruScWnyea=BbGcXo5qRBNup6sGLpsxX3_t_JLh7jDwPA@mail.gmail.com>
Message-ID: <CADO3vb52Re5Unyrg6=-Mzt6-GkCUQSuUkSWDDi4H29sdKrEW3A@mail.gmail.com>

On Thu, Mar 23, 2023 at 9:01?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> Can someone please help me identify the issue here?
> Latest cinder-volume logs from dcn02:
> (ATTACHED)
>

It's really not possible to analyze what's happening with just one or two
log entries. Do you have
debug logs enabled? One thing I noticed is the glance image's disk_format
is qcow2. You should
use "raw" images with ceph RBD.

Alan


>
> The volume is stuck in creating state.
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Jhon,
>> Thank you for clarifying that.
>> Right now the cinder volume is stuck in *creating *state when adding
>> image as volume source.
>> But when creating an empty volume the volumes are getting created
>> successfully without any errors.
>>
>> We are getting volume creation request in cinder-volume.log as such:
>> 2023-03-23 12:34:40.152 108 INFO
>> cinder.volume.flows.manager.create_volume
>> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
>> specification: {'status': 'creating', 'volume_name':
>> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
>> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
>> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
>> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>> 'owner_specified.openstack.object': 'images/cirros',
>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>> <cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}
>>
>> But there is nothing else after that and the volume doesn't even timeout,
>> it just gets stuck in creating state.
>> Can you advise what might be the issue here?
>> All the containers are in a healthy state now.
>>
>> With regards,
>> Swogat Pradhan
>>
>>
>> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop <abishop at redhat.com> wrote:
>>
>>>
>>>
>>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Hi,
>>>> Is this bind not required for cinder_scheduler container?
>>>>
>>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>>>> I do not see this particular bind on the cinder scheduler containers on
>>>> my controller nodes.
>>>>
>>>
>>> That is correct, because the scheduler does not access the ceph cluster.
>>>
>>> Alan
>>>
>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Cinder volume config:
>>>>>
>>>>> [tripleo_ceph]
>>>>> volume_backend_name=tripleo_ceph
>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>> rbd_user=openstack
>>>>> rbd_pool=volumes
>>>>> rbd_flatten_volume_from_snapshot=False
>>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>>>> report_discard_supported=True
>>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>>>> rbd_cluster_name=dcn02
>>>>>
>>>>> Glance api config:
>>>>>
>>>>> [dcn02]
>>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=dcn02 rbd glance store
>>>>> [ceph]
>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=Default glance store backend.
>>>>>
>>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> I still have the same issue, I'm not sure what's left to try.
>>>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>>>> create a volume with an image.
>>>>>> And the volumes are just stuck in creating state for more than 20
>>>>>> mins now.
>>>>>>
>>>>>> Cinder logs:
>>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>>>> 2023-03-22 20:34:59.166 108 INFO
>>>>>> cinder.volume.flows.manager.create_volume
>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>>>
>>>>>> With regards,
>>>>>> Swogat Pradhan
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Adam,
>>>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>>>> was getting pulled from the central site which was caused due to an
>>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>>>> fix it.
>>>>>>>>
>>>>>>>> Right now the glance api podman is running in unhealthy state and
>>>>>>>> the podman logs don't show any error whatsoever and when issued the command
>>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>>>
>>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>>>> finding address for
>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>> Unable to establish connection to
>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>>>> ECONNREFUSED',))
>>>>>>>>
>>>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>>>> service is running, which i am not sure how to find out.
>>>>>>>>
>>>>>>>
>>>>>>> One other thing to investigate is whether your deployment includes
>>>>>>> this patch [1]. If it does, then bear in mind
>>>>>>> the glance-api service running at the edge site will be an
>>>>>>> "internal" (non public facing) instance that uses port 9293
>>>>>>> instead of 9292. You should familiarize yourself with the release
>>>>>>> note [2].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>>>> [2]
>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>>>
>>>>>>> Alan
>>>>>>>
>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Swogat Pradhan
>>>>>>>>
>>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Update:
>>>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> [{'url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>>>
>>>>>>>>> John Fulton previously stated your cinder-volume service at the
>>>>>>>>> edge site is not using the local ceph image store. Assuming you are
>>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should
>>>>>>>>> be configured to use the local glance service [2]. You should check
>>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>>>> [2]
>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>>>
>>>>>>>>> Alan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>>>> MB/s
>>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>>>
>>>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>>>
>>>>>>>>>> With regards,
>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jhon,
>>>>>>>>>>> This seems to be an issue.
>>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the
>>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the
>>>>>>>>>>> config files were created in the name of ceph.conf and keyring was
>>>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>>>
>>>>>>>>>>> Which created issues in glance as well as the naming convention
>>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename
>>>>>>>>>>> the central ceph conf file as such:
>>>>>>>>>>>
>>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>>>> total 16
>>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>>>
>>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are
>>>>>>>>>>> the files used to access dcn02 ceph cluster and ceph_central* files are
>>>>>>>>>>> used in for accessing central ceph cluster.
>>>>>>>>>>>
>>>>>>>>>>> glance multistore config:
>>>>>>>>>>> [dcn02]
>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>>>
>>>>>>>>>>> [ceph_central]
>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> With regards,
>>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <johfulto at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>>>
>>>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>>>
>>>>>>>>>>>> I hope this is not a production system since the mailing list
>>>>>>>>>>>> now has
>>>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>>>
>>>>>>>>>>>> The section that looks like this:
>>>>>>>>>>>>
>>>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>>> rbd_user=openstack
>>>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>>>
>>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and
>>>>>>>>>>>> not the
>>>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure
>>>>>>>>>>>> the
>>>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>>>
>>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID
>>>>>>>>>>>> of the
>>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so
>>>>>>>>>>>> that
>>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>>>> This
>>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>>>
>>>>>>>>>>>> The documentation describes how to configure the central and
>>>>>>>>>>>> DCN sites
>>>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>>>> following
>>>>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>>>
>>>>>>>>>>>>   John
>>>>>>>>>>>>
>>>>>>>>>>>> >
>>>>>>>>>>>> > Ceph Output:
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>>>> 2        excl
>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> >
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>>>>   2
>>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>>>>   2
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>>>> >
>>>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>>>> >
>>>>>>>>>>>> > With regards,
>>>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <
>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run
>>>>>>>>>>>> a command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>>>> config.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Update:
>>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it
>>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images
>>>>>>>>>>>> created after importing from the central site.
>>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a
>>>>>>>>>>>> long time for the volume to get created.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> With regards,
>>>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>>>> launch the VM from volume.
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>>>>> to the edge glance.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>>>> observations in your
>>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf
>>>>>>>>>>>> --keyring
>>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>>>>   excl
>>>>>>>>>>>> >>>>> $
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the
>>>>>>>>>>>> image which is on
>>>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>>>> encountering
>>>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance
>>>>>>>>>>>> and be copied
>>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted
>>>>>>>>>>>> on DCN sites.
>>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>>>> booted, then the
>>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>>>>> will boot as
>>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>>>> has access to
>>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>>>> booting of
>>>>>>>>>>>> >>>>> the image will take time because it has not been copied
>>>>>>>>>>>> in advance,
>>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN
>>>>>>>>>>>> site and
>>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>   John
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>>>> then update.
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is
>>>>>>>>>>>> showing down.
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>>>> (lacp=active).
>>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities
>>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>>>> running command.
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266
>>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>>>> template mentioned here ?:
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Does your environment use different network
>>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything
>>>>>>>>>>>> on it?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>>>> while spawning the instance.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <
>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some
>>>>>>>>>>>> time ago in this thread:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the
>>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to
>>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit
>>>>>>>>>>>> is running, this will most likely impact client IO depending on your load.
>>>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>>>> rebuild.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>>>> a better advice.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>> >:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>>>> but not due to packet
>>>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>>>> checked when
>>>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>>>>> stuck at spawning
>>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>>>> sure if packet loss
>>>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>>>> identical between
>>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss
>>>>>>>>>>>> through the tunnel?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>> >:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to'
>>>>>>>>>>>> or 'cc' as i am not
>>>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to
>>>>>>>>>>>> definition      priority
>>>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only
>>>>>>>>>>>> goes down when i am
>>>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>>>> spawning state and
>>>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>>>> edge sites.
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>>>>> directly, i am
>>>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>>>>> reply.
>>>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>>>> occurred.
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>>>>> activities in the
>>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>>>> site.*
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>>>>> the details:
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest
>>>>>>>>>>>> ]:
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple
>>>>>>>>>>>> times but the issue is
>>>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>>>> cluster_status
>>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>>>> clustering, purpose:
>>>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>>>> purpose: HTTP API
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>>>> api log.
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>>>>> where i am trying to
>>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>>>>> down (openstack
>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>>>> restart the nova
>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>>>> nova.compute.manager
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - -
>>>>>>>>>>>> - - -] Running
>>>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>>>> nova.compute.claims
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>>>> successful on node
>>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>>>>> supplied device
>>>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied
>>>>>>>>>>>> dev names
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>>>> nova.virt.block_device
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>>>> with volume
>>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at
>>>>>>>>>>>> /dev/vda
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Running
>>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>>>> '--config-file',
>>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>>>> '--privsep_context',
>>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Spawned new
>>>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities
>>>>>>>>>>>> (eff/prm/inh):
>>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Process
>>>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>>>> running command.
>>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>>>>> image
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/1d3ac151/attachment-0001.htm>

From adetoyeanointing15 at gmail.com  Fri Mar 24 00:06:29 2023
From: adetoyeanointing15 at gmail.com (Anointing Adetoye)
Date: Fri, 24 Mar 2023 01:06:29 +0100
Subject: OUTREACHY INITIAL APPLICANT
Message-ID: <CAFdD8arM1AxSCvn0koc7TKVG=ZmnyHNXr1SJOa-o+d0e1Qd-Cg@mail.gmail.com>

Good day
I have been unable to make relevant contributions because I don't know how
to start contributing or converse on the channel. Also, I am a golang
developer that is just picking C and I would really like to use this
opportunity to build my knowledge in C.

I will appreciate it if i get pointed to how to make contribution and also
utilize the few weeks remaining to make relevant contribution to the
project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/f5ffd619/attachment.htm>

From lsofia.enriquez at gmail.com  Fri Mar 24 10:42:53 2023
From: lsofia.enriquez at gmail.com (Sofia Enriquez)
Date: Fri, 24 Mar 2023 10:42:53 +0000
Subject: OUTREACHY INITIAL APPLICANT
In-Reply-To: <CAFdD8arM1AxSCvn0koc7TKVG=ZmnyHNXr1SJOa-o+d0e1Qd-Cg@mail.gmail.com>
References: <CAFdD8arM1AxSCvn0koc7TKVG=ZmnyHNXr1SJOa-o+d0e1Qd-Cg@mail.gmail.com>
Message-ID: <CAGvpwmMhY6Y1-ex+JudkqgzE97D-BpsApFqfhaQhZZ9Q1mFGwg@mail.gmail.com>

Hi Anointing,

I hope this email finds you well. There's projects for Cinder, Manila, and
Glance. Could you please let me know which project you are interested in?

I wanted to address the concern you mentioned about being unable to make
relevant contributions. I understand that you are struggling with knowing
how to start contributing or chatting on the channel. I recommend reviewing
the how to contribute section on the Outreachy portal, where you will find
a detailed explanation that may be helpful.

Additionally, I appreciate your interest in using this opportunity to build
your knowledge in C, but I wanted to clarify that if you are referring to
the Extending Automated Validation of API-ref project, it explicitly
requires Python development. However, if you are interested in other
projects, please feel free to contact the mentor as soon as possible so
they can provide further assistance.

Thank you for your time and consideration. I look forward to hearing back
from you soon.

Best regards, Sofia


El vie, 24 mar 2023 a la(s) 00:06, Anointing Adetoye (
adetoyeanointing15 at gmail.com) escribi?:

> Good day
> I have been unable to make relevant contributions because I don't know how
> to start contributing or converse on the channel. Also, I am a golang
> developer that is just picking C and I would really like to use this
> opportunity to build my knowledge in C.
>
> I will appreciate it if i get pointed to how to make contribution and also
> utilize the few weeks remaining to make relevant contribution to the
> project.
>


-- 
Sofia Enriquez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/6bbf5200/attachment.htm>

From hiromu.asahina.az at hco.ntt.co.jp  Fri Mar 24 15:38:21 2023
From: hiromu.asahina.az at hco.ntt.co.jp (Hiromu Asahina)
Date: Sat, 25 Mar 2023 00:38:21 +0900
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
 <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
Message-ID: <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>

As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this 
discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic 
members, please kindly reply convenient timeslots.

[1] https://ptg.opendev.org/ptg.html

Thanks,

Hiromu Asahina

On 2023/03/22 20:01, Hiromu Asahina wrote:
> Thanks!
> 
> I look forward to your reply.
> 
> On 2023/03/22 1:29, Julia Kreger wrote:
>> No worries!
>>
>> I think that time works for me. I'm not sure it will work for 
>> everyone, but
>> I can proxy information back to the whole of the ironic project as we 
>> also
>> have the question of this functionality listed for our Operator Hour in
>> order to help ironic gauge interest.
>>
>> -Julia
>>
>> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>
>>> I apologize that I couldn't reply before the Ironic meeting on Monday.
>>>
>>> I need one slot to discuss this topic.
>>>
>>> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
>>> 27)[1,2] works for them. Does this work for Ironic? I understand not all
>>> Ironic members will join this discussion, so I hope we can arrange a
>>> convenient date for you two at least and, hopefully, for those
>>> interested in this topic.
>>>
>>> [1]
>>>
>>> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
>>> [2] https://ptg.opendev.org/ptg.html
>>>
>>> Thanks,
>>> Hiromu Asahina
>>>
>>> On 2023/03/17 23:29, Julia Kreger wrote:
>>>> I'm not sure how many Ironic contributors would be the ones to attend a
>>>> discussion, in part because this is disjointed from the items they need
>>> to
>>>> focus on. It is much more of a "big picture" item for those of us 
>>>> who are
>>>> leaders in the project.
>>>>
>>>> I think it would help to understand how much time you expect the
>>> discussion
>>>> to take to determine a path forward and how we can collaborate. Ironic
>>> has
>>>> a huge number of topics we want to discuss during the PTG, and I 
>>>> suspect
>>>> our team meeting on Monday next week should yield more 
>>>> interest/awareness
>>>> as well as an amount of time for each topic which will aid us in
>>> scheduling.
>>>>
>>>> If you can let us know how long, then I think we can figure out when 
>>>> the
>>>> best day/time will be.
>>>>
>>>> Thanks!
>>>>
>>>> -Julia
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
>>>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> I'd like to decide the time slot for this topic.
>>>>> I just checked PTG schedule [1].
>>>>>
>>>>> We have the following time slots. Which one is convenient to gether?
>>>>> (I didn't get reply but I listed Barbican, as its cores are almost the
>>>>> same as Keystone)
>>>>>
>>>>> Mon, 27:
>>>>>
>>>>> - 14 (keystone)
>>>>> - 15 (keystone)
>>>>>
>>>>> Tue, 28
>>>>>
>>>>> - 13 (barbican)
>>>>> - 14 (keystone, ironic)
>>>>> - 15 (keysonte, ironic)
>>>>> - 16 (ironic)
>>>>>
>>>>> Wed, 29
>>>>>
>>>>> - 13 (ironic)
>>>>> - 14 (keystone, ironic)
>>>>> - 15 (keystone, ironic)
>>>>> - 21 (ironic)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> [1] https://ptg.opendev.org/ptg.html
>>>>>
>>>>> Hiromu Asahina
>>>>>
>>>>>
>>>>> On 2023/02/11 1:41, Jay Faulkner wrote:
>>>>>> I think it's safe to say the Ironic community would be very 
>>>>>> invested in
>>>>>> such an effort. Let's make sure the time chosen for vPTG with this is
>>>>> such
>>>>>> that Ironic contributors can attend as well.
>>>>>>
>>>>>> Thanks,
>>>>>> Jay Faulkner
>>>>>> Ironic PTL
>>>>>>
>>>>>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
>>>>>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>>>>>
>>>>>>> Hello Everyone,
>>>>>>>
>>>>>>> Recently, Tacker and Keystone have been working together on a new
>>>>> Keystone
>>>>>>> Middleware that can work with external authentication
>>>>>>> services, such as Keycloak. The code has already been submitted [1],
>>> but
>>>>>>> we want to make this middleware a generic plugin that works
>>>>>>> with as many OpenStack services as possible. To that end, we would
>>> like
>>>>> to
>>>>>>> hear from other projects with similar use cases
>>>>>>> (especially Ironic and Barbican, which run as standalone 
>>>>>>> services). We
>>>>>>> will make a time slot to discuss this topic at the next vPTG.
>>>>>>> Please contact me if you are interested and available to 
>>>>>>> participate.
>>>>>>>
>>>>>>> [1]
>>> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>>>>>>>
>>>>>>> -- 
>>>>>>> Hiromu Asahina
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> -- 
>>>>> ?-------------------------------------?
>>>>> ????? NTT Network Innovation Center
>>>>> ??????? Hiromu Asahina
>>>>> ???? -------------------------------------
>>>>> ????? 3-9-11, Midori-cho, Musashino-shi
>>>>> ??????? Tokyo 180-8585, Japan
>>>>> Phone: +81-422-59-7008
>>>>> Email: hiromu.asahina.az at hco.ntt.co.jp
>>>>> ?-------------------------------------?
>>>>>
>>>>>
>>>>
>>>
>>> -- 
>>> ?-------------------------------------?
>>> ???? NTT Network Innovation Center
>>> ?????? Hiromu Asahina
>>> ??? -------------------------------------
>>> ???? 3-9-11, Midori-cho, Musashino-shi
>>> ?????? Tokyo 180-8585, Japan
>>> Phone: +81-422-59-7008
>>> Email: hiromu.asahina.az at hco.ntt.co.jp
>>> ?-------------------------------------?
>>>
>>>
>>
> 

-- 
?-------------------------------------?
    NTT Network Innovation Center
      Hiromu Asahina
   -------------------------------------
    3-9-11, Midori-cho, Musashino-shi
      Tokyo 180-8585, Japan
? Phone: +81-422-59-7008
? Email: hiromu.asahina.az at hco.ntt.co.jp
?-------------------------------------?


From james.slagle at gmail.com  Fri Mar 24 16:48:14 2023
From: james.slagle at gmail.com (James Slagle)
Date: Fri, 24 Mar 2023 12:48:14 -0400
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>
References: <CAHV77z9uS9YnRZke75En-pAS=n0zy-SBF0hDEurz-myL98mXGQ@mail.gmail.com>
 <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com>
 <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
 <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
 <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>
Message-ID: <CAHV77z8jNWBGxK3Jex80LCELwz9h1pd0tsYodYzO+vnztT86qA@mail.gmail.com>

On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann <gmann at ghanshyammann.com> wrote:
>
> Hi James, TripleO team,
>
> Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL
> as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects
> - https://etherpad.opendev.org/p/2023.2-leaderless

It doesn't look like we have any other volunteers, so I'm willing to
do it. At the last PTG, we discussed and it was agreed that we would
switch TripleO to the distributed project leadership model. However,
given the drastic change in our focus, I personally think it makes
more sense to continue with the PTL model for train/wallaby stable
maintenance. I would ask any project members to reply here with +1/-1
to indicate agreement.

[1] https://governance.openstack.org/tc/resolutions/20200803-distributed-project-leadership.html

-- 
-- James Slagle
--


From dwilde at redhat.com  Fri Mar 24 16:54:50 2023
From: dwilde at redhat.com (Dave Wilde)
Date: Fri, 24 Mar 2023 11:54:50 -0500
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware
 Feature Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
 <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
 <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>
Message-ID: <fec06931-a49a-4199-a1c4-5840e4609785@Spark>

I?m happy to book an additional time slot(s) specifically for this discussion if something other than what we currently have works better for everyone. Please let me know.

/Dave
On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina <hiromu.asahina.az at hco.ntt.co.jp>, wrote:
> As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this
> discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic
> members, please kindly reply convenient timeslots.
>
> [1] https://ptg.opendev.org/ptg.html
>
> Thanks,
>
> Hiromu Asahina
>
> On 2023/03/22 20:01, Hiromu Asahina wrote:
> > Thanks!
> >
> > I look forward to your reply.
> >
> > On 2023/03/22 1:29, Julia Kreger wrote:
> > > No worries!
> > >
> > > I think that time works for me. I'm not sure it will work for
> > > everyone, but
> > > I can proxy information back to the whole of the ironic project as we
> > > also
> > > have the question of this functionality listed for our Operator Hour in
> > > order to help ironic gauge interest.
> > >
> > > -Julia
> > >
> > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
> > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > >
> > > > I apologize that I couldn't reply before the Ironic meeting on Monday.
> > > >
> > > > I need one slot to discuss this topic.
> > > >
> > > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
> > > > 27)[1,2] works for them. Does this work for Ironic? I understand not all
> > > > Ironic members will join this discussion, so I hope we can arrange a
> > > > convenient date for you two at least and, hopefully, for those
> > > > interested in this topic.
> > > >
> > > > [1]
> > > >
> > > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
> > > > [2] https://ptg.opendev.org/ptg.html
> > > >
> > > > Thanks,
> > > > Hiromu Asahina
> > > >
> > > > On 2023/03/17 23:29, Julia Kreger wrote:
> > > > > I'm not sure how many Ironic contributors would be the ones to attend a
> > > > > discussion, in part because this is disjointed from the items they need
> > > > to
> > > > > focus on. It is much more of a "big picture" item for those of us
> > > > > who are
> > > > > leaders in the project.
> > > > >
> > > > > I think it would help to understand how much time you expect the
> > > > discussion
> > > > > to take to determine a path forward and how we can collaborate. Ironic
> > > > has
> > > > > a huge number of topics we want to discuss during the PTG, and I
> > > > > suspect
> > > > > our team meeting on Monday next week should yield more
> > > > > interest/awareness
> > > > > as well as an amount of time for each topic which will aid us in
> > > > scheduling.
> > > > >
> > > > > If you can let us know how long, then I think we can figure out when
> > > > > the
> > > > > best day/time will be.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > -Julia
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
> > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > > > >
> > > > > > Thank you for your reply.
> > > > > >
> > > > > > I'd like to decide the time slot for this topic.
> > > > > > I just checked PTG schedule [1].
> > > > > >
> > > > > > We have the following time slots. Which one is convenient to gether?
> > > > > > (I didn't get reply but I listed Barbican, as its cores are almost the
> > > > > > same as Keystone)
> > > > > >
> > > > > > Mon, 27:
> > > > > >
> > > > > > - 14 (keystone)
> > > > > > - 15 (keystone)
> > > > > >
> > > > > > Tue, 28
> > > > > >
> > > > > > - 13 (barbican)
> > > > > > - 14 (keystone, ironic)
> > > > > > - 15 (keysonte, ironic)
> > > > > > - 16 (ironic)
> > > > > >
> > > > > > Wed, 29
> > > > > >
> > > > > > - 13 (ironic)
> > > > > > - 14 (keystone, ironic)
> > > > > > - 15 (keystone, ironic)
> > > > > > - 21 (ironic)
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > [1] https://ptg.opendev.org/ptg.html
> > > > > >
> > > > > > Hiromu Asahina
> > > > > >
> > > > > >
> > > > > > On 2023/02/11 1:41, Jay Faulkner wrote:
> > > > > > > I think it's safe to say the Ironic community would be very
> > > > > > > invested in
> > > > > > > such an effort. Let's make sure the time chosen for vPTG with this is
> > > > > > such
> > > > > > > that Ironic contributors can attend as well.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jay Faulkner
> > > > > > > Ironic PTL
> > > > > > >
> > > > > > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > > > > > >
> > > > > > > > Hello Everyone,
> > > > > > > >
> > > > > > > > Recently, Tacker and Keystone have been working together on a new
> > > > > > Keystone
> > > > > > > > Middleware that can work with external authentication
> > > > > > > > services, such as Keycloak. The code has already been submitted [1],
> > > > but
> > > > > > > > we want to make this middleware a generic plugin that works
> > > > > > > > with as many OpenStack services as possible. To that end, we would
> > > > like
> > > > > > to
> > > > > > > > hear from other projects with similar use cases
> > > > > > > > (especially Ironic and Barbican, which run as standalone
> > > > > > > > services). We
> > > > > > > > will make a time slot to discuss this topic at the next vPTG.
> > > > > > > > Please contact me if you are interested and available to
> > > > > > > > participate.
> > > > > > > >
> > > > > > > > [1]
> > > > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
> > > > > > > >
> > > > > > > > --
> > > > > > > > Hiromu Asahina
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > ?-------------------------------------?
> > > > > > ????? NTT Network Innovation Center
> > > > > > ??????? Hiromu Asahina
> > > > > > ???? -------------------------------------
> > > > > > ????? 3-9-11, Midori-cho, Musashino-shi
> > > > > > ??????? Tokyo 180-8585, Japan
> > > > > > Phone: +81-422-59-7008
> > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp
> > > > > > ?-------------------------------------?
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > ?-------------------------------------?
> > > > ???? NTT Network Innovation Center
> > > > ?????? Hiromu Asahina
> > > > ??? -------------------------------------
> > > > ???? 3-9-11, Midori-cho, Musashino-shi
> > > > ?????? Tokyo 180-8585, Japan
> > > > Phone: +81-422-59-7008
> > > > Email: hiromu.asahina.az at hco.ntt.co.jp
> > > > ?-------------------------------------?
> > > >
> > > >
> > >
> >
>
> --
> ?-------------------------------------?
> NTT Network Innovation Center
> Hiromu Asahina
> -------------------------------------
> 3-9-11, Midori-cho, Musashino-shi
> Tokyo 180-8585, Japan
> ? Phone: +81-422-59-7008
> ? Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/6ee36ba6/attachment-0001.htm>

From gmann at ghanshyammann.com  Fri Mar 24 17:00:17 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Fri, 24 Mar 2023 10:00:17 -0700
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <CAHV77z8jNWBGxK3Jex80LCELwz9h1pd0tsYodYzO+vnztT86qA@mail.gmail.com>
References: <CAHV77z9uS9YnRZke75En-pAS=n0zy-SBF0hDEurz-myL98mXGQ@mail.gmail.com>
 <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com>
 <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
 <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
 <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>
 <CAHV77z8jNWBGxK3Jex80LCELwz9h1pd0tsYodYzO+vnztT86qA@mail.gmail.com>
Message-ID: <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com>

 ---- On Fri, 24 Mar 2023 09:48:14 -0700  James Slagle  wrote --- 
 > On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann gmann at ghanshyammann.com> wrote:
 > >
 > > Hi James, TripleO team,
 > >
 > > Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL
 > > as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects
 > > - https://etherpad.opendev.org/p/2023.2-leaderless
 > 
 > It doesn't look like we have any other volunteers, so I'm willing to
 > do it. At the last PTG, we discussed and it was agreed that we would
 > switch TripleO to the distributed project leadership model. However,
 > given the drastic change in our focus, I personally think it makes
 > more sense to continue with the PTL model for train/wallaby stable
 > maintenance. I would ask any project members to reply here with +1/-1
 > to indicate agreement.

Thanks, James, for volunteering. I think if you were thinking of the DPL model, then it will
work better than PTL here. 1. You might get more people helping you with a distributed amount
of work 2. we do not need to have PTL nomination/appointment work in every cycle until you
want to maintain train/wallaby.

If it is ok, let's move it to the DPL model, which satisfies the governance requirement. 

-gmann

 > 
 > [1] https://governance.openstack.org/tc/resolutions/20200803-distributed-project-leadership.html
 > 
 > -- 
 > -- James Slagle
 > --
 > 
 > 


From kennelson11 at gmail.com  Fri Mar 24 17:40:24 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Fri, 24 Mar 2023 12:40:24 -0500
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
Message-ID: <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>

Super annoying request, but can we do earlier in the week? The sessions for
sdk have 100% overlap with the TC which I was planning on attending :/

And I am very very sorry if I missed sharing an opinion on when would be
good to meet.

-Kendall

On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov <artem.goncharov at gmail.com>
wrote:

> Hi all,
>
> A bit late, but still - I have booked a 3 hours slot during PTG on Friday
> 14:00-17:00 UTC. This will follow publiccloud room discussion so I think
> some people and outcomes will follow directly into our room.
>
> Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
>
> Feel free to feel in topics you want to discuss
>
> Cheers,
> Artem
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/f3bd2961/attachment.htm>

From james.slagle at gmail.com  Fri Mar 24 17:51:10 2023
From: james.slagle at gmail.com (James Slagle)
Date: Fri, 24 Mar 2023 13:51:10 -0400
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com>
References: <CAHV77z9uS9YnRZke75En-pAS=n0zy-SBF0hDEurz-myL98mXGQ@mail.gmail.com>
 <1863235f907.129908e6f91780.6498006605997562838@ghanshyammann.com>
 <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
 <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
 <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>
 <CAHV77z8jNWBGxK3Jex80LCELwz9h1pd0tsYodYzO+vnztT86qA@mail.gmail.com>
 <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com>
Message-ID: <CAHV77z9oG0xK_Dc-Vpo0m5nyfCNn1ASdRy1-j_ia8vC1SFeJMg@mail.gmail.com>

On Fri, Mar 24, 2023 at 1:00?PM Ghanshyam Mann <gmann at ghanshyammann.com> wrote:
>
>  ---- On Fri, 24 Mar 2023 09:48:14 -0700  James Slagle  wrote ---
>  > On Wed, Mar 22, 2023 at 1:09?PM Ghanshyam Mann gmann at ghanshyammann.com> wrote:
>  > >
>  > > Hi James, TripleO team,
>  > >
>  > > Is there anyone volunteering to be PTL for train and wallaby maintenance? Please note we need PTL
>  > > as it is deprecated (wallaby is maintained), and we have tripleo in leaderless projects
>  > > - https://etherpad.opendev.org/p/2023.2-leaderless
>  >
>  > It doesn't look like we have any other volunteers, so I'm willing to
>  > do it. At the last PTG, we discussed and it was agreed that we would
>  > switch TripleO to the distributed project leadership model. However,
>  > given the drastic change in our focus, I personally think it makes
>  > more sense to continue with the PTL model for train/wallaby stable
>  > maintenance. I would ask any project members to reply here with +1/-1
>  > to indicate agreement.
>
> Thanks, James, for volunteering. I think if you were thinking of the DPL model, then it will
> work better than PTL here. 1. You might get more people helping you with a distributed amount
> of work 2. we do not need to have PTL nomination/appointment work in every cycle until you
> want to maintain train/wallaby.
>
> If it is ok, let's move it to the DPL model, which satisfies the governance requirement.

That WFM, and I think we can move forward with DPL since that is what
the team previously agreed upon. I'll work on some governance patches.

-- 
-- James Slagle
--


From artem.goncharov at gmail.com  Fri Mar 24 18:12:52 2023
From: artem.goncharov at gmail.com (Artem Goncharov)
Date: Fri, 24 Mar 2023 19:12:52 +0100
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
 <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
Message-ID: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>

Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear.

What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day)

Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway.


> On 24. Mar 2023, at 18:40, Kendall Nelson <kennelson11 at gmail.com> wrote:
> 
> Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/
> 
> And I am very very sorry if I missed sharing an opinion on when would be good to meet. 
> 
> -Kendall 
> 
> On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov <artem.goncharov at gmail.com <mailto:artem.goncharov at gmail.com>> wrote:
>> Hi all,
>> 
>> A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room.
>> 
>> Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
>> 
>> Feel free to feel in topics you want to discuss
>> 
>> Cheers,
>> Artem

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/4007a829/attachment.htm>

From fungi at yuggoth.org  Fri Mar 24 18:29:28 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Fri, 24 Mar 2023 18:29:28 +0000
Subject: [TripleO] Last maintained release of TripleO is Wallaby
In-Reply-To: <CAHV77z9oG0xK_Dc-Vpo0m5nyfCNn1ASdRy1-j_ia8vC1SFeJMg@mail.gmail.com>
References: <18632eaeb95.dd9a848198332.5696118532504201240@ghanshyammann.com>
 <186566e5712.11ccb8961578219.1604377158557956676@ghanshyammann.com>
 <CAHV77z9CQ=VUq1AietAQss6AJeYW6X8O=nPQ+riGuz6nk1VLrA@mail.gmail.com>
 <1867a38ae8c.10fd1fc731059880.6373796653920277020@ghanshyammann.com>
 <CAHV77z9de2z1S+6wHbo5gfk9XiHn+vM_sjXjhjhCdSwii=gGKw@mail.gmail.com>
 <186cd4ef50b.11d7db1bb135166.9097393815439653484@ghanshyammann.com>
 <1870a4ba83f.d9b070a6992321.8690096551273849522@ghanshyammann.com>
 <CAHV77z8jNWBGxK3Jex80LCELwz9h1pd0tsYodYzO+vnztT86qA@mail.gmail.com>
 <18714905bb1.d1bc0fd014378.3061357046190419249@ghanshyammann.com>
 <CAHV77z9oG0xK_Dc-Vpo0m5nyfCNn1ASdRy1-j_ia8vC1SFeJMg@mail.gmail.com>
Message-ID: <20230324182927.iwr3usifxvxhogen@yuggoth.org>

On 2023-03-24 13:51:10 -0400 (-0400), James Slagle wrote:
[...]
> That WFM, and I think we can move forward with DPL since that is what
> the team previously agreed upon. I'll work on some governance patches.

If it helps at all, you can probably even list the same person for
all the liaison positions, the main difference from PTL being that
liaisons don't have terms that expire.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/54f260e0/attachment.sig>

From gmann at ghanshyammann.com  Fri Mar 24 18:37:09 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Fri, 24 Mar 2023 11:37:09 -0700
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
 <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
 <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>
Message-ID: <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com>

Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 UTC slot does not overlap with TC.

- https://etherpad.opendev.org/p/tc-2023-2-ptg#L18

-gmann

 ---- On Fri, 24 Mar 2023 11:12:52 -0700  Artem Goncharov  wrote --- 
 > Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear.
 > What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day)
 > Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway.
 > 
 > 
 > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com> wrote:
 > Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/
 > 
 > And I am very very sorry if I missed sharing an opinion on when would be good to meet.?
 > -Kendall?
 > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov artem.goncharov at gmail.com> wrote:
 > Hi all,
 > A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room.
 > Etherpad is there:?https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
 > Feel free to feel in topics you want to discuss
 > Cheers,Artem
 > 


From smooney at redhat.com  Fri Mar 24 18:50:56 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 24 Mar 2023 18:50:56 +0000
Subject: [nova][cinder] Providing ephemeral storage to instances -
 Cinder or Nova
In-Reply-To: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de>
References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de>
Message-ID: <c510ca65d6cfc89ba975c701b1eda9f01a018097.camel@redhat.com>

i responed in line but just a waring this is a usecase we ahve heard before.
there is no simple option im afraid and there are many many sharp edges
and severl littel know features/limitatiosn that your question puts you right in the
middel of.

On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote:
> Hello OpenStack-discuss,
> 
> I am currently looking into how one can provide fast ephemeral storage 
> (backed by local NVME drives) to instances.
> 
> 
> There seem to be two approaches and I would love to double-check my 
> thoughts and assumptions.
> 
> 1) *Via Nova* instance storage and the configurable "ephemeral" volume 
> for a flavor
> 
> a) We currently use Ceph RBD als image_type 
> (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type), 
> so instance images are stored in Ceph, not locally on disk. I believe 
> this setting will also cause ephemeral volumes (destination_local) to be 
> placed on a RBD and not /var/lib/nova/instances?
it should be in ceph yes we do not support havign the root/swap/ephemral
disk use diffent storage locatiosn
> Or is there a setting to set a different backend for local block devices 
> providing "ephemeral" storage? So RBD for the root disk and a local LVM 
> VG for ephemeral?
no that would be a new feature and not a trivial one as yo uwould have to make
sure it works for live migration and cold migration.

> 
> b) Will an ephemeral volume also be migrated when the instance is 
> shutoff as with live-migration?
its hsoudl be. its not included in snapshots so its not presergved
when shelving. that means corss cell cold migration will not preserve the disk.

but for a normal cold migration it shoudl be scp'd or rsynced with the root disk
if you are using the raw/qcow/flat images type if i remember correctly.
with RBD or other shared storage like nfs it really sould be preserved.

one other thing to note is ironic and only ironic support the 
preserve_ephemeral option in the rebuild api.

libvirt will wipte the ephmeral disk if you rebuild or evacuate.
> Or will there be an new volume created on the target host? I am asking 
> because I want to avoid syncing 500G or 1T when it's only "ephemeral" 
> and the instance will not expect any data on it on the next boot.
i would perssonally consider it a bug if it was not transfered.
that does not mean that could not change in the future.
this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted.
the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm

for exmple it should nto get recreate vai a simple reboot or live migration
it should not get created for cold migration or rezise.
but it will get wipted for shelve_offload, cross cell resize and evacuate.
> 
> c) Is the size of the ephemeral storage for flavors a fixed size or just 
> the upper bound for users? So if I limit this to 1T, will such a flavor 
> always provision a block device with his size?
flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks 
on the same instance.  so if its 100G you can ask for 2 50G epmeeral disks

you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server
create.
this has been automated in recent version of the openstack client 

so you can do 

openstack server creeate  --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ...

https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral
this is limted by 
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices

> 
> I suppose using LVM this will be thin provisioned anyways?
to use the lvm backend with libvirt you set
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group
to identify which lvm VG to use.

https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might
work without it but see the note

""" 
Warning

This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future.

Reason

    Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin
provisioning, use Cinder thin-provisioned volumes.
"""

the nova lvm supprot has been in maintance mode for many years.

im not opposed to improving it just calling out that it has bugs and noone has really
worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local
storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase.

you are well into undefined behavior land however at this point

we do not test it so we assume untile told otherwise that its broken.


> 
> 
> 2) *Via Cinder*, running cinder-volume on each compute node to provide a 
> volume type "ephemeral", using e.g. the LVM driver
> 
> a) While not really "ephemeral" and bound to the instance lifecycle, 
> this would allow users to provision ephemeral volume just as they need them.
> I suppose I could use backend specific quotas 
> (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas) 
> to
> limit the number of size of such volumes?
> 
> b) Do I need to use the instance locality filter 
> (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html) 
> then?

That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost
so you still have the the network layer overhead.

when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver
so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has.
if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do
that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not
have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement.
we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term.

we are trying to keep nova device manamgnet to stateless only.
That said we added intel PMEM/NVDIM supprot to nova and did handle both optionl data transfer and multi tancny but that was a non trivial amount of
work


> 
> c)? Since a volume will always be bound to a certain host, I suppose 
> this will cause side-effects to instance scheduling?
> With the volume remaining after an instance has been destroyed (beating 
> the purpose of it being "ephemeral") I suppose any other instance 
> attaching this volume will
> be scheduling on this very machine?
> 
no nova would have no knowage about the volume locality out of the box
>  Is there any way around this? Maybe 
> a driver setting to have such volumes "self-destroy" if they are not 
> attached anymore?
we hate those kind of config options nova would not know that its bound to the host at the schduler level and
we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data"
by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely.

you woudl have to then do a volume migration sepreately in cinder i think.
> 
> d) Same question as with Nova: What happens when an instance is 
> live-migrated?
> 
i think i anser this above?
> 
> 
> Maybe others also have this use case and you can share your solution(s)?
adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option

you coudl extend nova but as i said we have rejected that in the past.
that said the generic resouce table we added for pemem was made generic so that future resocues like local block
device could be tracked there without db changes.

supproting differnt image_type backend for root,swap and ephmeral would be possibel.
its an invasive change but might be more natural then teh resouce tabel approch.
you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break
anything in the process woudl take a lot of testing.

> Thanks and with regards
> 
> 
> Christian
> 
> 


From rdhasman at redhat.com  Fri Mar 24 18:57:41 2023
From: rdhasman at redhat.com (Rajat Dhasmana)
Date: Sat, 25 Mar 2023 00:27:41 +0530
Subject: [cinder][PTG] Cinder 2023.2 (Bobcat) Virtual PTG
Message-ID: <CAARK8KSDkxj_eNqgQ22PwLZ6RU1DeXefL4C6cpwwa0GOCbBBaQ@mail.gmail.com>

Hello Argonauts,

We will be conducting cinder virtual PTG for 2023.2 (Bobcat) cycle
from 28th March to 31st March, 2023.
I've prepared the PTG etherpad[1] with topics day wise. I haven't
kept the schedule time bound since some discussions take less
and some take more time and in both cases reaching a conclusion
is important.

There are some events that need to be done on their respective
time are as follows:

*1) Operator Hour: *We encourage operators to join and tell us about the
pain points of cinder so we can improve upon it.
Date: Wednesday, 29 March, 2023
Time: 1400-1500 UTC
Link: https://bluejeans.com/556681290

*2) Glance Cross Project*
Date: Thursday, 30 March, 2023
Time: 1430-1500 UTC
Link: https://bluejeans.com/556681290

*3) Nova Cross Project*
Date: Thursday, 30 March, 2023
Time: 1600-1700 UTC
Link: https://zoom.us/j/96494117185?pwd=NGhya0NpeWppMEc1OUNKdlFPbDNYdz09
(Diablo room)

The general information about the PTG is as follows:

Date: 28th March to 31st March, 2023
Time: 1300-1700 UTC everyday
Link to PTG: https://bluejeans.com/556681290

You can also follow the schedule at https://ptg.opendev.org/ptg.html

Note that we have allocated 4 hours each day but it also depends on
the number and duration of topics as to how long the PTG will run.
If you want to be reminded about a particular topic, please add your
IRC nick in the *Courtesy ping list *mentioned after every topic.

If you still have topics, please add them to the Planning etherpad[2] and
I will see if we can accommodate that into our current schedule.

Lastly, we will be cancelling our cinder upstream meeting (Wednesday,
29 March, 1400-1500 UTC) since it overlaps with the PTG. If you still
have topics please bring it to the PTG.

See you all at the PTG!

[1] https://etherpad.opendev.org/p/bobcat-ptg-cinder
[2] https://etherpad.opendev.org/p/bobcat-ptg-cinder-planning

Thanks
Rajat Dhasmana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230325/11d5378a/attachment.htm>

From ianyrchoi at gmail.com  Fri Mar 24 19:49:05 2023
From: ianyrchoi at gmail.com (Ian Y. Choi)
Date: Sat, 25 Mar 2023 04:49:05 +0900
Subject: [i18n][PTG] I18n SIG 2023.2 (Bobcat) PTG Planning
Message-ID: <CACuLE15+NGev-SKS6Casqm9GCXawCveNQA8OT99P+bdOWepP+w@mail.gmail.com>

Hi,

PTG is coming next week.

I have blocked the following schedule - anyone can check via
https://ptg.opendev.org/ptg.html and please let me know if more
discussion is needed during PTG.

Those are topics what I am thinking currently but it is open - feel
free to suggest any topic to be discussed:
- Weblate migration
- Translation artifacts release management
- Review of i18n statistics & ATC/AC
- Discussion on translation target
- Overall OpenStack/OpenInfra I18n process/progress review
- Check-in on each language team

Discussions will be shared via Etherpad:
https://etherpad.opendev.org/p/march2023-ptg-i18n .

Lastly, please don't forget to register if you have forgotten:
https://openinfra-ptg.eventbrite.com :)

Looking forward to a productive PTG with I18n.


Thank you,

/Ian


From kozhukalov at gmail.com  Fri Mar 24 20:26:34 2023
From: kozhukalov at gmail.com (Vladimir Kozhukalov)
Date: Fri, 24 Mar 2023 23:26:34 +0300
Subject: [openstack-helm] PTG March 27-31 2023
Message-ID: <CANxTg77tjxbuYZT8JaZSp6-g6qH7WFArUcd4SeqLYdd-ZhMJKQ@mail.gmail.com>

Dear openstack-helmers,

As you know PTG is going to happen next week and I booked slots at the end
of the week on Thursday 03/30 and on Friday 03/31 from 14:00 UTC till 17:00
UTC [1] I believe this should be enough. If you feel other time slots are
gonna work better for any reason there are still some free slots that can
be booked.

Please also pay some attention to the etherpad where I listed some of the
points for our discussions. [2] Please feel free to add other points that
you think worth it to be discussed.

[1] https://ptg.opendev.org/ptg.html
[2] https://etherpad.opendev.org/p/march2023-ptg-openstack-helm

--
Best regards,
Kozhukalov Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/f51abf4b/attachment-0001.htm>

From amy at demarco.com  Fri Mar 24 21:03:02 2023
From: amy at demarco.com (Amy Marrich)
Date: Fri, 24 Mar 2023 16:03:02 -0500
Subject: [Diversity] Diversity and Inclusion at the PTG
Message-ID: <CAFs83QoBFp4+HFDRBYU2Bo1KsnhpEQBMOK_34W636kXw33rDMw@mail.gmail.com>

I have blocked off three hours on Monday for the D&I WG to discuss the
upcoming Diversity Survey(14:00 UTC) and then ongoing changes to the
Code of Conduct(15:00- 16:00 UTC) and then the Summit if there is
time. The agenda can be found here[0].

All projects are encouraged to attend these sessions as the WG is at
the Foundation level.

Thanks,

Amy (spotz)
0 - https://etherpad.opendev.org/p/march2023-ptg-diversity


From kennelson11 at gmail.com  Sat Mar 25 00:39:51 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Fri, 24 Mar 2023 19:39:51 -0500
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
 <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
 <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>
 <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com>
Message-ID: <CAJ6yrQgyvdk4zwE04zhObN9SsiBFg8SFXbG7BoeeR+SQPraxkA@mail.gmail.com>

Heh, okay well not complete overlap, but there is still a 3 hour overlap as
sdk things are currently scheduled go from 14 - 17 UTC.

Either way, I would rather not try to squeeze it down on Friday, when we
can just move it to Wednesday.

-Kendall

On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann <gmann at ghanshyammann.com>
wrote:

> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15
> UTC slot does not overlap with TC.
>
> - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18
>
> -gmann
>
>  ---- On Fri, 24 Mar 2023 11:12:52 -0700  Artem Goncharov  wrote ---
>  > Well, there was actually no pool, since I was not even sure anybody is
> that interested, but glad to hear.
>  > What about Wed somewhere from 13:00 to 17:00? There is however overlap
> with Nova (pretty much like on any other day)
>  > Ideas? I just want to avoid overlap with public cloud, but maybe even
> 1h is enough. So far there are not much topics anyway.
>  >
>  >
>  > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com> wrote:
>  > Super annoying request, but can we do earlier in the week? The sessions
> for sdk have 100% overlap with the TC which I was planning on attending :/
>  >
>  > And I am very very sorry if I missed sharing an opinion on when would
> be good to meet.
>  > -Kendall
>  > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov
> artem.goncharov at gmail.com> wrote:
>  > Hi all,
>  > A bit late, but still - I have booked a 3 hours slot during PTG on
> Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I
> think some people and outcomes will follow directly into our room.
>  > Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
>  > Feel free to feel in topics you want to discuss
>  > Cheers,Artem
>  >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230324/fbb71c4a/attachment.htm>

From nguyenhuukhoinw at gmail.com  Sun Mar 26 11:11:17 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 26 Mar 2023 18:11:17 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
Message-ID: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>

Hello guys.
I playing with Nova AZ and Masakari

https://docs.openstack.org/nova/latest/admin/availability-zones.html

Masakari will move server by nova scheduler.

Openstack Docs describe that:

If the server was not created in a specific zone then it is free to be
moved to other zones, i.e. the AvailabilityZoneFilter
<https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter>
is
a no-op.

I see that everyone usually creates instances with "Any Availability Zone"
on Horzion and also we don't specify AZ when creating instances by cli.

By this way, when we use Masakari or we miragrated instances( or evacuate)
so our instance will be moved to other zones.

Can we attach AZ to server create requests API based on Any
Availability Zone to limit instances moved to other zones?

Thank you. Regards

Nguyen Huu Khoi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/b933bed5/attachment.htm>

From rafaelweingartner at gmail.com  Sun Mar 26 12:24:16 2023
From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=)
Date: Sun, 26 Mar 2023 09:24:16 -0300
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
Message-ID: <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>

Hello Nguy?n H?u Kh?i,
You might want to take a look at:
https://review.opendev.org/c/openstack/nova/+/864760. We created a patch to
avoid migrating VMs to any AZ, once the VM has been bootstrapped in an AZ
that has cross zone attache equals to false.

On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Hello guys.
> I playing with Nova AZ and Masakari
>
> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>
> Masakari will move server by nova scheduler.
>
> Openstack Docs describe that:
>
> If the server was not created in a specific zone then it is free to be
> moved to other zones, i.e. the AvailabilityZoneFilter
> <https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> is
> a no-op.
>
> I see that everyone usually creates instances with "Any Availability Zone"
> on Horzion and also we don't specify AZ when creating instances by cli.
>
> By this way, when we use Masakari or we miragrated instances( or evacuate)
> so our instance will be moved to other zones.
>
> Can we attach AZ to server create requests API based on Any
> Availability Zone to limit instances moved to other zones?
>
> Thank you. Regards
>
> Nguyen Huu Khoi
>


-- 
Rafael Weing?rtner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/751f2197/attachment.htm>

From nguyenhuukhoinw at gmail.com  Sun Mar 26 12:52:00 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 26 Mar 2023 19:52:00 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
Message-ID: <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>

Hello.
Many thanks for your information. It's very helpful for me. :)
Nguyen Huu Khoi


On Sun, Mar 26, 2023 at 7:24?PM Rafael Weing?rtner <
rafaelweingartner at gmail.com> wrote:

> Hello Nguy?n H?u Kh?i,
> You might want to take a look at:
> https://review.opendev.org/c/openstack/nova/+/864760. We created a patch
> to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an
> AZ that has cross zone attache equals to false.
>
> On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> wrote:
>
>> Hello guys.
>> I playing with Nova AZ and Masakari
>>
>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>
>> Masakari will move server by nova scheduler.
>>
>> Openstack Docs describe that:
>>
>> If the server was not created in a specific zone then it is free to be
>> moved to other zones, i.e. the AvailabilityZoneFilter
>> <https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> is
>> a no-op.
>>
>> I see that everyone usually creates instances with "Any Availability
>> Zone" on Horzion and also we don't specify AZ when creating instances by
>> cli.
>>
>> By this way, when we use Masakari or we miragrated instances( or
>> evacuate) so our instance will be moved to other zones.
>>
>> Can we attach AZ to server create requests API based on Any
>> Availability Zone to limit instances moved to other zones?
>>
>> Thank you. Regards
>>
>> Nguyen Huu Khoi
>>
>
>
> --
> Rafael Weing?rtner
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/24b8120e/attachment.htm>

From nguyenhuukhoinw at gmail.com  Sun Mar 26 13:04:58 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 26 Mar 2023 20:04:58 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>
Message-ID: <CABAODRdiNJ1evk-RvDZBgFo+bFMD-VzHc=Vfw32JpCBj0sNRpg@mail.gmail.com>

I don't know why this is not merged to github.

It is a problem with the system.

Nguyen Huu Khoi


On Sun, Mar 26, 2023 at 7:52?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Hello.
> Many thanks for your information. It's very helpful for me. :)
> Nguyen Huu Khoi
>
>
> On Sun, Mar 26, 2023 at 7:24?PM Rafael Weing?rtner <
> rafaelweingartner at gmail.com> wrote:
>
>> Hello Nguy?n H?u Kh?i,
>> You might want to take a look at:
>> https://review.opendev.org/c/openstack/nova/+/864760. We created a patch
>> to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an
>> AZ that has cross zone attache equals to false.
>>
>> On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>> nguyenhuukhoinw at gmail.com> wrote:
>>
>>> Hello guys.
>>> I playing with Nova AZ and Masakari
>>>
>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>>
>>> Masakari will move server by nova scheduler.
>>>
>>> Openstack Docs describe that:
>>>
>>> If the server was not created in a specific zone then it is free to be
>>> moved to other zones, i.e. the AvailabilityZoneFilter
>>> <https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> is
>>> a no-op.
>>>
>>> I see that everyone usually creates instances with "Any Availability
>>> Zone" on Horzion and also we don't specify AZ when creating instances by
>>> cli.
>>>
>>> By this way, when we use Masakari or we miragrated instances( or
>>> evacuate) so our instance will be moved to other zones.
>>>
>>> Can we attach AZ to server create requests API based on Any
>>> Availability Zone to limit instances moved to other zones?
>>>
>>> Thank you. Regards
>>>
>>> Nguyen Huu Khoi
>>>
>>
>>
>> --
>> Rafael Weing?rtner
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/8289068f/attachment.htm>

From fungi at yuggoth.org  Sun Mar 26 13:50:08 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Sun, 26 Mar 2023 13:50:08 +0000
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRdiNJ1evk-RvDZBgFo+bFMD-VzHc=Vfw32JpCBj0sNRpg@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>
 <CABAODRdiNJ1evk-RvDZBgFo+bFMD-VzHc=Vfw32JpCBj0sNRpg@mail.gmail.com>
Message-ID: <20230326135007.4yttykkpeidqcijl@yuggoth.org>

On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote:
> I don't know why this is not merged to github.
[...]

The change is only a few months old, and Nova (like many teams)
receives more patches than they have time to review. It's probably
worth trying to get the attention of some reviewers in the
#openstack-nova IRC channel if this mailing list thread hasn't
already.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/c20a0f85/attachment-0001.sig>

From nguyenhuukhoinw at gmail.com  Sun Mar 26 14:08:48 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Sun, 26 Mar 2023 21:08:48 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <20230326135007.4yttykkpeidqcijl@yuggoth.org>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>
 <CABAODRdiNJ1evk-RvDZBgFo+bFMD-VzHc=Vfw32JpCBj0sNRpg@mail.gmail.com>
 <20230326135007.4yttykkpeidqcijl@yuggoth.org>
Message-ID: <CABAODRfHfXT54yGfm7tmf3a0H5BDYCG6rkLz=q=CTm-Y8eg-xA@mail.gmail.com>

Ok. I got it. Thank you very much

On Sun, Mar 26, 2023, 8:56 PM Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote:
> > I don't know why this is not merged to github.
> [...]
>
> The change is only a few months old, and Nova (like many teams)
> receives more patches than they have time to review. It's probably
> worth trying to get the attention of some reviewers in the
> #openstack-nova IRC channel if this mailing list thread hasn't
> already.
> --
> Jeremy Stanley
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230326/b9b6c2a4/attachment.htm>

From hanguangyu2 at gmail.com  Sun Mar 26 18:50:22 2023
From: hanguangyu2 at gmail.com (=?UTF-8?B?6Z+p5YWJ5a6H?=)
Date: Mon, 27 Mar 2023 02:50:22 +0800
Subject: [nova] Can OpenStack support snapshot rollback (not creating a new
 instance)?
Message-ID: <CAOirdiWPZTLv_Ynj83W=bZVY2kfGkK8gsacmEsAz3xAkQVwnhg@mail.gmail.com>

Hello,

I use Ceph as the storage backend for Nova, Glance, and Cinder.

If I create a snapshot for a instance, It create a new image in
glance. And I can use the image to create a new instance.

This feels to me more like creating an image based on the current
state of the VM rather than creating a VM snapshot.

I want to ask:
1?Can I create and revert a VM snapshot like I would in virtual
machine software?
2?When a VM uses multiple disks/volumes, does OpenStack support taking
a snapshot of all disks/volumes of the VM as a whole?
3?Can OpenStack snapshot and save the memory state of a VM?

If it is not currently supported, are there any simple customization
implementation ideas that can be recommended?

Thank you for any help and suggestions.
Best wishes.

Han


From nguyenhuukhoinw at gmail.com  Sun Mar 26 23:46:27 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Mon, 27 Mar 2023 06:46:27 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRfHfXT54yGfm7tmf3a0H5BDYCG6rkLz=q=CTm-Y8eg-xA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CABAODRf20o2bHg=5fOwtxcb1FAJhA9ys1grSEZQBjjq08x79sA@mail.gmail.com>
 <CABAODRdiNJ1evk-RvDZBgFo+bFMD-VzHc=Vfw32JpCBj0sNRpg@mail.gmail.com>
 <20230326135007.4yttykkpeidqcijl@yuggoth.org>
 <CABAODRfHfXT54yGfm7tmf3a0H5BDYCG6rkLz=q=CTm-Y8eg-xA@mail.gmail.com>
Message-ID: <CABAODRdcehU=J-sDKUAmRV2MHQLrWbfq9q+pcRqYHCYzWkpV_A@mail.gmail.com>

Hello.
I want to update you in this case, I think we need adjust for  cross zone
attache = true also.
Nguyen Huu Khoi


On Sun, Mar 26, 2023 at 9:08?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Ok. I got it. Thank you very much
>
> On Sun, Mar 26, 2023, 8:56 PM Jeremy Stanley <fungi at yuggoth.org> wrote:
>
>> On 2023-03-26 20:04:58 +0700 (+0700), Nguy?n H?u Kh?i wrote:
>> > I don't know why this is not merged to github.
>> [...]
>>
>> The change is only a few months old, and Nova (like many teams)
>> receives more patches than they have time to review. It's probably
>> worth trying to get the attention of some reviewers in the
>> #openstack-nova IRC channel if this mailing list thread hasn't
>> already.
>> --
>> Jeremy Stanley
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/9a9d63f1/attachment.htm>

From sbauza at redhat.com  Mon Mar 27 08:19:20 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Mon, 27 Mar 2023 10:19:20 +0200
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
Message-ID: <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>

Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
rafaelweingartner at gmail.com> a ?crit :

> Hello Nguy?n H?u Kh?i,
> You might want to take a look at:
> https://review.opendev.org/c/openstack/nova/+/864760. We created a patch
> to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an
> AZ that has cross zone attache equals to false.
>
>
Well, I'll provide some comments in the change, but I'm afraid we can't
just modify the request spec like you would want.

Anyway, if you want to discuss about it in the vPTG, just add it in the
etherpad and add your IRC nick so we could try to find a time where we
could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg
Also, this kind of behaviour modification is more a new feature than a
bugfix, so fwiw you should create a launchpad blueprint so we could better
see it.

-Sylvain


> On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> wrote:
>
>> Hello guys.
>> I playing with Nova AZ and Masakari
>>
>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>
>> Masakari will move server by nova scheduler.
>>
>> Openstack Docs describe that:
>>
>> If the server was not created in a specific zone then it is free to be
>> moved to other zones, i.e. the AvailabilityZoneFilter
>> <https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> is
>> a no-op.
>>
>> I see that everyone usually creates instances with "Any Availability
>> Zone" on Horzion and also we don't specify AZ when creating instances by
>> cli.
>>
>> By this way, when we use Masakari or we miragrated instances( or
>> evacuate) so our instance will be moved to other zones.
>>
>> Can we attach AZ to server create requests API based on Any
>> Availability Zone to limit instances moved to other zones?
>>
>> Thank you. Regards
>>
>> Nguyen Huu Khoi
>>
>
>
> --
> Rafael Weing?rtner
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/35f7a888/attachment.htm>

From christian.rohmann at inovex.de  Mon Mar 27 08:47:58 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Mon, 27 Mar 2023 10:47:58 +0200
Subject: [nova][cinder] Providing ephemeral storage to instances - Cinder
 or Nova
In-Reply-To: <c510ca65d6cfc89ba975c701b1eda9f01a018097.camel@redhat.com>
References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de>
 <c510ca65d6cfc89ba975c701b1eda9f01a018097.camel@redhat.com>
Message-ID: <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de>

Thanks for your extensive reply Sean!

I also replied inline and would love to continue the conversation with 
you and other with this use case
to find the best / most suitable approach.


On 24/03/2023 19:50, Sean Mooney wrote:
> i responed in line but just a waring this is a usecase we ahve heard before.
> there is no simple option im afraid and there are many many sharp edges
> and severl littel know features/limitatiosn that your question puts you right in the
> middel of.
>
> On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote:
>> 1) *Via Nova* instance storage and the configurable "ephemeral" volume
>> for a flavor
>>
>> a) We currently use Ceph RBD als image_type
>> (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type),
>> so instance images are stored in Ceph, not locally on disk. I believe
>> this setting will also cause ephemeral volumes (destination_local) to be
>> placed on a RBD and not /var/lib/nova/instances?
> it should be in ceph yes we do not support havign the root/swap/ephemral
> disk use diffent storage locatiosn
>> Or is there a setting to set a different backend for local block devices
>> providing "ephemeral" storage? So RBD for the root disk and a local LVM
>> VG for ephemeral?
> no that would be a new feature and not a trivial one as yo uwould have to make
> sure it works for live migration and cold migration.

While having the root disk on resilient storage, using local storage 
swap / ephemeral actually seems quite obvious.
Do you happen to know if there ever was a spec / push to implement this?


>> b) Will an ephemeral volume also be migrated when the instance is
>> shutoff as with live-migration?
> its hsoudl be. its not included in snapshots so its not presergved
> when shelving. that means corss cell cold migration will not preserve the disk.
>
> but for a normal cold migration it shoudl be scp'd or rsynced with the root disk
> if you are using the raw/qcow/flat images type if i remember correctly.
> with RBD or other shared storage like nfs it really sould be preserved.
>
> one other thing to note is ironic and only ironic support the
> preserve_ephemeral option in the rebuild api.
>
> libvirt will wipte the ephmeral disk if you rebuild or evacuate.

Could I somehow configure a flavor to "require" a rebuild / evacuate or? 
to disable live migration for it?


>> Or will there be an new volume created on the target host? I am asking
>> because I want to avoid syncing 500G or 1T when it's only "ephemeral"
>> and the instance will not expect any data on it on the next boot.
> i would perssonally consider it a bug if it was not transfered.
> that does not mean that could not change in the future.
> this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted.
> the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm
>
> for exmple it should nto get recreate vai a simple reboot or live migration
> it should not get created for cold migration or rezise.
> but it will get wipted for shelve_offload, cross cell resize and evacuate.
So even for cold migration it would be preserved then? So my only option 
would be to shelve such instances when trying to
"move" instances off a certain hypervisor while NOT syncing ephemeral 
storage?


>> c) Is the size of the ephemeral storage for flavors a fixed size or just
>> the upper bound for users? So if I limit this to 1T, will such a flavor
>> always provision a block device with his size?
> flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks
> on the same instance.  so if its 100G you can ask for 2 50G epmeeral disks
>
> you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server
> create.
> this has been automated in recent version of the openstack client
>
> so you can do
>
> openstack server creeate  --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ...
>
> https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral
> this is limted by
> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices
>
>> I suppose using LVM this will be thin provisioned anyways?
> to use the lvm backend with libvirt you set
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group
> to identify which lvm VG to use.
>
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might
> work without it but see the note
>
> """
> Warning
>
> This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future.
>
> Reason
>
>      Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin
> provisioning, use Cinder thin-provisioned volumes.
> """
>
> the nova lvm supprot has been in maintance mode for many years.
>
> im not opposed to improving it just calling out that it has bugs and noone has really
> worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local
> storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase.
>
> you are well into undefined behavior land however at this point
>
> we do not test it so we assume untile told otherwise that its broken.

Thanks for the heads up. I looked at LVM for cinder and there LVM 
volumes are thin provisioned,
so I figured this might be the case for Nova as well.


>> 2) *Via Cinder*, running cinder-volume on each compute node to provide a
>> volume type "ephemeral", using e.g. the LVM driver
>>
>> a) While not really "ephemeral" and bound to the instance lifecycle,
>> this would allow users to provision ephemeral volume just as they need them.
>> I suppose I could use backend specific quotas
>> (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas)
>> to
>> limit the number of size of such volumes?
>>
>> b) Do I need to use the instance locality filter
>> (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html)
>> then?
> That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost
> so you still have the the network layer overhead.

Thanks for the hint - one can easily be confused when reading "LVM" ... 
I actually thought there was a way to have "host-only" style volumes which
are simply local block devices with no iscsi / NVME in between which are 
used by Nova then.

These kind of volumes could maybe be built into cinder as a 
"taget_protocol: local" together with the instance_locality_filter -
but apparently now the only way is through iSCSI or NVME.


> when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver
> so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has.
> if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do
> that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not
> have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement.
> we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term.

This sounds like a "3rd" approach: Using Cyborg to provide local storage 
(via LVM).


>> c)? Since a volume will always be bound to a certain host, I suppose
>> this will cause side-effects to instance scheduling?
>> With the volume remaining after an instance has been destroyed (beating
>> the purpose of it being "ephemeral") I suppose any other instance
>> attaching this volume will
>> be scheduling on this very machine?
>>
> no nova would have no knowage about the volume locality out of the box
>>   Is there any way around this? Maybe
>> a driver setting to have such volumes "self-destroy" if they are not
>> attached anymore?
> we hate those kind of config options nova would not know that its bound to the host at the schduler level and
> we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data"
> by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely.
>
> you woudl have to then do a volume migration sepreately in cinder i think.
>> d) Same question as with Nova: What happens when an instance is
>> live-migrated?
>>
> i think i anser this above?

Yes, these questions where all due to my misconception that 
cinder-volume backend "LVM" did not have any networking layer
and was host-local.


>>
>> Maybe others also have this use case and you can share your solution(s)?
> adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option
>
> you coudl extend nova but as i said we have rejected that in the past.
> that said the generic resouce table we added for pemem was made generic so that future resocues like local block
> device could be tracked there without db changes.
>
> supproting differnt image_type backend for root,swap and ephmeral would be possibel.
> its an invasive change but might be more natural then teh resouce tabel approch.
> you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break
> anything in the process woudl take a lot of testing.

Thanks for the sum up!


Regards


Christian


From artem.goncharov at gmail.com  Mon Mar 27 09:18:52 2023
From: artem.goncharov at gmail.com (Artem Goncharov)
Date: Mon, 27 Mar 2023 11:18:52 +0200
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <CAJ6yrQgyvdk4zwE04zhObN9SsiBFg8SFXbG7BoeeR+SQPraxkA@mail.gmail.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
 <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
 <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>
 <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com>
 <CAJ6yrQgyvdk4zwE04zhObN9SsiBFg8SFXbG7BoeeR+SQPraxkA@mail.gmail.com>
Message-ID: <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com>

Okay, I have not received any other feedback, so I went and booked 2 slots Wed 15:00-17:00 and left also 1h slot on Fri 14:00 just for ?safety?.

Looking forward seeing you there.

Artem

> On 25. Mar 2023, at 01:39, Kendall Nelson <kennelson11 at gmail.com> wrote:
> 
> Heh, okay well not complete overlap, but there is still a 3 hour overlap as sdk things are currently scheduled go from 14 - 17 UTC. 
> 
> Either way, I would rather not try to squeeze it down on Friday, when we can just move it to Wednesday. 
> 
> -Kendall 
> 
> On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann <gmann at ghanshyammann.com <mailto:gmann at ghanshyammann.com>> wrote:
>> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15 UTC slot does not overlap with TC.
>> 
>> - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18
>> 
>> -gmann
>> 
>>  ---- On Fri, 24 Mar 2023 11:12:52 -0700  Artem Goncharov  wrote --- 
>>  > Well, there was actually no pool, since I was not even sure anybody is that interested, but glad to hear.
>>  > What about Wed somewhere from 13:00 to 17:00? There is however overlap with Nova (pretty much like on any other day)
>>  > Ideas? I just want to avoid overlap with public cloud, but maybe even 1h is enough. So far there are not much topics anyway.
>>  > 
>>  > 
>>  > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com <mailto:kennelson11 at gmail.com>> wrote:
>>  > Super annoying request, but can we do earlier in the week? The sessions for sdk have 100% overlap with the TC which I was planning on attending :/
>>  > 
>>  > And I am very very sorry if I missed sharing an opinion on when would be good to meet. 
>>  > -Kendall 
>>  > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov artem.goncharov at gmail.com <mailto:artem.goncharov at gmail.com>> wrote:
>>  > Hi all,
>>  > A bit late, but still - I have booked a 3 hours slot during PTG on Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I think some people and outcomes will follow directly into our room.
>>  > Etherpad is there: https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
>>  > Feel free to feel in topics you want to discuss
>>  > Cheers,Artem
>>  > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/e0f6a471/attachment.htm>

From adivya1.singh at gmail.com  Mon Mar 27 11:36:30 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Mon, 27 Mar 2023 17:06:30 +0530
Subject: (Open Stack )Image Upload in Open Stack in a Bulk
Message-ID: <CA+ykd62Qod4cYe0HEB74WZYBp2qRhES+xNiD76GjG1c_fwfL-Q@mail.gmail.com>

Hi Team,

Any hints, if i want to upload images  in a bulk in a Open Stack , because
it takes some time for the image to copy if we go one by one, or even of we
go with script


Also if there is a scenario where glance mount point fails and we can
create the same Share path and Copy the Image from the source , Will the
OpenStack glance Service will start detecting those images upload in a share

Regards
Adivya Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/b8509cf6/attachment.htm>

From smooney at redhat.com  Mon Mar 27 11:51:11 2023
From: smooney at redhat.com (Sean Mooney)
Date: Mon, 27 Mar 2023 12:51:11 +0100
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
Message-ID: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>

On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
> Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
> rafaelweingartner at gmail.com> a ?crit :
> 
> > Hello Nguy?n H?u Kh?i,
> > You might want to take a look at:
> > https://review.opendev.org/c/openstack/nova/+/864760. We created a patch
> > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in an
> > AZ that has cross zone attache equals to false.
> > 
> > 
> Well, I'll provide some comments in the change, but I'm afraid we can't
> just modify the request spec like you would want.
> 
> Anyway, if you want to discuss about it in the vPTG, just add it in the
> etherpad and add your IRC nick so we could try to find a time where we
> could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg
> Also, this kind of behaviour modification is more a new feature than a
> bugfix, so fwiw you should create a launchpad blueprint so we could better
> see it.

i tought i left review feedback on that too that the approch was not correct.
i guess i did not in the end.

modifying the request spec as sylvain menthioned is not correct.
i disucssed this topic on irc a few weeks back with mohomad for vxhost.
what can be done is as follows.

we can add a current_az field  to the Destination object
https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
The conductor can read the instance.AZ and populate it in that new field.
We can then add a new weigher to prefer hosts that are in the same az.

This will provide soft AZ affinity for the vm and preserve the fact that if a vm is created without sepcifying
An AZ the expectaiton at the api level woudl be that it can migrate to any AZ.

To provide hard AZ affintiy we could also add prefileter that would use the same data but instead include it in the
placement query so that only the current AZ is considered. This would have to be disabled by default.

That woudl allow operators to choose the desired behavior.
curret behavior (disable weigher and dont enabel prefilter)
new default, prefer current AZ (weigher enabeld prefilter disabled)
hard affintiy(prefilter enabled.)

there are other ways to approch this but updating the request spec is not one of them.
we have to maintain the fact the enduser did not request an AZ.

> 
> -Sylvain
> 
> 
> 
> > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> > wrote:
> > 
> > > Hello guys.
> > > I playing with Nova AZ and Masakari
> > > 
> > > https://docs.openstack.org/nova/latest/admin/availability-zones.html
> > > 
> > > Masakari will move server by nova scheduler.
> > > 
> > > Openstack Docs describe that:
> > > 
> > > If the server was not created in a specific zone then it is free to be
> > > moved to other zones, i.e. the AvailabilityZoneFilter
> > > <https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter> is
> > > a no-op.
> > > 
> > > I see that everyone usually creates instances with "Any Availability
> > > Zone" on Horzion and also we don't specify AZ when creating instances by
> > > cli.
> > > 
> > > By this way, when we use Masakari or we miragrated instances( or
> > > evacuate) so our instance will be moved to other zones.
> > > 
> > > Can we attach AZ to server create requests API based on Any
> > > Availability Zone to limit instances moved to other zones?
> > > 
> > > Thank you. Regards
> > > 
> > > Nguyen Huu Khoi
> > > 
> > 
> > 
> > --
> > Rafael Weing?rtner
> > 


From sbauza at redhat.com  Mon Mar 27 12:06:56 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Mon, 27 Mar 2023 14:06:56 +0200
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
Message-ID: <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>

Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a ?crit :

> On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
> > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
> > rafaelweingartner at gmail.com> a ?crit :
> >
> > > Hello Nguy?n H?u Kh?i,
> > > You might want to take a look at:
> > > https://review.opendev.org/c/openstack/nova/+/864760. We created a
> patch
> > > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in
> an
> > > AZ that has cross zone attache equals to false.
> > >
> > >
> > Well, I'll provide some comments in the change, but I'm afraid we can't
> > just modify the request spec like you would want.
> >
> > Anyway, if you want to discuss about it in the vPTG, just add it in the
> > etherpad and add your IRC nick so we could try to find a time where we
> > could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg
> > Also, this kind of behaviour modification is more a new feature than a
> > bugfix, so fwiw you should create a launchpad blueprint so we could
> better
> > see it.
>
> i tought i left review feedback on that too that the approch was not
> correct.
> i guess i did not in the end.
>
> modifying the request spec as sylvain menthioned is not correct.
> i disucssed this topic on irc a few weeks back with mohomad for vxhost.
> what can be done is as follows.
>
> we can add a current_az field  to the Destination object
>
> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
> The conductor can read the instance.AZ and populate it in that new field.
> We can then add a new weigher to prefer hosts that are in the same az.
>
>

I tend to disagree this approach as people would think that the
Destination.az field would be related to the current AZ for an instance,
while we only look at the original AZ.
That being said, we could have a weigher that would look at whether the
host is in the same AZ than the instance.host.


This will provide soft AZ affinity for the vm and preserve the fact that if
> a vm is created without sepcifying
> An AZ the expectaiton at the api level woudl be that it can migrate to any
> AZ.
>
> To provide hard AZ affintiy we could also add prefileter that would use
> the same data but instead include it in the
> placement query so that only the current AZ is considered. This would have
> to be disabled by default.
>
>
Sure, we could create a new prefilter so we could then deprecate the
AZFilter if we want.


> That woudl allow operators to choose the desired behavior.
> curret behavior (disable weigher and dont enabel prefilter)
> new default, prefer current AZ (weigher enabeld prefilter disabled)
> hard affintiy(prefilter enabled.)
>
> there are other ways to approch this but updating the request spec is not
> one of them.
> we have to maintain the fact the enduser did not request an AZ.
>
>
Anyway, if folks want to discuss about AZs, this week is the good time :-)


> >
> > -Sylvain
> >
> >
> >
> > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
> nguyenhuukhoinw at gmail.com>
> > > wrote:
> > >
> > > > Hello guys.
> > > > I playing with Nova AZ and Masakari
> > > >
> > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html
> > > >
> > > > Masakari will move server by nova scheduler.
> > > >
> > > > Openstack Docs describe that:
> > > >
> > > > If the server was not created in a specific zone then it is free to
> be
> > > > moved to other zones, i.e. the AvailabilityZoneFilter
> > > > <
> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter>
> is
> > > > a no-op.
> > > >
> > > > I see that everyone usually creates instances with "Any Availability
> > > > Zone" on Horzion and also we don't specify AZ when creating
> instances by
> > > > cli.
> > > >
> > > > By this way, when we use Masakari or we miragrated instances( or
> > > > evacuate) so our instance will be moved to other zones.
> > > >
> > > > Can we attach AZ to server create requests API based on Any
> > > > Availability Zone to limit instances moved to other zones?
> > > >
> > > > Thank you. Regards
> > > >
> > > > Nguyen Huu Khoi
> > > >
> > >
> > >
> > > --
> > > Rafael Weing?rtner
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/75c8be84/attachment.htm>

From smooney at redhat.com  Mon Mar 27 12:20:04 2023
From: smooney at redhat.com (Sean Mooney)
Date: Mon, 27 Mar 2023 13:20:04 +0100
Subject: [nova][cinder] Providing ephemeral storage to instances -
 Cinder or Nova
In-Reply-To: <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de>
References: <9d7f3d0a-5e99-7880-f573-6ccd53be47b0@inovex.de>
 <c510ca65d6cfc89ba975c701b1eda9f01a018097.camel@redhat.com>
 <30eb0918-b5d2-6ae8-bf61-0b509d8c4e33@inovex.de>
Message-ID: <adcec7f7d0c86dd59c3a04c7abdf5b06051ff17b.camel@redhat.com>

On Mon, 2023-03-27 at 10:47 +0200, Christian Rohmann wrote:
> Thanks for your extensive reply Sean!
> 
> I also replied inline and would love to continue the conversation with 
> you and other with this use case
> to find the best / most suitable approach.
> 
> 
> On 24/03/2023 19:50, Sean Mooney wrote:
> > i responed in line but just a waring this is a usecase we ahve heard before.
> > there is no simple option im afraid and there are many many sharp edges
> > and severl littel know features/limitatiosn that your question puts you right in the
> > middel of.
> > 
> > On Fri, 2023-03-24 at 16:28 +0100, Christian Rohmann wrote:
> > > 1) *Via Nova* instance storage and the configurable "ephemeral" volume
> > > for a flavor
> > > 
> > > a) We currently use Ceph RBD als image_type
> > > (https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type),
> > > so instance images are stored in Ceph, not locally on disk. I believe
> > > this setting will also cause ephemeral volumes (destination_local) to be
> > > placed on a RBD and not /var/lib/nova/instances?
> > it should be in ceph yes we do not support havign the root/swap/ephemral
> > disk use diffent storage locatiosn
> > > Or is there a setting to set a different backend for local block devices
> > > providing "ephemeral" storage? So RBD for the root disk and a local LVM
> > > VG for ephemeral?
> > no that would be a new feature and not a trivial one as yo uwould have to make
> > sure it works for live migration and cold migration.
> 
> While having the root disk on resilient storage, using local storage 
> swap / ephemeral actually seems quite obvious.
> Do you happen to know if there ever was a spec / push to implement this?
as far as i am aware no.

but if we were to have one i would do it as an api option basically the inverse of 
https://specs.openstack.org/openstack/nova-specs/specs/xena/approved/allow-migrate-pmem-data.html

For PMEM instance we defautl to not copying the possibel multiple TB of PMEM over the network on cold migrate.
later we added that option as an api paramter.

Swap is not coppied for cold migration today but is for live for obvious reasons.

like the  ?copy_pmem_devices?: ?true? option i woudl be fine wiht adding 
 ?copy_ephmeral_devices?: ?true|false?.

We woudl proably need to default to copying the data but we coudl discuss that in the spec.


> 
> 
> > > b) Will an ephemeral volume also be migrated when the instance is
> > > shutoff as with live-migration?
> > its hsoudl be. its not included in snapshots so its not presergved
> > when shelving. that means corss cell cold migration will not preserve the disk.
> > 
> > but for a normal cold migration it shoudl be scp'd or rsynced with the root disk
> > if you are using the raw/qcow/flat images type if i remember correctly.
> > with RBD or other shared storage like nfs it really sould be preserved.
> > 
> > one other thing to note is ironic and only ironic support the
> > preserve_ephemeral option in the rebuild api.
> > 
> > libvirt will wipte the ephmeral disk if you rebuild or evacuate.
> 
> Could I somehow configure a flavor to "require" a rebuild / evacuate or? 
> to disable live migration for it?
rebuild is not a move operation so that wont help you move the instance and evacuate is admin only and required you to ensure
teh instance is not runnign before its used.
disabling live migration is something that you can do via custom policy but its admin only by default as well.
> 
> 
> > > Or will there be an new volume created on the target host? I am asking
> > > because I want to avoid syncing 500G or 1T when it's only "ephemeral"
> > > and the instance will not expect any data on it on the next boot.
> > i would perssonally consider it a bug if it was not transfered.
> > that does not mean that could not change in the future.
> > this is a very virt driver specific behaivor by the way and nto one that is partically well docuemnted.
> > the ephemeral shoudl mostly exist for the lifetime of an instance. not the lifetime of a vm
> > 
> > for exmple it should nto get recreate vai a simple reboot or live migration
> > it should not get created for cold migration or rezise.
> > but it will get wipted for shelve_offload, cross cell resize and evacuate.
> So even for cold migration it would be preserved then? So my only option 
> would be to shelve such instances when trying to
> "move" instances off a certain hypervisor while NOT syncing ephemeral 
> storage?
yes shelve woudl be your only option today.
> 
> 
> > > c) Is the size of the ephemeral storage for flavors a fixed size or just
> > > the upper bound for users? So if I limit this to 1T, will such a flavor
> > > always provision a block device with his size?
> > flavor.ephemeral_gb is an upper bound and end users can devide that between multipel ephermal disks
> > on the same instance.  so if its 100G you can ask for 2 50G epmeeral disks
> > 
> > you specify the toplogy of the epmermeral disk using the block_device_mapping_v2 parmater on the server
> > create.
> > this has been automated in recent version of the openstack client
> > 
> > so you can do
> > 
> > openstack server creeate  --ephemeral size=50,format=ext4 --ephemeral size=50,format=vfat ...
> > 
> > https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-ephemeral
> > this is limted by
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_local_block_devices
> > 
> > > I suppose using LVM this will be thin provisioned anyways?
> > to use the lvm backend with libvirt you set
> > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_volume_group
> > to identify which lvm VG to use.
> > 
> > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.sparse_logical_volumes might enable thin provsion or it might
> > work without it but see the note
> > 
> > """
> > Warning
> > 
> > This option is deprecated for removal since 18.0.0. Its value may be silently ignored in the future.
> > 
> > Reason
> > 
> >      Sparse logical volumes is a feature that is not tested hence not supported. LVM logical volumes are preallocated by default. If you want thin
> > provisioning, use Cinder thin-provisioned volumes.
> > """
> > 
> > the nova lvm supprot has been in maintance mode for many years.
> > 
> > im not opposed to improving it just calling out that it has bugs and noone has really
> > worked on adressing them in 4 or 5 years which is sad becasue it out performnce raw for local
> > storage perfroamce and if thin provisioning still work it shoudl outperform qcow too for a simialr usecase.
> > 
> > you are well into undefined behavior land however at this point
> > 
> > we do not test it so we assume untile told otherwise that its broken.
> 
> Thanks for the heads up. I looked at LVM for cinder and there LVM 
> volumes are thin provisioned,
> so I figured this might be the case for Nova as well.
> 
> 
> > > 2) *Via Cinder*, running cinder-volume on each compute node to provide a
> > > volume type "ephemeral", using e.g. the LVM driver
> > > 
> > > a) While not really "ephemeral" and bound to the instance lifecycle,
> > > this would allow users to provision ephemeral volume just as they need them.
> > > I suppose I could use backend specific quotas
> > > (https://docs.openstack.org/cinder/latest/cli/cli-cinder-quotas.html#view-block-storage-quotas)
> > > to
> > > limit the number of size of such volumes?
> > > 
> > > b) Do I need to use the instance locality filter
> > > (https://docs.openstack.org/cinder/latest/contributor/api/cinder.scheduler.filters.instance_locality_filter.html)
> > > then?
> > That is an option but not ideally since it stilll means conencting to the volume via iscsi or nvmeof even if its effectlvy via localhost
> > so you still have the the network layer overhead.
> 
> Thanks for the hint - one can easily be confused when reading "LVM" ... 
> I actually thought there was a way to have "host-only" style volumes which
> are simply local block devices with no iscsi / NVME in between which are 
> used by Nova then.
> 
> These kind of volumes could maybe be built into cinder as a 
> "taget_protocol: local" together with the instance_locality_filter -
> but apparently now the only way is through iSCSI or NVME.
> 
> 
> > when i alas brought up this topic in a diffent context the alternitive to cinder and nova was to add a lvm cyborg driver
> > so that it could parttion local nvme devices and expose that to a guest. but i never wrote that and i dotn think anyone else has.
> > if you had a slightly diffent usecase such as providing an entire nvme or sata device to a guest the cyborge would be how you would do
> > that. nova pci passhtough is not an option as it is not multi tenant safe. its expclsively for stateless device not disk so we do not
> > have a way to rease the data when done. cyborg with htere driver modle can fullfile the multi tenancy requirement.
> > we have previously rejected adding this capabliyt into nova so i dont expect us to add it any tiem in teh near to medium term.
> 
> This sounds like a "3rd" approach: Using Cyborg to provide local storage 
> (via LVM).

yes cyborg woudl be a third approch.
i was going to enable this in a new project i was calling Arbiterd but that proposal was rejected in the last ptg so 
i currenlty have no planns to enabel local block device managment.
> 
> 
> > > c)? Since a volume will always be bound to a certain host, I suppose
> > > this will cause side-effects to instance scheduling?
> > > With the volume remaining after an instance has been destroyed (beating
> > > the purpose of it being "ephemeral") I suppose any other instance
> > > attaching this volume will
> > > be scheduling on this very machine?
> > > 
> > no nova would have no knowage about the volume locality out of the box
> > >   Is there any way around this? Maybe
> > > a driver setting to have such volumes "self-destroy" if they are not
> > > attached anymore?
> > we hate those kind of config options nova would not know that its bound to the host at the schduler level and
> > we would nto really want to add orcstration logic like that for "something its oke to delete our tenatns data"
> > by default today if you cold/live migrated the vm would move but the voluem vould not and you would end up accessing it remotely.
> > 
> > you woudl have to then do a volume migration sepreately in cinder i think.
> > > d) Same question as with Nova: What happens when an instance is
> > > live-migrated?
> > > 
> > i think i anser this above?
> 
> Yes, these questions where all due to my misconception that 
> cinder-volume backend "LVM" did not have any networking layer
> and was host-local.
> 
> 
> > > 
> > > Maybe others also have this use case and you can share your solution(s)?
> > adding a cyborg driver for lvm storage and integrateing that with nova would like be the simpelt option
> > 
> > you coudl extend nova but as i said we have rejected that in the past.
> > that said the generic resouce table we added for pemem was made generic so that future resocues like local block
> > device could be tracked there without db changes.
> > 
> > supproting differnt image_type backend for root,swap and ephmeral would be possibel.
> > its an invasive change but might be more natural then teh resouce tabel approch.
> > you coudl reuse more fo the code and inherit much fo the exiting fucntionality btu makeing sure you dont break
> > anything in the process woudl take a lot of testing.
> 
> Thanks for the sum up!
i think your two best options are add teh parmater to the migrat/resize apis to skip copying the ephmeral disks.
and second propose a replacement for
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.images_type
these should be seperate specs.

that woudl work like the how we supprot generic mdevs using dynimc config sections

i.e.?
[libvirt]
storage_profiles=swap:swap_storage,ephmeral:ephmeral_storage,root:root_storage

[swap_stroage}
driver=raw
driver_data:/mnt/nvme-swap/nova/


[ephmeral_stroage}
driver=lvm
driver_data:vg_ephmeral

[root_storage]
driver=rbd
driver_data:vms

we woudl have to work this out in a spec but if nova was every to support something like this in the futrue
i think we would to model it somethign along those lines.

im not sure how popular this woudl be however so we would need to get input form teh wirder nova team.
i do see value in being ablt ot have differnt storage profiles for root_gb, ephmeral_gb and swap_gb in the falvor.

but the last time somethign like this was discussed was the creation of a cinder images_type backend
to allow for automatic BootFormVolume.
i actully think that would be a nice feature too but its complex and because both were disucssed aroudn the same
tiem neithre got done.

> 
> 
> 
> 
> Regards
> 
> 
> Christian
> 


From smooney at redhat.com  Mon Mar 27 12:28:31 2023
From: smooney at redhat.com (Sean Mooney)
Date: Mon, 27 Mar 2023 13:28:31 +0100
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
Message-ID: <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>

On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
> Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a ?crit :
> 
> > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
> > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
> > > rafaelweingartner at gmail.com> a ?crit :
> > > 
> > > > Hello Nguy?n H?u Kh?i,
> > > > You might want to take a look at:
> > > > https://review.opendev.org/c/openstack/nova/+/864760. We created a
> > patch
> > > > to avoid migrating VMs to any AZ, once the VM has been bootstrapped in
> > an
> > > > AZ that has cross zone attache equals to false.
> > > > 
> > > > 
> > > Well, I'll provide some comments in the change, but I'm afraid we can't
> > > just modify the request spec like you would want.
> > > 
> > > Anyway, if you want to discuss about it in the vPTG, just add it in the
> > > etherpad and add your IRC nick so we could try to find a time where we
> > > could be discussing it : https://etherpad.opendev.org/p/nova-bobcat-ptg
> > > Also, this kind of behaviour modification is more a new feature than a
> > > bugfix, so fwiw you should create a launchpad blueprint so we could
> > better
> > > see it.
> > 
> > i tought i left review feedback on that too that the approch was not
> > correct.
> > i guess i did not in the end.
> > 
> > modifying the request spec as sylvain menthioned is not correct.
> > i disucssed this topic on irc a few weeks back with mohomad for vxhost.
> > what can be done is as follows.
> > 
> > we can add a current_az field  to the Destination object
> > 
> > https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
> > The conductor can read the instance.AZ and populate it in that new field.
> > We can then add a new weigher to prefer hosts that are in the same az.
> > 
> > 
> 
> I tend to disagree this approach as people would think that the
> Destination.az field would be related to the current AZ for an instance,
> while we only look at the original AZ.
> That being said, we could have a weigher that would look at whether the
> host is in the same AZ than the instance.host.
you miss understood what i wrote

i suggested addint Destination.current_az to store teh curernt AZ of the instance before scheduling.

so my proposal is if RequestSpec.AZ is not set and Destination.current_az is set then the new
weigher would prefer hosts that are in the same az as Destination.current_az

we coudl also call Destination.current_az Destination.prefered_az 

> 
> 
> This will provide soft AZ affinity for the vm and preserve the fact that if
> > a vm is created without sepcifying
> > An AZ the expectaiton at the api level woudl be that it can migrate to any
> > AZ.
> > 
> > To provide hard AZ affintiy we could also add prefileter that would use
> > the same data but instead include it in the
> > placement query so that only the current AZ is considered. This would have
> > to be disabled by default.
> > 
> > 
> Sure, we could create a new prefilter so we could then deprecate the
> AZFilter if we want.
we already have an AZ prefilter and the AZFilter is deprecate for removal
i ment to delete it in zed but did not have time to do it in zed of Antielope
i deprecated the AZ| filter in https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
xena when i enabeld the az prefilter by default.

i will try an delete teh AZ filter before m1 if others dont.
> 
> 
> > That woudl allow operators to choose the desired behavior.
> > curret behavior (disable weigher and dont enabel prefilter)
> > new default, prefer current AZ (weigher enabeld prefilter disabled)
> > hard affintiy(prefilter enabled.)
> > 
> > there are other ways to approch this but updating the request spec is not
> > one of them.
> > we have to maintain the fact the enduser did not request an AZ.
> > 
> > 
> Anyway, if folks want to discuss about AZs, this week is the good time :-)
> 
> 
> > > 
> > > -Sylvain
> > > 
> > > 
> > > 
> > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
> > nguyenhuukhoinw at gmail.com>
> > > > wrote:
> > > > 
> > > > > Hello guys.
> > > > > I playing with Nova AZ and Masakari
> > > > > 
> > > > > https://docs.openstack.org/nova/latest/admin/availability-zones.html
> > > > > 
> > > > > Masakari will move server by nova scheduler.
> > > > > 
> > > > > Openstack Docs describe that:
> > > > > 
> > > > > If the server was not created in a specific zone then it is free to
> > be
> > > > > moved to other zones, i.e. the AvailabilityZoneFilter
> > > > > <
> > https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter>
> > is
> > > > > a no-op.
> > > > > 
> > > > > I see that everyone usually creates instances with "Any Availability
> > > > > Zone" on Horzion and also we don't specify AZ when creating
> > instances by
> > > > > cli.
> > > > > 
> > > > > By this way, when we use Masakari or we miragrated instances( or
> > > > > evacuate) so our instance will be moved to other zones.
> > > > > 
> > > > > Can we attach AZ to server create requests API based on Any
> > > > > Availability Zone to limit instances moved to other zones?
> > > > > 
> > > > > Thank you. Regards
> > > > > 
> > > > > Nguyen Huu Khoi
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > Rafael Weing?rtner
> > > > 
> > 
> > 


From sbauza at redhat.com  Mon Mar 27 12:43:16 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Mon, 27 Mar 2023 14:43:16 +0200
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
Message-ID: <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>

Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a ?crit :

> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a ?crit :
> >
> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
> > > > rafaelweingartner at gmail.com> a ?crit :
> > > >
> > > > > Hello Nguy?n H?u Kh?i,
> > > > > You might want to take a look at:
> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We created a
> > > patch
> > > > > to avoid migrating VMs to any AZ, once the VM has been
> bootstrapped in
> > > an
> > > > > AZ that has cross zone attache equals to false.
> > > > >
> > > > >
> > > > Well, I'll provide some comments in the change, but I'm afraid we
> can't
> > > > just modify the request spec like you would want.
> > > >
> > > > Anyway, if you want to discuss about it in the vPTG, just add it in
> the
> > > > etherpad and add your IRC nick so we could try to find a time where
> we
> > > > could be discussing it :
> https://etherpad.opendev.org/p/nova-bobcat-ptg
> > > > Also, this kind of behaviour modification is more a new feature than
> a
> > > > bugfix, so fwiw you should create a launchpad blueprint so we could
> > > better
> > > > see it.
> > >
> > > i tought i left review feedback on that too that the approch was not
> > > correct.
> > > i guess i did not in the end.
> > >
> > > modifying the request spec as sylvain menthioned is not correct.
> > > i disucssed this topic on irc a few weeks back with mohomad for vxhost.
> > > what can be done is as follows.
> > >
> > > we can add a current_az field  to the Destination object
> > >
> > >
> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
> > > The conductor can read the instance.AZ and populate it in that new
> field.
> > > We can then add a new weigher to prefer hosts that are in the same az.
> > >
> > >
> >
> > I tend to disagree this approach as people would think that the
> > Destination.az field would be related to the current AZ for an instance,
> > while we only look at the original AZ.
> > That being said, we could have a weigher that would look at whether the
> > host is in the same AZ than the instance.host.
> you miss understood what i wrote
>
> i suggested addint Destination.current_az to store teh curernt AZ of the
> instance before scheduling.
>
> so my proposal is if RequestSpec.AZ is not set and Destination.current_az
> is set then the new
> weigher would prefer hosts that are in the same az as
> Destination.current_az
>
> we coudl also call Destination.current_az Destination.prefered_az
>
>
I meant, I think we don't need to provide a new field, we can already know
about what host an existing instance uses if we want (using [1])
Anyway, let's stop to discuss about it here, we should rather review that
for a Launchpad blueprint or more a spec.

-Sylvain

[1]
https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370

> >
> >
> > This will provide soft AZ affinity for the vm and preserve the fact that
> if
> > > a vm is created without sepcifying
> > > An AZ the expectaiton at the api level woudl be that it can migrate to
> any
> > > AZ.
> > >
> > > To provide hard AZ affintiy we could also add prefileter that would use
> > > the same data but instead include it in the
> > > placement query so that only the current AZ is considered. This would
> have
> > > to be disabled by default.
> > >
> > >
> > Sure, we could create a new prefilter so we could then deprecate the
> > AZFilter if we want.
> we already have an AZ prefilter and the AZFilter is deprecate for removal
> i ment to delete it in zed but did not have time to do it in zed of
> Antielope
> i deprecated the AZ| filter in
> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
> xena when i enabeld the az prefilter by default.
>
>
Ah whoops, indeed I forgot the fact we already have the prefilter, so the
hard support for AZ is already existing.


> i will try an delete teh AZ filter before m1 if others dont.
>

OK.


> >
> >
> > > That woudl allow operators to choose the desired behavior.
> > > curret behavior (disable weigher and dont enabel prefilter)
> > > new default, prefer current AZ (weigher enabeld prefilter disabled)
> > > hard affintiy(prefilter enabled.)
> > >
> > > there are other ways to approch this but updating the request spec is
> not
> > > one of them.
> > > we have to maintain the fact the enduser did not request an AZ.
> > >
> > >
> > Anyway, if folks want to discuss about AZs, this week is the good time
> :-)
> >
> >
> > > >
> > > > -Sylvain
> > > >
> > > >
> > > >
> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
> > > nguyenhuukhoinw at gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello guys.
> > > > > > I playing with Nova AZ and Masakari
> > > > > >
> > > > > >
> https://docs.openstack.org/nova/latest/admin/availability-zones.html
> > > > > >
> > > > > > Masakari will move server by nova scheduler.
> > > > > >
> > > > > > Openstack Docs describe that:
> > > > > >
> > > > > > If the server was not created in a specific zone then it is free
> to
> > > be
> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
> > > > > > <
> > >
> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
> >
> > > is
> > > > > > a no-op.
> > > > > >
> > > > > > I see that everyone usually creates instances with "Any
> Availability
> > > > > > Zone" on Horzion and also we don't specify AZ when creating
> > > instances by
> > > > > > cli.
> > > > > >
> > > > > > By this way, when we use Masakari or we miragrated instances( or
> > > > > > evacuate) so our instance will be moved to other zones.
> > > > > >
> > > > > > Can we attach AZ to server create requests API based on Any
> > > > > > Availability Zone to limit instances moved to other zones?
> > > > > >
> > > > > > Thank you. Regards
> > > > > >
> > > > > > Nguyen Huu Khoi
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Rafael Weing?rtner
> > > > >
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/409b0a92/attachment-0001.htm>

From elod.illes at est.tech  Mon Mar 27 13:25:41 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Mon, 27 Mar 2023 13:25:41 +0000
Subject: [neutron][release] Proposing transition to EOL Train (all Neutron
 related projects)
In-Reply-To: <CAECr9X7bJrrspPQ7q+YAkDN8iXSQ8gSha4+0h6dASZBXhGJdVw@mail.gmail.com>
References: <CAECr9X7bJrrspPQ7q+YAkDN8iXSQ8gSha4+0h6dASZBXhGJdVw@mail.gmail.com>
Message-ID: <VI1P18901MB0751F0FE0790E327FF711B6FFF8B9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Hi,

(First of all, I'm writing this as stable maintainer, someone who
was there when the 'Extended Maintenance' process was formulated
in the first place)

As far as I understand, neutron's stable/train gate is still fully
operational. I also know that backporting every bug fix to stable
branches is time and resource consuming, and the team does not have /
want to spend time on this anymore. Between EOL'ing and backporting
every single bug fix, there are another levels of engagement.

What I want to say is: what if stable/train of neutron is kept open as
long as the gate is functional, to give people the possibility for
cooperation, give the opportunity to test backports, bug fixes on
upstream CI for stable/train.

There are two extremity in opinions about how far back we should
maintain things:
1) we should keep only open the most recent stable release to free up
   resources, and minimize maintenance cost
2) we should keep everything open, even the very old stable branches,
   where even the gate jobs are not functional anymore, to give space
   for collaboration in fixing important bugs (like security bugs)

I think the right way is somewhere in the middle: as long as the gate
is functional we can keep a branch open, for *collaboration*.
I understand if most active neutron team members do not propose
backports to stable/train anymore. Some way, this is acceptable
according to Extended Maintenance process: it is not "fully maintained",
rather there is still the possibility to do *some* maintenance.

(Note, that I'm mostly talking about neutron. Stadium projects, that
have broken gates (even on master branch), I support the EOL'ing)

What do you think about the above suggestion?

Thanks,

El?d
irc: elodilles
________________________________
From: Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
Sent: Thursday, March 16, 2023 5:15 PM
To: openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: [neutron][release] Proposing transition to EOL Train (all Neutron related projects)

Hello:

I'm sending this mail in advance to propose transitioning Neutron and all related projects to EOL. I'll propose this topic too during the next Neutron meeting.

The announcement is the first step [1] to transition a stable branch to EOL.

The patch to mark these branches as EOL will be pushed in two weeks. If you have any inconvenience, please let me know in this mail chain or in IRC (ralonsoh, #openstack-neutron channel). You can also contact any Neutron core reviewer in the IRC channel.

Regards.

[1]https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/4c558b5f/attachment.htm>

From nguyenhuukhoinw at gmail.com  Mon Mar 27 13:37:28 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Mon, 27 Mar 2023 20:37:28 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
 <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
Message-ID: <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>

Hello guys.

I just suggest to openstack nova works better. My story because


   1.

   The server was created in a specific zone with the POST /servers request
   containing the availability_zone parameter.

It will be nice when we attach randow zone when we create instances then It
will only move to the same zone when migrating or masakari ha.

Currently we can force it to zone by default zone shedule in nova.conf.

Sorry because I am new to Openstack and I am just an operator. I try to
verify some real cases.


Nguyen Huu Khoi


On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza <sbauza at redhat.com> wrote:

>
>
> Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a ?crit :
>
>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a ?crit
>> :
>> >
>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
>> > > > rafaelweingartner at gmail.com> a ?crit :
>> > > >
>> > > > > Hello Nguy?n H?u Kh?i,
>> > > > > You might want to take a look at:
>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We created
>> a
>> > > patch
>> > > > > to avoid migrating VMs to any AZ, once the VM has been
>> bootstrapped in
>> > > an
>> > > > > AZ that has cross zone attache equals to false.
>> > > > >
>> > > > >
>> > > > Well, I'll provide some comments in the change, but I'm afraid we
>> can't
>> > > > just modify the request spec like you would want.
>> > > >
>> > > > Anyway, if you want to discuss about it in the vPTG, just add it in
>> the
>> > > > etherpad and add your IRC nick so we could try to find a time where
>> we
>> > > > could be discussing it :
>> https://etherpad.opendev.org/p/nova-bobcat-ptg
>> > > > Also, this kind of behaviour modification is more a new feature
>> than a
>> > > > bugfix, so fwiw you should create a launchpad blueprint so we could
>> > > better
>> > > > see it.
>> > >
>> > > i tought i left review feedback on that too that the approch was not
>> > > correct.
>> > > i guess i did not in the end.
>> > >
>> > > modifying the request spec as sylvain menthioned is not correct.
>> > > i disucssed this topic on irc a few weeks back with mohomad for
>> vxhost.
>> > > what can be done is as follows.
>> > >
>> > > we can add a current_az field  to the Destination object
>> > >
>> > >
>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
>> > > The conductor can read the instance.AZ and populate it in that new
>> field.
>> > > We can then add a new weigher to prefer hosts that are in the same az.
>> > >
>> > >
>> >
>> > I tend to disagree this approach as people would think that the
>> > Destination.az field would be related to the current AZ for an instance,
>> > while we only look at the original AZ.
>> > That being said, we could have a weigher that would look at whether the
>> > host is in the same AZ than the instance.host.
>> you miss understood what i wrote
>>
>> i suggested addint Destination.current_az to store teh curernt AZ of the
>> instance before scheduling.
>>
>> so my proposal is if RequestSpec.AZ is not set and Destination.current_az
>> is set then the new
>> weigher would prefer hosts that are in the same az as
>> Destination.current_az
>>
>> we coudl also call Destination.current_az Destination.prefered_az
>>
>>
> I meant, I think we don't need to provide a new field, we can already know
> about what host an existing instance uses if we want (using [1])
> Anyway, let's stop to discuss about it here, we should rather review that
> for a Launchpad blueprint or more a spec.
>
> -Sylvain
>
> [1]
> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370
>
>> >
>> >
>> > This will provide soft AZ affinity for the vm and preserve the fact
>> that if
>> > > a vm is created without sepcifying
>> > > An AZ the expectaiton at the api level woudl be that it can migrate
>> to any
>> > > AZ.
>> > >
>> > > To provide hard AZ affintiy we could also add prefileter that would
>> use
>> > > the same data but instead include it in the
>> > > placement query so that only the current AZ is considered. This would
>> have
>> > > to be disabled by default.
>> > >
>> > >
>> > Sure, we could create a new prefilter so we could then deprecate the
>> > AZFilter if we want.
>> we already have an AZ prefilter and the AZFilter is deprecate for removal
>> i ment to delete it in zed but did not have time to do it in zed of
>> Antielope
>> i deprecated the AZ| filter in
>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
>> xena when i enabeld the az prefilter by default.
>>
>>
> Ah whoops, indeed I forgot the fact we already have the prefilter, so the
> hard support for AZ is already existing.
>
>
>> i will try an delete teh AZ filter before m1 if others dont.
>>
>
> OK.
>
>
>> >
>> >
>> > > That woudl allow operators to choose the desired behavior.
>> > > curret behavior (disable weigher and dont enabel prefilter)
>> > > new default, prefer current AZ (weigher enabeld prefilter disabled)
>> > > hard affintiy(prefilter enabled.)
>> > >
>> > > there are other ways to approch this but updating the request spec is
>> not
>> > > one of them.
>> > > we have to maintain the fact the enduser did not request an AZ.
>> > >
>> > >
>> > Anyway, if folks want to discuss about AZs, this week is the good time
>> :-)
>> >
>> >
>> > > >
>> > > > -Sylvain
>> > > >
>> > > >
>> > > >
>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>> > > nguyenhuukhoinw at gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hello guys.
>> > > > > > I playing with Nova AZ and Masakari
>> > > > > >
>> > > > > >
>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>> > > > > >
>> > > > > > Masakari will move server by nova scheduler.
>> > > > > >
>> > > > > > Openstack Docs describe that:
>> > > > > >
>> > > > > > If the server was not created in a specific zone then it is
>> free to
>> > > be
>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
>> > > > > > <
>> > >
>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
>> >
>> > > is
>> > > > > > a no-op.
>> > > > > >
>> > > > > > I see that everyone usually creates instances with "Any
>> Availability
>> > > > > > Zone" on Horzion and also we don't specify AZ when creating
>> > > instances by
>> > > > > > cli.
>> > > > > >
>> > > > > > By this way, when we use Masakari or we miragrated instances( or
>> > > > > > evacuate) so our instance will be moved to other zones.
>> > > > > >
>> > > > > > Can we attach AZ to server create requests API based on Any
>> > > > > > Availability Zone to limit instances moved to other zones?
>> > > > > >
>> > > > > > Thank you. Regards
>> > > > > >
>> > > > > > Nguyen Huu Khoi
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Rafael Weing?rtner
>> > > > >
>> > >
>> > >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/e0e57702/attachment-0001.htm>

From juliaashleykreger at gmail.com  Mon Mar 27 13:44:18 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Mon, 27 Mar 2023 06:44:18 -0700
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <fec06931-a49a-4199-a1c4-5840e4609785@Spark>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
 <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
 <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>
 <fec06931-a49a-4199-a1c4-5840e4609785@Spark>
Message-ID: <CAF7gwdguADYsE=XQ2u=yXP8UV6bB2mxjX=b3dX=5D2YLZQ-qMQ@mail.gmail.com>

On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde <dwilde at redhat.com> wrote:

> I?m happy to book an additional time slot(s) specifically for this
> discussion if something other than what we currently have works better for
> everyone. Please let me know.
>
> /Dave
> On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp>, wrote:
>
> As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this
> discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic
> members, please kindly reply convenient timeslots.
>
>
Unfortunately, I took the last few days off and I'm only seeing this now.
My morning is booked up aside from the original time slot which was
discussed.

Maybe there is a time later in the week which could work?


>
> [1] https://ptg.opendev.org/ptg.html
>
> Thanks,
>
> Hiromu Asahina
>
> On 2023/03/22 20:01, Hiromu Asahina wrote:
>
> Thanks!
>
> I look forward to your reply.
>
> On 2023/03/22 1:29, Julia Kreger wrote:
>
> No worries!
>
> I think that time works for me. I'm not sure it will work for
> everyone, but
> I can proxy information back to the whole of the ironic project as we
> also
> have the question of this functionality listed for our Operator Hour in
> order to help ironic gauge interest.
>
> -Julia
>
> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>
> I apologize that I couldn't reply before the Ironic meeting on Monday.
>
> I need one slot to discuss this topic.
>
> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
> 27)[1,2] works for them. Does this work for Ironic? I understand not all
> Ironic members will join this discussion, so I hope we can arrange a
> convenient date for you two at least and, hopefully, for those
> interested in this topic.
>
> [1]
>
>
> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
> [2] https://ptg.opendev.org/ptg.html
>
> Thanks,
> Hiromu Asahina
>
> On 2023/03/17 23:29, Julia Kreger wrote:
>
> I'm not sure how many Ironic contributors would be the ones to attend a
> discussion, in part because this is disjointed from the items they need
>
> to
>
> focus on. It is much more of a "big picture" item for those of us
> who are
> leaders in the project.
>
> I think it would help to understand how much time you expect the
>
> discussion
>
> to take to determine a path forward and how we can collaborate. Ironic
>
> has
>
> a huge number of topics we want to discuss during the PTG, and I
> suspect
> our team meeting on Monday next week should yield more
> interest/awareness
> as well as an amount of time for each topic which will aid us in
>
> scheduling.
>
>
> If you can let us know how long, then I think we can figure out when
> the
> best day/time will be.
>
> Thanks!
>
> -Julia
>
>
>
>
>
> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>
> Thank you for your reply.
>
> I'd like to decide the time slot for this topic.
> I just checked PTG schedule [1].
>
> We have the following time slots. Which one is convenient to gether?
> (I didn't get reply but I listed Barbican, as its cores are almost the
> same as Keystone)
>
> Mon, 27:
>
> - 14 (keystone)
> - 15 (keystone)
>
> Tue, 28
>
> - 13 (barbican)
> - 14 (keystone, ironic)
> - 15 (keysonte, ironic)
> - 16 (ironic)
>
> Wed, 29
>
> - 13 (ironic)
> - 14 (keystone, ironic)
> - 15 (keystone, ironic)
> - 21 (ironic)
>
> Thanks,
>
> [1] https://ptg.opendev.org/ptg.html
>
> Hiromu Asahina
>
>
> On 2023/02/11 1:41, Jay Faulkner wrote:
>
> I think it's safe to say the Ironic community would be very
> invested in
> such an effort. Let's make sure the time chosen for vPTG with this is
>
> such
>
> that Ironic contributors can attend as well.
>
> Thanks,
> Jay Faulkner
> Ironic PTL
>
> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>
> Hello Everyone,
>
> Recently, Tacker and Keystone have been working together on a new
>
> Keystone
>
> Middleware that can work with external authentication
> services, such as Keycloak. The code has already been submitted [1],
>
> but
>
> we want to make this middleware a generic plugin that works
> with as many OpenStack services as possible. To that end, we would
>
> like
>
> to
>
> hear from other projects with similar use cases
> (especially Ironic and Barbican, which run as standalone
> services). We
> will make a time slot to discuss this topic at the next vPTG.
> Please contact me if you are interested and available to
> participate.
>
> [1]
>
> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>
>
> --
> Hiromu Asahina
>
>
>
>
>
>
> --
> ?-------------------------------------?
>       NTT Network Innovation Center
>         Hiromu Asahina
>      -------------------------------------
>       3-9-11, Midori-cho, Musashino-shi
>         Tokyo 180-8585, Japan
> Phone: +81-422-59-7008
> Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
>
>
> --
> ?-------------------------------------?
>      NTT Network Innovation Center
>        Hiromu Asahina
>     -------------------------------------
>      3-9-11, Midori-cho, Musashino-shi
>        Tokyo 180-8585, Japan
> Phone: +81-422-59-7008
> Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
>
>
>
> --
> ?-------------------------------------?
> NTT Network Innovation Center
> Hiromu Asahina
> -------------------------------------
> 3-9-11, Midori-cho, Musashino-shi
> Tokyo 180-8585, Japan
> Phone: +81-422-59-7008
> Email: hiromu.asahina.az at hco.ntt.co.jp
> ?-------------------------------------?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/7ad94e93/attachment.htm>

From kennelson11 at gmail.com  Mon Mar 27 14:00:40 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Mon, 27 Mar 2023 09:00:40 -0500
Subject: [ptg][sdk][cli][ansible] PTG Slot for SDK, CLI, Ansible
 collection OpenStack is now booked
In-Reply-To: <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com>
References: <F2BE8FBD-CB22-4966-9DC0-FF040C380283@gmail.com>
 <CAJ6yrQjBp4iwr8GBWKfOxFsUqW-oCUgU+k8ROuWcQ4F0u7h-Tg@mail.gmail.com>
 <4EC7F595-9BBF-40F0-9CEC-FC390429192D@gmail.com>
 <18714e90ba0.118f197f918210.656438679534707790@ghanshyammann.com>
 <CAJ6yrQgyvdk4zwE04zhObN9SsiBFg8SFXbG7BoeeR+SQPraxkA@mail.gmail.com>
 <1C528238-6437-46B7-8F3D-F7A72D82DEC3@gmail.com>
Message-ID: <CAJ6yrQg4FQ1wNABo2FKyA_aT3MokJLgQsToE_7k1mcCCh2_3EQ@mail.gmail.com>

Perfect. Thank you!

-Kendall

On Mon, Mar 27, 2023 at 4:19?AM Artem Goncharov <artem.goncharov at gmail.com>
wrote:

> Okay, I have not received any other feedback, so I went and booked 2 slots
> Wed 15:00-17:00 and left also 1h slot on Fri 14:00 just for ?safety?.
>
> Looking forward seeing you there.
>
> Artem
>
> On 25. Mar 2023, at 01:39, Kendall Nelson <kennelson11 at gmail.com> wrote:
>
> Heh, okay well not complete overlap, but there is still a 3 hour overlap
> as sdk things are currently scheduled go from 14 - 17 UTC.
>
> Either way, I would rather not try to squeeze it down on Friday, when we
> can just move it to Wednesday.
>
> -Kendall
>
> On Fri, Mar 24, 2023 at 1:37?PM Ghanshyam Mann <gmann at ghanshyammann.com>
> wrote:
>
>> Just to clarify the TC slots on Friday, is from 15 - 19 UTC and sdk 14-15
>> UTC slot does not overlap with TC.
>>
>> - https://etherpad.opendev.org/p/tc-2023-2-ptg#L18
>>
>> -gmann
>>
>>  ---- On Fri, 24 Mar 2023 11:12:52 -0700  Artem Goncharov  wrote ---
>>  > Well, there was actually no pool, since I was not even sure anybody is
>> that interested, but glad to hear.
>>  > What about Wed somewhere from 13:00 to 17:00? There is however overlap
>> with Nova (pretty much like on any other day)
>>  > Ideas? I just want to avoid overlap with public cloud, but maybe even
>> 1h is enough. So far there are not much topics anyway.
>>  >
>>  >
>>  > On 24. Mar 2023, at 18:40, Kendall Nelson kennelson11 at gmail.com>
>> wrote:
>>  > Super annoying request, but can we do earlier in the week? The
>> sessions for sdk have 100% overlap with the TC which I was planning on
>> attending :/
>>  >
>>  > And I am very very sorry if I missed sharing an opinion on when would
>> be good to meet.
>>  > -Kendall
>>  > On Fri, Mar 24, 2023 at 5:37?AM Artem Goncharov
>> artem.goncharov at gmail.com> wrote:
>>  > Hi all,
>>  > A bit late, but still - I have booked a 3 hours slot during PTG on
>> Friday 14:00-17:00 UTC. This will follow publiccloud room discussion so I
>> think some people and outcomes will follow directly into our room.
>>  > Etherpad is there:
>> https://etherpad.opendev.org/p/march2023-ptg-sdk-cli
>>  > Feel free to feel in topics you want to discuss
>>  > Cheers,Artem
>>  >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/9b43ba35/attachment-0001.htm>

From dwilde at redhat.com  Mon Mar 27 14:07:36 2023
From: dwilde at redhat.com (Dave Wilde)
Date: Mon, 27 Mar 2023 09:07:36 -0500
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware
 Feature Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <CAF7gwdguADYsE=XQ2u=yXP8UV6bB2mxjX=b3dX=5D2YLZQ-qMQ@mail.gmail.com>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
 <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
 <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>
 <fec06931-a49a-4199-a1c4-5840e4609785@Spark>
 <CAF7gwdguADYsE=XQ2u=yXP8UV6bB2mxjX=b3dX=5D2YLZQ-qMQ@mail.gmail.com>
Message-ID: <b6c849df-a70b-4732-917f-85d221fb0496@Spark>

Hi Julia,

No worries!

I see that several of our sessions are overlapping, perhaps we could combine the 15:00 UTC session tomorrow to discuss this topic?

/Dave
On Mar 27, 2023 at 8:44 AM -0500, Julia Kreger <juliaashleykreger at gmail.com>, wrote:
>
>
> > On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde <dwilde at redhat.com> wrote:
> > > I?m happy to book an additional time slot(s) specifically for this discussion if something other than what we currently have works better for everyone. Please let me know.
> > >
> > > /Dave
> > > On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina <hiromu.asahina.az at hco.ntt.co.jp>, wrote:
> > > > As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this
> > > > discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic
> > > > members, please kindly reply convenient timeslots.
> >
> > Unfortunately, I took the last few days off and I'm only seeing this now. My morning is booked up aside from the original time slot which was discussed.
> >
> > Maybe there is a time later in the week which could work?
> >
> >
> > > >
> > > > [1] https://ptg.opendev.org/ptg.html
> > > >
> > > > Thanks,
> > > >
> > > > Hiromu Asahina
> > > >
> > > > On 2023/03/22 20:01, Hiromu Asahina wrote:
> > > > > Thanks!
> > > > >
> > > > > I look forward to your reply.
> > > > >
> > > > > On 2023/03/22 1:29, Julia Kreger wrote:
> > > > > > No worries!
> > > > > >
> > > > > > I think that time works for me. I'm not sure it will work for
> > > > > > everyone, but
> > > > > > I can proxy information back to the whole of the ironic project as we
> > > > > > also
> > > > > > have the question of this functionality listed for our Operator Hour in
> > > > > > order to help ironic gauge interest.
> > > > > >
> > > > > > -Julia
> > > > > >
> > > > > > On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
> > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > > > > >
> > > > > > > I apologize that I couldn't reply before the Ironic meeting on Monday.
> > > > > > >
> > > > > > > I need one slot to discuss this topic.
> > > > > > >
> > > > > > > I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
> > > > > > > 27)[1,2] works for them. Does this work for Ironic? I understand not all
> > > > > > > Ironic members will join this discussion, so I hope we can arrange a
> > > > > > > convenient date for you two at least and, hopefully, for those
> > > > > > > interested in this topic.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > > https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
> > > > > > > [2] https://ptg.opendev.org/ptg.html
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Hiromu Asahina
> > > > > > >
> > > > > > > On 2023/03/17 23:29, Julia Kreger wrote:
> > > > > > > > I'm not sure how many Ironic contributors would be the ones to attend a
> > > > > > > > discussion, in part because this is disjointed from the items they need
> > > > > > > to
> > > > > > > > focus on. It is much more of a "big picture" item for those of us
> > > > > > > > who are
> > > > > > > > leaders in the project.
> > > > > > > >
> > > > > > > > I think it would help to understand how much time you expect the
> > > > > > > discussion
> > > > > > > > to take to determine a path forward and how we can collaborate. Ironic
> > > > > > > has
> > > > > > > > a huge number of topics we want to discuss during the PTG, and I
> > > > > > > > suspect
> > > > > > > > our team meeting on Monday next week should yield more
> > > > > > > > interest/awareness
> > > > > > > > as well as an amount of time for each topic which will aid us in
> > > > > > > scheduling.
> > > > > > > >
> > > > > > > > If you can let us know how long, then I think we can figure out when
> > > > > > > > the
> > > > > > > > best day/time will be.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > -Julia
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
> > > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > > > > > > >
> > > > > > > > > Thank you for your reply.
> > > > > > > > >
> > > > > > > > > I'd like to decide the time slot for this topic.
> > > > > > > > > I just checked PTG schedule [1].
> > > > > > > > >
> > > > > > > > > We have the following time slots. Which one is convenient to gether?
> > > > > > > > > (I didn't get reply but I listed Barbican, as its cores are almost the
> > > > > > > > > same as Keystone)
> > > > > > > > >
> > > > > > > > > Mon, 27:
> > > > > > > > >
> > > > > > > > > - 14 (keystone)
> > > > > > > > > - 15 (keystone)
> > > > > > > > >
> > > > > > > > > Tue, 28
> > > > > > > > >
> > > > > > > > > - 13 (barbican)
> > > > > > > > > - 14 (keystone, ironic)
> > > > > > > > > - 15 (keysonte, ironic)
> > > > > > > > > - 16 (ironic)
> > > > > > > > >
> > > > > > > > > Wed, 29
> > > > > > > > >
> > > > > > > > > - 13 (ironic)
> > > > > > > > > - 14 (keystone, ironic)
> > > > > > > > > - 15 (keystone, ironic)
> > > > > > > > > - 21 (ironic)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > [1] https://ptg.opendev.org/ptg.html
> > > > > > > > >
> > > > > > > > > Hiromu Asahina
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2023/02/11 1:41, Jay Faulkner wrote:
> > > > > > > > > > I think it's safe to say the Ironic community would be very
> > > > > > > > > > invested in
> > > > > > > > > > such an effort. Let's make sure the time chosen for vPTG with this is
> > > > > > > > > such
> > > > > > > > > > that Ironic contributors can attend as well.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Jay Faulkner
> > > > > > > > > > Ironic PTL
> > > > > > > > > >
> > > > > > > > > > On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
> > > > > > > > > > hiromu.asahina.az at hco.ntt.co.jp> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello Everyone,
> > > > > > > > > > >
> > > > > > > > > > > Recently, Tacker and Keystone have been working together on a new
> > > > > > > > > Keystone
> > > > > > > > > > > Middleware that can work with external authentication
> > > > > > > > > > > services, such as Keycloak. The code has already been submitted [1],
> > > > > > > but
> > > > > > > > > > > we want to make this middleware a generic plugin that works
> > > > > > > > > > > with as many OpenStack services as possible. To that end, we would
> > > > > > > like
> > > > > > > > > to
> > > > > > > > > > > hear from other projects with similar use cases
> > > > > > > > > > > (especially Ironic and Barbican, which run as standalone
> > > > > > > > > > > services). We
> > > > > > > > > > > will make a time slot to discuss this topic at the next vPTG.
> > > > > > > > > > > Please contact me if you are interested and available to
> > > > > > > > > > > participate.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Hiromu Asahina
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > ?-------------------------------------?
> > > > > > > > > ????? NTT Network Innovation Center
> > > > > > > > > ??????? Hiromu Asahina
> > > > > > > > > ???? -------------------------------------
> > > > > > > > > ????? 3-9-11, Midori-cho, Musashino-shi
> > > > > > > > > ??????? Tokyo 180-8585, Japan
> > > > > > > > > Phone: +81-422-59-7008
> > > > > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp
> > > > > > > > > ?-------------------------------------?
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > ?-------------------------------------?
> > > > > > > ???? NTT Network Innovation Center
> > > > > > > ?????? Hiromu Asahina
> > > > > > > ??? -------------------------------------
> > > > > > > ???? 3-9-11, Midori-cho, Musashino-shi
> > > > > > > ?????? Tokyo 180-8585, Japan
> > > > > > > Phone: +81-422-59-7008
> > > > > > > Email: hiromu.asahina.az at hco.ntt.co.jp
> > > > > > > ?-------------------------------------?
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > ?-------------------------------------?
> > > > NTT Network Innovation Center
> > > > Hiromu Asahina
> > > > -------------------------------------
> > > > 3-9-11, Midori-cho, Musashino-shi
> > > > Tokyo 180-8585, Japan
> > > > ? Phone: +81-422-59-7008
> > > > ? Email: hiromu.asahina.az at hco.ntt.co.jp
> > > > ?-------------------------------------?
> > > >
> > > >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/e3d0759d/attachment.htm>

From ralonsoh at redhat.com  Mon Mar 27 14:40:00 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Mon, 27 Mar 2023 16:40:00 +0200
Subject: [neutron][release] Proposing transition to EOL Train (all Neutron
 related projects)
In-Reply-To: <VI1P18901MB0751F0FE0790E327FF711B6FFF8B9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
References: <CAECr9X7bJrrspPQ7q+YAkDN8iXSQ8gSha4+0h6dASZBXhGJdVw@mail.gmail.com>
 <VI1P18901MB0751F0FE0790E327FF711B6FFF8B9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
Message-ID: <CAECr9X74ZCB7QVUR1u8AcBRj7tTMVezhBMsuvLcHu7G435PvDw@mail.gmail.com>

Hello El?d:

As you said, we are no longer sending patches for Train. In the last four
months, we have sent only two patches changing the code (apart from other
testing patches). I proposed the EOL of Train because of this and the extra
cost involved in maintaining older versions, regardless of this CI status.
In any case, I'll propose this topic in the PTG tomorrow, considering
leaving only the Neutron Train branch as EM and closing the rest of the
projects.

Regards.

On Mon, Mar 27, 2023 at 3:25?PM El?d Ill?s <elod.illes at est.tech> wrote:

> Hi,
>
> (First of all, I'm writing this as stable maintainer, someone who
> was there when the 'Extended Maintenance' process was formulated
> in the first place)
>
> As far as I understand, neutron's stable/train gate is still fully
> operational. I also know that backporting every bug fix to stable
> branches is time and resource consuming, and the team does not have /
> want to spend time on this anymore. Between EOL'ing and backporting
> every single bug fix, there are another levels of engagement.
>
> What I want to say is: what if stable/train of neutron is kept open as
> long as the gate is functional, to give people the possibility for
> cooperation, give the opportunity to test backports, bug fixes on
> upstream CI for stable/train.
>
> There are two extremity in opinions about how far back we should
> maintain things:
> 1) we should keep only open the most recent stable release to free up
>    resources, and minimize maintenance cost
> 2) we should keep everything open, even the very old stable branches,
>    where even the gate jobs are not functional anymore, to give space
>    for collaboration in fixing important bugs (like security bugs)
>
> I think the right way is somewhere in the middle: as long as the gate
> is functional we can keep a branch open, for *collaboration*.
> I understand if most active neutron team members do not propose
> backports to stable/train anymore. Some way, this is acceptable
> according to Extended Maintenance process: it is not "fully maintained",
> rather there is still the possibility to do *some* maintenance.
>
> (Note, that I'm mostly talking about neutron. Stadium projects, that
> have broken gates (even on master branch), I support the EOL'ing)
>
> What do you think about the above suggestion?
>
> Thanks,
>
> El?d
> irc: elodilles
> ------------------------------
> *From:* Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
> *Sent:* Thursday, March 16, 2023 5:15 PM
> *To:* openstack-discuss <openstack-discuss at lists.openstack.org>
> *Subject:* [neutron][release] Proposing transition to EOL Train (all
> Neutron related projects)
>
> Hello:
>
> I'm sending this mail in advance to propose transitioning Neutron and all
> related projects to EOL. I'll propose this topic too during the next
> Neutron meeting.
>
> The announcement is the first step [1] to transition a stable branch to
> EOL.
>
> The patch to mark these branches as EOL will be pushed in two weeks. If
> you have any inconvenience, please let me know in this mail chain or in IRC
> (ralonsoh, #openstack-neutron channel). You can also contact any Neutron
> core reviewer in the IRC channel.
>
> Regards.
>
> [1]
> https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/6dbb1b25/attachment-0001.htm>

From sbauza at redhat.com  Mon Mar 27 15:28:20 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Mon, 27 Mar 2023 17:28:20 +0200
Subject: [nova] Hold your rechecks
Message-ID: <CALOCmunuvfsCo1M==q0bQ-uyrjGQvYF_PaEMg2zLa5+d3K-tDw@mail.gmail.com>

Hey,

Due to the recent merge of
https://review.opendev.org/c/openstack/requirements/+/872065/10/upper-constraints.txt#298
we now use mypy==1.1.1 which includes a breaking behavioural change against
our code :
https://07de6a0c9e6ec0c6835f-ccccbfab26b1456f69293167016566bc.ssl.cf2.rackcdn.com/875621/10/gate/openstack-tox-pep8/e50f9f0/job-output.txt

Thanks to Eric (kudos to him, he was quickier than me), we have a fix
https://review.opendev.org/c/openstack/nova/+/878693

Please accordingly hold your rechecks until that fix is merged.

-Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/8de702ad/attachment.htm>

From fungi at yuggoth.org  Mon Mar 27 16:07:20 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Mon, 27 Mar 2023 16:07:20 +0000
Subject: [neutron][release] Proposing transition to EOL Train (all
 Neutron related projects)
In-Reply-To: <VI1P18901MB0751F0FE0790E327FF711B6FFF8B9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
References: <CAECr9X7bJrrspPQ7q+YAkDN8iXSQ8gSha4+0h6dASZBXhGJdVw@mail.gmail.com>
 <VI1P18901MB0751F0FE0790E327FF711B6FFF8B9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
Message-ID: <20230327160720.hmf53vm5czvzntbh@yuggoth.org>

On 2023-03-27 13:25:41 +0000 (+0000), El?d Ill?s wrote:
[...]
> I also know that backporting every bug fix to stable branches is
> time and resource consuming, and the team does not have / want to
> spend time on this anymore.
[...]

Note that this was actually the point of Extended Maintenance. Team
members aren't expected to backport fixes to EM phase branches. They
exist so interested members of the community can propose, review,
and otherwise collaborate on backports even if the core review team
for the project is no longer interested in paying attention to them.

> what if stable/train of neutron is kept open as long as the gate
> is functional, to give people the possibility for cooperation,
> give the opportunity to test backports, bug fixes on upstream CI
> for stable/train.
[...]

And this can be accomplished by removing jobs which will no longer
work without significant effort, we included provisions for exactly
that in the original EM resolution:

    "[...] these older branches might, at some point, just be
    running pep8 and unit tests but those are required at a
    minimum."

https://governance.openstack.org/tc/resolutions/20180301-stable-branch-eol.html#testing

So dropping "integration" (e.g. DevStack/Tempest) and "functional
testing" jobs from EM branches is fine, even expected. If unit
testing and static analysis jobs required by the PTI don't pass any
longer, then the branch and all branches older than it have to
switch to unmaintained of end of life.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/0df94982/attachment.sig>

From gilles.mocellin at nuagelibre.org  Mon Mar 27 18:01:00 2023
From: gilles.mocellin at nuagelibre.org (Gilles Mocellin)
Date: Mon, 27 Mar 2023 20:01:00 +0200
Subject: [nova] Can OpenStack support snapshot rollback (not creating a new
 instance)?
In-Reply-To: <CAOirdiWPZTLv_Ynj83W=bZVY2kfGkK8gsacmEsAz3xAkQVwnhg@mail.gmail.com>
References: <CAOirdiWPZTLv_Ynj83W=bZVY2kfGkK8gsacmEsAz3xAkQVwnhg@mail.gmail.com>
Message-ID: <2275345.ElGaqSPkdT@guitare>

Le dimanche 26 mars 2023, 20:50:22 CEST ??? a ?crit :
> Hello,

Hello,
 
> I use Ceph as the storage backend for Nova, Glance, and Cinder.
> 
> If I create a snapshot for a instance, It create a new image in
> glance. And I can use the image to create a new instance.
> 
> This feels to me more like creating an image based on the current
> state of the VM rather than creating a VM snapshot.

The term "snapshot" is not ideal, as it miss-leads every virtualization 
platform users (VMware, Hyper-V...).

> I want to ask:
> 1?Can I create and revert a VM snapshot like I would in virtual
> machine software?

In fact, if you don't user boot from volume instances, you could have 
something similar of snapshots by rebuilding your VM with the image create by 
the snapshot.

> 2?When a VM uses multiple disks/volumes, does OpenStack support taking
> a snapshot of all disks/volumes of the VM as a whole?

No, but you can take snapshot of the volumes. It won't be coherent, in a 
single transaction.

> 3?Can OpenStack snapshot and save the memory state of a VM?

I think not. When the instance is suspended, it's done, but even if you 
snapshot it, it would not take that memory state with the image.
 
> If it is not currently supported, are there any simple customization
> implementation ideas that can be recommended?

You really need to think differently.
OpenStack is a Cloud platform, made to consume Infrastructure as a Service, 
with Infra as code tools (like Terraform).

You should have disposable instances, that could be destroyed and rebuild when 
you want easily.
Use clusters for your middlwares (mariadb, redis...), use load balancer : 
Octavia our you own HAproxy, in front of your several web frontends / 
backends...
Keep your data on additional volumes, and also do backups in object storage 
(Swift/S3, handled by Ceph).
Make restoration easy and test it.

Deploy different environments of your projects in different OpenStack projects, 
to test changes.

That way of thinking will make it easier for you when you will begin to think 
about containers and Kubernetes.

> Thank you for any help and suggestions.
> Best wishes.
> 
> Han


From smooney at redhat.com  Mon Mar 27 19:03:12 2023
From: smooney at redhat.com (Sean Mooney)
Date: Mon, 27 Mar 2023 20:03:12 +0100
Subject: [nova] Can OpenStack support snapshot rollback (not creating a
 new instance)?
In-Reply-To: <2275345.ElGaqSPkdT@guitare>
References: <CAOirdiWPZTLv_Ynj83W=bZVY2kfGkK8gsacmEsAz3xAkQVwnhg@mail.gmail.com>
 <2275345.ElGaqSPkdT@guitare>
Message-ID: <d29bdb2fdd5d558d4f0d04b4ccd46dfc3f810c36.camel@redhat.com>

On Mon, 2023-03-27 at 20:01 +0200, Gilles Mocellin wrote:
> Le dimanche 26 mars 2023, 20:50:22 CEST ??? a ?crit :
> > Hello,
> 
> Hello,
>  
> > I use Ceph as the storage backend for Nova, Glance, and Cinder.
> > 
> > If I create a snapshot for a instance, It create a new image in
> > glance. And I can use the image to create a new instance.
> > 
> > This feels to me more like creating an image based on the current
> > state of the VM rather than creating a VM snapshot.
> 
> The term "snapshot" is not ideal, as it miss-leads every virtualization 
> platform users (VMware, Hyper-V...).

nova snapshots are snapshots of the root disk not of the disk and memory.
> 
> > I want to ask:
> > 1?Can I create and revert a VM snapshot like I would in virtual
> > machine software?
> 
> In fact, if you don't user boot from volume instances, you could have 
> something similar of snapshots by rebuilding your VM with the image create by 
> the snapshot.

if its just the root disk state you can do that now for boot form volume or non boot from voluem isntances.
regardless of the storage backend you use.
nova snapshots are only of the vms root disk.
cinder snapshots are only of the volume content.
enitehr supprot capturing the ram state.

one of the main reason rebuild exist is to allow rolling back the vm root disk state.
cinder volume snapshots have the same usecase.

its better to think of it as backup and restore then vitrualbox style snapshots.

> 
> > 2?When a VM uses multiple disks/volumes, does OpenStack support taking
> > a snapshot of all disks/volumes of the VM as a whole?
> 
> No, but you can take snapshot of the volumes. It won't be coherent, in a 
> single transaction.

well yes and no.
if you have the qemu quest agent you can quiese all writes to all file systems during the snapshot
cinder allso supprot voluem groups i belive and i think they allow you to do a consitent snapshot of all volumes
in a group at once.

you cant as far as i am aware do a consitent snapshot of all voluem and the root disk at the same time however.
> 
> > 3?Can OpenStack snapshot and save the memory state of a VM?
> 
> I think not. When the instance is suspended, it's done, but even if you 
> snapshot it, it would not take that memory state with the image.
no we almost can.

for the libvirt driver we implement suspend as manage save. this dumps the guest ram to disk liek virtual box does for
its snapshots but we don thave a way to then snapshot that and use it to restore later.

to get what you are asking for would actully be an extention to shelve or snapthos that would require the guest to be stopped
while the disk and memeory snapshot is done.
it woudl requrie use to assocate 2 images with teh snapshot the ram and disk iamge.
there would be security consideration with saving hte ram like this too.

its a feature that might be doable but i dont know if it coudl be done for other hypervieros liek powervm, hyperv or vmware.
it obviouly woudl not work with ironic.

>  
> > If it is not currently supported, are there any simple customization
> > implementation ideas that can be recommended?

if we were to do this i woudl see it as an extention to shelve i think.
i think this is not really inline with the normal cloud usage model and defintly feel more like classic virutaliastion.
in general im not sure if it woudl be an acceptable change to the nova api but it would be a new feature.
> 
> You really need to think differently.
> OpenStack is a Cloud platform, made to consume Infrastructure as a Service, 
> with Infra as code tools (like Terraform).
> 
> You should have disposable instances, that could be destroyed and rebuild when 
> you want easily.
> Use clusters for your middlwares (mariadb, redis...), use load balancer : 
> Octavia our you own HAproxy, in front of your several web frontends / 
> backends...
> Keep your data on additional volumes, and also do backups in object storage 
> (Swift/S3, handled by Ceph).
> Make restoration easy and test it.
> 
> Deploy different environments of your projects in different OpenStack projects, 
> to test changes.
> 
> That way of thinking will make it easier for you when you will begin to think 
> about containers and Kubernetes.
this is often refered to as the pets vs cattel view.
we supprot backup and restore type functionality in nova and cinder and that is unliekly to go away.
so this request is not ensirely out of scope but i twould require a lot of work and testing
to enable. on the nova side it would requrie extenstion to the rebuild, shelve/unshleve and the backup/create iamge apis.
it would be a preat large change to implemenet and im not sure it would reach a quorm of agreemnt to accept.

however this week is the upstream vPTG.

if you want to ask for feedback syncronosely you could add it as a topic to the nova adgenda
https://etherpad.opendev.org/p/nova-bobcat-ptg, operator pain point adgenda https://etherpad.opendev.org/p/march2023-ptg-operator-hour-nova
or continue async on the mailing list or vai a nova spec.


> 
> > Thank you for any help and suggestions.
> > Best wishes.
> > 
> > Han
> 
> 
> 
> 
> 


From jay at gr-oss.io  Mon Mar 27 19:10:28 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Mon, 27 Mar 2023 12:10:28 -0700
Subject: [all][ptg] Pre-PTG discussion: New Keystone Middleware Feature
 Supporting OAuth2.0 with External Authorization Service
In-Reply-To: <b6c849df-a70b-4732-917f-85d221fb0496@Spark>
References: <000001d93d64$9ea1fb60$dbe5f220$@hco.ntt.co.jp>
 <CA+sTGNcPMrJF_fq3MaT711T5Dy4kWTZ0J9GCfC_-iuRRgYr0CA@mail.gmail.com>
 <7c9e78de-884e-35d4-ea94-2196047150c3@hco.ntt.co.jp>
 <CAF7gwdh9fR_34GN+gurxWwDQf+atsqtQ+pN0OgCK_Ay5gHWuuA@mail.gmail.com>
 <f9d8ed56-484e-c769-7aec-232c2d56cd7d@hco.ntt.co.jp>
 <CAF7gwdgbS2_s4EEkZrHEYxA98tSoMcbKB1OsFEXtU2V1jtmnPw@mail.gmail.com>
 <1f42eac2-3e08-acf1-91f9-14f9c438dfb5@hco.ntt.co.jp>
 <f1dd3df4-098f-bef7-658a-feee6b569d33@hco.ntt.co.jp>
 <fec06931-a49a-4199-a1c4-5840e4609785@Spark>
 <CAF7gwdguADYsE=XQ2u=yXP8UV6bB2mxjX=b3dX=5D2YLZQ-qMQ@mail.gmail.com>
 <b6c849df-a70b-4732-917f-85d221fb0496@Spark>
Message-ID: <CA+sTGNd5WCzrnrD3s-BjGyqTAC069MOf+U5QNHvcx9tNe7OvCw@mail.gmail.com>

So, looking over the Ironic PTG schedule, I appear to have booked Firmware
Upgrade interface in two places -- tomorrow and Wednesday 2200 UTC. This is
fortuitous: I can move the firmware upgrade conversation entirely into 2200
UTC, and give the time we had set aside to this topic.

Dave, Julia and I consulted on IRC, and decided to take this action. We'll
be adding an item to Ironic's PTG for tomorrow, Tuesday March 28 at 1500
UTC - 1525 UTC to discuss KeystoneMiddleware OAUTH support.

I will perform the following changes to the Ironic schedule to accommodate:
- Remove firmware upgrades from Ironic Tues 1630-1700 UTC, move all
discussion fo it to Weds 2200 UTC - 2300 UTC (should be plenty of time).
- Move everything from Service Steps and later (after the first break)
forward 30 minutes
- Add new item for KeystoneMiddleware/OAUTH discussion into Ironic's
schedule at Wednesday, 1500 UTC - 1525 UTC (30 minutes with room for a
break)

Ironic will host the discussion in the Folsom room, and Dave will ensure
interested keystone contributors are redirected to our room for this period.

-
Jay Faulkner
Ironic PTL

On Mon, Mar 27, 2023 at 7:07?AM Dave Wilde <dwilde at redhat.com> wrote:

> Hi Julia,
>
> No worries!
>
> I see that several of our sessions are overlapping, perhaps we could
> combine the 15:00 UTC session tomorrow to discuss this topic?
>
> /Dave
> On Mar 27, 2023 at 8:44 AM -0500, Julia Kreger <
> juliaashleykreger at gmail.com>, wrote:
>
>
>
> On Fri, Mar 24, 2023 at 9:55?AM Dave Wilde <dwilde at redhat.com> wrote:
>
>> I?m happy to book an additional time slot(s) specifically for this
>> discussion if something other than what we currently have works better for
>> everyone. Please let me know.
>>
>> /Dave
>> On Mar 24, 2023 at 10:49 AM -0500, Hiromu Asahina <
>> hiromu.asahina.az at hco.ntt.co.jp>, wrote:
>>
>> As Keystone canceled Monday 14 UTC timeslot [1], I'd like to hold this
>> discussion on Monday 15 UTC timeslot. If it doesn't work for Ironic
>> members, please kindly reply convenient timeslots.
>>
>>
> Unfortunately, I took the last few days off and I'm only seeing this now.
> My morning is booked up aside from the original time slot which was
> discussed.
>
> Maybe there is a time later in the week which could work?
>
>
>
>>
>> [1] https://ptg.opendev.org/ptg.html
>>
>> Thanks,
>>
>> Hiromu Asahina
>>
>> On 2023/03/22 20:01, Hiromu Asahina wrote:
>>
>> Thanks!
>>
>> I look forward to your reply.
>>
>> On 2023/03/22 1:29, Julia Kreger wrote:
>>
>> No worries!
>>
>> I think that time works for me. I'm not sure it will work for
>> everyone, but
>> I can proxy information back to the whole of the ironic project as we
>> also
>> have the question of this functionality listed for our Operator Hour in
>> order to help ironic gauge interest.
>>
>> -Julia
>>
>> On Tue, Mar 21, 2023 at 9:00?AM Hiromu Asahina <
>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>
>> I apologize that I couldn't reply before the Ironic meeting on Monday.
>>
>> I need one slot to discuss this topic.
>>
>> I asked Keystone today and Monday's first Keystone slot (14 UTC Mon,
>> 27)[1,2] works for them. Does this work for Ironic? I understand not all
>> Ironic members will join this discussion, so I hope we can arrange a
>> convenient date for you two at least and, hopefully, for those
>> interested in this topic.
>>
>> [1]
>>
>>
>> https://www.timeanddate.com/worldclock/fixedtime.html?iso=2023-03-27T14:00:00Z
>> [2] https://ptg.opendev.org/ptg.html
>>
>> Thanks,
>> Hiromu Asahina
>>
>> On 2023/03/17 23:29, Julia Kreger wrote:
>>
>> I'm not sure how many Ironic contributors would be the ones to attend a
>> discussion, in part because this is disjointed from the items they need
>>
>> to
>>
>> focus on. It is much more of a "big picture" item for those of us
>> who are
>> leaders in the project.
>>
>> I think it would help to understand how much time you expect the
>>
>> discussion
>>
>> to take to determine a path forward and how we can collaborate. Ironic
>>
>> has
>>
>> a huge number of topics we want to discuss during the PTG, and I
>> suspect
>> our team meeting on Monday next week should yield more
>> interest/awareness
>> as well as an amount of time for each topic which will aid us in
>>
>> scheduling.
>>
>>
>> If you can let us know how long, then I think we can figure out when
>> the
>> best day/time will be.
>>
>> Thanks!
>>
>> -Julia
>>
>>
>>
>>
>>
>> On Fri, Mar 17, 2023 at 2:57?AM Hiromu Asahina <
>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>
>> Thank you for your reply.
>>
>> I'd like to decide the time slot for this topic.
>> I just checked PTG schedule [1].
>>
>> We have the following time slots. Which one is convenient to gether?
>> (I didn't get reply but I listed Barbican, as its cores are almost the
>> same as Keystone)
>>
>> Mon, 27:
>>
>> - 14 (keystone)
>> - 15 (keystone)
>>
>> Tue, 28
>>
>> - 13 (barbican)
>> - 14 (keystone, ironic)
>> - 15 (keysonte, ironic)
>> - 16 (ironic)
>>
>> Wed, 29
>>
>> - 13 (ironic)
>> - 14 (keystone, ironic)
>> - 15 (keystone, ironic)
>> - 21 (ironic)
>>
>> Thanks,
>>
>> [1] https://ptg.opendev.org/ptg.html
>>
>> Hiromu Asahina
>>
>>
>> On 2023/02/11 1:41, Jay Faulkner wrote:
>>
>> I think it's safe to say the Ironic community would be very
>> invested in
>> such an effort. Let's make sure the time chosen for vPTG with this is
>>
>> such
>>
>> that Ironic contributors can attend as well.
>>
>> Thanks,
>> Jay Faulkner
>> Ironic PTL
>>
>> On Fri, Feb 10, 2023 at 7:40 AM Hiromu Asahina <
>> hiromu.asahina.az at hco.ntt.co.jp> wrote:
>>
>> Hello Everyone,
>>
>> Recently, Tacker and Keystone have been working together on a new
>>
>> Keystone
>>
>> Middleware that can work with external authentication
>> services, such as Keycloak. The code has already been submitted [1],
>>
>> but
>>
>> we want to make this middleware a generic plugin that works
>> with as many OpenStack services as possible. To that end, we would
>>
>> like
>>
>> to
>>
>> hear from other projects with similar use cases
>> (especially Ironic and Barbican, which run as standalone
>> services). We
>> will make a time slot to discuss this topic at the next vPTG.
>> Please contact me if you are interested and available to
>> participate.
>>
>> [1]
>>
>> https://review.opendev.org/c/openstack/keystonemiddleware/+/868734
>>
>>
>> --
>> Hiromu Asahina
>>
>>
>>
>>
>>
>>
>> --
>> ?-------------------------------------?
>>       NTT Network Innovation Center
>>         Hiromu Asahina
>>      -------------------------------------
>>       3-9-11, Midori-cho, Musashino-shi
>>         Tokyo 180-8585, Japan
>> Phone: +81-422-59-7008
>> Email: hiromu.asahina.az at hco.ntt.co.jp
>> ?-------------------------------------?
>>
>>
>>
>>
>> --
>> ?-------------------------------------?
>>      NTT Network Innovation Center
>>        Hiromu Asahina
>>     -------------------------------------
>>      3-9-11, Midori-cho, Musashino-shi
>>        Tokyo 180-8585, Japan
>> Phone: +81-422-59-7008
>> Email: hiromu.asahina.az at hco.ntt.co.jp
>> ?-------------------------------------?
>>
>>
>>
>>
>>
>> --
>> ?-------------------------------------?
>> NTT Network Innovation Center
>> Hiromu Asahina
>> -------------------------------------
>> 3-9-11, Midori-cho, Musashino-shi
>> Tokyo 180-8585, Japan
>> Phone: +81-422-59-7008
>> Email: hiromu.asahina.az at hco.ntt.co.jp
>> ?-------------------------------------?
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/b324d603/attachment-0001.htm>

From jay at gr-oss.io  Mon Mar 27 19:16:35 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Mon, 27 Mar 2023 12:16:35 -0700
Subject: [ironic] Slight PTG Schedule change
Message-ID: <CA+sTGNdCqgEbKt6c4B8iYr=DJn5Tve1Wx4sMTdxspOpzX4rC+A@mail.gmail.com>

Take notice: tomorrow's topic schedule has been changed slightly. We have
moved Service Steps and DPU Orchestration conversations up 30 minutes. As
mentioned in the other thread (
https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032983.html),
I accidentally booked Firmware Upgrades to two times. This was fortuitous
because we needed to add a cross-team item with Keystone.

As always, the up to date schedule and notes are here:
https://etherpad.opendev.org/p/ironic-bobcat-ptg

Thanks,
Jay Faulkner
Ironic PTL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/bd212fd2/attachment.htm>

From swogatpradhan22 at gmail.com  Tue Mar 28 00:49:57 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 28 Mar 2023 06:19:57 +0530
Subject: Nova undefine secret | openstack | wallaby
Message-ID: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>

Hi,
For some reason, i had to redeploy ceph for my hci nodes and then found
that the deployment command is giving out the following error:
2023-03-28 01:49:46.709605 |                                      |
 WARNING | ERROR: Can't run container nova_libvirt_init_secret
stderr: error: Failed to set attributes from /etc/nova/secret.xml
error: internal error: a secret with UUID
bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
client.openstack secret
2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
 FATAL | Create containers managed by Podman for
/var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
error={"changed": false, "msg": "Failed containers:
nova_libvirt_init_secret"}

Can you please tell me how I can undefine the existing secret?

With regards,
Swogat Pradhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/d0ed8217/attachment.htm>

From swogatpradhan22 at gmail.com  Tue Mar 28 00:54:49 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 28 Mar 2023 06:24:49 +0530
Subject: Nova undefine secret | openstack | wallaby
In-Reply-To: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
References: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
Message-ID: <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>

Update podman logs:
[root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864
------------------------------------------------
Initializing virsh secrets for: dcn01:openstack
--------
Initializing the virsh secret for 'dcn01' cluster
(cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client
The /etc/nova/secret.xml file already exists
error: Failed to set attributes from /etc/nova/secret.xml
error: internal error: a secret with UUID
bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
client.openstack secret


On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> For some reason, i had to redeploy ceph for my hci nodes and then found
> that the deployment command is giving out the following error:
> 2023-03-28 01:49:46.709605 |                                      |
>  WARNING | ERROR: Can't run container nova_libvirt_init_secret
> stderr: error: Failed to set attributes from /etc/nova/secret.xml
> error: internal error: a secret with UUID
> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> client.openstack secret
> 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
>  FATAL | Create containers managed by Podman for
> /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
> error={"changed": false, "msg": "Failed containers:
> nova_libvirt_init_secret"}
>
> Can you please tell me how I can undefine the existing secret?
>
> With regards,
> Swogat Pradhan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/3a6d6cb4/attachment.htm>

From jake.yip at ardc.edu.au  Tue Mar 28 01:21:06 2023
From: jake.yip at ardc.edu.au (Jake Yip)
Date: Tue, 28 Mar 2023 12:21:06 +1100
Subject: [Magnum] vPTG
Message-ID: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au>

Dear all,

Magnum vPTG will be held at Wed 0900 UTC in the Havana Room.

Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum 
for updates

Regards,
Jake

(sorry for duplicates)


From adivya1.singh at gmail.com  Tue Mar 28 02:50:27 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Tue, 28 Mar 2023 08:20:27 +0530
Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk
In-Reply-To: <CA+ykd62Qod4cYe0HEB74WZYBp2qRhES+xNiD76GjG1c_fwfL-Q@mail.gmail.com>
References: <CA+ykd62Qod4cYe0HEB74WZYBp2qRhES+xNiD76GjG1c_fwfL-Q@mail.gmail.com>
Message-ID: <CA+ykd60RdtYD0640fCZiuyeXwwxPTk0Vujfdy8Me8mbeMfvMFQ@mail.gmail.com>

Hi Team,

Any thoughts on this ?

Regards
Adivya Singh


On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh <adivya1.singh at gmail.com>
wrote:

> Hi Team,
>
> Any hints, if i want to upload images  in a bulk in a Open Stack , because
> it takes some time for the image to copy if we go one by one, or even of we
> go with script
>
>
> Also if there is a scenario where glance mount point fails and we can
> create the same Share path and Copy the Image from the source , Will the
> OpenStack glance Service will start detecting those images upload in a share
>
> Regards
> Adivya Singh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/f51a27aa/attachment.htm>

From matt at oliver.net.au  Tue Mar 28 03:30:56 2023
From: matt at oliver.net.au (Matthew Oliver)
Date: Tue, 28 Mar 2023 14:30:56 +1100
Subject: [swift][ptg] Ops Feedback Session - 29th March at 13:00 UTC
Message-ID: <CAEQe3e=x70bJVKo+gofLvgbKwZOdRMwLuF1ysWk8MPW=Pk6ZDg@mail.gmail.com>

As we've done in PTGs past, we're getting devs and ops together to talk
about Swift: what's working, what isn't, and what would be most helpful to
improve.

We're meeting in Ocata (https://www.openstack.org/ptg/rooms/ocata) at
13:00UTC -- if you run a Swift cluster, we hope to see you there! Even if
you can't make it, We'd appreciate it if you can offer some feedback on the
feedback etherpad (https://etherpad.opendev.org/p/swift-bobcat-ops-feedback
).

This has always been a highlight at every PTG for us swift devs. Have your
say and help make Swift even better!

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/2d08baa1/attachment-0001.htm>

From abishop at redhat.com  Tue Mar 28 03:56:08 2023
From: abishop at redhat.com (Alan Bishop)
Date: Mon, 27 Mar 2023 20:56:08 -0700
Subject: Nova undefine secret | openstack | wallaby
In-Reply-To: <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
References: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
 <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
Message-ID: <CADO3vb4Mr5jjkORc3zkuhAPNRgdvyajUtHaDS11pinnC_0XS5g@mail.gmail.com>

On Mon, Mar 27, 2023 at 5:56?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Update podman logs:
> [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864
> ------------------------------------------------
> Initializing virsh secrets for: dcn01:openstack
> --------
> Initializing the virsh secret for 'dcn01' cluster
> (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client
> The /etc/nova/secret.xml file already exists
> error: Failed to set attributes from /etc/nova/secret.xml
> error: internal error: a secret with UUID
> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> client.openstack secret
>
>
> On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> For some reason, i had to redeploy ceph for my hci nodes and then found
>> that the deployment command is giving out the following error:
>> 2023-03-28 01:49:46.709605 |                                      |
>>  WARNING | ERROR: Can't run container nova_libvirt_init_secret
>> stderr: error: Failed to set attributes from /etc/nova/secret.xml
>> error: internal error: a secret with UUID
>> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
>> client.openstack secret
>> 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
>>  FATAL | Create containers managed by Podman for
>> /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
>> error={"changed": false, "msg": "Failed containers:
>> nova_libvirt_init_secret"}
>>
>> Can you please tell me how I can undefine the existing secret?
>>
>
Use "podman exec -ti <nova libvirt container> bash" to open a shell within
the nova_libvirt container, then you can use virsh commands to examine and
delete any extraneous secrets. This command might be all that you need:

[root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine
bd136bb0-fd78-5429-ab80-80b8c571d821

You should also delete the /etc/nova/secret.xml file, and let it be
recreated when you re-run the deployment command.

Alan


>> With regards,
>> Swogat Pradhan
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230327/8dced673/attachment.htm>

From swogatpradhan22 at gmail.com  Tue Mar 28 05:28:12 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 28 Mar 2023 10:58:12 +0530
Subject: Nova undefine secret | openstack | wallaby
In-Reply-To: <CADO3vb4Mr5jjkORc3zkuhAPNRgdvyajUtHaDS11pinnC_0XS5g@mail.gmail.com>
References: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
 <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
 <CADO3vb4Mr5jjkORc3zkuhAPNRgdvyajUtHaDS11pinnC_0XS5g@mail.gmail.com>
Message-ID: <CAH0LXPp2Do3CcN4jGxB72FC1KChYST_k6VjeqRPdyx7=K8Z=HA@mail.gmail.com>

Hi Alan,
Thank you for your response.
We cannot run that particular command as the container itself doesn't run.
That container is only used to set the secret and stays in exited state if
i am correct.

[root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine
bd136bb0-fd78-5429-ab80-80b8c571d821
Error: can only create exec sessions on running containers: container state
improper

With regards,
Swogat Pradhan

On Tue, Mar 28, 2023 at 9:26?AM Alan Bishop <abishop at redhat.com> wrote:

>
>
> On Mon, Mar 27, 2023 at 5:56?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Update podman logs:
>> [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864
>> ------------------------------------------------
>> Initializing virsh secrets for: dcn01:openstack
>> --------
>> Initializing the virsh secret for 'dcn01' cluster
>> (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client
>> The /etc/nova/secret.xml file already exists
>> error: Failed to set attributes from /etc/nova/secret.xml
>> error: internal error: a secret with UUID
>> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
>> client.openstack secret
>>
>>
>> On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> For some reason, i had to redeploy ceph for my hci nodes and then found
>>> that the deployment command is giving out the following error:
>>> 2023-03-28 01:49:46.709605 |                                      |
>>>  WARNING | ERROR: Can't run container nova_libvirt_init_secret
>>> stderr: error: Failed to set attributes from /etc/nova/secret.xml
>>> error: internal error: a secret with UUID
>>> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
>>> client.openstack secret
>>> 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
>>>  FATAL | Create containers managed by Podman for
>>> /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
>>> error={"changed": false, "msg": "Failed containers:
>>> nova_libvirt_init_secret"}
>>>
>>> Can you please tell me how I can undefine the existing secret?
>>>
>>
> Use "podman exec -ti <nova libvirt container> bash" to open a shell within
> the nova_libvirt container, then you can use virsh commands to examine and
> delete any extraneous secrets. This command might be all that you need:
>
> [root at dcn01-hci-1 ~]# podman exec -ti 3e5e6c1a7864 virsh secret-undefine
> bd136bb0-fd78-5429-ab80-80b8c571d821
>
> You should also delete the /etc/nova/secret.xml file, and let it be
> recreated when you re-run the deployment command.
>
> Alan
>
>
>>> With regards,
>>> Swogat Pradhan
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/339fdd00/attachment.htm>

From noonedeadpunk at gmail.com  Tue Mar 28 05:59:05 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Tue, 28 Mar 2023 07:59:05 +0200
Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk
In-Reply-To: <CA+ykd60RdtYD0640fCZiuyeXwwxPTk0Vujfdy8Me8mbeMfvMFQ@mail.gmail.com>
References: <CA+ykd62Qod4cYe0HEB74WZYBp2qRhES+xNiD76GjG1c_fwfL-Q@mail.gmail.com>
 <CA+ykd60RdtYD0640fCZiuyeXwwxPTk0Vujfdy8Me8mbeMfvMFQ@mail.gmail.com>
Message-ID: <CAPd_6As9yAKzWF0FWMtizm5_u_qYXc6gYi_gSu6UfC2iOvietg@mail.gmail.com>

There's no server-side support for bulk upload of images in glance API. But
I see no reason why client-side tooling would work for that. Simplest thing
would be using xargs in some bash one-liner.

If talking about python and sdk, should be also quite trivial to implement
that leveraging multiprocessing or joblib libraries.


> On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh <adivya1.singh at gmail.com>
> wrote:
>
>> Hi Team,
>>
>> Any hints, if i want to upload images  in a bulk in a Open Stack ,
>> because it takes some time for the image to copy if we go one by one, or
>> even of we go with script
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/b8a27508/attachment.htm>

From noonedeadpunk at gmail.com  Tue Mar 28 06:04:24 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Tue, 28 Mar 2023 08:04:24 +0200
Subject: (Open Stack-glance )Image Upload in Open Stack in a Bulk
In-Reply-To: <CAPd_6As9yAKzWF0FWMtizm5_u_qYXc6gYi_gSu6UfC2iOvietg@mail.gmail.com>
References: <CA+ykd62Qod4cYe0HEB74WZYBp2qRhES+xNiD76GjG1c_fwfL-Q@mail.gmail.com>
 <CA+ykd60RdtYD0640fCZiuyeXwwxPTk0Vujfdy8Me8mbeMfvMFQ@mail.gmail.com>
 <CAPd_6As9yAKzWF0FWMtizm5_u_qYXc6gYi_gSu6UfC2iOvietg@mail.gmail.com>
Message-ID: <CAPd_6AvUnTbeFSKLBpqYzRKareNx5eoC2XRuao9v+VgsXpV8FQ@mail.gmail.com>

Sorry, made a confusing typo in my reply, what I meant was that some
client-side script will work just nicely for this purpose if it's written
in a way to execute multiple processes simultaneously.

??, 28 ???. 2023 ?., 07:59 Dmitriy Rabotyagov <noonedeadpunk at gmail.com>:

> There's no server-side support for bulk upload of images in glance API.
> But I see no reason why client-side tooling would work for that. Simplest
> thing would be using xargs in some bash one-liner.
>
> If talking about python and sdk, should be also quite trivial to implement
> that leveraging multiprocessing or joblib libraries.
>
>
>
>> On Mon, Mar 27, 2023 at 5:06?PM Adivya Singh <adivya1.singh at gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> Any hints, if i want to upload images  in a bulk in a Open Stack ,
>>> because it takes some time for the image to copy if we go one by one, or
>>> even of we go with script
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/27c14bc7/attachment-0001.htm>

From elod.illes at est.tech  Tue Mar 28 06:34:09 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Tue, 28 Mar 2023 06:34:09 +0000
Subject: [all][stable][ptl] Propose to EOL Rocky series
In-Reply-To: <e600c96d-19f9-4a14-6abf-08bd01b79d79@debian.org>
References: <E1pLm0Z-001aff-0j@lists.openstack.org>
 <VI1P18901MB0751BF4E42E67D9FABC8AB51FFDE9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>
 <e600c96d-19f9-4a14-6abf-08bd01b79d79@debian.org>
Message-ID: <VI1P18901MB07518A29DA2DDC613DB0FF80FF889@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Hi,

this thread was bit of forgotten, sorry for that. A bit more than two
weeks ago we had a discussion on #openstack-release about this [1].

So, to summerize, there are the issues:
- stable/rocky's gate is mostly broken
- more than one third of the repositories have transitioned their
  stable/rocky branch to EOL (including multiple core component)
- old, unmaintained CI jobs, testing environments, hinders refactoring
  of Zuul jobs and other configurations

On the other hand, as Thomas mentioned, there is the need for some
to be able to cooperate (as an example: recent security issue [2],
mentioned in previous mail or in our IRC discussion) on a common place,
namely in gerrit. This was originally the intention with Extended
Maintenance. We just haven't thought about eternity :)

It seems that teams feel that if a branch is 'open' and in 'Extended
Maintenance' then it still means it is 'fully supported', thus cannot
let the gate failing AND don't want to merge patches without gate
tests, that's one reason why teams rather EOL their branches.

We might need to think more about what is the best way forward.

[1] https://meetings.opendev.org/irclogs/%23openstack-release/%23openstack-release.2023-03-08.log.html#t2023-03-08T13:54:34
[2] https://security.openstack.org/ossa/OSSA-2023-002.html

El?d
irc: elodilles

________________________________
From: Thomas Goirand <zigo at debian.org>
Sent: Tuesday, February 14, 2023 7:31 PM
To: El?d Ill?s <elod.illes at est.tech>
Cc: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: [all][stable][ptl] Propose to EOL Rocky series

On 2/10/23 18:26, El?d Ill?s wrote:
> Hi,
>
> thanks for all the feedbacks from teams so far!
>
> @Zigo: Extended Maintenance process was created just for the same
> situation: to give space to interested parties to cooperate and keep
> things maintained even when stable releases are over their 'supported'
> lifetime. So it's good to see that there is interest in it!
> Unfortunately, with very old branches we've reached the state where
> gates can't be maintained and without a functional gate it's not safe to
> merge patches (yes, even security fixes) and they are just using
> resources (CI & maintainers' time). When gate is broken in such extent,
> then i think the community have to accept that it is not possible to
> merge patches confidently and needs to EOL that release.

That's where I don't agree. There are ways, outside of the OpenStack
gate, to test things, in such ways that merging patches there can be a
thing.

> Another aspect is that code cannot be cleaned up until those old
> branches are still present (CI jobs, project configurations, etc) which
> gives pain for developers.

Just disable gating completely then.

> So, however some vendors would appreciate probably to keep things open
> forever, for the community this is not beneficial and doable I think.

I don't agree. We need a place to share patches between distros. The
official Git feels like the natural place to do so, even without any
type of gating.

BTW, my Nova patches for CVE-2022-47951 in Rocky, Stein & Train are
currently wrong and need another approach. I was thinking about simply
disabling .vmdk altogether (rather than having a complicated code to
check for the VMDK subtype). I wonder what other distros did. Where do I
disucss this?

Cheers,

Thomas Goirand (zigo)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/6651aeee/attachment.htm>

From amotoki at gmail.com  Tue Mar 28 09:16:16 2023
From: amotoki at gmail.com (Akihiro Motoki)
Date: Tue, 28 Mar 2023 18:16:16 +0900
Subject: [neutron] Bug deputy report (week of Mar 20)
Message-ID: <CALhU9tkZCz0OJk6HSq7PqG94P=gSP=58vCCTOsAsKcmcgRbccQ@mail.gmail.com>

Hi,

Here is the bug deputy report last week.
Hopefully, someone familiar with DNS integration can check the first bug.

# Needs assignee

* FQDN inside guest VM is not the same as dns_assignment on network port
  https://bugs.launchpad.net/neutron/+bug/2012391
  It would be nice if someone familiar with DNS integration can look into it.

* [CI] "neutron-ovs-grenade-multinode-skip-level" and
"neutron-ovn-grenade-multinode-skip-level" failing always
  https://bugs.launchpad.net/neutron/+bug/2012731

# New but assigned

* [OVN] Define the OVS port in the LSP to allow OVN to set the QoS rules
  https://bugs.launchpad.net/neutron/+bug/2012613
  Assigned to ralonsoh

# In Progress

* [ovn] N/S traffic for VMs without FIPs not working
  https://bugs.launchpad.net/neutron/+bug/2012712
  Assigned to ltomasbo

  [1] was proposed as a fix of bug 2012712 which is caused by [2].
  In parallel, [3] was proposed to revert [2].
  Reverting [2] first sounds reasonable to avoid the regression but it
is better to clarify the priority and the relationship in the review
comments.
  ltomasbo is involved in both, so I think there is no confusion though.

  [1] https://review.opendev.org/c/openstack/neutron/+/878450
  [2] https://review.opendev.org/c/openstack/neutron/+/875644
  [3] https://review.opendev.org/c/openstack/neutron/+/878441

* neutron-ovn-agent fails on do_commit aborted due to error: 'Chassis_Private'
  https://bugs.launchpad.net/neutron/+bug/2012385
* Intermittent failures of test_agent_metadata_port_ip_update_event
  https://bugs.launchpad.net/neutron/+bug/2012754
* [sqlalchemy-20] sqlalchemy.exc.InvalidRequestError: No 'on clause'
argument may be passed when joining to a relationship path as a target
  https://bugs.launchpad.net/neutron/+bug/2012643
* [sqlalchemy-20] Strings are not accepted for attribute names in loader options
  https://bugs.launchpad.net/neutron/+bug/2012662
* [sqlalchemy-20] Unexpected keyword argument "when" in "sqlalchemy.case" method
  https://bugs.launchpad.net/neutron/+bug/2012705

# RFE

* [rfe] Add one api support CRUD allowed_address_pairs
  https://bugs.launchpad.net/neutron/+bug/2012332


Thanks,
Akihiro Motoki (amotoki)


From elod.illes at est.tech  Tue Mar 28 09:56:23 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Tue, 28 Mar 2023 09:56:23 +0000
Subject: [networking-mlnx] release job failure - missing openstackci as
 maintainer in pypi
Message-ID: <VI1P18901MB07517ED27C9ECF5AC533452EFF889@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Hi networking-mlnx maintainers,

latest networking-mlnx releases caused release job failures [1][2][3], with the following error:

    The user 'openstackci' isn't allowed to upload to project 'networking-mlnx'.

(note, that networking-mlnx is not under openstack namespace (x/networking-mlnx))

[1] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001654.html
[2] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001655.html
[3] https://lists.openstack.org/pipermail/release-job-failures/2023-March/001656.html

Thanks,

El?d
irc: elodilles @ #openstack-release

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/3f295bcc/attachment-0001.htm>

From smooney at redhat.com  Tue Mar 28 11:30:29 2023
From: smooney at redhat.com (Sean Mooney)
Date: Tue, 28 Mar 2023 12:30:29 +0100
Subject: Nova undefine secret | openstack | wallaby
In-Reply-To: <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
References: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
 <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
Message-ID: <e149b8d2a4344480f7e8a911782d732975bbe335.camel@redhat.com>

On Tue, 2023-03-28 at 06:24 +0530, Swogat Pradhan wrote:
> Update podman logs:
> [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864
> ------------------------------------------------
> Initializing virsh secrets for: dcn01:openstack
> --------
> Initializing the virsh secret for 'dcn01' cluster
> (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client
> The /etc/nova/secret.xml file already exists
> error: Failed to set attributes from /etc/nova/secret.xml
> error: internal error: a secret with UUID
> bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> client.openstack secret

you jsut do "virsh secret-undefine <uuid>"

> 
> 
> On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
> 
> > Hi,
> > For some reason, i had to redeploy ceph for my hci nodes and then found
> > that the deployment command is giving out the following error:
> > 2023-03-28 01:49:46.709605 |                                      |
> >  WARNING | ERROR: Can't run container nova_libvirt_init_secret
> > stderr: error: Failed to set attributes from /etc/nova/secret.xml
> > error: internal error: a secret with UUID
> > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> > client.openstack secret
> > 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
> >  FATAL | Create containers managed by Podman for
> > /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
> > error={"changed": false, "msg": "Failed containers:
> > nova_libvirt_init_secret"}
> > 
> > Can you please tell me how I can undefine the existing secret?
> > 
> > With regards,
> > Swogat Pradhan
> > 


From jake.yip at unimelb.edu.au  Tue Mar 28 01:15:51 2023
From: jake.yip at unimelb.edu.au (Jake Yip)
Date: Tue, 28 Mar 2023 12:15:51 +1100
Subject: [Magnum] vPTG
Message-ID: <e339da41-6b20-e2ca-333b-61761b849c4a@unimelb.edu.au>

Dear all,

Magnum vPTG will be held at Wed 0900 UTC in the Havana Room.

Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum 
for updates

Regards,
Jake

-- 
Jake Yip
DevOps Engineer, ARDC Nectar Research Cloud


From swogatpradhan22 at gmail.com  Tue Mar 28 02:09:23 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Tue, 28 Mar 2023 07:39:23 +0530
Subject: DCN compute service goes down when a instance is scheduled to
 launch | wallaby | tripleo
In-Reply-To: <CADO3vb52Re5Unyrg6=-Mzt6-GkCUQSuUkSWDDi4H29sdKrEW3A@mail.gmail.com>
References: <CAH0LXPrJdMss-C4w786v0p5rwVEQ7S8uk6BmnutKfMmLBYcoxg@mail.gmail.com>
 <CAH0LXPppyDVq9GPBAbjBYWPXXcu0=qUtNuEAyUY1mfKVOH=XkQ@mail.gmail.com>
 <CAH0LXPoPXc4pbbG3jqxv-v4fA4tcwajdGMtxa476m3D1DMOBQg@mail.gmail.com>
 <CAH0LXPqSA0RddTGneSWY5E7qGSy94==a_Sv8ek-5oKp0CRQUBw@mail.gmail.com>
 <CAH0LXPoQ=htCzu5tHbGSPRzCcqQL8vj6btBaxVFnYNpbi64SCg@mail.gmail.com>
 <20230301095953.Horde.2lvkFjt1j-QzRJRplLfTan3@webmail.nde.ag>
 <CAH0LXPrfqnHRWPZLXg_cKHbCenUdKJq14uGjqPcSHmoHhU-t9A@mail.gmail.com>
 <CAH0LXPqAJ9-C=vrC5xcHw78desMv4V7SmkYfe4VBRTj-B30XvA@mail.gmail.com>
 <CAH0LXPoON2b1zt9zYgZHDosRMg-nQ-Mk7DOtCXu2EyAj3u-f=g@mail.gmail.com>
 <20230304204745.Horde.k4XZrTgWPZWL-tw8eTzWIJs@webmail.nde.ag>
 <819227B8-566B-4696-B045-BBAB8751CBFC@redhat.com>
 <CAH0LXPqyZXRnOtd=_WnPNDkqLq8XZi0f0Ee8TTbes1sLKOpfig@mail.gmail.com>
 <CAH0LXPrHV+y80sGshW6DOW1q1H9u8P9Jt0n3uJDND58yTHrjLA@mail.gmail.com>
 <CAH0LXPpYEFC97SpJHx4cLgJUiRqKUADS=-RM9Nf+FnWMOOrQMg@mail.gmail.com>
 <CAE66OMDdmaqvDSekn0ooy=V2d-nOG8fgfZ=N3=MZjv1vrNUWqw@mail.gmail.com>
 <CAH0LXPpadEynD1j0ssnpacjLnqjU3oi8ZnFOCkuR-dyfw4jJSA@mail.gmail.com>
 <CAH0LXPqERn1R-OVWXLwEh5=7HrHdV_kQgKpor61T7QesNjAN4A@mail.gmail.com>
 <CAE66OMD7CAgb9q=+1-_LM-bJbKF+76-J3bDqCNFknKmaTbwx-w@mail.gmail.com>
 <CAH0LXPp7uQqC+2p5VTtoP8h-_WLwP+10MAS8RmoJcf11Gv-JFQ@mail.gmail.com>
 <CAE66OMDYuQrAX+Ht95nG0YdECLmNWaoSpbzv-tqfi0Ctf7C3pA@mail.gmail.com>
 <CAH0LXPr1GfWV8mWFHgssciZW79eThrcoVB2TesEq83-a5WCMug@mail.gmail.com>
 <CAH0LXPok2=+78xFs=V4kx_YP6wWVMkNTH2sF24jsUv=SA269kA@mail.gmail.com>
 <CADO3vb5xGnzGe+XmVZrbyRPHJF6TDWJ4R2yEs7iobBPeVVhBpQ@mail.gmail.com>
 <CAH0LXPrVeJiP34UmDmpNphH3Lg+wtvWaz_eu=0V_RHUwDAmGdw@mail.gmail.com>
 <CADO3vb6uKm5vjbT8LZBF6dq1JaAxWUgyuKRAH9puY09gQ8i5xQ@mail.gmail.com>
 <CAH0LXPqYrto0fuKnnDY6tA=pARrPpkCRMxxGuN=V57i1nwsS=Q@mail.gmail.com>
 <CAH0LXPoPHbmdETRPCAX33A8i59m0xcza-+nY9VH=Uy-=g_720w@mail.gmail.com>
 <CAH0LXPrkr2aEe2u5iXn8v+WhTxPV7QG5miJ96BBgXj6nMRuDYw@mail.gmail.com>
 <CADO3vb5Aoj+7oqn+1=k2p18VO-TSAfh-v0SXowGoqucrax5MgQ@mail.gmail.com>
 <CAH0LXPqRXHdZjtnS1dxhmL+j_Jztjy7PVL+oGsuijv5Za0bxWA@mail.gmail.com>
 <CAH0LXPruScWnyea=BbGcXo5qRBNup6sGLpsxX3_t_JLh7jDwPA@mail.gmail.com>
 <CADO3vb52Re5Unyrg6=-Mzt6-GkCUQSuUkSWDDi4H29sdKrEW3A@mail.gmail.com>
Message-ID: <CAH0LXPrzRZSHJwVNB2hTHetOnHLagmZoDuu+pXA5nurR9AyH9g@mail.gmail.com>

Hi Alan,
Thank you so much.
The issue was with the images. Qcow2 images are not working.
I used RAW images and it now takes 2 mins to spawn the instances without
any issues.

Thank you
With regards,
Swogat Pradhan

On Thu, Mar 23, 2023 at 10:35?PM Alan Bishop <abishop at redhat.com> wrote:

>
>
> On Thu, Mar 23, 2023 at 9:01?AM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> Can someone please help me identify the issue here?
>> Latest cinder-volume logs from dcn02:
>> (ATTACHED)
>>
>
> It's really not possible to analyze what's happening with just one or two
> log entries. Do you have
> debug logs enabled? One thing I noticed is the glance image's disk_format
> is qcow2. You should
> use "raw" images with ceph RBD.
>
> Alan
>
>
>>
>> The volume is stuck in creating state.
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Thu, Mar 23, 2023 at 6:12?PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi Jhon,
>>> Thank you for clarifying that.
>>> Right now the cinder volume is stuck in *creating *state when adding
>>> image as volume source.
>>> But when creating an empty volume the volumes are getting created
>>> successfully without any errors.
>>>
>>> We are getting volume creation request in cinder-volume.log as such:
>>> 2023-03-23 12:34:40.152 108 INFO
>>> cinder.volume.flows.manager.create_volume
>>> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
>>> specification: {'status': 'creating', 'volume_name':
>>> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
>>> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
>>> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
>>> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> 'metadata': {'store': 'ceph'}}, {'url':
>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>>> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>> 'owner_specified.openstack.object': 'images/cirros',
>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>> <cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}
>>>
>>> But there is nothing else after that and the volume doesn't even
>>> timeout, it just gets stuck in creating state.
>>> Can you advise what might be the issue here?
>>> All the containers are in a healthy state now.
>>>
>>> With regards,
>>> Swogat Pradhan
>>>
>>>
>>> On Thu, Mar 23, 2023 at 6:06?PM Alan Bishop <abishop at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Mar 23, 2023 at 5:20?AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> Is this bind not required for cinder_scheduler container?
>>>>>
>>>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>>>>> I do not see this particular bind on the cinder scheduler containers
>>>>> on my controller nodes.
>>>>>
>>>>
>>>> That is correct, because the scheduler does not access the ceph
>>>> cluster.
>>>>
>>>> Alan
>>>>
>>>>
>>>>> With regards,
>>>>> Swogat Pradhan
>>>>>
>>>>> On Thu, Mar 23, 2023 at 2:46?AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> Cinder volume config:
>>>>>>
>>>>>> [tripleo_ceph]
>>>>>> volume_backend_name=tripleo_ceph
>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>> rbd_user=openstack
>>>>>> rbd_pool=volumes
>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>>>>> report_discard_supported=True
>>>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>>>>> rbd_cluster_name=dcn02
>>>>>>
>>>>>> Glance api config:
>>>>>>
>>>>>> [dcn02]
>>>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>>>>> rbd_store_user=openstack
>>>>>> rbd_store_pool=images
>>>>>> rbd_thin_provisioning=False
>>>>>> store_description=dcn02 rbd glance store
>>>>>> [ceph]
>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>> rbd_store_user=openstack
>>>>>> rbd_store_pool=images
>>>>>> rbd_thin_provisioning=False
>>>>>> store_description=Default glance store backend.
>>>>>>
>>>>>> On Thu, Mar 23, 2023 at 2:29?AM Swogat Pradhan <
>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>
>>>>>>> I still have the same issue, I'm not sure what's left to try.
>>>>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>>>>> create a volume with an image.
>>>>>>> And the volumes are just stuck in creating state for more than 20
>>>>>>> mins now.
>>>>>>>
>>>>>>> Cinder logs:
>>>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>>>>> 2023-03-22 20:34:59.166 108 INFO
>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> [{'url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>>>>
>>>>>>> With regards,
>>>>>>> Swogat Pradhan
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 9:19?PM Alan Bishop <abishop at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 22, 2023 at 8:38?AM Swogat Pradhan <
>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Adam,
>>>>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>>>>> was getting pulled from the central site which was caused due to an
>>>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>>>>> fix it.
>>>>>>>>>
>>>>>>>>> Right now the glance api podman is running in unhealthy state and
>>>>>>>>> the podman logs don't show any error whatsoever and when issued the command
>>>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>>>>
>>>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>>>>> finding address for
>>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>>> Unable to establish connection to
>>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>>>>> ECONNREFUSED',))
>>>>>>>>>
>>>>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>>>>> service is running, which i am not sure how to find out.
>>>>>>>>>
>>>>>>>>
>>>>>>>> One other thing to investigate is whether your deployment includes
>>>>>>>> this patch [1]. If it does, then bear in mind
>>>>>>>> the glance-api service running at the edge site will be an
>>>>>>>> "internal" (non public facing) instance that uses port 9293
>>>>>>>> instead of 9292. You should familiarize yourself with the release
>>>>>>>> note [2].
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>>>>> [2]
>>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>>>>
>>>>>>>> Alan
>>>>>>>>
>>>>>>>>
>>>>>>>>> With regards,
>>>>>>>>> Swogat Pradhan
>>>>>>>>>
>>>>>>>>> On Wed, Mar 22, 2023 at 8:11?PM Alan Bishop <abishop at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 22, 2023 at 6:37?AM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Update:
>>>>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>>>>
>>>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> [{'url':
>>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>>>>
>>>>>>>>>> John Fulton previously stated your cinder-volume service at the
>>>>>>>>>> edge site is not using the local ceph image store. Assuming you are
>>>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should
>>>>>>>>>> be configured to use the local glance service [2]. You should check
>>>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>>>>> [2]
>>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>>>>
>>>>>>>>>> Alan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>>
>>>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>>
>>>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>>>>> MB/s
>>>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>>>>
>>>>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>>>>
>>>>>>>>>>> With regards,
>>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 21, 2023 at 6:10?PM Swogat Pradhan <
>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jhon,
>>>>>>>>>>>> This seems to be an issue.
>>>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the
>>>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the
>>>>>>>>>>>> config files were created in the name of ceph.conf and keyring was
>>>>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>>>>
>>>>>>>>>>>> Which created issues in glance as well as the naming convention
>>>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename
>>>>>>>>>>>> the central ceph conf file as such:
>>>>>>>>>>>>
>>>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>>>>> total 16
>>>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>>>>
>>>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are
>>>>>>>>>>>> the files used to access dcn02 ceph cluster and ceph_central* files are
>>>>>>>>>>>> used in for accessing central ceph cluster.
>>>>>>>>>>>>
>>>>>>>>>>>> glance multistore config:
>>>>>>>>>>>> [dcn02]
>>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>>>>
>>>>>>>>>>>> [ceph_central]
>>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> With regards,
>>>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52?PM John Fulton <
>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03?AM Swogat Pradhan
>>>>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I hope this is not a production system since the mailing list
>>>>>>>>>>>>> now has
>>>>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The section that looks like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>>>> rbd_user=openstack
>>>>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and
>>>>>>>>>>>>> not the
>>>>>>>>>>>>> central one. Use the ceph conf file for that cluster and
>>>>>>>>>>>>> ensure the
>>>>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> TripleO?s convention is to set the rbd_secret_uuid to the FSID
>>>>>>>>>>>>> of the
>>>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so
>>>>>>>>>>>>> that
>>>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>>>>> This
>>>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The documentation describes how to configure the central and
>>>>>>>>>>>>> DCN sites
>>>>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>>>>> following
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>   John
>>>>>>>>>>>>>
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Ceph Output:
>>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>>>>>   2        excl
>>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB
>>>>>>>>>>>>>   2
>>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>>>>>   2  yes
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>>>>> > NAME                                         SIZE
>>>>>>>>>>>>>  PARENT  FMT  PROT  LOCK
>>>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>>>>>     2
>>>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>>>>>     2
>>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > With regards,
>>>>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53?PM John Fulton <
>>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run
>>>>>>>>>>>>> a command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>>>>> config.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Update:
>>>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it
>>>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26?PM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images
>>>>>>>>>>>>> created after importing from the central site.
>>>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a
>>>>>>>>>>>>> long time for the volume to get created.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> With regards,
>>>>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24?PM John Fulton <
>>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05?PM Swogat Pradhan
>>>>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>>>>> launch the VM from volume.
>>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the
>>>>>>>>>>>>> block device creation is stuck in creating state, it is taking more than 10
>>>>>>>>>>>>> mins for the volume to be created, whereas the image has already been
>>>>>>>>>>>>> imported to the edge glance.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>>>>> observations in your
>>>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf
>>>>>>>>>>>>> --keyring
>>>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>>>>>   excl
>>>>>>>>>>>>> >>>>> $
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the
>>>>>>>>>>>>> image which is on
>>>>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>>>>> encountering
>>>>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance
>>>>>>>>>>>>> and be copied
>>>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted
>>>>>>>>>>>>> on DCN sites.
>>>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>>>>> booted, then the
>>>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the
>>>>>>>>>>>>> image will boot as
>>>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>>>>> has access to
>>>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>>>>> booting of
>>>>>>>>>>>>> >>>>> the image will take time because it has not been copied
>>>>>>>>>>>>> in advance,
>>>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN
>>>>>>>>>>>>> site and
>>>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>>   John
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>>>>> then update.
>>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13?PM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is
>>>>>>>>>>>>> showing down.
>>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11?PM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>>>>> (lacp=active).
>>>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities
>>>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>>>>> running command.
>>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO
>>>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266
>>>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>>>>> template mentioned here ?:
>>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> The volume is already created and i do not
>>>>>>>>>>>>> understand why the instance is stuck in spawning state.
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02?PM Brendan Shephard <
>>>>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Does your environment use different network
>>>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything
>>>>>>>>>>>>> on it?
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>>>>> single active/backup bond on 1Gbe nics. It?s worth checking the network
>>>>>>>>>>>>> traffic while you try to spawn the instance to see if you?re dropping
>>>>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>>>>> while spawning the instance.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <
>>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some
>>>>>>>>>>>>> time ago in this thread:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it
>>>>>>>>>>>>> for that user, not sure if that could apply here. But is it possible that
>>>>>>>>>>>>> your nova and neutron versions are different between central and edge site?
>>>>>>>>>>>>> Have you restarted nova and neutron services on the compute nodes after
>>>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the
>>>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to
>>>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit
>>>>>>>>>>>>> is running, this will most likely impact client IO depending on your load.
>>>>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>>>>> rebuild.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives"
>>>>>>>>>>>>> while being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>>>>> a better advice.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>>> >:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24?PM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>>>>> but not due to packet
>>>>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34?PM Swogat Pradhan <
>>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>>>>> checked when
>>>>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance
>>>>>>>>>>>>> gets stuck at spawning
>>>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>>>>> sure if packet loss
>>>>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30?PM Eugen Block <
>>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>>>>> identical between
>>>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss
>>>>>>>>>>>>> through the tunnel?
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>>> >:
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to'
>>>>>>>>>>>>> or 'cc' as i am not
>>>>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to
>>>>>>>>>>>>> definition      priority
>>>>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only
>>>>>>>>>>>>> goes down when i am
>>>>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>>>>> spawning state and
>>>>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>>>>> edge sites.
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11?PM Swogat Pradhan <
>>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to
>>>>>>>>>>>>> me directly, i am
>>>>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find
>>>>>>>>>>>>> your reply.
>>>>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>>>>> occurred.
>>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform
>>>>>>>>>>>>> other activities in the
>>>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>>>>> site.*
>>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12?PM Swogat Pradhan <
>>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here
>>>>>>>>>>>>> are the details:
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest
>>>>>>>>>>>>> ]:
>>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple
>>>>>>>>>>>>> times but the issue is
>>>>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>>>>> cluster_status
>>>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> :
>>>>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose:
>>>>>>>>>>>>> HTTP API
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose:
>>>>>>>>>>>>> HTTP API
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose:
>>>>>>>>>>>>> HTTP API
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>>>>> clustering, purpose:
>>>>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672,
>>>>>>>>>>>>> protocol: amqp, purpose: AMQP
>>>>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>>>>> purpose: HTTP API
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34?PM Swogat Pradhan <
>>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>>>>> api log.
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - -
>>>>>>>>>>>>> - - -] The reply
>>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to
>>>>>>>>>>>>> send after 60 seconds
>>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -] The reply
>>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to
>>>>>>>>>>>>> send after 60 seconds
>>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -] The reply
>>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to
>>>>>>>>>>>>> send after 60 seconds
>>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -]
>>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - -
>>>>>>>>>>>>> - - -] The reply
>>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to
>>>>>>>>>>>>> send after 60 seconds
>>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan
>>>>>>>>>>>>> <
>>>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge
>>>>>>>>>>>>> site1 where i am trying to
>>>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node
>>>>>>>>>>>>> goes down (openstack
>>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>>>>> restart the nova
>>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>>>>> nova.compute.manager
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - -
>>>>>>>>>>>>> - - -] Running
>>>>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>>>>> nova.compute.claims
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>>>>> successful on node
>>>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600]
>>>>>>>>>>>>> Ignoring supplied device
>>>>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied
>>>>>>>>>>>>> dev names
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>>>>> nova.virt.block_device
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>>>>> with volume
>>>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at
>>>>>>>>>>>>> /dev/vda
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] Running
>>>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>>>>> '--config-file',
>>>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>>>>> '--privsep_context',
>>>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] Spawned new
>>>>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities
>>>>>>>>>>>>> (eff/prm/inh):
>>>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] Process
>>>>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>>>>> running command.
>>>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600]
>>>>>>>>>>>>> Creating image
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/9dc9ffac/attachment-0001.htm>

From AnnieLiu at zhaoxin.com  Tue Mar 28 09:21:16 2023
From: AnnieLiu at zhaoxin.com (Annie Liu(BJ-RD))
Date: Tue, 28 Mar 2023 09:21:16 +0000
Subject: [Freezer] Restore action report ERROR can't delete temporary Image
Message-ID: <e761fe3898234decb8dceac454293207@zhaoxin.com>

Hi All

My Cinder Backend is Ceph, and so is Glance. For backup and restore, I choose Cinder for mode and local for storage . When I restore a Cinder volume with Freezer, an ERROR reported related to deleting temporary Image failure, which makes the whole restore flow can't be completed.

2023-03-23 16:53:50.555 361 ERROR freezer.main [-] HTTP 409 Conflict: Image 8af4be3f-2e7a-4965-8b34-74e610a89a3e could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.: HTTPConflict: HTTP 409 Conflict: Image 8af4be3f-2e7a-4965-8b34-74e610a89a3e could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.

According to Cinder source code, the volume created from Image is a child of the Image, since it is made by clone which means it doesn't copy volume data at first. In other words, the Image is its parent holding the real data, and that's why deleting is failed.

My question is, is this a general issue for other storage solution? Is there any opportunity to fix it? For example, do a flatten immediately after create volume from Image?

Thanks.

Beat Regards, Annie Liu


?????
?????????????????????????????????????????????????????
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/754e677e/attachment.htm>

From kennelson11 at gmail.com  Tue Mar 28 14:48:01 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Tue, 28 Mar 2023 09:48:01 -0500
Subject: [Magnum] vPTG
In-Reply-To: <e339da41-6b20-e2ca-333b-61761b849c4a@unimelb.edu.au>
References: <e339da41-6b20-e2ca-333b-61761b849c4a@unimelb.edu.au>
Message-ID: <CAJ6yrQhyeKf-jJYN-9=P-4dAdqAcxLzjST4BC4tVACa88SS3Ng@mail.gmail.com>

I can't say I will be awake at that time, but I look forward to reading the
notes/ summary! I will maybe use some of the conversations in my forum
proposal wrt k8s certification.

-Kendall

On Tue, Mar 28, 2023 at 9:19?AM Jake Yip <jake.yip at unimelb.edu.au> wrote:

> Dear all,
>
> Magnum vPTG will be held at Wed 0900 UTC in the Havana Room.
>
> Please see etherpad https://etherpad.opendev.org/p/march2023-ptg-magnum
> for updates
>
> Regards,
> Jake
>
> --
> Jake Yip
> DevOps Engineer, ARDC Nectar Research Cloud
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/09d0efd4/attachment.htm>

From stig.openstack at telfer.org  Tue Mar 28 16:35:56 2023
From: stig.openstack at telfer.org (Stig Telfer)
Date: Tue, 28 Mar 2023 17:35:56 +0100
Subject: [scientific-sig] PTG sessions for Scientific SIG
Message-ID: <D0FD3926-A443-446C-93A6-2FEF36FD9B53@telfer.org>

Hi all - 

The Scientific SIG has two sessions on Wednesday (tomorrow) at the PTG, at 1400 UTC and 2100 UTC.  Everyone is welcome.

There is an etherpad for the sessions here: https://etherpad.opendev.org/p/march2023-ptg-scientific-sig <https://etherpad.opendev.org/p/march2023-ptg-scientific-sig>.  We'll begin with an introduction to the SIG and some discussion of problems, challenges and new features specific to research computing use cases.

We'd also like participants to contribute lightning talks about anything relevant that is of interest to them.

Cheers,
Stig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/688271f5/attachment.htm>

From ralonsoh at redhat.com  Tue Mar 28 16:47:31 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Tue, 28 Mar 2023 18:47:31 +0200
Subject: [neutron] Deprecate networking-odl project
Message-ID: <CAECr9X4hTg63gg8xum+8ceXQQJ6nPUtcz1zXJsrudUMHgxC8Ew@mail.gmail.com>

Hello all:

During the last releases, the support for "networking-odl" project has
decreased. Currently there is no active developer or maintainer in the
community. This project depends on https://www.opendaylight.org/; the
latest version released is Sulfur (16) while the version still used in the
CI is Sodium (11) [1].

I would like first to make a call for developers to update this project.
But if this is not possible, I will then start the procedure to deprecate
it [2] (**not to retire it**).

Regards.

[1]
https://github.com/openstack/networking-odl/blob/db5c79b3ee5054feb8a17df130e4ce3a95ec64c2/.zuul.d/jobs.yaml#L172
[2]
https://docs.openstack.org/project-team-guide/repository.html#deprecating-a-repository
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/82039d0f/attachment.htm>

From adivya1.singh at gmail.com  Tue Mar 28 18:02:47 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Tue, 28 Mar 2023 23:32:47 +0530
Subject: (Openstack-Nova)
Message-ID: <CA+ykd61ADg1Qs-HpVJY_M9ZRSCLXNDKPr_H7EBxb8b-JiHhpkA@mail.gmail.com>

Hi Team

I see these error in my syslog related to my nova compute service getting
hung while communicating to rabbit-mq service

"A recoverable connection/channel error occurred, trying to reconnect:
[Errno 24] Too many open files"

Is this a OS related error, or some thing i can change to get rid of this
error

Regards
Adivya Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/017cc935/attachment-0001.htm>

From michael at knox.net.nz  Tue Mar 28 19:55:07 2023
From: michael at knox.net.nz (Michael Knox)
Date: Tue, 28 Mar 2023 15:55:07 -0400
Subject: (Openstack-Nova)
In-Reply-To: <CA+ykd61ADg1Qs-HpVJY_M9ZRSCLXNDKPr_H7EBxb8b-JiHhpkA@mail.gmail.com>
References: <CA+ykd61ADg1Qs-HpVJY_M9ZRSCLXNDKPr_H7EBxb8b-JiHhpkA@mail.gmail.com>
Message-ID: <CABRCrKk1tUDYoBhapBHR-3ewzch0+ySx-94D__hbjwEENu71-g@mail.gmail.com>

Hi,

This will be the OS you have rabbit running on. You will need to increase
the ulimit. "ulimit -n" will provide the current limit for the installed OS
and configuration. So you will need more than what's there. There could
also be other configuration issues, a normal default of 1024 isn't low for
most uses, but you will need to consider that as part of the increase.

Cheers


On Tue, Mar 28, 2023 at 2:16?PM Adivya Singh <adivya1.singh at gmail.com>
wrote:

> Hi Team
>
> I see these error in my syslog related to my nova compute service getting
> hung while communicating to rabbit-mq service
>
> "A recoverable connection/channel error occurred, trying to reconnect:
> [Errno 24] Too many open files"
>
> Is this a OS related error, or some thing i can change to get rid of this
> error
>
> Regards
> Adivya Singh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/7ef51839/attachment.htm>

From jamesleong123098 at gmail.com  Tue Mar 28 20:30:46 2023
From: jamesleong123098 at gmail.com (James Leong)
Date: Tue, 28 Mar 2023 15:30:46 -0500
Subject: [Horizon][policies][keystone] allow _member_ role to add user
Message-ID: <CA+_ZFmGBW_oHCH1h_3ZKL_EU9C_oTwg9_Q4BU0xvGJtUcjvrqw@mail.gmail.com>

Hi all,

I am using kolla-ansible for OpenStack deployment in the yoga version.
Would it be possible to allow a user with a "_member_" role to add a user
to the respective project? In OpenStack, an admin role allows users to add,
delete, and edit a user profile. I would like to have the same privilege
added to the "_member_" role.

I have tried to modify the file "user.py" in the keystone container at
"keystone/common/policies" and restarted the container. However, the button
on my horizon dashboard did not appear. How could I integrate the code in
order to allow the create button to appear on my dashboard with the
"_member_" role?

Is there a way to do that?

THanks for your help
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230328/565c1655/attachment.htm>

From melwittt at gmail.com  Tue Mar 28 21:05:04 2023
From: melwittt at gmail.com (melanie witt)
Date: Tue, 28 Mar 2023 14:05:04 -0700
Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks
Message-ID: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com>

Hey all,

Sending this to the ML to try to help others who may not have heard 
about it yet. I didn't know about myself until I saw the job fail on a 
random nova patch and I looked around to figure out why.

It looks like openstack-tox-pep8 is failing 100% in nova due to a newer 
version of the mypy library being pulled in since a recent 
upper-constraints bump:

   https://review.opendev.org/c/openstack/requirements/+/872065

And the job isn't going to pass until the following fix merges:

   https://review.opendev.org/c/openstack/nova/+/878693

Finally, to help prevent a breakage for this in the future, a 
cross-nova-pep8 job has been approved:

   https://review.opendev.org/c/openstack/requirements/+/878748

and will be on its way to the gate once the aforementioned fix merges.

I'll post a reply to this email when it's OK to do rechecks again.

Cheers,
-melwitt


From adivya1.singh at gmail.com  Wed Mar 29 04:55:45 2023
From: adivya1.singh at gmail.com (Adivya Singh)
Date: Wed, 29 Mar 2023 10:25:45 +0530
Subject: (Openstack-Designate) rndc key not getting generated in /etc/designate
Message-ID: <CA+ykd61N_zKiFgQjUPeYuCfsjau+t7vVzCRmHOm9=+46u4EjOQ@mail.gmail.com>

Hi Team,

My DNS Server located outside the Open Stack, and i am using below
variables in my user_variables.yaml File.

But When i ' m running os-desigante-install.yml Playbook, rndc key are not
generating in /etc/designate Folder

and the playbook fail at the below Task

{"msg": "The task includes an option with an undefined variable. The error
was: 'dict object' has no attribute 'file'\n\nThe error appears to be in
'/etc/ansible/roles/os_designate/tasks/designate_post_install.yml': line
89, column 3, but may\nbe elsewhere in the file depending on the exact
syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create
Designate rndc key file\n  ^ here\n"}

- name: Create Designate rndc key file
  template:
    src: rndc.key.j2
    dest: "{{ item.file }}"
    owner: "{{ item.owner | default('root') }}"
    group: "{{ item.group | default('root') }}"
    mode: "{{ item.mode | default('0600') }}"
  with_items: "{{ designate_rndc_keys }}"
  when: designate_rndc_keys is defined

and the post-install.yml File looks like this

Any idea on this, Where i am missing


## rndc keys for authenticating with bind9
# define this to create as many key files as are required
# designate_rndc_keys
#   - name: "rndc-key"
#     file: /etc/designate/rndc.key
#     algorithm: "hmac-md5"
#     secret: "<key>"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/e4f6224e/attachment.htm>

From nguyenhuukhoinw at gmail.com  Wed Mar 29 06:06:13 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 29 Mar 2023 13:06:13 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
 <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
 <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>
Message-ID: <CABAODRcUSTxfJodFijJJA4FqeW+=uxP__xbRFV9MjaJ26wwFBA@mail.gmail.com>

Hello.
I have one question.
Follow this

https://docs.openstack.org/nova/latest/admin/availability-zones.html

If the server was not created in a specific zone then it is free to be
moved to other zones. but when I use

openstack server show [server id]

I still see the "OS-EXT-AZ:availability_zone" value belonging to my
instance.

Could you tell the difference which causes "if the server was not created
in a specific zone then it is free to be moved to other zones."

Nguyen Huu Khoi


On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Hello guys.
>
> I just suggest to openstack nova works better. My story because
>
>
>    1.
>
>    The server was created in a specific zone with the POST /servers request
>    containing the availability_zone parameter.
>
> It will be nice when we attach randow zone when we create instances then
> It will only move to the same zone when migrating or masakari ha.
>
> Currently we can force it to zone by default zone shedule in nova.conf.
>
> Sorry because I am new to Openstack and I am just an operator. I try to
> verify some real cases.
>
>
>
> Nguyen Huu Khoi
>
>
> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza <sbauza at redhat.com> wrote:
>
>>
>>
>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a ?crit :
>>
>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a
>>> ?crit :
>>> >
>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
>>> > > > rafaelweingartner at gmail.com> a ?crit :
>>> > > >
>>> > > > > Hello Nguy?n H?u Kh?i,
>>> > > > > You might want to take a look at:
>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We
>>> created a
>>> > > patch
>>> > > > > to avoid migrating VMs to any AZ, once the VM has been
>>> bootstrapped in
>>> > > an
>>> > > > > AZ that has cross zone attache equals to false.
>>> > > > >
>>> > > > >
>>> > > > Well, I'll provide some comments in the change, but I'm afraid we
>>> can't
>>> > > > just modify the request spec like you would want.
>>> > > >
>>> > > > Anyway, if you want to discuss about it in the vPTG, just add it
>>> in the
>>> > > > etherpad and add your IRC nick so we could try to find a time
>>> where we
>>> > > > could be discussing it :
>>> https://etherpad.opendev.org/p/nova-bobcat-ptg
>>> > > > Also, this kind of behaviour modification is more a new feature
>>> than a
>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we could
>>> > > better
>>> > > > see it.
>>> > >
>>> > > i tought i left review feedback on that too that the approch was not
>>> > > correct.
>>> > > i guess i did not in the end.
>>> > >
>>> > > modifying the request spec as sylvain menthioned is not correct.
>>> > > i disucssed this topic on irc a few weeks back with mohomad for
>>> vxhost.
>>> > > what can be done is as follows.
>>> > >
>>> > > we can add a current_az field  to the Destination object
>>> > >
>>> > >
>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
>>> > > The conductor can read the instance.AZ and populate it in that new
>>> field.
>>> > > We can then add a new weigher to prefer hosts that are in the same
>>> az.
>>> > >
>>> > >
>>> >
>>> > I tend to disagree this approach as people would think that the
>>> > Destination.az field would be related to the current AZ for an
>>> instance,
>>> > while we only look at the original AZ.
>>> > That being said, we could have a weigher that would look at whether the
>>> > host is in the same AZ than the instance.host.
>>> you miss understood what i wrote
>>>
>>> i suggested addint Destination.current_az to store teh curernt AZ of the
>>> instance before scheduling.
>>>
>>> so my proposal is if RequestSpec.AZ is not set and
>>> Destination.current_az is set then the new
>>> weigher would prefer hosts that are in the same az as
>>> Destination.current_az
>>>
>>> we coudl also call Destination.current_az Destination.prefered_az
>>>
>>>
>> I meant, I think we don't need to provide a new field, we can already
>> know about what host an existing instance uses if we want (using [1])
>> Anyway, let's stop to discuss about it here, we should rather review that
>> for a Launchpad blueprint or more a spec.
>>
>> -Sylvain
>>
>> [1]
>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370
>>
>>> >
>>> >
>>> > This will provide soft AZ affinity for the vm and preserve the fact
>>> that if
>>> > > a vm is created without sepcifying
>>> > > An AZ the expectaiton at the api level woudl be that it can migrate
>>> to any
>>> > > AZ.
>>> > >
>>> > > To provide hard AZ affintiy we could also add prefileter that would
>>> use
>>> > > the same data but instead include it in the
>>> > > placement query so that only the current AZ is considered. This
>>> would have
>>> > > to be disabled by default.
>>> > >
>>> > >
>>> > Sure, we could create a new prefilter so we could then deprecate the
>>> > AZFilter if we want.
>>> we already have an AZ prefilter and the AZFilter is deprecate for removal
>>> i ment to delete it in zed but did not have time to do it in zed of
>>> Antielope
>>> i deprecated the AZ| filter in
>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
>>> xena when i enabeld the az prefilter by default.
>>>
>>>
>> Ah whoops, indeed I forgot the fact we already have the prefilter, so the
>> hard support for AZ is already existing.
>>
>>
>>> i will try an delete teh AZ filter before m1 if others dont.
>>>
>>
>> OK.
>>
>>
>>> >
>>> >
>>> > > That woudl allow operators to choose the desired behavior.
>>> > > curret behavior (disable weigher and dont enabel prefilter)
>>> > > new default, prefer current AZ (weigher enabeld prefilter disabled)
>>> > > hard affintiy(prefilter enabled.)
>>> > >
>>> > > there are other ways to approch this but updating the request spec
>>> is not
>>> > > one of them.
>>> > > we have to maintain the fact the enduser did not request an AZ.
>>> > >
>>> > >
>>> > Anyway, if folks want to discuss about AZs, this week is the good time
>>> :-)
>>> >
>>> >
>>> > > >
>>> > > > -Sylvain
>>> > > >
>>> > > >
>>> > > >
>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>>> > > nguyenhuukhoinw at gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hello guys.
>>> > > > > > I playing with Nova AZ and Masakari
>>> > > > > >
>>> > > > > >
>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>> > > > > >
>>> > > > > > Masakari will move server by nova scheduler.
>>> > > > > >
>>> > > > > > Openstack Docs describe that:
>>> > > > > >
>>> > > > > > If the server was not created in a specific zone then it is
>>> free to
>>> > > be
>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
>>> > > > > > <
>>> > >
>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
>>> >
>>> > > is
>>> > > > > > a no-op.
>>> > > > > >
>>> > > > > > I see that everyone usually creates instances with "Any
>>> Availability
>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating
>>> > > instances by
>>> > > > > > cli.
>>> > > > > >
>>> > > > > > By this way, when we use Masakari or we miragrated instances(
>>> or
>>> > > > > > evacuate) so our instance will be moved to other zones.
>>> > > > > >
>>> > > > > > Can we attach AZ to server create requests API based on Any
>>> > > > > > Availability Zone to limit instances moved to other zones?
>>> > > > > >
>>> > > > > > Thank you. Regards
>>> > > > > >
>>> > > > > > Nguyen Huu Khoi
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Rafael Weing?rtner
>>> > > > >
>>> > >
>>> > >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/9e78c3e0/attachment-0001.htm>

From ralonsoh at redhat.com  Wed Mar 29 08:28:13 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Wed, 29 Mar 2023 10:28:13 +0200
Subject: [neutron][ptg] Today's agenda
Message-ID: <CAECr9X5mMA3vBTb7QgbafJxQR_GHmMRSnTz09m-H5L4Pzys9Xw@mail.gmail.com>

Hello all:

This is a quick summary of the agenda for today's meeting (starting at
13UTC):
* Status and questions about https://review.opendev.org/q/topic:port-hints
* IPv6 Prefix Delegation in OVN
* Neutron agents status (https://bugs.launchpad.net/neutron/+bug/2011422)
* DHCP IPv6 issues with metadata service (
https://bugs.launchpad.net/neutron/+bug/1953165)
* Operator hour

Please check the full agenda, topics and past logs in
https://etherpad.opendev.org/p/neutron-bobcat-ptg.

Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/7a89e811/attachment.htm>

From sbauza at redhat.com  Wed Mar 29 09:49:42 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 29 Mar 2023 11:49:42 +0200
Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks
In-Reply-To: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com>
References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com>
Message-ID: <CALOCmum55w++hnN2EUNpBfN1qqJ7jH9OauHB41Q_hGzCo24cFg@mail.gmail.com>

Heh, you missed my email ;-)

https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html

No worries tho :)

Le mar. 28 mars 2023 ? 23:11, melanie witt <melwittt at gmail.com> a ?crit :

> Hey all,
>
> Sending this to the ML to try to help others who may not have heard
> about it yet. I didn't know about myself until I saw the job fail on a
> random nova patch and I looked around to figure out why.
>
> It looks like openstack-tox-pep8 is failing 100% in nova due to a newer
> version of the mypy library being pulled in since a recent
> upper-constraints bump:
>
>    https://review.opendev.org/c/openstack/requirements/+/872065
>
> And the job isn't going to pass until the following fix merges:
>
>    https://review.opendev.org/c/openstack/nova/+/878693
>
> Finally, to help prevent a breakage for this in the future, a
> cross-nova-pep8 job has been approved:
>
>    https://review.opendev.org/c/openstack/requirements/+/878748
>
> and will be on its way to the gate once the aforementioned fix merges.
>
> I'll post a reply to this email when it's OK to do rechecks again.
>
> Cheers,
> -melwitt
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/a7d1d4c3/attachment.htm>

From sbauza at redhat.com  Wed Mar 29 10:05:13 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 29 Mar 2023 12:05:13 +0200
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRcUSTxfJodFijJJA4FqeW+=uxP__xbRFV9MjaJ26wwFBA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
 <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
 <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>
 <CABAODRcUSTxfJodFijJJA4FqeW+=uxP__xbRFV9MjaJ26wwFBA@mail.gmail.com>
Message-ID: <CALOCmum1PwzFYEG-Go1KMjHk5L4RGNEFRabgw0vJ5QaFG9yUMg@mail.gmail.com>

Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com> a
?crit :

> Hello.
> I have one question.
> Follow this
>
> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>
> If the server was not created in a specific zone then it is free to be
> moved to other zones. but when I use
>
> openstack server show [server id]
>
> I still see the "OS-EXT-AZ:availability_zone" value belonging to my
> instance.
>
>
Correct, this is normal. If the operators creates some AZs, then the
enduser should see where the instance in which AZ.


> Could you tell the difference which causes "if the server was not created
> in a specific zone then it is free to be moved to other zones."
>
>
To be clear, an operator can create Availability Zones. Those AZs can then
be seen by an enduser using the os-availability-zones API [1]. Then, either
the enduser wants to use a specific AZ for their next instance creation
(and if so, he/she adds --availability-zone parameter to their instance
creation client) or they don't want and then they don't provide this
parameter.

If they provide this parameter, then the server will be created only in one
host in the specific AZ and then when moving the instance later, it will
continue to move to any host within the same AZ.
If they *don't* provide this parameter, then depending on the
default_schedule_zone config option, either the instance will eventually
use a specific AZ (and then it's like if the enduser was asking for this
AZ), or none of AZ is requested and then the instance can be created and
moved between any hosts within *all* AZs.

That being said, as I said earlier, the enduser can still verify the AZ
from where the instance is by the server show parameter you told.

We also have a documentation explaining about Availability Zones, maybe
this would help you more to understand about AZs :
https://docs.openstack.org/nova/latest/admin/availability-zones.html


[1]
https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone
(tbc, the enduser won't see the hosts, but they can see the list of
existing AZs)


> Nguyen Huu Khoi
>
>
> On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> wrote:
>
>> Hello guys.
>>
>> I just suggest to openstack nova works better. My story because
>>
>>
>>    1.
>>
>>    The server was created in a specific zone with the POST /servers request
>>    containing the availability_zone parameter.
>>
>> It will be nice when we attach randow zone when we create instances then
>> It will only move to the same zone when migrating or masakari ha.
>>
>> Currently we can force it to zone by default zone shedule in nova.conf.
>>
>> Sorry because I am new to Openstack and I am just an operator. I try to
>> verify some real cases.
>>
>>
>>
>> Nguyen Huu Khoi
>>
>>
>> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza <sbauza at redhat.com> wrote:
>>
>>>
>>>
>>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a ?crit :
>>>
>>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
>>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a
>>>> ?crit :
>>>> >
>>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
>>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
>>>> > > > rafaelweingartner at gmail.com> a ?crit :
>>>> > > >
>>>> > > > > Hello Nguy?n H?u Kh?i,
>>>> > > > > You might want to take a look at:
>>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We
>>>> created a
>>>> > > patch
>>>> > > > > to avoid migrating VMs to any AZ, once the VM has been
>>>> bootstrapped in
>>>> > > an
>>>> > > > > AZ that has cross zone attache equals to false.
>>>> > > > >
>>>> > > > >
>>>> > > > Well, I'll provide some comments in the change, but I'm afraid we
>>>> can't
>>>> > > > just modify the request spec like you would want.
>>>> > > >
>>>> > > > Anyway, if you want to discuss about it in the vPTG, just add it
>>>> in the
>>>> > > > etherpad and add your IRC nick so we could try to find a time
>>>> where we
>>>> > > > could be discussing it :
>>>> https://etherpad.opendev.org/p/nova-bobcat-ptg
>>>> > > > Also, this kind of behaviour modification is more a new feature
>>>> than a
>>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we
>>>> could
>>>> > > better
>>>> > > > see it.
>>>> > >
>>>> > > i tought i left review feedback on that too that the approch was not
>>>> > > correct.
>>>> > > i guess i did not in the end.
>>>> > >
>>>> > > modifying the request spec as sylvain menthioned is not correct.
>>>> > > i disucssed this topic on irc a few weeks back with mohomad for
>>>> vxhost.
>>>> > > what can be done is as follows.
>>>> > >
>>>> > > we can add a current_az field  to the Destination object
>>>> > >
>>>> > >
>>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
>>>> > > The conductor can read the instance.AZ and populate it in that new
>>>> field.
>>>> > > We can then add a new weigher to prefer hosts that are in the same
>>>> az.
>>>> > >
>>>> > >
>>>> >
>>>> > I tend to disagree this approach as people would think that the
>>>> > Destination.az field would be related to the current AZ for an
>>>> instance,
>>>> > while we only look at the original AZ.
>>>> > That being said, we could have a weigher that would look at whether
>>>> the
>>>> > host is in the same AZ than the instance.host.
>>>> you miss understood what i wrote
>>>>
>>>> i suggested addint Destination.current_az to store teh curernt AZ of
>>>> the instance before scheduling.
>>>>
>>>> so my proposal is if RequestSpec.AZ is not set and
>>>> Destination.current_az is set then the new
>>>> weigher would prefer hosts that are in the same az as
>>>> Destination.current_az
>>>>
>>>> we coudl also call Destination.current_az Destination.prefered_az
>>>>
>>>>
>>> I meant, I think we don't need to provide a new field, we can already
>>> know about what host an existing instance uses if we want (using [1])
>>> Anyway, let's stop to discuss about it here, we should rather review
>>> that for a Launchpad blueprint or more a spec.
>>>
>>> -Sylvain
>>>
>>> [1]
>>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370
>>>
>>>> >
>>>> >
>>>> > This will provide soft AZ affinity for the vm and preserve the fact
>>>> that if
>>>> > > a vm is created without sepcifying
>>>> > > An AZ the expectaiton at the api level woudl be that it can migrate
>>>> to any
>>>> > > AZ.
>>>> > >
>>>> > > To provide hard AZ affintiy we could also add prefileter that would
>>>> use
>>>> > > the same data but instead include it in the
>>>> > > placement query so that only the current AZ is considered. This
>>>> would have
>>>> > > to be disabled by default.
>>>> > >
>>>> > >
>>>> > Sure, we could create a new prefilter so we could then deprecate the
>>>> > AZFilter if we want.
>>>> we already have an AZ prefilter and the AZFilter is deprecate for
>>>> removal
>>>> i ment to delete it in zed but did not have time to do it in zed of
>>>> Antielope
>>>> i deprecated the AZ| filter in
>>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
>>>> xena when i enabeld the az prefilter by default.
>>>>
>>>>
>>> Ah whoops, indeed I forgot the fact we already have the prefilter, so
>>> the hard support for AZ is already existing.
>>>
>>>
>>>> i will try an delete teh AZ filter before m1 if others dont.
>>>>
>>>
>>> OK.
>>>
>>>
>>>> >
>>>> >
>>>> > > That woudl allow operators to choose the desired behavior.
>>>> > > curret behavior (disable weigher and dont enabel prefilter)
>>>> > > new default, prefer current AZ (weigher enabeld prefilter disabled)
>>>> > > hard affintiy(prefilter enabled.)
>>>> > >
>>>> > > there are other ways to approch this but updating the request spec
>>>> is not
>>>> > > one of them.
>>>> > > we have to maintain the fact the enduser did not request an AZ.
>>>> > >
>>>> > >
>>>> > Anyway, if folks want to discuss about AZs, this week is the good
>>>> time :-)
>>>> >
>>>> >
>>>> > > >
>>>> > > > -Sylvain
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>>>> > > nguyenhuukhoinw at gmail.com>
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Hello guys.
>>>> > > > > > I playing with Nova AZ and Masakari
>>>> > > > > >
>>>> > > > > >
>>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>>> > > > > >
>>>> > > > > > Masakari will move server by nova scheduler.
>>>> > > > > >
>>>> > > > > > Openstack Docs describe that:
>>>> > > > > >
>>>> > > > > > If the server was not created in a specific zone then it is
>>>> free to
>>>> > > be
>>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
>>>> > > > > > <
>>>> > >
>>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
>>>> >
>>>> > > is
>>>> > > > > > a no-op.
>>>> > > > > >
>>>> > > > > > I see that everyone usually creates instances with "Any
>>>> Availability
>>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating
>>>> > > instances by
>>>> > > > > > cli.
>>>> > > > > >
>>>> > > > > > By this way, when we use Masakari or we miragrated instances(
>>>> or
>>>> > > > > > evacuate) so our instance will be moved to other zones.
>>>> > > > > >
>>>> > > > > > Can we attach AZ to server create requests API based on Any
>>>> > > > > > Availability Zone to limit instances moved to other zones?
>>>> > > > > >
>>>> > > > > > Thank you. Regards
>>>> > > > > >
>>>> > > > > > Nguyen Huu Khoi
>>>> > > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > > --
>>>> > > > > Rafael Weing?rtner
>>>> > > > >
>>>> > >
>>>> > >
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/39d6b893/attachment-0001.htm>

From sbauza at redhat.com  Wed Mar 29 10:32:26 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 29 Mar 2023 12:32:26 +0200
Subject: [nova][ptg] Today's agenda
Message-ID: <CALOCmumH_dAn5FZEGnxWHiP-SR7b4tmz1pbo4nDy1cAiELcUsQ@mail.gmail.com>

(just shamelessly stealing the idea from Neutron's team)

Hey foks,
Yesterday was a packed day but we didn't really progressed on a lot of
topics. Today I'm gonna propose a list of topics in order to improve our
sessions's visibility and in order to provide some timeboxing.

13:00 UTC - 14:45 UTC :
* How to make sure people can help to review ?
* Should we ask for some implementation before accepting a spec ?
* CI stability is a nightmare, let's fight over this
* Bobcat is a non-SLURP release
* Let's clean up our upgrade documentation
* Nova community outreach
* Clean-up our bug list by abandoning very old LP bug reports ?
* Summit/PTG : what could we be doing for the physical PTG ? (Will be 4
weeks before milestone-2)

15:00 UTC - 15:45 UTC :
* Nova/Manila cross-project session : Prevent share deletion while it's
attached to an instance

16:00 UTC - 17:00 UTC :
* When your instance is stuck due to hard affinity policies, what could we
do ?
* Users reported exhaustion of primary keys ('id') in some large tables
like system_metadata. How could we achieve a data migration from sa.Integer
to sa.BigInteger ?


Details can be found in https://etherpad.opendev.org/p/nova-bobcat-ptg#L192
I assume the packed agenda, particularly the first two hours. As a
reminder, people are welcome to add their IRC nicks in each of the courtesy
ping list of the related topic.

Hope this gives you a taste of joining the Nova PTG today.

-Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/429f0872/attachment.htm>

From nguyenhuukhoinw at gmail.com  Wed Mar 29 12:38:05 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 29 Mar 2023 19:38:05 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CALOCmum1PwzFYEG-Go1KMjHk5L4RGNEFRabgw0vJ5QaFG9yUMg@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
 <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
 <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>
 <CABAODRcUSTxfJodFijJJA4FqeW+=uxP__xbRFV9MjaJ26wwFBA@mail.gmail.com>
 <CALOCmum1PwzFYEG-Go1KMjHk5L4RGNEFRabgw0vJ5QaFG9yUMg@mail.gmail.com>
Message-ID: <CABAODRfQS_5mtL08XAYnxiOVBU3tB-dDCALWW77mhf+DivD4MA@mail.gmail.com>

Yes. Thanks, but the things I would like to know: after instances are
created, how do we know if it was launched with specified AZ or without it?
I mean the way to distinguish between specified instances and non specified
instances?

Nguyen Huu Khoi


On Wed, Mar 29, 2023 at 5:05?PM Sylvain Bauza <sbauza at redhat.com> wrote:

>
>
> Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
> a ?crit :
>
>> Hello.
>> I have one question.
>> Follow this
>>
>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>
>> If the server was not created in a specific zone then it is free to be
>> moved to other zones. but when I use
>>
>> openstack server show [server id]
>>
>> I still see the "OS-EXT-AZ:availability_zone" value belonging to my
>> instance.
>>
>>
> Correct, this is normal. If the operators creates some AZs, then the
> enduser should see where the instance in which AZ.
>
>
>> Could you tell the difference which causes "if the server was not
>> created in a specific zone then it is free to be moved to other zones."
>>
>>
> To be clear, an operator can create Availability Zones. Those AZs can then
> be seen by an enduser using the os-availability-zones API [1]. Then, either
> the enduser wants to use a specific AZ for their next instance creation
> (and if so, he/she adds --availability-zone parameter to their instance
> creation client) or they don't want and then they don't provide this
> parameter.
>
> If they provide this parameter, then the server will be created only in
> one host in the specific AZ and then when moving the instance later, it
> will continue to move to any host within the same AZ.
> If they *don't* provide this parameter, then depending on the
> default_schedule_zone config option, either the instance will eventually
> use a specific AZ (and then it's like if the enduser was asking for this
> AZ), or none of AZ is requested and then the instance can be created and
> moved between any hosts within *all* AZs.
>
> That being said, as I said earlier, the enduser can still verify the AZ
> from where the instance is by the server show parameter you told.
>
> We also have a documentation explaining about Availability Zones, maybe
> this would help you more to understand about AZs :
> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>
>
> [1]
> https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone
> (tbc, the enduser won't see the hosts, but they can see the list of
> existing AZs)
>
>
>
>> Nguyen Huu Khoi
>>
>>
>> On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i <
>> nguyenhuukhoinw at gmail.com> wrote:
>>
>>> Hello guys.
>>>
>>> I just suggest to openstack nova works better. My story because
>>>
>>>
>>>    1.
>>>
>>>    The server was created in a specific zone with the POST /servers request
>>>    containing the availability_zone parameter.
>>>
>>> It will be nice when we attach randow zone when we create instances then
>>> It will only move to the same zone when migrating or masakari ha.
>>>
>>> Currently we can force it to zone by default zone shedule in nova.conf.
>>>
>>> Sorry because I am new to Openstack and I am just an operator. I try to
>>> verify some real cases.
>>>
>>>
>>>
>>> Nguyen Huu Khoi
>>>
>>>
>>> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza <sbauza at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a
>>>> ?crit :
>>>>
>>>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
>>>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a
>>>>> ?crit :
>>>>> >
>>>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
>>>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
>>>>> > > > rafaelweingartner at gmail.com> a ?crit :
>>>>> > > >
>>>>> > > > > Hello Nguy?n H?u Kh?i,
>>>>> > > > > You might want to take a look at:
>>>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We
>>>>> created a
>>>>> > > patch
>>>>> > > > > to avoid migrating VMs to any AZ, once the VM has been
>>>>> bootstrapped in
>>>>> > > an
>>>>> > > > > AZ that has cross zone attache equals to false.
>>>>> > > > >
>>>>> > > > >
>>>>> > > > Well, I'll provide some comments in the change, but I'm afraid
>>>>> we can't
>>>>> > > > just modify the request spec like you would want.
>>>>> > > >
>>>>> > > > Anyway, if you want to discuss about it in the vPTG, just add it
>>>>> in the
>>>>> > > > etherpad and add your IRC nick so we could try to find a time
>>>>> where we
>>>>> > > > could be discussing it :
>>>>> https://etherpad.opendev.org/p/nova-bobcat-ptg
>>>>> > > > Also, this kind of behaviour modification is more a new feature
>>>>> than a
>>>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we
>>>>> could
>>>>> > > better
>>>>> > > > see it.
>>>>> > >
>>>>> > > i tought i left review feedback on that too that the approch was
>>>>> not
>>>>> > > correct.
>>>>> > > i guess i did not in the end.
>>>>> > >
>>>>> > > modifying the request spec as sylvain menthioned is not correct.
>>>>> > > i disucssed this topic on irc a few weeks back with mohomad for
>>>>> vxhost.
>>>>> > > what can be done is as follows.
>>>>> > >
>>>>> > > we can add a current_az field  to the Destination object
>>>>> > >
>>>>> > >
>>>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
>>>>> > > The conductor can read the instance.AZ and populate it in that new
>>>>> field.
>>>>> > > We can then add a new weigher to prefer hosts that are in the same
>>>>> az.
>>>>> > >
>>>>> > >
>>>>> >
>>>>> > I tend to disagree this approach as people would think that the
>>>>> > Destination.az field would be related to the current AZ for an
>>>>> instance,
>>>>> > while we only look at the original AZ.
>>>>> > That being said, we could have a weigher that would look at whether
>>>>> the
>>>>> > host is in the same AZ than the instance.host.
>>>>> you miss understood what i wrote
>>>>>
>>>>> i suggested addint Destination.current_az to store teh curernt AZ of
>>>>> the instance before scheduling.
>>>>>
>>>>> so my proposal is if RequestSpec.AZ is not set and
>>>>> Destination.current_az is set then the new
>>>>> weigher would prefer hosts that are in the same az as
>>>>> Destination.current_az
>>>>>
>>>>> we coudl also call Destination.current_az Destination.prefered_az
>>>>>
>>>>>
>>>> I meant, I think we don't need to provide a new field, we can already
>>>> know about what host an existing instance uses if we want (using [1])
>>>> Anyway, let's stop to discuss about it here, we should rather review
>>>> that for a Launchpad blueprint or more a spec.
>>>>
>>>> -Sylvain
>>>>
>>>> [1]
>>>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370
>>>>
>>>>> >
>>>>> >
>>>>> > This will provide soft AZ affinity for the vm and preserve the fact
>>>>> that if
>>>>> > > a vm is created without sepcifying
>>>>> > > An AZ the expectaiton at the api level woudl be that it can
>>>>> migrate to any
>>>>> > > AZ.
>>>>> > >
>>>>> > > To provide hard AZ affintiy we could also add prefileter that
>>>>> would use
>>>>> > > the same data but instead include it in the
>>>>> > > placement query so that only the current AZ is considered. This
>>>>> would have
>>>>> > > to be disabled by default.
>>>>> > >
>>>>> > >
>>>>> > Sure, we could create a new prefilter so we could then deprecate the
>>>>> > AZFilter if we want.
>>>>> we already have an AZ prefilter and the AZFilter is deprecate for
>>>>> removal
>>>>> i ment to delete it in zed but did not have time to do it in zed of
>>>>> Antielope
>>>>> i deprecated the AZ| filter in
>>>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
>>>>> xena when i enabeld the az prefilter by default.
>>>>>
>>>>>
>>>> Ah whoops, indeed I forgot the fact we already have the prefilter, so
>>>> the hard support for AZ is already existing.
>>>>
>>>>
>>>>> i will try an delete teh AZ filter before m1 if others dont.
>>>>>
>>>>
>>>> OK.
>>>>
>>>>
>>>>> >
>>>>> >
>>>>> > > That woudl allow operators to choose the desired behavior.
>>>>> > > curret behavior (disable weigher and dont enabel prefilter)
>>>>> > > new default, prefer current AZ (weigher enabeld prefilter disabled)
>>>>> > > hard affintiy(prefilter enabled.)
>>>>> > >
>>>>> > > there are other ways to approch this but updating the request spec
>>>>> is not
>>>>> > > one of them.
>>>>> > > we have to maintain the fact the enduser did not request an AZ.
>>>>> > >
>>>>> > >
>>>>> > Anyway, if folks want to discuss about AZs, this week is the good
>>>>> time :-)
>>>>> >
>>>>> >
>>>>> > > >
>>>>> > > > -Sylvain
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>>>>> > > nguyenhuukhoinw at gmail.com>
>>>>> > > > > wrote:
>>>>> > > > >
>>>>> > > > > > Hello guys.
>>>>> > > > > > I playing with Nova AZ and Masakari
>>>>> > > > > >
>>>>> > > > > >
>>>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>>>> > > > > >
>>>>> > > > > > Masakari will move server by nova scheduler.
>>>>> > > > > >
>>>>> > > > > > Openstack Docs describe that:
>>>>> > > > > >
>>>>> > > > > > If the server was not created in a specific zone then it is
>>>>> free to
>>>>> > > be
>>>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
>>>>> > > > > > <
>>>>> > >
>>>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
>>>>> >
>>>>> > > is
>>>>> > > > > > a no-op.
>>>>> > > > > >
>>>>> > > > > > I see that everyone usually creates instances with "Any
>>>>> Availability
>>>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating
>>>>> > > instances by
>>>>> > > > > > cli.
>>>>> > > > > >
>>>>> > > > > > By this way, when we use Masakari or we miragrated
>>>>> instances( or
>>>>> > > > > > evacuate) so our instance will be moved to other zones.
>>>>> > > > > >
>>>>> > > > > > Can we attach AZ to server create requests API based on Any
>>>>> > > > > > Availability Zone to limit instances moved to other zones?
>>>>> > > > > >
>>>>> > > > > > Thank you. Regards
>>>>> > > > > >
>>>>> > > > > > Nguyen Huu Khoi
>>>>> > > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > --
>>>>> > > > > Rafael Weing?rtner
>>>>> > > > >
>>>>> > >
>>>>> > >
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/0bce636d/attachment-0001.htm>

From elod.illes at est.tech  Wed Mar 29 13:24:09 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Wed, 29 Mar 2023 13:24:09 +0000
Subject: [PTL][release][stable][EM] Extended Maintenance - Xena
Message-ID: <DB8P189MB0745A204151B98AEDDC8FE6CFF899@DB8P189MB0745.EURP189.PROD.OUTLOOK.COM>

Hi teams,

As 2023.1 Antelope was released last week and we are in a less busy
period, now is the good time to call your attention to the following:

In less than a month Xena is planned to transition to Extended
Maintenance phase [1] (planned date: 2023-04-20).

I have generated the list of the current *open* and *unreleased*
changes in stable/xena for every repositories [2] (where there are
such patches). These lists could help the teams who are planning to
do a *final* release on Xena before moving stable/xena branches to
Extended Maintenance. Feel free to edit and extend these lists to
track your team's progress!

Note that the *latest* Xena *release* tagging patches ('xena-em' tag)
have been generated too in advance [3], please mark with a -1 if your
team plans to do a final release, or +1 if the team is ready for the
transition.

The schedule from now on is as follows:
* patches with +1 from PTL / release liaison will be merged, thus
  those repositories will transition to Extended Maintenance
* at the planned deadline (April 20th) the Release Team will merge all
  of the transition patches (even the ones without any response!)
* after the transition, stable/xena will be still open for bug fixes,
  but there won't be official releases anymore.

*NOTE*: teams, please focus on wrapping up your libraries first if
there is any concern about the changes, in order to avoid broken
(final!!) releases!

[1] https://releases.openstack.org/
[2] https://etherpad.opendev.org/p/xena-final-release-before-em
[3] https://review.opendev.org/q/topic:xena-em

Thanks,

El?d
irc: elodilles @ #openstack-release
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/ebef31c6/attachment.htm>

From nguyenhuukhoinw at gmail.com  Wed Mar 29 13:26:26 2023
From: nguyenhuukhoinw at gmail.com (=?UTF-8?B?Tmd1eeG7hW4gSOG7r3UgS2jDtGk=?=)
Date: Wed, 29 Mar 2023 20:26:26 +0700
Subject: [horizon][nova][masakari] Instances created with "Any AZ" problem
In-Reply-To: <CABAODRfQS_5mtL08XAYnxiOVBU3tB-dDCALWW77mhf+DivD4MA@mail.gmail.com>
References: <CABAODRfg-m=3_c9ynAW6ZT5sqL=2Q5zwSGB=fX+q+NtBg86=Ew@mail.gmail.com>
 <CAG97rac6PFLNq7AMZe8hLgqF9tYqTbxx0Uj84aF-ux=XAW+vRQ@mail.gmail.com>
 <CALOCmump0kJ-2BMri-wP=8frgKAofqEj9BmDfkZgetK1uXfOGA@mail.gmail.com>
 <173abaa4b89efc8594b08c1c256bc873f3192828.camel@redhat.com>
 <CALOCmumceqTz-BDX2OOdvLi1gbQGsDJpnsh5RT2z+nAjQtvMQA@mail.gmail.com>
 <c0ad091d2030ed7416c2117ad9e9debff60941e6.camel@redhat.com>
 <CALOCmumSkbPcwfo9LhpCd=A6tFveeVFm=EtojKX6v1E3pa_9Ug@mail.gmail.com>
 <CABAODRe_vYVnTiG+QnkJdawiaXeRCiLp8o5c4tZLa3AsGggVLw@mail.gmail.com>
 <CABAODRcUSTxfJodFijJJA4FqeW+=uxP__xbRFV9MjaJ26wwFBA@mail.gmail.com>
 <CALOCmum1PwzFYEG-Go1KMjHk5L4RGNEFRabgw0vJ5QaFG9yUMg@mail.gmail.com>
 <CABAODRfQS_5mtL08XAYnxiOVBU3tB-dDCALWW77mhf+DivD4MA@mail.gmail.com>
Message-ID: <CABAODReEPT0iJRM3oU+S6RD4NjwNXyfmcU6O8=qzq7WQoudbEg@mail.gmail.com>

"If they *don't* provide this parameter, then depending on the
default_schedule_zone config option, either the instance will eventually
use a specific AZ (and then it's like if the enduser was asking for this
AZ), or none of AZ is requested and then the instance can be created and
moved between any hosts within *all* AZs."

I ask aftet that, although without az when launch instances but they still
have az. But i still mv to diffent host in diffent az when mirgrating or
spawn which masakari. i am not clear, I tested.


On Wed, Mar 29, 2023, 7:38 PM Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
wrote:

> Yes. Thanks, but the things I would like to know: after instances are
> created, how do we know if it was launched with specified AZ or without it?
> I mean the way to distinguish between specified instances and non specified
> instances?
>
> Nguyen Huu Khoi
>
>
> On Wed, Mar 29, 2023 at 5:05?PM Sylvain Bauza <sbauza at redhat.com> wrote:
>
>>
>>
>> Le mer. 29 mars 2023 ? 08:06, Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com>
>> a ?crit :
>>
>>> Hello.
>>> I have one question.
>>> Follow this
>>>
>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>>
>>> If the server was not created in a specific zone then it is free to be
>>> moved to other zones. but when I use
>>>
>>> openstack server show [server id]
>>>
>>> I still see the "OS-EXT-AZ:availability_zone" value belonging to my
>>> instance.
>>>
>>>
>> Correct, this is normal. If the operators creates some AZs, then the
>> enduser should see where the instance in which AZ.
>>
>>
>>> Could you tell the difference which causes "if the server was not
>>> created in a specific zone then it is free to be moved to other zones."
>>>
>>>
>> To be clear, an operator can create Availability Zones. Those AZs can
>> then be seen by an enduser using the os-availability-zones API [1]. Then,
>> either the enduser wants to use a specific AZ for their next instance
>> creation (and if so, he/she adds --availability-zone parameter to their
>> instance creation client) or they don't want and then they don't provide
>> this parameter.
>>
>> If they provide this parameter, then the server will be created only in
>> one host in the specific AZ and then when moving the instance later, it
>> will continue to move to any host within the same AZ.
>> If they *don't* provide this parameter, then depending on the
>> default_schedule_zone config option, either the instance will eventually
>> use a specific AZ (and then it's like if the enduser was asking for this
>> AZ), or none of AZ is requested and then the instance can be created and
>> moved between any hosts within *all* AZs.
>>
>> That being said, as I said earlier, the enduser can still verify the AZ
>> from where the instance is by the server show parameter you told.
>>
>> We also have a documentation explaining about Availability Zones, maybe
>> this would help you more to understand about AZs :
>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>
>>
>> [1]
>> https://docs.openstack.org/api-ref/compute/#availability-zones-os-availability-zone
>> (tbc, the enduser won't see the hosts, but they can see the list of
>> existing AZs)
>>
>>
>>
>>> Nguyen Huu Khoi
>>>
>>>
>>> On Mon, Mar 27, 2023 at 8:37?PM Nguy?n H?u Kh?i <
>>> nguyenhuukhoinw at gmail.com> wrote:
>>>
>>>> Hello guys.
>>>>
>>>> I just suggest to openstack nova works better. My story because
>>>>
>>>>
>>>>    1.
>>>>
>>>>    The server was created in a specific zone with the POST /servers request
>>>>    containing the availability_zone parameter.
>>>>
>>>> It will be nice when we attach randow zone when we create instances
>>>> then It will only move to the same zone when migrating or masakari ha.
>>>>
>>>> Currently we can force it to zone by default zone shedule in nova.conf.
>>>>
>>>> Sorry because I am new to Openstack and I am just an operator. I try to
>>>> verify some real cases.
>>>>
>>>>
>>>>
>>>> Nguyen Huu Khoi
>>>>
>>>>
>>>> On Mon, Mar 27, 2023 at 7:43?PM Sylvain Bauza <sbauza at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> Le lun. 27 mars 2023 ? 14:28, Sean Mooney <smooney at redhat.com> a
>>>>> ?crit :
>>>>>
>>>>>> On Mon, 2023-03-27 at 14:06 +0200, Sylvain Bauza wrote:
>>>>>> > Le lun. 27 mars 2023 ? 13:51, Sean Mooney <smooney at redhat.com> a
>>>>>> ?crit :
>>>>>> >
>>>>>> > > On Mon, 2023-03-27 at 10:19 +0200, Sylvain Bauza wrote:
>>>>>> > > > Le dim. 26 mars 2023 ? 14:30, Rafael Weing?rtner <
>>>>>> > > > rafaelweingartner at gmail.com> a ?crit :
>>>>>> > > >
>>>>>> > > > > Hello Nguy?n H?u Kh?i,
>>>>>> > > > > You might want to take a look at:
>>>>>> > > > > https://review.opendev.org/c/openstack/nova/+/864760. We
>>>>>> created a
>>>>>> > > patch
>>>>>> > > > > to avoid migrating VMs to any AZ, once the VM has been
>>>>>> bootstrapped in
>>>>>> > > an
>>>>>> > > > > AZ that has cross zone attache equals to false.
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > Well, I'll provide some comments in the change, but I'm afraid
>>>>>> we can't
>>>>>> > > > just modify the request spec like you would want.
>>>>>> > > >
>>>>>> > > > Anyway, if you want to discuss about it in the vPTG, just add
>>>>>> it in the
>>>>>> > > > etherpad and add your IRC nick so we could try to find a time
>>>>>> where we
>>>>>> > > > could be discussing it :
>>>>>> https://etherpad.opendev.org/p/nova-bobcat-ptg
>>>>>> > > > Also, this kind of behaviour modification is more a new feature
>>>>>> than a
>>>>>> > > > bugfix, so fwiw you should create a launchpad blueprint so we
>>>>>> could
>>>>>> > > better
>>>>>> > > > see it.
>>>>>> > >
>>>>>> > > i tought i left review feedback on that too that the approch was
>>>>>> not
>>>>>> > > correct.
>>>>>> > > i guess i did not in the end.
>>>>>> > >
>>>>>> > > modifying the request spec as sylvain menthioned is not correct.
>>>>>> > > i disucssed this topic on irc a few weeks back with mohomad for
>>>>>> vxhost.
>>>>>> > > what can be done is as follows.
>>>>>> > >
>>>>>> > > we can add a current_az field  to the Destination object
>>>>>> > >
>>>>>> > >
>>>>>> https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1092-L1122
>>>>>> > > The conductor can read the instance.AZ and populate it in that
>>>>>> new field.
>>>>>> > > We can then add a new weigher to prefer hosts that are in the
>>>>>> same az.
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> > I tend to disagree this approach as people would think that the
>>>>>> > Destination.az field would be related to the current AZ for an
>>>>>> instance,
>>>>>> > while we only look at the original AZ.
>>>>>> > That being said, we could have a weigher that would look at whether
>>>>>> the
>>>>>> > host is in the same AZ than the instance.host.
>>>>>> you miss understood what i wrote
>>>>>>
>>>>>> i suggested addint Destination.current_az to store teh curernt AZ of
>>>>>> the instance before scheduling.
>>>>>>
>>>>>> so my proposal is if RequestSpec.AZ is not set and
>>>>>> Destination.current_az is set then the new
>>>>>> weigher would prefer hosts that are in the same az as
>>>>>> Destination.current_az
>>>>>>
>>>>>> we coudl also call Destination.current_az Destination.prefered_az
>>>>>>
>>>>>>
>>>>> I meant, I think we don't need to provide a new field, we can already
>>>>> know about what host an existing instance uses if we want (using [1])
>>>>> Anyway, let's stop to discuss about it here, we should rather review
>>>>> that for a Launchpad blueprint or more a spec.
>>>>>
>>>>> -Sylvain
>>>>>
>>>>> [1]
>>>>> https://github.com/openstack/nova/blob/b9a49ffb04cb5ae2d8c439361a3552296df02988/nova/scheduler/host_manager.py#L369-L370
>>>>>
>>>>>> >
>>>>>> >
>>>>>> > This will provide soft AZ affinity for the vm and preserve the fact
>>>>>> that if
>>>>>> > > a vm is created without sepcifying
>>>>>> > > An AZ the expectaiton at the api level woudl be that it can
>>>>>> migrate to any
>>>>>> > > AZ.
>>>>>> > >
>>>>>> > > To provide hard AZ affintiy we could also add prefileter that
>>>>>> would use
>>>>>> > > the same data but instead include it in the
>>>>>> > > placement query so that only the current AZ is considered. This
>>>>>> would have
>>>>>> > > to be disabled by default.
>>>>>> > >
>>>>>> > >
>>>>>> > Sure, we could create a new prefilter so we could then deprecate the
>>>>>> > AZFilter if we want.
>>>>>> we already have an AZ prefilter and the AZFilter is deprecate for
>>>>>> removal
>>>>>> i ment to delete it in zed but did not have time to do it in zed of
>>>>>> Antielope
>>>>>> i deprecated the AZ| filter in
>>>>>> https://github.com/openstack/nova/commit/7c7a2a142d74a7deeda2a79baf21b689fe32cd08
>>>>>> xena when i enabeld the az prefilter by default.
>>>>>>
>>>>>>
>>>>> Ah whoops, indeed I forgot the fact we already have the prefilter, so
>>>>> the hard support for AZ is already existing.
>>>>>
>>>>>
>>>>>> i will try an delete teh AZ filter before m1 if others dont.
>>>>>>
>>>>>
>>>>> OK.
>>>>>
>>>>>
>>>>>> >
>>>>>> >
>>>>>> > > That woudl allow operators to choose the desired behavior.
>>>>>> > > curret behavior (disable weigher and dont enabel prefilter)
>>>>>> > > new default, prefer current AZ (weigher enabeld prefilter
>>>>>> disabled)
>>>>>> > > hard affintiy(prefilter enabled.)
>>>>>> > >
>>>>>> > > there are other ways to approch this but updating the request
>>>>>> spec is not
>>>>>> > > one of them.
>>>>>> > > we have to maintain the fact the enduser did not request an AZ.
>>>>>> > >
>>>>>> > >
>>>>>> > Anyway, if folks want to discuss about AZs, this week is the good
>>>>>> time :-)
>>>>>> >
>>>>>> >
>>>>>> > > >
>>>>>> > > > -Sylvain
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > > On Sun, Mar 26, 2023 at 8:20?AM Nguy?n H?u Kh?i <
>>>>>> > > nguyenhuukhoinw at gmail.com>
>>>>>> > > > > wrote:
>>>>>> > > > >
>>>>>> > > > > > Hello guys.
>>>>>> > > > > > I playing with Nova AZ and Masakari
>>>>>> > > > > >
>>>>>> > > > > >
>>>>>> https://docs.openstack.org/nova/latest/admin/availability-zones.html
>>>>>> > > > > >
>>>>>> > > > > > Masakari will move server by nova scheduler.
>>>>>> > > > > >
>>>>>> > > > > > Openstack Docs describe that:
>>>>>> > > > > >
>>>>>> > > > > > If the server was not created in a specific zone then it is
>>>>>> free to
>>>>>> > > be
>>>>>> > > > > > moved to other zones, i.e. the AvailabilityZoneFilter
>>>>>> > > > > > <
>>>>>> > >
>>>>>> https://docs.openstack.org/nova/latest/admin/scheduling.html#availabilityzonefilter
>>>>>> >
>>>>>> > > is
>>>>>> > > > > > a no-op.
>>>>>> > > > > >
>>>>>> > > > > > I see that everyone usually creates instances with "Any
>>>>>> Availability
>>>>>> > > > > > Zone" on Horzion and also we don't specify AZ when creating
>>>>>> > > instances by
>>>>>> > > > > > cli.
>>>>>> > > > > >
>>>>>> > > > > > By this way, when we use Masakari or we miragrated
>>>>>> instances( or
>>>>>> > > > > > evacuate) so our instance will be moved to other zones.
>>>>>> > > > > >
>>>>>> > > > > > Can we attach AZ to server create requests API based on Any
>>>>>> > > > > > Availability Zone to limit instances moved to other zones?
>>>>>> > > > > >
>>>>>> > > > > > Thank you. Regards
>>>>>> > > > > >
>>>>>> > > > > > Nguyen Huu Khoi
>>>>>> > > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > > --
>>>>>> > > > > Rafael Weing?rtner
>>>>>> > > > >
>>>>>> > >
>>>>>> > >
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/9c48e08f/attachment-0001.htm>

From sbauza at redhat.com  Wed Mar 29 14:49:32 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 29 Mar 2023 16:49:32 +0200
Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks
In-Reply-To: <CALOCmum55w++hnN2EUNpBfN1qqJ7jH9OauHB41Q_hGzCo24cFg@mail.gmail.com>
References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com>
 <CALOCmum55w++hnN2EUNpBfN1qqJ7jH9OauHB41Q_hGzCo24cFg@mail.gmail.com>
Message-ID: <CALOCmu=vObrr0N9mwcyUr5HApfmoEoaEgFP-55n7aN0nYYRDww@mail.gmail.com>

Fix is merged, everyone can recheck (with a written reason ;) )

Le mer. 29 mars 2023 ? 11:49, Sylvain Bauza <sbauza at redhat.com> a ?crit :

> Heh, you missed my email ;-)
>
>
> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html
>
> No worries tho :)
>
> Le mar. 28 mars 2023 ? 23:11, melanie witt <melwittt at gmail.com> a ?crit :
>
>> Hey all,
>>
>> Sending this to the ML to try to help others who may not have heard
>> about it yet. I didn't know about myself until I saw the job fail on a
>> random nova patch and I looked around to figure out why.
>>
>> It looks like openstack-tox-pep8 is failing 100% in nova due to a newer
>> version of the mypy library being pulled in since a recent
>> upper-constraints bump:
>>
>>    https://review.opendev.org/c/openstack/requirements/+/872065
>>
>> And the job isn't going to pass until the following fix merges:
>>
>>    https://review.opendev.org/c/openstack/nova/+/878693
>>
>> Finally, to help prevent a breakage for this in the future, a
>> cross-nova-pep8 job has been approved:
>>
>>    https://review.opendev.org/c/openstack/requirements/+/878748
>>
>> and will be on its way to the gate once the aforementioned fix merges.
>>
>> I'll post a reply to this email when it's OK to do rechecks again.
>>
>> Cheers,
>> -melwitt
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/18348344/attachment.htm>

From sbauza at redhat.com  Wed Mar 29 14:51:10 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Wed, 29 Mar 2023 16:51:10 +0200
Subject: [nova] Hold your rechecks
In-Reply-To: <CALOCmunuvfsCo1M==q0bQ-uyrjGQvYF_PaEMg2zLa5+d3K-tDw@mail.gmail.com>
References: <CALOCmunuvfsCo1M==q0bQ-uyrjGQvYF_PaEMg2zLa5+d3K-tDw@mail.gmail.com>
Message-ID: <CALOCmunikri2trOYXUoyaXnU6GXwAFpjnvAdKD17ch0zb3e8PA@mail.gmail.com>

Le lun. 27 mars 2023 ? 17:28, Sylvain Bauza <sbauza at redhat.com> a ?crit :

> Hey,
>
> Due to the recent merge of
> https://review.opendev.org/c/openstack/requirements/+/872065/10/upper-constraints.txt#298
> we now use mypy==1.1.1 which includes a breaking behavioural change against
> our code :
>
> https://07de6a0c9e6ec0c6835f-ccccbfab26b1456f69293167016566bc.ssl.cf2.rackcdn.com/875621/10/gate/openstack-tox-pep8/e50f9f0/job-output.txt
>
> Thanks to Eric (kudos to him, he was quickier than me), we have a fix
> https://review.opendev.org/c/openstack/nova/+/878693
>
> Please accordingly hold your rechecks until that fix is merged.
>
>
Aaaaand this is done (after a few fights against CI failures). You can now
recheck with a reason like :

"recheck mypy upgrade issue is fixed by
Ie50c8d364ad9c339355cc138b560ec4df14fe307
<https://review.opendev.org/#/q/Ie50c8d364ad9c339355cc138b560ec4df14fe307>"


-Sylvain
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/5d4e2de2/attachment.htm>

From gmann at ghanshyammann.com  Wed Mar 29 15:15:33 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Wed, 29 Mar 2023 08:15:33 -0700
Subject: [ptl][tc] OpenStack packages PyPi additional external
 maintainers audit & cleanup
In-Reply-To: <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com>
References: <185d18a20aa.1206b91ad115363.5205111285046207324@ghanshyammann.com>
 <18709ff76be.10ad4bda1984477.2001967889741209449@ghanshyammann.com>
Message-ID: <1872df04490.f676662973065.8836276543901283423@ghanshyammann.com>

Hi Everyone,

Posting top of the email.

I am listing the projects that have not updated the status in etherpad; if you have any progress, please write in etherpad. If not
request you to plan the same while in vPTG?

- https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L43

* adjutant
* barbican
* cloudkitty
* cyborg
* designate
* ec2-api
* freezer
* heat
* kuryr
* mistral
* monasca
* murano
* octavia
* OpenStackSDK
* oslo
* rally
* Release Management
* requirements
* sahara
* senlin
* skyline
* solum
* storlets
* swift
* tacker
* Telemetry
* trove
* vitrage
* watcher
* winstackers
* zaqar
* zun

-gmann

 ---- On Wed, 22 Mar 2023 08:45:49 -0700  Ghanshyam Mann  wrote --- 
 >  ---- On Fri, 20 Jan 2023 15:36:08 -0800  Ghanshyam Mann  wrote --- 
 >  > Hi PTLs,
 >  > 
 >  > As you might know or have seen for your project package on PyPi, OpenStack deliverables on PyPi have
 >  > additional maintainers, For example, https://pypi.org/project/murano/, https://pypi.org/project/glance/
 >  > 
 >  > We should keep only  'openstackci' as a maintainer in PyPi so that releases of OpenStack deliverables
 >  > can be managed in a single place. Otherwise, we might face the two sets of maintainers' places and
 >  > packages might get released in PyPi by additional maintainers without the OpenStack project team
 >  > knowing about it. One such case is in Horizon repo 'xstatic-font-awesome' where a new maintainer is
 >  > added by an existing additional maintainer and this package was released without the Horizon team
 >  > knowing about the changes and release.
 >  > - https://github.com/openstack/xstatic-font-awesome/pull/2
 >  > 
 >  > To avoid the 'xstatic-font-awesome' case for other packages, TC discussed it in their weekly meetings[1]
 >  > and agreed to audit all the OpenStack packages and then clean up the additional maintainers in PyPi
 >  > (keep only 'openstackci' as maintainers). 
 >  > 
 >  > To help in this task, TC requests project PTL to perform the audit for their project's repo and add comments
 >  > in the below etherpad.
 >  > 
 >  > - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup
 > 
 > Hello Everyone,
 > 
 > To update, there is an extra step for project PTLs in this task:
 > 
 > * Step 1.1: Project PTL/team needs to communicate to the additional maintainers about removing themselves
 >  and transferring ownership to 'openstackci'
 >  - https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup#L23
 > 
 > Initially, TC thought we could do a cleanup with the help of openstackci admin for all repo. But, to avoid any issue
 > or misunderstanding/panic among additional maintainers on removal, it is better that projects communicate with
 > additional maintainers and ask them to remove themself. JayF sent the email format to communicate to additional
 > maintainers[1]. Please use that and let TC know if any queries/issues you are facing.
 >  
 > [1] https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032780.html
 > 
 > -gmann
 > 
 >  > 
 >  > Thanks to knikolla to automate the listing of the OpenStack packages with additional maintainers in PyPi which
 >  > you can find the result in output.txt at the bottom of this link. I have added the project list of who needs to check
 >  > their repo in etherpad.
 >  > 
 >  > - https://gist.github.com/knikolla/7303a65a5ddaa2be553fc6e54619a7a1
 >  > 
 >  > Please complete the audit for your project before March 15 so that TC can discuss the next step in vPTG.
 >  > 
 >  > [1] https://meetings.opendev.org/meetings/tc/2023/tc.2023-01-11-16.00.log.html#l-41
 >  > 
 >  > 
 >  > -gmann
 >  > 
 >  > 
 > 
 > 


From melwittt at gmail.com  Wed Mar 29 16:02:34 2023
From: melwittt at gmail.com (melanie witt)
Date: Wed, 29 Mar 2023 09:02:34 -0700
Subject: [nova] openstack-tox-pep8 job broken, hold your rechecks
In-Reply-To: <CALOCmum55w++hnN2EUNpBfN1qqJ7jH9OauHB41Q_hGzCo24cFg@mail.gmail.com>
References: <481ba0c8-7a20-23dd-f579-955c4d0c83b9@gmail.com>
 <CALOCmum55w++hnN2EUNpBfN1qqJ7jH9OauHB41Q_hGzCo24cFg@mail.gmail.com>
Message-ID: <3f221938-ef0d-b957-d4a9-be4bac28e402@gmail.com>

On 03/29/23 02:49, Sylvain Bauza wrote:
> Heh, you missed my email ;-)
> 
> https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html <https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032979.html>
> 
> No worries tho :)

Ugh, sorry. I searched for "[nova]" and skimmed for the words "broken" 
or "gate" or "CI" and managed to miss it accordingly.

Sorry about that.

> Le?mar. 28 mars 2023 ??23:11, melanie witt <melwittt at gmail.com 
> <mailto:melwittt at gmail.com>> a ?crit?:
> 
>     Hey all,
> 
>     Sending this to the ML to try to help others who may not have heard
>     about it yet. I didn't know about myself until I saw the job fail on a
>     random nova patch and I looked around to figure out why.
> 
>     It looks like openstack-tox-pep8 is failing 100% in nova due to a newer
>     version of the mypy library being pulled in since a recent
>     upper-constraints bump:
> 
>     https://review.opendev.org/c/openstack/requirements/+/872065
>     <https://review.opendev.org/c/openstack/requirements/+/872065>
> 
>     And the job isn't going to pass until the following fix merges:
> 
>     https://review.opendev.org/c/openstack/nova/+/878693
>     <https://review.opendev.org/c/openstack/nova/+/878693>
> 
>     Finally, to help prevent a breakage for this in the future, a
>     cross-nova-pep8 job has been approved:
> 
>     https://review.opendev.org/c/openstack/requirements/+/878748
>     <https://review.opendev.org/c/openstack/requirements/+/878748>
> 
>     and will be on its way to the gate once the aforementioned fix merges.
> 
>     I'll post a reply to this email when it's OK to do rechecks again.
> 
>     Cheers,
>     -melwitt
> 


From ihrachys at redhat.com  Wed Mar 29 16:45:26 2023
From: ihrachys at redhat.com (Ihar Hrachyshka)
Date: Wed, 29 Mar 2023 12:45:26 -0400
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <CAKwN9=C15QqWqbE2TU_BE57eRWDMXe3UciAJ_+MoHD3Z8rm4wQ@mail.gmail.com>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
 <3840757.STTH5IQzZg@p1>
 <CAKwN9=DQbon5=v+7O-hSyZnn1rRFYxNVBfJniMR8_TpZokZ5pw@mail.gmail.com>
 <CAECr9X675UpPz7NQy+_1vN6jXhyJgnu1Je5BiBCRM_CShJ4EVw@mail.gmail.com>
 <CAKwN9=C15QqWqbE2TU_BE57eRWDMXe3UciAJ_+MoHD3Z8rm4wQ@mail.gmail.com>
Message-ID: <CAKwN9=CWQPKvHUkx3ASJEW7ifM8d0+eoZFMSKK0v84wydJkB0g@mail.gmail.com>

To close the loop,

We had a very productive discussion of the topic during vPTG today.
Some of it is captured here:
https://etherpad.opendev.org/p/neutron-bobcat-ptg#L207 and below. Here
is the brief plus next steps.

In regards to api-ref definitions for stateless SG:
- it is agreed that it should explain the semantics and not only
mechanics of API fields;
- it is agreed that it should explain behavior of basic network services;
- it is agreed that basic network services that are expected to work
by default are things like ARP, DHCP; while metadata service is not; -
this will mimic what OVS implementation of stateless SG already does;
- it is agreed that these basic services that are expected to work
will work transparently, meaning no SG rules will be visible for them;
- this will mimic OVS implementation too.

Next steps:
- update api-ref stateless SG description to capture decisions above;
- update my neutron patch series to exclude metadata enablement;
- adjust tempest scenarios for stateless SG to not create explicit SG
rules for DHCPv6 stateless (there are already patches for that);
- clean up Launchpad bugs as per decisions above.

I will take care of the above in next days.

Thanks everyone,
Ihar

On Wed, Mar 22, 2023 at 12:55?PM Ihar Hrachyshka <ihrachys at redhat.com> wrote:
>
> On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez
> <ralonsoh at redhat.com> wrote:
> >
> > Hello:
> >
> > I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly.
>
> Thanks for this, I didn't realize that iptables may be considered prior art.
>
> >
> > The discussion here could be the default behaviour for standard services:
> > * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that.
> > * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance.
>
> Agreed DHCPv6 rules are closer to "base" and that the argument for RA
> / NA flows is stronger because of the parallel to DHCPv4 operation.
>
> > * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2]
>
> At this point I am more ambivalent to the decision of whether to
> include metadata into the list of "base" services, as long as we
> define the list (behavior) in api-ref. But to address the point, since
> Slawek leans to creating SG rules in Neutron API to handle ICMP
> traffic necessary for RA / NA (which seems to have a merit and
> internal logic) anyway, we could as well at this point create another
> "default" rule for metadata replies.
>
> But - I will repeat - as long as a decision on what the list of "base"
> services enabled for any SG by default is, I can live with metadata
> out of the list. It may not be as convenient to users (which is my
> concern), but that's probably a matter of taste in API design.
>
> BTW Rodolfo, thanks for allocating a time slot for this discussion at
> vPTG. I hope we get to the bottom of it then. See you all next Wed
> @13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg)
>
> Ihar
>
> >
> > Regards.
> >
> > [1]https://review.opendev.org/c/openstack/neutron/+/877049
> > [2]https://review.opendev.org/c/openstack/neutron/+/876659
> >
> > On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka <ihrachys at redhat.com> wrote:
> >>
> >> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski <skaplons at redhat.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> >
> >> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
> >> >
> >> > > Hi all,
> >> >
> >> > >
> >> >
> >> > > (I've tagged the thread with [ovn] because this question was raised in
> >> >
> >> > > the context of OVN, but it really is about the intent of neutron
> >> >
> >> > > stateless SG API.)
> >> >
> >> > >
> >> >
> >> > > Neutron API supports 'stateless' field for security groups:
> >> >
> >> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
> >> >
> >> > >
> >> >
> >> > > The API reference doesn't explain the intent of the API, merely
> >> >
> >> > > walking through the field mechanics, as in
> >> >
> >> > >
> >> >
> >> > > "The stateful security group extension (stateful-security-group) adds
> >> >
> >> > > the stateful field to security groups, allowing users to configure
> >> >
> >> > > stateful or stateless security groups for ports. The existing security
> >> >
> >> > > groups will all be considered as stateful. Update of the stateful
> >> >
> >> > > attribute is allowed when there is no port associated with the
> >> >
> >> > > security group."
> >> >
> >> > >
> >> >
> >> > > The meaning of the API is left for users to deduce. It's customary
> >> >
> >> > > understood as something like
> >> >
> >> > >
> >> >
> >> > > "allowing to bypass connection tracking in the firewall, potentially
> >> >
> >> > > providing performance and simplicity benefits" (while imposing
> >> >
> >> > > additional complexity onto rule definitions - the user now has to
> >> >
> >> > > explicitly define rules for both directions of a duplex connection.)
> >> >
> >> > > [This is not an official definition, nor it's quoted from a respected
> >> >
> >> > > source, please don't criticize it. I don't think this is an important
> >> >
> >> > > point here.]
> >> >
> >> > >
> >> >
> >> > > Either way, the definition doesn't explain what should happen with
> >> >
> >> > > basic network services that a user of Neutron SG API is used to rely
> >> >
> >> > > on. Specifically, what happens for a port related to a stateless SG
> >> >
> >> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
> >> >
> >> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
> >> >
> >> > > procedure to configure its IPv6 stack.
> >> >
> >> > >
> >> >
> >> > > As part of our testing of stateless SG implementation for OVN backend,
> >> >
> >> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to
> >> >
> >> > > configure IPv6.
> >> >
> >> > >
> >> >
> >> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
> >> >
> >> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
> >> >
> >> > >
> >> >
> >> > > We've noticed that adding explicit SG rules to allow 'returning'
> >> >
> >> > > communication for 169.254.169.254:80 and RA / NA fixes the problem.
> >> >
> >> > >
> >> >
> >> > > I figured that these services are "base" / "basic" and should be
> >> >
> >> > > provided to ports regardless of the stateful-ness of SG. I proposed
> >> >
> >> > > patches for this here:
> >> >
> >> > >
> >> >
> >> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
> >> >
> >> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
> >> >
> >> > >
> >> >
> >> > > Discussion in the patch that adjusts the existing stateless SG test
> >> >
> >> > > scenarios to not create explicit SG rules for metadata and ICMP
> >> >
> >> > > replies suggests that it's not a given / common understanding that
> >> >
> >> > > these "base" services should work by default for stateless SGs.
> >> >
> >> > >
> >> >
> >> > > See discussion in comments here:
> >> >
> >> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
> >> >
> >> > >
> >> >
> >> > > While this discussion is happening in the context of OVN, I think it
> >> >
> >> > > should be resolved in a broader context. Specifically, a decision
> >> >
> >> > > should be made about what Neutron API "means" by stateless SGs, and
> >> >
> >> > > how "base" services are supposed to behave. Then backends can act
> >> >
> >> > > accordingly.
> >> >
> >> > >
> >> >
> >> > > There's also an open question of how this should be implemented.
> >> >
> >> > > Whether Neutron would like to create explicit SG rules visible in API
> >> >
> >> > > that would allow for the returning traffic and that could be deleted
> >> >
> >> > > as needed, or whether backends should do it implicitly. We already
> >> >
> >> > > have "default" egress rules, so there's a precedent here. On the other
> >> >
> >> > > hand, the egress rules are broad (allowing everything) and there's
> >> >
> >> > > more rationale to delete them and replace them with tighter filters.
> >> >
> >> > > In my OVN series, I implement ACLs directly in OVN database, without
> >> >
> >> > > creating SG rules in Neutron API.
> >> >
> >> > >
> >> >
> >> > > So, questions for the community to clarify:
> >> >
> >> > > - whether Neutron API should define behavior of stateless SGs in general,
> >> >
> >> > > - if so, whether Neutron API should also define behavior of stateless
> >> >
> >> > > SGs in terms of "base" services like metadata and DHCP,
> >> >
> >> > > - if so, whether backends should implement the necessary filters
> >> >
> >> > > themselves, or Neutron will create default SG rules itself.
> >> >
> >> >
> >> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
> >> >
> >> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
> >> >
> >> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
> >> >
> >>
> >> Thanks for clarifying the rationale on picking SG rules and not
> >> per-backend implementation.
> >>
> >> What would be your answer to the two other questions in the list
> >> above, specifically, "whether Neutron API should define behavior of
> >> stateless SGs in general" and "whether Neutron API should define
> >> behavior of stateless SGs in relation to metadata / RA / NA". Once we
> >> have agreement on these points, we can discuss the exact mechanism -
> >> whether to implement in backend or in API. But these two questions are
> >> first order in my view.
> >>
> >> (To give an idea of my thinking, I believe API definition should not
> >> only define fields and their mechanics but also semantics, so
> >>
> >> - yes, api-ref should define the meaning ("behavior") of stateless SG
> >> in general, and
> >> - yes, api-ref should also define the meaning ("behavior") of
> >> stateless SG in relation to "standard" services like ipv6 addressing
> >> or metadata.
> >>
> >> As to the last question - whether it's up to ml2 backend to implement
> >> the behavior, or up to the core SG database plugin - I don't have a
> >> strong opinion. I lean to "backend" solution just because it allows
> >> for more granular definition because SG rules may not express some
> >> filter rules, e.g. source port for metadata replies (an unfortunate
> >> limitation of SG API that we inherited from AWS?). But perhaps others
> >> prefer paying the price for having neutron ml2 plugin enforcing the
> >> behavior consistently across all backends.
> >>
> >> >
> >> > >
> >> >
> >> > > I hope I laid the problem out clearly, let me know if anything needs
> >> >
> >> > > clarification or explanation.
> >> >
> >> >
> >> > Yes :) At least for me.
> >> >
> >> >
> >> > >
> >> >
> >> > > Yours,
> >> >
> >> > > Ihar
> >> >
> >> > >
> >> >
> >> > >
> >> >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Slawek Kaplonski
> >> >
> >> > Principal Software Engineer
> >> >
> >> > Red Hat
> >>
> >>


From posta at dnzydn.com  Wed Mar 29 17:13:25 2023
From: posta at dnzydn.com (Deniz AYDIN)
Date: Wed, 29 Mar 2023 20:13:25 +0300
Subject: [neutron] BGP for self-service network
Message-ID: <B2483A1C-1A74-48E0-87C4-2B991B3FC9CA@dnzydn.com>


Hi,
I am looking for options for removing Layer-2 in the underlay as much as possible.

The features explained in the document, BGP floating IPs over L2 segmented network <https://docs.openstack.org/neutron/zed/admin/config-bgp-floating-ip-over-l2-segmented-network.html>, solve the problem for floating IPS where layer 2 is only needed between servers and rack switches.

Is there any specific reason that this feature is limited to floating IPS? As long as we have unique BGP next-hops defined for every DVR, it can also be used for advertising self-service networks /32 routes. 


Thanks for the help

Deniz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/1ac90d20/attachment.htm>

From noonedeadpunk at gmail.com  Wed Mar 29 18:05:35 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Wed, 29 Mar 2023 20:05:35 +0200
Subject: (Openstack-Nova)
In-Reply-To: <CABRCrKk1tUDYoBhapBHR-3ewzch0+ySx-94D__hbjwEENu71-g@mail.gmail.com>
References: <CA+ykd61ADg1Qs-HpVJY_M9ZRSCLXNDKPr_H7EBxb8b-JiHhpkA@mail.gmail.com>
 <CABRCrKk1tUDYoBhapBHR-3ewzch0+ySx-94D__hbjwEENu71-g@mail.gmail.com>
Message-ID: <CAPd_6Atq+Ni_ztFCNiFYH-k62rKLTooCG9Bg2e_uUiK5x7FOng@mail.gmail.com>

Hey there,

If you're using an OpenStack-Ansible as a deployment tool, you can
make an override like that in your user_varaibles.yml and run
openstack-ansible os-nova-install.yml --limit nova_compute afterwards:

  nova_compute_init_overrides:
    Service:
      LimitNOFILE: 4096

??, 28 ???. 2023??. ? 21:59, Michael Knox <michael at knox.net.nz>:
>
> Hi,
>
> This will be the OS you have rabbit running on. You will need to increase the ulimit. "ulimit -n" will provide the current limit for the installed OS and configuration. So you will need more than what's there. There could also be other configuration issues, a normal default of 1024 isn't low for most uses, but you will need to consider that as part of the increase.
>
> Cheers
>
>
>
> On Tue, Mar 28, 2023 at 2:16?PM Adivya Singh <adivya1.singh at gmail.com> wrote:
>>
>> Hi Team
>>
>> I see these error in my syslog related to my nova compute service getting hung while communicating to rabbit-mq service
>>
>> "A recoverable connection/channel error occurred, trying to reconnect: [Errno 24] Too many open files"
>>
>> Is this a OS related error, or some thing i can change to get rid of this error
>>
>> Regards
>> Adivya Singh


From juliaashleykreger at gmail.com  Wed Mar 29 18:14:02 2023
From: juliaashleykreger at gmail.com (Julia Kreger)
Date: Wed, 29 Mar 2023 11:14:02 -0700
Subject: [ironic] bug tracking move to launchpad
Message-ID: <CAF7gwdi5sMX-v0NvRkdnWBGgAPsbs_wgvEsZdPHk+TO=7UaFQQ@mail.gmail.com>

Greetings Ironic!

During the PTG yesterday, we discussed the fact we had not moved back to
Launchpad for bug tracking. Mainly because we are all very busy. The point
was raised, why don't we just re-enable launchpad bug tracking, and move
our primary usage back to that.  Consensus on the call resulted with this
proposal, and the consensus to send this email to the mailing list.

As such, if nobody objects, I'm going to go turn the bug tracker for
Ironic in launchpad back on next Monday.

If you object, scream now and/or write a migration script and/or propose
another solution.

Thanks!

-Julia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/16e2926c/attachment.htm>

From noonedeadpunk at gmail.com  Wed Mar 29 18:14:15 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Wed, 29 Mar 2023 20:14:15 +0200
Subject: (Openstack-Designate) rndc key not getting generated in
 /etc/designate
In-Reply-To: <CA+ykd61N_zKiFgQjUPeYuCfsjau+t7vVzCRmHOm9=+46u4EjOQ@mail.gmail.com>
References: <CA+ykd61N_zKiFgQjUPeYuCfsjau+t7vVzCRmHOm9=+46u4EjOQ@mail.gmail.com>
Message-ID: <CAPd_6Auv2tBOny-_JTofuyDAVOtwaP8OzznS3koNAm1UKikW3g@mail.gmail.com>

Hi there,

Looking through the code, I really don't see any obvious issue or bug
in there. So based on that it sounds that this error might be raised
only if you have defined `designate_rndc_keys` variable somewhere
(like user_variables) but did not provided `file` key in it, which
made this task fail. As `file`, `name`, `secret` and `algorithm` keys
are all required ones in this variable.

Would be great if you could double-check definition of the variable in
your user_variables.

??, 29 ???. 2023??. ? 07:00, Adivya Singh <adivya1.singh at gmail.com>:
>
> Hi Team,
>
> My DNS Server located outside the Open Stack, and i am using below variables in my user_variables.yaml File.
>
> But When i ' m running os-desigante-install.yml Playbook, rndc key are not generating in /etc/designate Folder
>
> and the playbook fail at the below Task
>
> {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'file'\n\nThe error appears to be in '/etc/ansible/roles/os_designate/tasks/designate_post_install.yml': line 89, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create Designate rndc key file\n  ^ here\n"}
>
> - name: Create Designate rndc key file
>   template:
>     src: rndc.key.j2
>     dest: "{{ item.file }}"
>     owner: "{{ item.owner | default('root') }}"
>     group: "{{ item.group | default('root') }}"
>     mode: "{{ item.mode | default('0600') }}"
>   with_items: "{{ designate_rndc_keys }}"
>   when: designate_rndc_keys is defined
>
> and the post-install.yml File looks like this
>
> Any idea on this, Where i am missing
>
>
>
>
>
>
>
> ## rndc keys for authenticating with bind9
> # define this to create as many key files as are required
> # designate_rndc_keys
> #   - name: "rndc-key"
> #     file: /etc/designate/rndc.key
> #     algorithm: "hmac-md5"
> #     secret: "<key>"


From jay at gr-oss.io  Wed Mar 29 18:39:47 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Wed, 29 Mar 2023 11:39:47 -0700
Subject: [ironic] ARM Support in CI: Call for vendors / contributors /
 interested parties
Message-ID: <CA+sTGNfqAzE=bR16oXwBFyObKV8iz_RVkyB43t-ZemftdHsESQ@mail.gmail.com>

Hi stackers,

Ironic has published an experimental Ironic Python Agent image for ARM64 (
https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/)
and discussed promoting this image to supported via CI testing. However, we
have a problem: there are no Ironic developers with easy access to ARM
hardware at the moment, and no Ironic developers with free time to commit
to improving our support of ARM hardware.

So we're putting out a call for help:
- If you're a hardware vendor and want your ARM hardware supported? Please
come talk to the Ironic community about setting up third-party-CI.
- Are you an operator or contributor from a company invested in ARM bare
metal? Please come join the Ironic community to help us build this support.

Thanks,
Jay Faulkner
Ironic PTL
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/b398f8dd/attachment.htm>

From corey.bryant at canonical.com  Wed Mar 29 19:13:00 2023
From: corey.bryant at canonical.com (Corey Bryant)
Date: Wed, 29 Mar 2023 15:13:00 -0400
Subject: OpenStack 2023.1 Antelope for Ubuntu 22.04 LTS
Message-ID: <CADn0iZ0TdRHMww6UvtGdC1BD_hVuNcugxZyyR37FkmYw1gdgPg@mail.gmail.com>

The Ubuntu OpenStack team at Canonical is pleased to announce the general
availability of OpenStack 2023.1 Antelope on Ubuntu 22.04 LTS (Jammy
Jellyfish). Details of
the Antelope release can be found at:
https://www.openstack.org/software/antelope

The Ubuntu Cloud Archive for OpenStack 2023.1 Antelope can be enabled on
Ubuntu
22.04 by running the following command:

sudo add-apt-repository cloud-archive:antelope

The Ubuntu Cloud Archive for 2023.1 Antelope includes updates for:

aodh, barbican, ceilometer, cinder, designate,
designate-dashboard, dpdk (22.11.1), glance, gnocchi, heat,
heat-dashboard, horizon, ironic, ironic-ui, keystone,
magnum, magnum-ui, manila, manila-ui, masakari, mistral, murano,
murano-dashboard, networking-arista, networking-bagpipe,
networking-baremetal, networking-bgpvpn, networking-hyperv,
networking-l2gw, networking-mlnx, networking-odl, networking-sfc,
neutron, neutron-dynamic-routing, neutron-fwaas, neutron-taas,
neutron-vpnaas, nova, octavia, octavia-dashboard, openstack-trove,
openvswitch (3.1.0), ovn (23.03.0), ovn-octavia-provider, placement,
sahara, sahara-dashboard, senlin, swift, trove-dashboard, vitrage,
watcher, watcher-dashboard, zaqar, and zaqar-ui.

For a full list of packages and versions, please refer to:
https://openstack-ci-reports.ubuntu.com/reports/cloud-archive/antelope_versions.html

== Reporting bugs ==

If you have any issues please report bugs using the ?ubuntu-bug? tool to
ensure that bugs get logged in the right place in Launchpad:

sudo ubuntu-bug nova-conductor

Thank you to everyone who contributed to OpenStack 2023.1 Antelope!

Corey
(on behalf of the Ubuntu OpenStack Engineering team)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230329/3240042a/attachment-0001.htm>

From jake.yip at ardc.edu.au  Wed Mar 29 23:10:27 2023
From: jake.yip at ardc.edu.au (Jake Yip)
Date: Thu, 30 Mar 2023 10:10:27 +1100
Subject: [Magnum] vPTG summary
In-Reply-To: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au>
References: <92954613-d892-ba47-0fbc-51d3adc864b5@ardc.edu.au>
Message-ID: <d4b968fa-5497-e437-fe43-817e41a40b48@ardc.edu.au>

Hi all,

We had a good attendance of Magnum developers and operators from 
different clouds providers - Nectar, StackHPC, Catalyst Cloud NZ, 
Vexxhost, Cleura. As the Magnum team spans EU and AU/NZ, we have decided 
to hold the PTG on Wed 0900 UTC so that most of us can make it.

One of the main topics we discussed was the progress of ClusterAPI 
driver in Magnum. This work is ongoing and we hope to have it in this 
cycle or next. Thanks to the hardworking folks at StackHPC (Matt, 
johnthetubaguy, Tyler) and Vexxhost (mnaser) driving this initiative.

We also discussed the issues with testing in check/gate. Testing for 
Magnum is quite resource intensive, as it needs to spin up a cluster. 
This needs more work so we can land patches with more confidence.

There will also be more deprecations/removals in this cycle to keep up 
with Kubernetes. One of the things we agreed on was the removal of 
PodSecurityPolicy so that we can continue supporting K8S >= v1.25. This 
would be flagged in an upgrade note containing the upstream 
instructions[1] on how to migrate to PodSecurity Admission Controller.

We briefly touched on the many reports of Magnum not working in (W/X/Y) 
versions of Kubernetes. It is unfortunate situation; Kubernetes move 
very quickly and the Kubernetes versions (v1.21 ~ v1.23) we have 
developed for in Yoga is already EOL. In addition, there are a few 
incompatible changes that happened from v1.21 to v1.25 that makes 
backporting newer K8S support to W/X/Y/Z challenging. We will ease this 
hump as much as possible by (1) careful backports, (2) better testing 
and (3) better documentation. It is still a big barrier to new users, 
and we hope to leave this behind with ClusterAPI (my new hope!).

I hope I've summarised the vPTG satisfactorily. Feel free to check our 
etherpad[2] for more details.

Last but not least, Matt Pryor and Mohammed Naser will be giving a talk 
"Magnum Episode IV - A New Hope: OpenStack, Kubernetes and ClusterAPI" 
at the Vancouver summit. Please give them your support!

[1] 
https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/
[2] https://etherpad.opendev.org/p/march2023-ptg-magnum

Regards,
Jake Yip
On behalf of Magnum Team

-- 
Jake Yip
Technical Lead, Nectar Research Cloud


From tkajinam at redhat.com  Thu Mar 30 02:20:43 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 30 Mar 2023 11:20:43 +0900
Subject: [heat][qa][requirements] Pinning tempest in stable/xena constraints
Message-ID: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>

Hello,


I have had some local discussions with gmann, but I'd really like to move
this discussion forward
to fix the broken stable/xena gate in heat so I will start this thread,
hoping the thread can provide
more context behind my proposal.

Historically stable branches of heat have been frequently affected by any
change in requirements
of tempest. This is mainly because in our CI we install our own in-tree
integration tests[1] into
tempest venv where tempest and heat-tempest-plugin are installed. Because
in-tree integration tests
are tied to that specific stable branch, this has been often causing
conflicts in requirements
(master constraint vs stable/X constraint).

[1]
https://github.com/openstack/heat/tree/master/heat_integrationtests

In the past we changed our test installation[2] to use stable constraint to
avoid this conflicts,
but this approach does no longer work since stable/xena because

1. stable/xena u-c no longer includes tempest

2. latest tempest CAN'T be installed with stable/xena u-c because current
tempest requires
    fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c.

[2]
https://review.opendev.org/c/openstack/heat/+/803890
https://review.opendev.org/c/openstack/heat/+/848215

I've proposed the change to pin tempest[3] in stable/xena u-c so that
people can install tempest
with stable/xena u-c.
[3] https://review.opendev.org/c/openstack/requirements/+/878228

I understand the reason tempest was removed from u-c was that we should use
the latest tempest
to test recent stable releases.I agree we can keep tempest excluded for
stable/yoga and onwards
because tempest is installable with their u-c, but stable/xena u-c is no
longer compatible with master.
Adding pin to xena u-c does not mainly affect the policy to test stable
branches with latest tempest
because for that we anyway need to use more recent u-c.

I'm still trying to find out the workaround within heat but IMO adding
tempest pin to stable/xena u-c
is harmless but beneficial in case anyone is trying to use tempest with
stable/xena u-c.

Thank you,
Takashi Kajinami
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/72ff3c25/attachment.htm>

From tkajinam at redhat.com  Thu Mar 30 02:46:51 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 30 Mar 2023 11:46:51 +0900
Subject: [heat][magnum][tacker] Future of SoftwareDeployment support
Message-ID: <CAL_crJTB_D+Y+LVRs2qNindh8e1Pk=aK2EEx7SP09eSFZDRB5g@mail.gmail.com>

Hello,


We discussed this briefly in the past thread where we discussed maintenance
of os-*-agent repos,
and also talked about this topic during Heat PTG, but I'd like to formalize
the discussion to get
a clear agreement.

Heat has been supporting SoftwareDeployment resources to configure software
in instances using
some agents such as os-collect-config[1].
 [1]
https://docs.openstack.org/heat/latest/template_guide/software_deployment.html#software-deployment-resources

This feature was initially developed to be used by TripleO (IIUC), but
TripleO is retired now and
we are losing the first motivation to maintain the feature.
#  Even TripleO replaced most of its usage of softwaredeployment by
config-download lately.

Because the heat project team has drunk dramatically recently, we'd like to
put more focus on core
features. For that aim we are now wondering if we can deprecate and remove
this feature, and would
like to hear from anyone who has any concerns about this.

Quickly looking through the repos, it seems currently Magnum and Tacker are
using SoftwareDeployment,
and it'd be nice especially if we can understand their current requirements.

1. Magnum
It seems SoftwareDeployment is used by k8s_fedora_atomic_v1 driver but I'm
not too sure whether
this driver is still supported, because Fedora Atomic was EOLed a while
ago, right ?

2. Tacker
SoftwareDeployment can be found in only test code in the tacker repo. We
have some references kept
in heat-translator which look related to TOSCA templates.

Thank you,
Takashi Kajinami
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/2c989231/attachment.htm>

From tkajinam at redhat.com  Thu Mar 30 04:10:20 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 30 Mar 2023 13:10:20 +0900
Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu Jammy
Message-ID: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>

Hello,


Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate
jobs have
become very flaky. Further investigation revealed that the issue is related
to something
in libvirt from Ubuntu Jammy and that prevents detaching devices from
instances[1].

The same problem appears in different jobs[2] and we workaround the problem
by disabling
some affected jobs. In heat we also disabled some flaky tests but because
of this we no longer
run basic scenario tests which deploys instance/volume/network in a single
stack, which means
we lost the quite basic test coverage.

My question is, is there anyone in the Nova team working on "fixing" this
problem ?
We might be able to implement some workaround (like checking status of the
instances before
attempting to delete it) but this should be fixed in libvirt side IMO, as
this looks like a "regression"
in Ubuntu Jammy.
Probably we should report a bug against the libvirt package in Ubuntu but
I'd like to hear some
thoughts from the nova team because they are more directly affected by this
problem.

I'm now trying to set up a centos stream 9 job in Heat repo to see whether
this can be reproduced
if we use centos stream 9. I've been running that specific scenario test in
centos stream 9 jobs
in puppet repos but I've never seen this issue, so I suspect the issue is
really specific to libvirt
in Jammy.

[1] https://bugs.launchpad.net/nova/+bug/1998274
[2] https://bugs.launchpad.net/nova/+bug/1998148

Thank you,
Takashi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/1828f451/attachment-0001.htm>

From noonedeadpunk at gmail.com  Thu Mar 30 06:29:47 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 30 Mar 2023 08:29:47 +0200
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
Message-ID: <CAPd_6AsjryS=GzRvAzDeJQUJpo67b99noky6vJO-6CEy0QJizg@mail.gmail.com>

I know, I'm whining a lot about usage of u-c for such projects, but I'm
just gonna say that u-c is also might be used for tempest installation
itself. So if you're trying to install specific tempest version from the
requirements file with providing u-c as constraints while having tempest in
u-c - this will break due to pip being unable to resolve that.
And installing tempest without constraints also tends to break.

I've used a workaround to filter out u-c to drop tempest from them until
xena, so moving this back and force is a bit annoying for the end users.

I know nobody agrees with me here, but I do see u-c as an instruction for
end users on how to build their venvs (because these constraints are
tested!) to install openstack projects (can build analogy to poetry here)
and not CI thing only.

Eventually, we see more troubles with time not in tempest itself, but in
tempest plugins, when a new test being added to the plugin, that requires
new API but not verifying API microversion or feature availability. These
kind of failures we experience quite regularly, couple time during any
given cycle, which made us also pin tempest plugin versions in requirements
with for every release.

Also I have a feeling that a lot of times we're treating tempest as a
CI-only thing, which is also weird and not true for me, since it's valuable
tool for operators and being leveraged by rally or refstack to ensure state
of production environments.


??, 30 ???. 2023 ?., 04:24 Takashi Kajinami <tkajinam at redhat.com>:

> Hello,
>
>
> I have had some local discussions with gmann, but I'd really like to move
> this discussion forward
> to fix the broken stable/xena gate in heat so I will start this thread,
> hoping the thread can provide
> more context behind my proposal.
>
> Historically stable branches of heat have been frequently affected by any
> change in requirements
> of tempest. This is mainly because in our CI we install our own in-tree
> integration tests[1] into
> tempest venv where tempest and heat-tempest-plugin are installed. Because
> in-tree integration tests
> are tied to that specific stable branch, this has been often causing
> conflicts in requirements
> (master constraint vs stable/X constraint).
>
> [1]
> https://github.com/openstack/heat/tree/master/heat_integrationtests
>
> In the past we changed our test installation[2] to use stable constraint
> to avoid this conflicts,
> but this approach does no longer work since stable/xena because
>
> 1. stable/xena u-c no longer includes tempest
>
> 2. latest tempest CAN'T be installed with stable/xena u-c because current
> tempest requires
>     fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c.
>
> [2]
> https://review.opendev.org/c/openstack/heat/+/803890
> https://review.opendev.org/c/openstack/heat/+/848215
>
> I've proposed the change to pin tempest[3] in stable/xena u-c so that
> people can install tempest
> with stable/xena u-c.
> [3] https://review.opendev.org/c/openstack/requirements/+/878228
>
> I understand the reason tempest was removed from u-c was that we should
> use the latest tempest
> to test recent stable releases.I agree we can keep tempest excluded for
> stable/yoga and onwards
> because tempest is installable with their u-c, but stable/xena u-c is no
> longer compatible with master.
> Adding pin to xena u-c does not mainly affect the policy to test stable
> branches with latest tempest
> because for that we anyway need to use more recent u-c.
>
> I'm still trying to find out the workaround within heat but IMO adding
> tempest pin to stable/xena u-c
> is harmless but beneficial in case anyone is trying to use tempest with
> stable/xena u-c.
>
> Thank you,
> Takashi Kajinami
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/c4a826cc/attachment.htm>

From tkajinam at redhat.com  Thu Mar 30 06:54:32 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 30 Mar 2023 15:54:32 +0900
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAPd_6AsjryS=GzRvAzDeJQUJpo67b99noky6vJO-6CEy0QJizg@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <CAPd_6AsjryS=GzRvAzDeJQUJpo67b99noky6vJO-6CEy0QJizg@mail.gmail.com>
Message-ID: <CAL_crJTha_DMnkXHDtnM_P45yTEfmQkAjiHWtOtNbGgatv9Yig@mail.gmail.com>

On Thu, Mar 30, 2023 at 3:35?PM Dmitriy Rabotyagov <noonedeadpunk at gmail.com>
wrote:

> I know, I'm whining a lot about usage of u-c for such projects, but I'm
> just gonna say that u-c is also might be used for tempest installation
> itself. So if you're trying to install specific tempest version from the
> requirements file with providing u-c as constraints while having tempest in
> u-c - this will break due to pip being unable to resolve that.
>

To support installing a specific tempest with older stable u-c we probably
can try adding upper version
instead of requiring a specific version ( like <= 33.0.0  instead of ===
33.0.0 ), though I guess this might
not be accepted by pip.


> And installing tempest without constraints also tends to break.
>

I didn't really get this point. Do you mind elaborating on this ?


>
> I've used a workaround to filter out u-c to drop tempest from them until
> xena, so moving this back and force is a bit annoying for the end users.
>
> I know nobody agrees with me here, but I do see u-c as an instruction for
> end users on how to build their venvs (because these constraints are
> tested!) to install openstack projects (can build analogy to poetry here)
> and not CI thing only.
>
> Eventually, we see more troubles with time not in tempest itself, but in
> tempest plugins, when a new test being added to the plugin, that requires
> new API but not verifying API microversion or feature availability. These
> kind of failures we experience quite regularly, couple time during any
> given cycle, which made us also pin tempest plugin versions in requirements
> with for every release.
>
> Also I have a feeling that a lot of times we're treating tempest as a
> CI-only thing, which is also weird and not true for me, since it's valuable
> tool for operators and being leveraged by rally or refstack to ensure state
> of production environments.
>
> I tend to agree with these points and these would be the problem caused
mainly by the fact
tempest is branchless, IMHO.


>
> ??, 30 ???. 2023 ?., 04:24 Takashi Kajinami <tkajinam at redhat.com>:
>
>> Hello,
>>
>>
>> I have had some local discussions with gmann, but I'd really like to move
>> this discussion forward
>> to fix the broken stable/xena gate in heat so I will start this thread,
>> hoping the thread can provide
>> more context behind my proposal.
>>
>> Historically stable branches of heat have been frequently affected by any
>> change in requirements
>> of tempest. This is mainly because in our CI we install our own in-tree
>> integration tests[1] into
>> tempest venv where tempest and heat-tempest-plugin are installed. Because
>> in-tree integration tests
>> are tied to that specific stable branch, this has been often causing
>> conflicts in requirements
>> (master constraint vs stable/X constraint).
>>
>> [1]
>> https://github.com/openstack/heat/tree/master/heat_integrationtests
>>
>> In the past we changed our test installation[2] to use stable constraint
>> to avoid this conflicts,
>> but this approach does no longer work since stable/xena because
>>
>> 1. stable/xena u-c no longer includes tempest
>>
>> 2. latest tempest CAN'T be installed with stable/xena u-c because current
>> tempest requires
>>     fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c.
>>
>> [2]
>> https://review.opendev.org/c/openstack/heat/+/803890
>> https://review.opendev.org/c/openstack/heat/+/848215
>>
>> I've proposed the change to pin tempest[3] in stable/xena u-c so that
>> people can install tempest
>> with stable/xena u-c.
>> [3] https://review.opendev.org/c/openstack/requirements/+/878228
>>
>> I understand the reason tempest was removed from u-c was that we should
>> use the latest tempest
>> to test recent stable releases.I agree we can keep tempest excluded for
>> stable/yoga and onwards
>> because tempest is installable with their u-c, but stable/xena u-c is no
>> longer compatible with master.
>> Adding pin to xena u-c does not mainly affect the policy to test stable
>> branches with latest tempest
>> because for that we anyway need to use more recent u-c.
>>
>> I'm still trying to find out the workaround within heat but IMO adding
>> tempest pin to stable/xena u-c
>> is harmless but beneficial in case anyone is trying to use tempest with
>> stable/xena u-c.
>>
>> Thank you,
>> Takashi Kajinami
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/8d8eac2a/attachment.htm>

From noonedeadpunk at gmail.com  Thu Mar 30 07:06:48 2023
From: noonedeadpunk at gmail.com (Dmitriy Rabotyagov)
Date: Thu, 30 Mar 2023 09:06:48 +0200
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAL_crJTha_DMnkXHDtnM_P45yTEfmQkAjiHWtOtNbGgatv9Yig@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <CAPd_6AsjryS=GzRvAzDeJQUJpo67b99noky6vJO-6CEy0QJizg@mail.gmail.com>
 <CAL_crJTha_DMnkXHDtnM_P45yTEfmQkAjiHWtOtNbGgatv9Yig@mail.gmail.com>
Message-ID: <CAPd_6AtQS4o2ULXDFXKnUJ-qKs4VhD7FZOavPAgp1ru9gVJx6g@mail.gmail.com>

> And installing tempest without constraints also tends to break

Basically what I meant is, if user decides not to use u-c for installing
tempest due to the conflict, as tempest is part of u-c, then this also
tended to fail due to having too fresh libraries, so this older tempest (or
even master) is no longer compatible.
But yeah, master tempest will be fixed soonish to match these newer
dependencies, still you can easily get unlucky. So what I was saying - it's
better to use u-c for installing tempest itself.

??, 30 ???. 2023 ?., 08:54 Takashi Kajinami <tkajinam at redhat.com>:

>
>
> On Thu, Mar 30, 2023 at 3:35?PM Dmitriy Rabotyagov <
> noonedeadpunk at gmail.com> wrote:
>
>> I know, I'm whining a lot about usage of u-c for such projects, but I'm
>> just gonna say that u-c is also might be used for tempest installation
>> itself. So if you're trying to install specific tempest version from the
>> requirements file with providing u-c as constraints while having tempest in
>> u-c - this will break due to pip being unable to resolve that.
>>
>
> To support installing a specific tempest with older stable u-c we probably
> can try adding upper version
> instead of requiring a specific version ( like <= 33.0.0  instead of ===
> 33.0.0 ), though I guess this might
> not be accepted by pip.
>
>
>
>> And installing tempest without constraints also tends to break.
>>
>
> I didn't really get this point. Do you mind elaborating on this ?
>
>
>>
>> I've used a workaround to filter out u-c to drop tempest from them until
>> xena, so moving this back and force is a bit annoying for the end users.
>>
>> I know nobody agrees with me here, but I do see u-c as an instruction for
>> end users on how to build their venvs (because these constraints are
>> tested!) to install openstack projects (can build analogy to poetry here)
>> and not CI thing only.
>>
>> Eventually, we see more troubles with time not in tempest itself, but in
>> tempest plugins, when a new test being added to the plugin, that requires
>> new API but not verifying API microversion or feature availability. These
>> kind of failures we experience quite regularly, couple time during any
>> given cycle, which made us also pin tempest plugin versions in requirements
>> with for every release.
>>
>> Also I have a feeling that a lot of times we're treating tempest as a
>> CI-only thing, which is also weird and not true for me, since it's valuable
>> tool for operators and being leveraged by rally or refstack to ensure state
>> of production environments.
>>
>> I tend to agree with these points and these would be the problem caused
> mainly by the fact
> tempest is branchless, IMHO.
>
>
>>
>> ??, 30 ???. 2023 ?., 04:24 Takashi Kajinami <tkajinam at redhat.com>:
>>
>>> Hello,
>>>
>>>
>>> I have had some local discussions with gmann, but I'd really like to
>>> move this discussion forward
>>> to fix the broken stable/xena gate in heat so I will start this thread,
>>> hoping the thread can provide
>>> more context behind my proposal.
>>>
>>> Historically stable branches of heat have been frequently affected by
>>> any change in requirements
>>> of tempest. This is mainly because in our CI we install our own in-tree
>>> integration tests[1] into
>>> tempest venv where tempest and heat-tempest-plugin are installed.
>>> Because in-tree integration tests
>>> are tied to that specific stable branch, this has been often causing
>>> conflicts in requirements
>>> (master constraint vs stable/X constraint).
>>>
>>> [1]
>>> https://github.com/openstack/heat/tree/master/heat_integrationtests
>>>
>>> In the past we changed our test installation[2] to use stable constraint
>>> to avoid this conflicts,
>>> but this approach does no longer work since stable/xena because
>>>
>>> 1. stable/xena u-c no longer includes tempest
>>>
>>> 2. latest tempest CAN'T be installed with stable/xena u-c because
>>> current tempest requires
>>>     fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c.
>>>
>>> [2]
>>> https://review.opendev.org/c/openstack/heat/+/803890
>>> https://review.opendev.org/c/openstack/heat/+/848215
>>>
>>> I've proposed the change to pin tempest[3] in stable/xena u-c so that
>>> people can install tempest
>>> with stable/xena u-c.
>>> [3] https://review.opendev.org/c/openstack/requirements/+/878228
>>>
>>> I understand the reason tempest was removed from u-c was that we should
>>> use the latest tempest
>>> to test recent stable releases.I agree we can keep tempest excluded for
>>> stable/yoga and onwards
>>> because tempest is installable with their u-c, but stable/xena u-c is no
>>> longer compatible with master.
>>> Adding pin to xena u-c does not mainly affect the policy to test stable
>>> branches with latest tempest
>>> because for that we anyway need to use more recent u-c.
>>>
>>> I'm still trying to find out the workaround within heat but IMO adding
>>> tempest pin to stable/xena u-c
>>> is harmless but beneficial in case anyone is trying to use tempest with
>>> stable/xena u-c.
>>>
>>> Thank you,
>>> Takashi Kajinami
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/e9877051/attachment-0001.htm>

From lpetrut at cloudbasesolutions.com  Thu Mar 30 07:11:38 2023
From: lpetrut at cloudbasesolutions.com (Lucian Petrut)
Date: Thu, 30 Mar 2023 07:11:38 +0000
Subject: [ptl] Need PTL volunteer for OpenStack Winstackers
In-Reply-To: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com>
References: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com>
Message-ID: <A8A3517C-9022-4D13-9784-B3CB41B422DA@cloudbasesolutions.com>

Hi,

Thanks for reaching out. As mentioned here [1], Cloudbase Solutions can no longer lead the Winstackers project. Since there weren?t any other interested parties, I think there?s no other option but to retire the project.

[1] https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031044.html

Regards,
Lucian Petrut

On 22 Mar 2023, at 19:43, Ghanshyam Mann <gmann at ghanshyammann.com> wrote:

Hi Lukas,

I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle.

There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please
check if you or anyone you know would like to lead this project.

- https://etherpad.opendev.org/p/2023.2-leaderless

Also, if anyone else would like to help leading this project, this is time to let TC knows.

-gmann

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/a2c168b4/attachment.htm>

From skaplons at redhat.com  Thu Mar 30 08:07:29 2023
From: skaplons at redhat.com (Slawek Kaplonski)
Date: Thu, 30 Mar 2023 10:07:29 +0200
Subject: [neutron][ovn] stateless SG behavior for metadata / slaac / dhcpv6
In-Reply-To: <CAKwN9=CWQPKvHUkx3ASJEW7ifM8d0+eoZFMSKK0v84wydJkB0g@mail.gmail.com>
References: <CAKwN9=CvT5j2=5W=ZJV1X5wqbM5BM6up8+uGUYY3dPygJ9c+FQ@mail.gmail.com>
 <CAKwN9=C15QqWqbE2TU_BE57eRWDMXe3UciAJ_+MoHD3Z8rm4wQ@mail.gmail.com>
 <CAKwN9=CWQPKvHUkx3ASJEW7ifM8d0+eoZFMSKK0v84wydJkB0g@mail.gmail.com>
Message-ID: <5996164.lOV4Wx5bFT@p1>

Hi,

Dnia ?roda, 29 marca 2023 18:45:26 CEST Ihar Hrachyshka pisze:
> To close the loop,
> 
> We had a very productive discussion of the topic during vPTG today.
> Some of it is captured here:
> https://etherpad.opendev.org/p/neutron-bobcat-ptg#L207 and below. Here
> is the brief plus next steps.
> 
> In regards to api-ref definitions for stateless SG:
> - it is agreed that it should explain the semantics and not only
> mechanics of API fields;
> - it is agreed that it should explain behavior of basic network services;
> - it is agreed that basic network services that are expected to work
> by default are things like ARP, DHCP; while metadata service is not; -
> this will mimic what OVS implementation of stateless SG already does;
> - it is agreed that these basic services that are expected to work
> will work transparently, meaning no SG rules will be visible for them;
> - this will mimic OVS implementation too.
> 
> Next steps:
> - update api-ref stateless SG description to capture decisions above;
> - update my neutron patch series to exclude metadata enablement;
> - adjust tempest scenarios for stateless SG to not create explicit SG
> rules for DHCPv6 stateless (there are already patches for that);
> - clean up Launchpad bugs as per decisions above.
> 
> I will take care of the above in next days.

Thx Ihar for summary of the yesterday's discussion and for taking care of it.

> 
> Thanks everyone,
> Ihar
> 
> On Wed, Mar 22, 2023 at 12:55?PM Ihar Hrachyshka <ihrachys at redhat.com> wrote:
> >
> > On Tue, Mar 21, 2023 at 12:07?PM Rodolfo Alonso Hernandez
> > <ralonsoh at redhat.com> wrote:
> > >
> > > Hello:
> > >
> > > I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly.
> >
> > Thanks for this, I didn't realize that iptables may be considered prior art.
> >
> > >
> > > The discussion here could be the default behaviour for standard services:
> > > * DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that.
> > > * DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance.
> >
> > Agreed DHCPv6 rules are closer to "base" and that the argument for RA
> > / NA flows is stronger because of the parallel to DHCPv4 operation.
> >
> > > * Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2]
> >
> > At this point I am more ambivalent to the decision of whether to
> > include metadata into the list of "base" services, as long as we
> > define the list (behavior) in api-ref. But to address the point, since
> > Slawek leans to creating SG rules in Neutron API to handle ICMP
> > traffic necessary for RA / NA (which seems to have a merit and
> > internal logic) anyway, we could as well at this point create another
> > "default" rule for metadata replies.
> >
> > But - I will repeat - as long as a decision on what the list of "base"
> > services enabled for any SG by default is, I can live with metadata
> > out of the list. It may not be as convenient to users (which is my
> > concern), but that's probably a matter of taste in API design.
> >
> > BTW Rodolfo, thanks for allocating a time slot for this discussion at
> > vPTG. I hope we get to the bottom of it then. See you all next Wed
> > @13:00. (As per https://etherpad.opendev.org/p/neutron-bobcat-ptg)
> >
> > Ihar
> >
> > >
> > > Regards.
> > >
> > > [1]https://review.opendev.org/c/openstack/neutron/+/877049
> > > [2]https://review.opendev.org/c/openstack/neutron/+/876659
> > >
> > > On Mon, Mar 20, 2023 at 10:19?PM Ihar Hrachyshka <ihrachys at redhat.com> wrote:
> > >>
> > >> On Mon, Mar 20, 2023 at 12:03?PM Slawek Kaplonski <skaplons at redhat.com> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> >
> > >> > Dnia pi?tek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
> > >> >
> > >> > > Hi all,
> > >> >
> > >> > >
> > >> >
> > >> > > (I've tagged the thread with [ovn] because this question was raised in
> > >> >
> > >> > > the context of OVN, but it really is about the intent of neutron
> > >> >
> > >> > > stateless SG API.)
> > >> >
> > >> > >
> > >> >
> > >> > > Neutron API supports 'stateless' field for security groups:
> > >> >
> > >> > > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
> > >> >
> > >> > >
> > >> >
> > >> > > The API reference doesn't explain the intent of the API, merely
> > >> >
> > >> > > walking through the field mechanics, as in
> > >> >
> > >> > >
> > >> >
> > >> > > "The stateful security group extension (stateful-security-group) adds
> > >> >
> > >> > > the stateful field to security groups, allowing users to configure
> > >> >
> > >> > > stateful or stateless security groups for ports. The existing security
> > >> >
> > >> > > groups will all be considered as stateful. Update of the stateful
> > >> >
> > >> > > attribute is allowed when there is no port associated with the
> > >> >
> > >> > > security group."
> > >> >
> > >> > >
> > >> >
> > >> > > The meaning of the API is left for users to deduce. It's customary
> > >> >
> > >> > > understood as something like
> > >> >
> > >> > >
> > >> >
> > >> > > "allowing to bypass connection tracking in the firewall, potentially
> > >> >
> > >> > > providing performance and simplicity benefits" (while imposing
> > >> >
> > >> > > additional complexity onto rule definitions - the user now has to
> > >> >
> > >> > > explicitly define rules for both directions of a duplex connection.)
> > >> >
> > >> > > [This is not an official definition, nor it's quoted from a respected
> > >> >
> > >> > > source, please don't criticize it. I don't think this is an important
> > >> >
> > >> > > point here.]
> > >> >
> > >> > >
> > >> >
> > >> > > Either way, the definition doesn't explain what should happen with
> > >> >
> > >> > > basic network services that a user of Neutron SG API is used to rely
> > >> >
> > >> > > on. Specifically, what happens for a port related to a stateless SG
> > >> >
> > >> > > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
> > >> >
> > >> > > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
> > >> >
> > >> > > procedure to configure its IPv6 stack.
> > >> >
> > >> > >
> > >> >
> > >> > > As part of our testing of stateless SG implementation for OVN backend,
> > >> >
> > >> > > we've noticed that VMs fail to configure via metadata, or use SLAAC to
> > >> >
> > >> > > configure IPv6.
> > >> >
> > >> > >
> > >> >
> > >> > > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
> > >> >
> > >> > > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
> > >> >
> > >> > >
> > >> >
> > >> > > We've noticed that adding explicit SG rules to allow 'returning'
> > >> >
> > >> > > communication for 169.254.169.254:80 and RA / NA fixes the problem.
> > >> >
> > >> > >
> > >> >
> > >> > > I figured that these services are "base" / "basic" and should be
> > >> >
> > >> > > provided to ports regardless of the stateful-ness of SG. I proposed
> > >> >
> > >> > > patches for this here:
> > >> >
> > >> > >
> > >> >
> > >> > > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
> > >> >
> > >> > > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
> > >> >
> > >> > >
> > >> >
> > >> > > Discussion in the patch that adjusts the existing stateless SG test
> > >> >
> > >> > > scenarios to not create explicit SG rules for metadata and ICMP
> > >> >
> > >> > > replies suggests that it's not a given / common understanding that
> > >> >
> > >> > > these "base" services should work by default for stateless SGs.
> > >> >
> > >> > >
> > >> >
> > >> > > See discussion in comments here:
> > >> >
> > >> > > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
> > >> >
> > >> > >
> > >> >
> > >> > > While this discussion is happening in the context of OVN, I think it
> > >> >
> > >> > > should be resolved in a broader context. Specifically, a decision
> > >> >
> > >> > > should be made about what Neutron API "means" by stateless SGs, and
> > >> >
> > >> > > how "base" services are supposed to behave. Then backends can act
> > >> >
> > >> > > accordingly.
> > >> >
> > >> > >
> > >> >
> > >> > > There's also an open question of how this should be implemented.
> > >> >
> > >> > > Whether Neutron would like to create explicit SG rules visible in API
> > >> >
> > >> > > that would allow for the returning traffic and that could be deleted
> > >> >
> > >> > > as needed, or whether backends should do it implicitly. We already
> > >> >
> > >> > > have "default" egress rules, so there's a precedent here. On the other
> > >> >
> > >> > > hand, the egress rules are broad (allowing everything) and there's
> > >> >
> > >> > > more rationale to delete them and replace them with tighter filters.
> > >> >
> > >> > > In my OVN series, I implement ACLs directly in OVN database, without
> > >> >
> > >> > > creating SG rules in Neutron API.
> > >> >
> > >> > >
> > >> >
> > >> > > So, questions for the community to clarify:
> > >> >
> > >> > > - whether Neutron API should define behavior of stateless SGs in general,
> > >> >
> > >> > > - if so, whether Neutron API should also define behavior of stateless
> > >> >
> > >> > > SGs in terms of "base" services like metadata and DHCP,
> > >> >
> > >> > > - if so, whether backends should implement the necessary filters
> > >> >
> > >> > > themselves, or Neutron will create default SG rules itself.
> > >> >
> > >> >
> > >> > I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
> > >> >
> > >> > We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
> > >> >
> > >> > If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
> > >> >
> > >>
> > >> Thanks for clarifying the rationale on picking SG rules and not
> > >> per-backend implementation.
> > >>
> > >> What would be your answer to the two other questions in the list
> > >> above, specifically, "whether Neutron API should define behavior of
> > >> stateless SGs in general" and "whether Neutron API should define
> > >> behavior of stateless SGs in relation to metadata / RA / NA". Once we
> > >> have agreement on these points, we can discuss the exact mechanism -
> > >> whether to implement in backend or in API. But these two questions are
> > >> first order in my view.
> > >>
> > >> (To give an idea of my thinking, I believe API definition should not
> > >> only define fields and their mechanics but also semantics, so
> > >>
> > >> - yes, api-ref should define the meaning ("behavior") of stateless SG
> > >> in general, and
> > >> - yes, api-ref should also define the meaning ("behavior") of
> > >> stateless SG in relation to "standard" services like ipv6 addressing
> > >> or metadata.
> > >>
> > >> As to the last question - whether it's up to ml2 backend to implement
> > >> the behavior, or up to the core SG database plugin - I don't have a
> > >> strong opinion. I lean to "backend" solution just because it allows
> > >> for more granular definition because SG rules may not express some
> > >> filter rules, e.g. source port for metadata replies (an unfortunate
> > >> limitation of SG API that we inherited from AWS?). But perhaps others
> > >> prefer paying the price for having neutron ml2 plugin enforcing the
> > >> behavior consistently across all backends.
> > >>
> > >> >
> > >> > >
> > >> >
> > >> > > I hope I laid the problem out clearly, let me know if anything needs
> > >> >
> > >> > > clarification or explanation.
> > >> >
> > >> >
> > >> > Yes :) At least for me.
> > >> >
> > >> >
> > >> > >
> > >> >
> > >> > > Yours,
> > >> >
> > >> > > Ihar
> > >> >
> > >> > >
> > >> >
> > >> > >
> > >> >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Slawek Kaplonski
> > >> >
> > >> > Principal Software Engineer
> > >> >
> > >> > Red Hat
> > >>
> > >>
> 
> 


-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/fa8556e7/attachment-0001.sig>

From suzhengwei at inspur.com  Thu Mar 30 08:53:47 2023
From: suzhengwei at inspur.com (=?utf-8?B?U2FtIFN1ICjoi4/mraPkvJ8p?=)
Date: Thu, 30 Mar 2023 08:53:47 +0000
Subject: =?utf-8?B?562U5aSNOiBbb3Nsb11baGVhdF1bbWFzYWthcmldW3Nlbmxpbl1bdmVudXNd?=
 =?utf-8?B?W2FsbF0gb3Nsby5kYiAxMy4wLjAgd2lsbCByZW1vdmUgc3FsYWxjaGVteS1t?=
 =?utf-8?Q?igrate_support?=
In-Reply-To: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>
References: <1a7f4dd7ccd000f1b55924b21aaa639aa12d3890.camel@redhat.com>
Message-ID: <23e450dc390b452c8b8129774b94d90e@inspur.com>

Hi, Stephen,
I have tried to remove the dependency on sqlalchemy-migrate from Masakari. But obviously it is not easy to me.
Would you please to take this work? Any help would be very appreciated.


-----????-----
???: Stephen Finucane [mailto:stephenfin at redhat.com] 
????: 2023?3?23? 0:38
???: openstack-discuss at lists.openstack.org
??: [oslo][heat][masakari][senlin][venus][all] oslo.db 13.0.0 will remove sqlalchemy-migrate support

tl;dr: Projects still relying on sqlalchemy-migrate for migrations need to start their switch to alembic immediately. Projects with "legacy" sqlalchemy-migrated based migrations need to drop them.

A quick heads up that oslo.db 13.0.0 will be release in the next month or so and will remove sqlalchemy-migrate support and formally add support for sqlalchemy 2.x. The removal of sqlalchemy-migrate support should only affect projects using oslo.db's sqlalchemy-migrate wrappers, as opposed to using sqlalchemy-migrate directly. For any projects that rely on this functionality, a short-term fix is to vendor the removed code [1] in your project. However, I must emphasise that we're not removing sqlalchemy-migrate integration for the fun of it: it's not compatible with sqlalchemy 2.x and is no longer maintained. If your project uses sqlalchemy-migrate and you haven't migrated to alembic yet, you need to start doing so immediately. If you have migrated to alembic but still have sqlalchemy- migrate "legacy" migrations in-tree, you need to look at dropping these asap.
Anything less will result in broken master when we bump upper-constraints to allow sqlalchemy 2.x in Bobcat. I've listed projects in $subject that appear to be using the removed modules.

For more advice on migrating to sqlalchemy 2.x and alembic, please look at my previous post on the matter [2].

Cheers,
Stephen

[1] https://review.opendev.org/c/openstack/oslo.db/+/853025
[2] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.html


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3606 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/9f99a6ee/attachment.bin>

From sbauza at redhat.com  Thu Mar 30 09:27:45 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Thu, 30 Mar 2023 11:27:45 +0200
Subject: [nova][ptg] Today's agenda (Thursday)
Message-ID: <CALOCmu=z3sMy-S1S4hUbMHsOYepysSnsCq37p1gtKbkYmpk2Zg@mail.gmail.com>

Heya again,

Yesterday was a very productive day. Thanks folks.
Today, we'll have mostly cross-project discussions but we'll also try to
discuss about 3 topics :


   - 13:00 - 14:30 UTC : Nova-Neutron cross-project sessions* in the
   neutron room (tbd)*


   - 14:30 - 14:45 UTC : Transition Xena to EM, any concerns ?


   - 14:45 - 15:00 UTC : break


   - 15:00 - 15:30 UTC : Glance/Cinder/Nova cross-project session about
   secure glance Direct URLs *in the glance room (newton)*


   - 15:30 - 16:30 UTC : Cinder/Nova cross-project session* in the nova
   room (diablo)*


   - 16:30 - 16:50 UTC :  Discuss the next steps with compute hostname
   robustification
   - 16:50 - 16:00 UTC : (tentatively) Instance.name is not persisted and
   just sub'd by every service calling the object

Thanks and enjoy this day.

-Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/96e44078/attachment.htm>

From swogatpradhan22 at gmail.com  Thu Mar 30 09:35:34 2023
From: swogatpradhan22 at gmail.com (Swogat Pradhan)
Date: Thu, 30 Mar 2023 15:05:34 +0530
Subject: Nova undefine secret | openstack | wallaby
In-Reply-To: <e149b8d2a4344480f7e8a911782d732975bbe335.camel@redhat.com>
References: <CAH0LXPpo=+-Z8knwgooYxR6d3SX25i5r3BZk_iNjxXnGZ-Q_gQ@mail.gmail.com>
 <CAH0LXPpUVN6o5JXPL3i1DjkGO2Te5Yf4M5wZNUYnY7W9hFY57A@mail.gmail.com>
 <e149b8d2a4344480f7e8a911782d732975bbe335.camel@redhat.com>
Message-ID: <CAH0LXPqGJaLsauPg3qSb1NMPSjVhCG99n-gHhiKoEmF_AfBO4w@mail.gmail.com>

It is actually not that simple, as everything is containerised.
To get past this issue i deleted two files by the name of <UUID> on the
braemetal nodes from the directory /etc/libvirt/secrets/

This issue is now resolved.

On Tue, Mar 28, 2023 at 5:00?PM Sean Mooney <smooney at redhat.com> wrote:

> On Tue, 2023-03-28 at 06:24 +0530, Swogat Pradhan wrote:
> > Update podman logs:
> > [root at dcn01-hci-1 ~]# podman logs 3e5e6c1a7864
> > ------------------------------------------------
> > Initializing virsh secrets for: dcn01:openstack
> > --------
> > Initializing the virsh secret for 'dcn01' cluster
> > (cec7cdfd-3667-57f1-afaf-5dfca9b0e975) 'openstack' client
> > The /etc/nova/secret.xml file already exists
> > error: Failed to set attributes from /etc/nova/secret.xml
> > error: internal error: a secret with UUID
> > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> > client.openstack secret
>
> you jsut do "virsh secret-undefine <uuid>"
>
> >
> >
> > On Tue, Mar 28, 2023 at 6:19?AM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> > wrote:
> >
> > > Hi,
> > > For some reason, i had to redeploy ceph for my hci nodes and then found
> > > that the deployment command is giving out the following error:
> > > 2023-03-28 01:49:46.709605 |                                      |
> > >  WARNING | ERROR: Can't run container nova_libvirt_init_secret
> > > stderr: error: Failed to set attributes from /etc/nova/secret.xml
> > > error: internal error: a secret with UUID
> > > bd136bb0-fd78-5429-ab80-80b8c571d821 already defined for use with
> > > client.openstack secret
> > > 2023-03-28 01:49:46.711176 | 48d539a1-1679-623b-0af7-000000004b45 |
> > >  FATAL | Create containers managed by Podman for
> > > /var/lib/tripleo-config/container-startup-config/step_4 | dcn01-hci-0 |
> > > error={"changed": false, "msg": "Failed containers:
> > > nova_libvirt_init_secret"}
> > >
> > > Can you please tell me how I can undefine the existing secret?
> > >
> > > With regards,
> > > Swogat Pradhan
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/4efd64c4/attachment.htm>

From sbauza at redhat.com  Thu Mar 30 09:44:02 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Thu, 30 Mar 2023 11:44:02 +0200
Subject: [nova][ptg] Today's agenda (Thursday)
In-Reply-To: <CALOCmu=z3sMy-S1S4hUbMHsOYepysSnsCq37p1gtKbkYmpk2Zg@mail.gmail.com>
References: <CALOCmu=z3sMy-S1S4hUbMHsOYepysSnsCq37p1gtKbkYmpk2Zg@mail.gmail.com>
Message-ID: <CALOCmu=+Ej8x4+fLfmSFRNmpVs8jYg0eJ1gfcy7nxiDXzETREQ@mail.gmail.com>

Just a short modification : we will use the Cinder room for both the
Glance/Nova and Cinder/Nova discussions, starting at 1500UTC.
Etherpad is accordingly modified
https://etherpad.opendev.org/p/nova-bobcat-ptg#L55


Le jeu. 30 mars 2023 ? 11:27, Sylvain Bauza <sbauza at redhat.com> a ?crit :

> Heya again,
>
> Yesterday was a very productive day. Thanks folks.
> Today, we'll have mostly cross-project discussions but we'll also try to
> discuss about 3 topics :
>
>
>    - 13:00 - 14:30 UTC : Nova-Neutron cross-project sessions* in the
>    neutron room (tbd)*
>
>
>    - 14:30 - 14:45 UTC : Transition Xena to EM, any concerns ?
>
>
>    - 14:45 - 15:00 UTC : break
>
>
>    - 15:00 - 15:30 UTC : Glance/Cinder/Nova cross-project session about
>    secure glance Direct URLs *in the glance room (newton)*
>
>
>    - 15:30 - 16:30 UTC : Cinder/Nova cross-project session* in the nova
>    room (diablo)*
>
>
>    - 16:30 - 16:50 UTC :  Discuss the next steps with compute hostname
>    robustification
>    - 16:50 - 16:00 UTC : (tentatively) Instance.name is not persisted and
>    just sub'd by every service calling the object
>
> Thanks and enjoy this day.
>
> -Sylvain
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/ae1bfc45/attachment-0001.htm>

From sbauza at redhat.com  Thu Mar 30 10:10:16 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Thu, 30 Mar 2023 12:10:16 +0200
Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu
 Jammy
In-Reply-To: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>
References: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>
Message-ID: <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>

Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami <tkajinam at redhat.com> a
?crit :

> Hello,
>
>
> Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate
> jobs have
> become very flaky. Further investigation revealed that the issue is
> related to something
> in libvirt from Ubuntu Jammy and that prevents detaching devices from
> instances[1].
>
> The same problem appears in different jobs[2] and we workaround the
> problem by disabling
> some affected jobs. In heat we also disabled some flaky tests but because
> of this we no longer
> run basic scenario tests which deploys instance/volume/network in a single
> stack, which means
> we lost the quite basic test coverage.
>
> My question is, is there anyone in the Nova team working on "fixing" this
> problem ?
> We might be able to implement some workaround (like checking status of the
> instances before
> attempting to delete it) but this should be fixed in libvirt side IMO, as
> this looks like a "regression"
> in Ubuntu Jammy.
> Probably we should report a bug against the libvirt package in Ubuntu but
> I'd like to hear some
> thoughts from the nova team because they are more directly affected by
> this problem.
>
>

FWIW, we discussed about it yesterday on our vPTG :
https://etherpad.opendev.org/p/nova-bobcat-ptg#L289

Most of the problems come from the volume detach thing. We also merged some
Tempest changes for not trying to cleanup some volumes if the test was OK
(thanks Dan for this). We also added more verifications to ask SSH to wait
for a bit of time before calling the instance.
Eventually, as you see in the etherpad, we didn't found any solutions but
we'll try to add some canary job for testing multiple times volume
attachs/detachs.

We'll also continue to discuss on the CI failures during every Nova weekly
meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a
cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and
others.
I leave other SMEs to reply on your other points, like for c9s.


> I'm now trying to set up a centos stream 9 job in Heat repo to see whether
> this can be reproduced
> if we use centos stream 9. I've been running that specific scenario test
> in centos stream 9 jobs
> in puppet repos but I've never seen this issue, so I suspect the issue is
> really specific to libvirt
> in Jammy.
>


Well, maybe I'm wrong, but no, we also have a centos9stream issue for
volume detachs :
https://bugs.launchpad.net/nova/+bug/1960346


> [1] https://bugs.launchpad.net/nova/+bug/1998274
> [2] https://bugs.launchpad.net/nova/+bug/1998148
>
> Thank you,
> Takashi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/357afacb/attachment.htm>

From skidoo at tlen.pl  Thu Mar 30 10:10:23 2023
From: skidoo at tlen.pl (Luk)
Date: Thu, 30 Mar 2023 12:10:23 +0200
Subject: Migration from linuxbridge to ovs
Message-ID: <1253710667.20230330121023@tlen.pl>

Hello,

Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ?

We have now linuxbridge with l3-ha, and we noticed that for example when doing live migration of VM from linuxbridge baked compute to openvswitch compute is created
bridge... inside openvswitch, instead adding qvo device to br-int:

     Bridge brq91dc40ac-ea
        datapath_type: system
        Port qvo84e2bd98-e9
            Interface qvo84e2bd98-e9
        Port brq91dc40ac-ea
            Interface brq91dc40ac-ea
                type: internal

After removing the brq91dc40ac-ea from ovs, and hard reboot, the qvo interface is added properly to br-int:

  Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: system
        Port qvo84e2bd98-e9             
            tag: 1
            Interface qvo84e2bd98-e9

Also, before hard reboot, there is no flow for br-int or any other openvswitch bridge regarding this VM/ip.

Does anyone have same problems ? Have tried to migrate from lb to ovs ?

Openstack version: ussuri
OS: ubuntu 20

Regards
Lukasz


From ralonsoh at redhat.com  Thu Mar 30 10:27:39 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 30 Mar 2023 12:27:39 +0200
Subject: Migration from linuxbridge to ovs
In-Reply-To: <1253710667.20230330121023@tlen.pl>
References: <1253710667.20230330121023@tlen.pl>
Message-ID: <CAECr9X433XWS_d2q6htC9nkvwzbs=4c03XMW6rWjufPxxxiBcA@mail.gmail.com>

Hi Lukasz:

This is happening because you are using the "iptables_hybrid" firewall
driver in the OVS agent. That creates a namespace where a set of iptables
is defined (firewall rules) and a linux bridge, that is connected to OVS
using a veth pair [1]. If you need the native plug implementation, then use
the native firewall (or don't use any). That will create a TAP port
directly connected to the integration bridge.

Regards.

[1]https://www.rdoproject.org/networking/networking-in-too-much-detail/

On Thu, Mar 30, 2023 at 12:11?PM Luk <skidoo at tlen.pl> wrote:

> Hello,
>
> Can You share some thoughts/ideas or some clues regarding migration from
> linux bridge to ovs ? Does this migration is posible without interrupting
> traffic from VMs ?
>
> We have now linuxbridge with l3-ha, and we noticed that for example when
> doing live migration of VM from linuxbridge baked compute to openvswitch
> compute is created
> bridge... inside openvswitch, instead adding qvo device to br-int:
>
>      Bridge brq91dc40ac-ea
>         datapath_type: system
>         Port qvo84e2bd98-e9
>             Interface qvo84e2bd98-e9
>         Port brq91dc40ac-ea
>             Interface brq91dc40ac-ea
>                 type: internal
>
> After removing the brq91dc40ac-ea from ovs, and hard reboot, the qvo
> interface is added properly to br-int:
>
>   Bridge br-int
>         Controller "tcp:127.0.0.1:6633"
>             is_connected: true
>         fail_mode: secure
>         datapath_type: system
>         Port qvo84e2bd98-e9
>             tag: 1
>             Interface qvo84e2bd98-e9
>
> Also, before hard reboot, there is no flow for br-int or any other
> openvswitch bridge regarding this VM/ip.
>
> Does anyone have same problems ? Have tried to migrate from lb to ovs ?
>
> Openstack version: ussuri
> OS: ubuntu 20
>
> Regards
> Lukasz
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/7cdf1ee8/attachment.htm>

From tkajinam at redhat.com  Thu Mar 30 10:54:03 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Thu, 30 Mar 2023 19:54:03 +0900
Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu
 Jammy
In-Reply-To: <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>
References: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>
 <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>
Message-ID: <CAL_crJQkPaoSCyXmYP_ULr8SHd5WjDn=DCX_tT07Wo=XrmyZ1g@mail.gmail.com>

Thank you, Sylvain, for all these inputs !

On Thu, Mar 30, 2023 at 7:10?PM Sylvain Bauza <sbauza at redhat.com> wrote:

>
>
> Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami <tkajinam at redhat.com> a
> ?crit :
>
>> Hello,
>>
>>
>> Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate
>> jobs have
>> become very flaky. Further investigation revealed that the issue is
>> related to something
>> in libvirt from Ubuntu Jammy and that prevents detaching devices from
>> instances[1].
>>
>> The same problem appears in different jobs[2] and we workaround the
>> problem by disabling
>> some affected jobs. In heat we also disabled some flaky tests but because
>> of this we no longer
>> run basic scenario tests which deploys instance/volume/network in a
>> single stack, which means
>> we lost the quite basic test coverage.
>>
>> My question is, is there anyone in the Nova team working on "fixing" this
>> problem ?
>> We might be able to implement some workaround (like checking status of
>> the instances before
>> attempting to delete it) but this should be fixed in libvirt side IMO, as
>> this looks like a "regression"
>> in Ubuntu Jammy.
>> Probably we should report a bug against the libvirt package in Ubuntu but
>> I'd like to hear some
>> thoughts from the nova team because they are more directly affected by
>> this problem.
>>
>>
>
> FWIW, we discussed about it yesterday on our vPTG :
> https://etherpad.opendev.org/p/nova-bobcat-ptg#L289
>
> Most of the problems come from the volume detach thing. We also merged
> some Tempest changes for not trying to cleanup some volumes if the test was
> OK (thanks Dan for this). We also added more verifications to ask SSH to
> wait for a bit of time before calling the instance.
> Eventually, as you see in the etherpad, we didn't found any solutions but
> we'll try to add some canary job for testing multiple times volume
> attachs/detachs.
>

> We'll also continue to discuss on the CI failures during every Nova weekly
> meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a
> cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and
> others.
> I leave other SMEs to reply on your other points, like for c9s.
>

It's good to hear that the issue is still getting attention. I'll catch up
the discussion by reading the etherpad
and will try to attend follow-up discussions if possible, especially if I
can attend Vancouver vPTG.

I know some changes have been proposed to check ssh-ability to workaround
the problem (though
the comment in the vPTG session indicates  that does not fully solve the
problem) but it's still annoying
because we don't really block resource deletions based on instance status
(especially its internal status)
so we eventually need some solutions here to avoid this problem, IMHO.


>
>> I'm now trying to set up a centos stream 9 job in Heat repo to see
>> whether this can be reproduced
>> if we use centos stream 9. I've been running that specific scenario test
>> in centos stream 9 jobs
>> in puppet repos but I've never seen this issue, so I suspect the issue is
>> really specific to libvirt
>> in Jammy.
>>
>
>
> Well, maybe I'm wrong, but no, we also have a centos9stream issue for
> volume detachs :
> https://bugs.launchpad.net/nova/+bug/1960346
>
>
I just managed to launch a c9s job in heat but it seems the issue is
reproducible in c9s as well[1].
I'll rerun the job a few more times to see how frequent the issue appears
in c9s compared to
ubuntu.
We do not run many tests in puppet jobs so that might be the reason I've
never hit it in
puppet jobs.

[1] https://review.opendev.org/c/openstack/heat/+/879014


>
>
>> [1] https://bugs.launchpad.net/nova/+bug/1998274
>> [2] https://bugs.launchpad.net/nova/+bug/1998148
>>
>> Thank you,
>> Takashi
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/31293ac3/attachment-0001.htm>

From smooney at redhat.com  Thu Mar 30 11:16:09 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 30 Mar 2023 12:16:09 +0100
Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu
 Jammy
In-Reply-To: <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>
References: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>
 <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>
Message-ID: <8e7154b88655b49c6fb2af3053fd9f7307d246cc.camel@redhat.com>

On Thu, 2023-03-30 at 12:10 +0200, Sylvain Bauza wrote:
> Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami <tkajinam at redhat.com> a
> ?crit :
> 
> > Hello,
> > 
> > 
> > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate
> > jobs have
> > become very flaky. Further investigation revealed that the issue is
> > related to something
> > in libvirt from Ubuntu Jammy and that prevents detaching devices from
> > instances[1].

for what its worth this is not a probelm that is new in jammy it also affect
the libvirt/qemu verion in focal and i centos 9 stream.

this detach issue was intoduced in qemu as a sideeffect of fixign a security issue.
we mostly mitigated the impact on Focal with some tempest changes but not entirly

> > 
> > The same problem appears in different jobs[2] and we workaround the
> > problem by disabling
> > some affected jobs. In heat we also disabled some flaky tests but because
> > of this we no longer
> > run basic scenario tests which deploys instance/volume/network in a single
> > stack, which means
> > we lost the quite basic test coverage.
> > 
> > My question is, is there anyone in the Nova team working on "fixing" this
> > problem ?
yes and no we cannot fix this in nova as it not a nova issue its a issue with
qemu/libvirt and possible cirros.

one possible "fix" is to stop using cirros so i did a few things last night
first i tried using the ubuntu-minimal-cloud-image
this is strip down image that is smaller and uses less memory

while it could boot with the normal cirros flavor with 128mb of ram it OOMd cloud-init
fortunetly it was after ssh was set up so i could log in but its too close to the memory limit to use.

second attempt was to revive my alpine disk image builder serise 
https://review.opendev.org/c/openstack/diskimage-builder/+/755410

that now works to generate really light weight image (its using about 30mb of ram while idel)

i am going to try creating a job that will use that instead of cirros
for now im just goign to use a pre playbook to build the image in the job and make destack use
that instead.


> > We might be able to implement some workaround (like checking status of the
> > instances before
> > attempting to delete it) but this should be fixed in libvirt side IMO, as
> > this looks like a "regression"
> > in Ubuntu Jammy.
This is not new in Jammy and it should affect RHEL9

i am very very surpsied this is not causeing us a lot of internal pain for our downstream
ci as it was breaking centos 9 before it started affecting ubuntu.

we have seen downstream detach issues but the sshablae changes in tempest mostly helped
so this is not just a ubuntu issue its affecting all distros includeing rhel.

this is the upstream libvirt bug for the current probelm https://gitlab.com/libvirt/libvirt/-/issues/309 
https://bugzilla.redhat.com/show_bug.cgi?id=2087047 is the downstream tracker for the libvirt team to actully
fix this i have left a comment there to see if i can move that along.

> > Probably we should report a bug against the libvirt package in Ubuntu but
> > I'd like to hear some
> > thoughts from the nova team because they are more directly affected by
> > this problem.
> > 
> > 
> 
> FWIW, we discussed about it yesterday on our vPTG :
> https://etherpad.opendev.org/p/nova-bobcat-ptg#L289
> 
> Most of the problems come from the volume detach thing. We also merged some
> Tempest changes for not trying to cleanup some volumes if the test was OK
> (thanks Dan for this). We also added more verifications to ask SSH to wait
> for a bit of time before calling the instance.
> Eventually, as you see in the etherpad, we didn't found any solutions but
> we'll try to add some canary job for testing multiple times volume
> attachs/detachs.
> 
> We'll also continue to discuss on the CI failures during every Nova weekly
> meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a
> cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and
> others.
> I leave other SMEs to reply on your other points, like for c9s.
c9s hit this before ubuntu did it will not help
> 
> 
> > I'm now trying to set up a centos stream 9 job in Heat repo to see whether
> > this can be reproduced
> > if we use centos stream 9. I've been running that specific scenario test
> > in centos stream 9 jobs
> > in puppet repos but I've never seen this issue, so I suspect the issue is
> > really specific to libvirt
> > in Jammy.
> > 
> 
> 
> Well, maybe I'm wrong, but no, we also have a centos9stream issue for
> volume detachs :
> https://bugs.launchpad.net/nova/+bug/1960346
> 
> 
> 
> > [1] https://bugs.launchpad.net/nova/+bug/1998274
> > [2] https://bugs.launchpad.net/nova/+bug/1998148
> > 
> > Thank you,
> > Takashi
> > 


From smooney at redhat.com  Thu Mar 30 11:18:31 2023
From: smooney at redhat.com (Sean Mooney)
Date: Thu, 30 Mar 2023 12:18:31 +0100
Subject: [nova][heat] The next steps to "fix" libvirt problems in Ubuntu
 Jammy
In-Reply-To: <CAL_crJQkPaoSCyXmYP_ULr8SHd5WjDn=DCX_tT07Wo=XrmyZ1g@mail.gmail.com>
References: <CAL_crJQ+gdTu6ohhuYQsjhQeQvpHbwqWxnoUSU9gGaYCee2m7g@mail.gmail.com>
 <CALOCmukx_Lm09pw5n0xW63FQOnCS9LSiEJuYUb4uefixQiLfEg@mail.gmail.com>
 <CAL_crJQkPaoSCyXmYP_ULr8SHd5WjDn=DCX_tT07Wo=XrmyZ1g@mail.gmail.com>
Message-ID: <25dab368b2c68cc18ae83a52927c94561f46a77d.camel@redhat.com>

On Thu, 2023-03-30 at 19:54 +0900, Takashi Kajinami wrote:
> Thank you, Sylvain, for all these inputs !
> 
> On Thu, Mar 30, 2023 at 7:10?PM Sylvain Bauza <sbauza at redhat.com> wrote:
> 
> > 
> > 
> > Le jeu. 30 mars 2023 ? 06:16, Takashi Kajinami <tkajinam at redhat.com> a
> > ?crit :
> > 
> > > Hello,
> > > 
> > > 
> > > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate
> > > jobs have
> > > become very flaky. Further investigation revealed that the issue is
> > > related to something
> > > in libvirt from Ubuntu Jammy and that prevents detaching devices from
> > > instances[1].
> > > 
> > > The same problem appears in different jobs[2] and we workaround the
> > > problem by disabling
> > > some affected jobs. In heat we also disabled some flaky tests but because
> > > of this we no longer
> > > run basic scenario tests which deploys instance/volume/network in a
> > > single stack, which means
> > > we lost the quite basic test coverage.
> > > 
> > > My question is, is there anyone in the Nova team working on "fixing" this
> > > problem ?
> > > We might be able to implement some workaround (like checking status of
> > > the instances before
> > > attempting to delete it) but this should be fixed in libvirt side IMO, as
> > > this looks like a "regression"
> > > in Ubuntu Jammy.
> > > Probably we should report a bug against the libvirt package in Ubuntu but
> > > I'd like to hear some
> > > thoughts from the nova team because they are more directly affected by
> > > this problem.
> > > 
> > > 
> > 
> > FWIW, we discussed about it yesterday on our vPTG :
> > https://etherpad.opendev.org/p/nova-bobcat-ptg#L289
> > 
> > Most of the problems come from the volume detach thing. We also merged
> > some Tempest changes for not trying to cleanup some volumes if the test was
> > OK (thanks Dan for this). We also added more verifications to ask SSH to
> > wait for a bit of time before calling the instance.
> > Eventually, as you see in the etherpad, we didn't found any solutions but
> > we'll try to add some canary job for testing multiple times volume
> > attachs/detachs.
> > 
> 
> > We'll also continue to discuss on the CI failures during every Nova weekly
> > meetings (Tuesdays at 1600UTC on #openstack-nova) and I'll want to ask a
> > cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and
> > others.
> > I leave other SMEs to reply on your other points, like for c9s.
> > 
> 
> It's good to hear that the issue is still getting attention. I'll catch up
> the discussion by reading the etherpad
> and will try to attend follow-up discussions if possible, especially if I
> can attend Vancouver vPTG.
> 
> I know some changes have been proposed to check ssh-ability to workaround
> the problem (though
> the comment in the vPTG session indicates  that does not fully solve the
> problem) but it's still annoying
> because we don't really block resource deletions based on instance status
> (especially its internal status)
> so we eventually need some solutions here to avoid this problem, IMHO.
> 
> 
> > 
> > > I'm now trying to set up a centos stream 9 job in Heat repo to see
> > > whether this can be reproduced
> > > if we use centos stream 9. I've been running that specific scenario test
> > > in centos stream 9 jobs
> > > in puppet repos but I've never seen this issue, so I suspect the issue is
> > > really specific to libvirt
> > > in Jammy.
> > > 
> > 
> > 
> > Well, maybe I'm wrong, but no, we also have a centos9stream issue for
> > volume detachs :
> > https://bugs.launchpad.net/nova/+bug/1960346
> > 
> > 
> I just managed to launch a c9s job in heat but it seems the issue is
> reproducible in c9s as well[1].

ya i replied in paralle in my other reply i noted that we saw this issue
first in c9s then in ubuntu and we also see this in our internal downstram
ci.

changing the distro we use for the devstack jobs wont help unless we downgrade libvirt and qemu to before the
orginal change in lbvirt was done. which would break other things.
> I'll rerun the job a few more times to see how frequent the issue appears
> in c9s compared to
> ubuntu.
> We do not run many tests in puppet jobs so that might be the reason I've
> never hit it in
> puppet jobs.
> 
> [1] https://review.opendev.org/c/openstack/heat/+/879014
> 
> 
> > 
> > 
> > > [1] https://bugs.launchpad.net/nova/+bug/1998274
> > > [2] https://bugs.launchpad.net/nova/+bug/1998148
> > > 
> > > Thank you,
> > > Takashi
> > > 
> > 


From christian.rohmann at inovex.de  Thu Mar 30 12:24:56 2023
From: christian.rohmann at inovex.de (Christian Rohmann)
Date: Thu, 30 Mar 2023 14:24:56 +0200
Subject: Migration from linuxbridge to ovs
In-Reply-To: <1253710667.20230330121023@tlen.pl>
References: <1253710667.20230330121023@tlen.pl>
Message-ID: <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de>

On 30/03/2023 12:10, Luk wrote:
> Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ?

I asked a similar questions back in August - 
https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030070.html, 
maybe there are some insights there.

We did not replace the SDN in place, but as actively looking into 
setting up a new cloud. Not that we do not believe in the idea of being 
able to replace the SDN,
but we intend to change much much more and migrating through many big 
changes is too inefficient compared to replacing the cloud with a new one.


Regards

Christian


From ralonsoh at redhat.com  Thu Mar 30 12:44:35 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Thu, 30 Mar 2023 14:44:35 +0200
Subject: [neutron][ptg] Today's agenda
Message-ID: <CAECr9X6LiD39e751ydi0Q4bvZqi1mTmx0ccGrYVnuzdszAdeSw@mail.gmail.com>

Hello Neutrinos:

Same as yesterday, we have a packed agenda. Today we'll have the
Nova-Neutron meetings, starting at 13UTC. Quick summary:
* delete_on_termination for Neutron ports
* Blueprint: "Add support for Napatech LinkVirt SmartNICs"
* https://bugs.launchpad.net/neutron/+bug/1986003
* ovn-bgp-agent roadmap
* neutron-dynamic-routing: Make static scheduler finally the default?

See you in a few minutes. Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/545ce737/attachment.htm>

From fungi at yuggoth.org  Thu Mar 30 14:17:39 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Thu, 30 Mar 2023 14:17:39 +0000
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
Message-ID: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org>

On 2023-03-30 11:20:43 +0900 (+0900), Takashi Kajinami wrote:
[...]
> latest tempest CAN'T be installed with stable/xena u-c because
> current tempest requires fasteners>=0.16.0 which conflicts with
> 0.14.1 in stable/xena u-c.
[...]

Won't this situation sort itself out in a few weeks when the Tempest
master branch officially ceases support for stable/xena?

But more generally, Tempest isn't expected to be coinstallable with
stable branches of projects, it's supposed to be installed into an
isolated venv or possibly even onto an entirely separate VM. Why not
move the problem tests into the heat-tempest-plugin repository
instead, since it should be properly coinstallable with Tempest?
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/f1688b5d/attachment.sig>

From wodel.youchi at gmail.com  Thu Mar 30 20:19:28 2023
From: wodel.youchi at gmail.com (wodel youchi)
Date: Thu, 30 Mar 2023 21:19:28 +0100
Subject: [kolla-ansible] deploy cinder and glance with multi-backend
Message-ID: <CAJV_UBD6PNPLuZPL2PXDcm3XbY_3He6QxFXm080SB_tTvLxFtg@mail.gmail.com>

Hi,

I need to deploy cinder and glance with multi-backend.

My experience for now is simple, my deployment is an HCI built on top of
ceph storage, and I am using it to store both cinder volumes and glance
images.

I have some questions if you can help:
- can multi-backend be deployed at first run with kolla-ansible, or do I
need to do at least two runs?

- reading the doc about multi-backend, I saw that the first backend is
specified, do I need to remove the creation of the first backend from
glabals.yml and put it in a configuration file cinder.conf and glance.conf?

In other words, do I have to create all the backends from the config files,
or can I still create the first one using globals.yml and the rest from the
config files?

I hope my question is clear.

Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/c075860b/attachment.htm>

From satish.txt at gmail.com  Thu Mar 30 20:49:43 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Thu, 30 Mar 2023 16:49:43 -0400
Subject: [kolla] Image building question
Message-ID: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>

Folks,

I am playing with kolla image building to understand how it works. I am
using the following command to build images and wanted to check with you
folks if that is the correct way to do it.

$ kolla-build -b ubuntu -t source keystone nova neutron glance

Does the above command compile code from source or just download images
from remote repositories and re-compile them?  because in command output
I've not noticed anything related to the compiling process going on.

Here is the output of all images produced by kolla-build command. Do I need
anything else or is this enough to deploy kolla?

root at docker-reg:~# docker images
REPOSITORY                            TAG       IMAGE ID       CREATED
        SIZE
kolla/mariadb-server                  15.1.0    2a497eee8269   26 minutes
ago      595MB
kolla/cron                            15.1.0    342877f26a8a   30 minutes
ago      250MB
kolla/memcached                       15.1.0    0d19a4902644   31 minutes
ago      250MB
kolla/mariadb-clustercheck            15.1.0    d84427d3c639   31 minutes
ago      314MB
kolla/mariadb-base                    15.1.0    34447e3e59b6   31 minutes
ago      314MB
kolla/keepalived                      15.1.0    82133b09fbf0   31 minutes
ago      260MB
kolla/prometheus-memcached-exporter   15.1.0    6c2d605f70ee   31 minutes
ago      262MB
<none>                                <none>    e66b228c2a07   31 minutes
ago      248MB
kolla/rabbitmq                        15.1.0    8de5c39379d3   32 minutes
ago      309MB
kolla/fluentd                         15.1.0    adfd19027862   33 minutes
ago      519MB
kolla/haproxy-ssh                     15.1.0    514357ac4d36   36 minutes
ago      255MB
kolla/haproxy                         15.1.0    e5b9cfdf6dfc   37 minutes
ago      257MB
kolla/prometheus-haproxy-exporter     15.1.0    a679f65fd735   37 minutes
ago      263MB
kolla/prometheus-base                 15.1.0    afeff3ed5dce   37 minutes
ago      248MB
kolla/glance-api                      15.1.0    a2241f68f23a   38 minutes
ago      1.04GB
kolla/glance-base                     15.1.0    7286772a03a4   About an
hour ago   1.03GB
kolla/neutron-infoblox-ipam-agent     15.1.0    f90ffc1a3326   About an
hour ago   1.05GB
kolla/neutron-server                  15.1.0    69c844a2e3a9   About an
hour ago   1.05GB
kolla/neutron-l3-agent                15.1.0    4d87e6963c96   About an
hour ago   1.05GB
<none>                                <none>    486da9a6562e   About an
hour ago   1.05GB
kolla/neutron-linuxbridge-agent       15.1.0    e5b3ca7e099c   About an
hour ago   1.04GB
kolla/neutron-bgp-dragent             15.1.0    ac37377820c6   About an
hour ago   1.04GB
kolla/ironic-neutron-agent            15.1.0    90993adcd74b   About an
hour ago   1.04GB
kolla/neutron-metadata-agent          15.1.0    8522f147f88d   About an
hour ago   1.04GB
kolla/neutron-sriov-agent             15.1.0    8a92ce7d13c0   About an
hour ago   1.04GB
kolla/neutron-dhcp-agent              15.1.0    5c214b0171f5   About an
hour ago   1.04GB
kolla/neutron-metering-agent          15.1.0    7b3b91ecd77b   About an
hour ago   1.04GB
kolla/neutron-openvswitch-agent       15.1.0    1f8807308814   About an
hour ago   1.04GB
kolla/neutron-base                    15.1.0    f85b6a2e2725   About an
hour ago   1.04GB
kolla/nova-libvirt                    15.1.0    0f3ecefe4752   About an
hour ago   987MB
kolla/nova-compute                    15.1.0    241b7e7fafbe   About an
hour ago   1.47GB
kolla/nova-spicehtml5proxy            15.1.0    b740820a7ad1   About an
hour ago   1.15GB
kolla/nova-novncproxy                 15.1.0    1ba2f443d5c3   About an
hour ago   1.22GB
kolla/nova-compute-ironic             15.1.0    716612107532   About an
hour ago   1.12GB
kolla/nova-ssh                        15.1.0    ae2397f4e1c1   About an
hour ago   1.11GB
kolla/nova-api                        15.1.0    2aef02667ff8   About an
hour ago   1.11GB
kolla/nova-conductor                  15.1.0    6f1da3400901   About an
hour ago   1.11GB
kolla/nova-scheduler                  15.1.0    628326776b1d   About an
hour ago   1.11GB
kolla/nova-serialproxy                15.1.0    28eb7a4a13f8   About an
hour ago   1.11GB
kolla/nova-base                       15.1.0    e47420013283   About an
hour ago   1.11GB
kolla/keystone                        15.1.0    e5530d829d5f   2 hours ago
        947MB
kolla/keystone-ssh                    15.1.0    eaa7e3f3985a   2 hours ago
        953MB
kolla/keystone-fernet                 15.1.0    8a4fa24853a8   2 hours ago
        951MB
kolla/keystone-base                   15.1.0    b6f9562364a9   2 hours ago
        945MB
kolla/barbican-base                   15.1.0    b2fdef1afb44   2 hours ago
        915MB
kolla/barbican-keystone-listener      15.1.0    58bd59de2c63   2 hours ago
        915MB
kolla/openstack-base                  15.1.0    c805b4b3b1c1   2 hours ago
        893MB
kolla/base                            15.1.0    f68e9ef3dd30   2 hours ago
        248MB
registry                              2         8db46f9d7550   19 hours ago
       24.2MB
ubuntu                                22.04     08d22c0ceb15   3 weeks ago
        77.8MB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/7a50f660/attachment-0001.htm>

From gmann at ghanshyammann.com  Thu Mar 30 21:24:33 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Thu, 30 Mar 2023 14:24:33 -0700
Subject: [ptl] Need PTL volunteer for OpenStack Winstackers
In-Reply-To: <A8A3517C-9022-4D13-9784-B3CB41B422DA@cloudbasesolutions.com>
References: <1870a6b7a1d.114e70a2d994244.3514791188773000084@ghanshyammann.com>
 <A8A3517C-9022-4D13-9784-B3CB41B422DA@cloudbasesolutions.com>
Message-ID: <1873468752c.bf93124c180931.3029567703559224707@ghanshyammann.com>

Thanks, Lucian, for the updates and email link.

As the next step, we will discuss it in TC and take the next action.

-gmann

 ---- On Thu, 30 Mar 2023 00:11:38 -0700  Lucian Petrut  wrote --- 
 > Hi,
 > Thanks for reaching out. As mentioned here [1], Cloudbase Solutions can no longer lead the Winstackers project. Since there weren?t any other interested parties, I think there?s no other option but to retire the project.
 > [1]?https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031044.html
 > Regards,Lucian Petrut
 > 
 > On 22 Mar 2023, at 19:43, Ghanshyam Mann gmann at ghanshyammann.com> wrote:
 > Hi Lukas,
 > 
 > I am reaching out to you as you were PTL for OpenStack Winstackers project in the last cycle.
 > 
 > There is no PTL candidate for the next cycle (2023.2), and it is on the leaderless project list. Please
 > check if you or anyone you know would like to lead this project.
 > 
 > - https://etherpad.opendev.org/p/2023.2-leaderless
 > 
 > Also, if anyone else would like to help leading this project, this is time to let TC knows.
 > 
 > -gmann
 > 
 > 


From gmann at ghanshyammann.com  Fri Mar 31 02:12:46 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Thu, 30 Mar 2023 19:12:46 -0700
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org>
Message-ID: <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com>

 ---- On Thu, 30 Mar 2023 07:17:39 -0700  Jeremy Stanley  wrote --- 
 > On 2023-03-30 11:20:43 +0900 (+0900), Takashi Kajinami wrote:
 > [...]
 > > latest tempest CAN'T be installed with stable/xena u-c because
 > > current tempest requires fasteners>=0.16.0 which conflicts with
 > > 0.14.1 in stable/xena u-c.
 > [...]
 > 
 > Won't this situation sort itself out in a few weeks when the Tempest
 > master branch officially ceases support for stable/xena?
 > 
 > But more generally, Tempest isn't expected to be coinstallable with
 > stable branches of projects, it's supposed to be installed into an
 > isolated venv or possibly even onto an entirely separate VM. Why not
 > move the problem tests into the heat-tempest-plugin repository
 > instead, since it should be properly coinstallable with Tempest?

This is not related to stable/xena or heat tests. Grenade job running on immediately
supported branch from EM branch where the base is EM branch using old tempest
and stable constraints and target use master tempest and constraints. When you run
tempest on target, it causes an issue as constraints var are not set properly for the target.

-gmann

 > -- 
 > Jeremy Stanley
 > 


From gmann at ghanshyammann.com  Fri Mar 31 02:26:25 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Thu, 30 Mar 2023 19:26:25 -0700
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
Message-ID: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com>

 ---- On Wed, 29 Mar 2023 19:20:43 -0700  Takashi Kajinami  wrote --- 
 > Hello,
 > 
 > I have had some local discussions with gmann, but I'd really like to move this discussion forwardto fix the broken stable/xena gate in heat so I will start this thread, hoping the thread can providemore context behind my proposal.
 > Historically stable branches of heat have been frequently affected by any change in requirementsof tempest. This is mainly because in our CI we install our own in-tree integration tests[1] intotempest venv where tempest and heat-tempest-plugin are installed. Because in-tree integration testsare tied to that specific stable branch, this has been often causing conflicts in requirements(master constraint vs stable/X constraint).

Let me explain the issue here. It is not because of using tempest from upper constraints or branchless things,
it is because we are not setting the tempest venv constraints correctly for the target tempest run in the grenade job.

We fixed the tempest venv constraints setting for the tempest test run on the base branch[1] but forgot to do the same
for the target branch test run. As we do not have any grenade job except heat which is running tempest on the target branch
in the grenade job, we could not face this issue and heat testing unhide it. I then reproduce it on a normal grenade job by
running the tempest on target and the same issue[2][3].

The issue is when base and target branches have different Tempest and constraints to use (for example, stable/wallaby uses old tempest
and stable/wallaby constraints, but stable/xena use tempest master and master constraints); in such cases, we need to set proper constraints
defined in devstack and then run tempest. It will happen in the grenade job running on the immediately supported branch of the latest EM.

I have pushed the grenade fix[4] and testing it by applying the same in heat[5]. If it work then I will push heat change
form master itself and backported till stable/xena, so we fix it for all future EM/stable branches.

[1] https://review.opendev.org/q/topic:bug%252F2003993
[2] https://review.opendev.org/c/openstack/grenade/+/878247/1
[3] https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184
[4] https://review.opendev.org/c/openstack/grenade/+/879113
[5] https://review.opendev.org/c/openstack/heat/+/872055


-gmann

 > 
 > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests
 > In the past we changed our test installation[2] to use stable constraint to avoid this conflicts,but this approach does no longer work since stable/xena because
 > 1. stable/xena u-c no longer includes tempest
 > 2. latest tempest CAN'T be installed with stable/xena u-c because current tempest requires??? fasteners>=0.16.0 which conflicts with 0.14.1 in stable/xena u-c.
 > [2]https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215
 > I've proposed the change to pin tempest[3] in stable/xena u-c so that people can install tempestwith stable/xena u-c.
 > [3] https://review.opendev.org/c/openstack/requirements/+/878228
 > I understand the reason tempest was removed from u-c was that we should use the latest tempestto test recent stable releases.I agree we can keep tempest excluded for stable/yoga and onwardsbecause tempest is installable with their u-c, but stable/xena u-c is no longer compatible with master.Adding pin to xena u-c does not mainly affect the policy to test stable branches with latest tempestbecause for that we anyway need to use more recent u-c.
 > I'm still trying to find out the workaround within heat but IMO adding tempest pin to stable/xena u-cis harmless but beneficial in case anyone is trying to use tempest with stable/xena u-c.
 > 
 > Thank you,
 > Takashi Kajinami
 > 


From satish.txt at gmail.com  Fri Mar 31 03:05:48 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Thu, 30 Mar 2023 23:05:48 -0400
Subject: [kolla] horizon image build failed
Message-ID: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>

Folks,

All other images build successfully but when i am trying to build horizon
which failed with following error:

$ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed
horizon


INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
INFO:kolla.common.utils.horizon:  Downloading
XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
INFO:kolla.common.utils.horizon:     ??????????????????????????????????????
140.0/140.0 kB 14.4 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
INFO:kolla.common.utils.horizon:  Downloading
XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
INFO:kolla.common.utils.horizon:     ??????????????????????????????????????
357.4/357.4 kB 13.5 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:Requirement already satisfied:
XStatic-Font-Awesome>=4.7.0.0 in
/var/lib/kolla/venv/lib/python3.10/site-packages (from
vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
INFO:kolla.common.utils.horizon:  Downloading
XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
INFO:kolla.common.utils.horizon:
??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:Requirement already satisfied:
XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages
(from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
INFO:kolla.common.utils.horizon:  Downloading
XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
INFO:kolla.common.utils.horizon:     ??????????????????????????????????????
167.9/167.9 kB 12.4 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
INFO:kolla.common.utils.horizon:  Downloading
XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
INFO:kolla.common.utils.horizon:
???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0
INFO:kolla.common.utils.horizon:  Downloading
XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
INFO:kolla.common.utils.horizon:
???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished
with status 'error'
INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
INFO:kolla.common.utils.horizon:
INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run
successfully.
INFO:kolla.common.utils.horizon:  ? exit code: 1
INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
INFO:kolla.common.utils.horizon:        File "<string>", line 2, in <module>
INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>",
line 34, in <module>
INFO:kolla.common.utils.horizon:        File
"/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py",
line 2, in <module>
INFO:kolla.common.utils.horizon:          from xstatic.pkg import
moment_timezone as xs
INFO:kolla.common.utils.horizon:      ImportError: cannot import name
'moment_timezone' from 'xstatic.pkg' (unknown location)
INFO:kolla.common.utils.horizon:      [end of output]
INFO:kolla.common.utils.horizon:
INFO:kolla.common.utils.horizon:  note: This error originates from a
subprocess, and is likely not a problem with pip.
INFO:kolla.common.utils.horizon:
INFO:kolla.common.utils.horizon:error: metadata-generation-failed
INFO:kolla.common.utils.horizon:? Encountered error while generating
package metadata.
INFO:kolla.common.utils.horizon:??> See above for output.
INFO:kolla.common.utils.horizon:note: This is an issue with the package
mentioned above, not pip.
INFO:kolla.common.utils.horizon:hint: See above for details.
INFO:kolla.common.utils.horizon:
INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529
ERROR:kolla.common.utils.horizon:Error'd with the following message
ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s
horizon-source/* horizon     && sed -i /^horizon=/d
/requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib
python3 -m pip --no-cache-dir install --upgrade -c
/requirements/upper-constraints.txt /horizon     && mkdir -p
/etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/*
/etc/openstack-dashboard/     && cp
/horizon/openstack_dashboard/local/local_settings.py.example
/etc/openstack-dashboard/local_settings     && cp /horizon/manage.py
/var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then
       SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir
install --upgrade -c /requirements/upper-constraints.txt /plugins/*;
 fi     && for locale in
 /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do
 (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)
       done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a
non-zero code: 1
INFO:kolla.common.utils:=========================
INFO:kolla.common.utils:Successfully built images
INFO:kolla.common.utils:=========================
INFO:kolla.common.utils:base
INFO:kolla.common.utils:openstack-base
INFO:kolla.common.utils:===========================
INFO:kolla.common.utils:Images that failed to build
INFO:kolla.common.utils:===========================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230330/26f67f59/attachment-0001.htm>

From tkajinam at redhat.com  Fri Mar 31 03:27:17 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Fri, 31 Mar 2023 12:27:17 +0900
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com>
Message-ID: <CAL_crJTTVNy9c3xtpk1W-5D5p2+1R97B3VS0sS_caSMt7XW2Zw@mail.gmail.com>

On Fri, Mar 31, 2023 at 11:26?AM Ghanshyam Mann <gmann at ghanshyammann.com>
wrote:

>  ---- On Wed, 29 Mar 2023 19:20:43 -0700  Takashi Kajinami  wrote ---
>  > Hello,
>  >
>  > I have had some local discussions with gmann, but I'd really like to
> move this discussion forwardto fix the broken stable/xena gate in heat so I
> will start this thread, hoping the thread can providemore context behind my
> proposal.
>  > Historically stable branches of heat have been frequently affected by
> any change in requirementsof tempest. This is mainly because in our CI we
> install our own in-tree integration tests[1] intotempest venv where tempest
> and heat-tempest-plugin are installed. Because in-tree integration testsare
> tied to that specific stable branch, this has been often causing conflicts
> in requirements(master constraint vs stable/X constraint).
>
> Let me explain the issue here. It is not because of using tempest from
> upper constraints or branchless things,
> it is because we are not setting the tempest venv constraints correctly
> for the target tempest run in the grenade job.
>
> We fixed the tempest venv constraints setting for the tempest test run on
> the base branch[1] but forgot to do the same
> for the target branch test run. As we do not have any grenade job except
> heat which is running tempest on the target branch
> in the grenade job, we could not face this issue and heat testing unhide
> it. I then reproduce it on a normal grenade job by
> running the tempest on target and the same issue[2][3].
>
> The issue is when base and target branches have different Tempest and
> constraints to use (for example, stable/wallaby uses old tempest
> and stable/wallaby constraints, but stable/xena use tempest master and
> master constraints); in such cases, we need to set proper constraints
> defined in devstack and then run tempest. It will happen in the grenade
> job running on the immediately supported branch of the latest EM.
>
This is the core problem in heat, which is conflicting what has been done
in heat testing.
During tests after upgrade we run not only tempest + heat-tempest-plugin
tests but also in-tree heat integration tests
which test more actual resources. However in-tree integration tests are
dependent on a specific stable/requirement.
So when we run tests in stable/xena then we need stable/xena constraints
installed in venv, which means we need to
install tempest which is compatible with stable/xena uc, rather than master
tempest. For this sake, I'm asking for
adding requirements so that we can install tempest with stable/xena u-c.
(Currently we do not explicitly install tempest
but it is installed as a dependency of heat-tempest-plugin. I tried to set
an explicit tempest version but it has been failing
for some reason.)

https://review.opendev.org/c/openstack/heat/+/878610

If running tempest tests after upgrade is not commonly done then we
probably can replace tempest by more simple
ones as is done for the core services such as keystone, or at least we can
get rid of integration tests. Though we still
likely face an issue with our normal integration tests which run the same
set of tests.


> I have pushed the grenade fix[4] and testing it by applying the same in
> heat[5]. If it work then I will push heat change
> form master itself and backported till stable/xena, so we fix it for all
> future EM/stable branches.
>
> [1] https://review.opendev.org/q/topic:bug%252F2003993
> [2] https://review.opendev.org/c/openstack/grenade/+/878247/1
> [3]
> https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184
> [4] https://review.opendev.org/c/openstack/grenade/+/879113
> [5] https://review.opendev.org/c/openstack/heat/+/872055
>
>
> -gmann
>
>  >
>  > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests
>  > In the past we changed our test installation[2] to use stable
> constraint to avoid this conflicts,but this approach does no longer work
> since stable/xena because
>  > 1. stable/xena u-c no longer includes tempest
>  > 2. latest tempest CAN'T be installed with stable/xena u-c because
> current tempest requires    fasteners>=0.16.0 which conflicts with 0.14.1
> in stable/xena u-c.
>  > [2]
> https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215
>  > I've proposed the change to pin tempest[3] in stable/xena u-c so that
> people can install tempestwith stable/xena u-c.
>  > [3] https://review.opendev.org/c/openstack/requirements/+/878228
>  > I understand the reason tempest was removed from u-c was that we should
> use the latest tempestto test recent stable releases.I agree we can keep
> tempest excluded for stable/yoga and onwardsbecause tempest is installable
> with their u-c, but stable/xena u-c is no longer compatible with
> master.Adding pin to xena u-c does not mainly affect the policy to test
> stable branches with latest tempestbecause for that we anyway need to use
> more recent u-c.
>  > I'm still trying to find out the workaround within heat but IMO adding
> tempest pin to stable/xena u-cis harmless but beneficial in case anyone is
> trying to use tempest with stable/xena u-c.
>  >
>  > Thank you,
>  > Takashi Kajinami
>  >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/8d6270f3/attachment.htm>

From tkajinam at redhat.com  Fri Mar 31 03:38:10 2023
From: tkajinam at redhat.com (Takashi Kajinami)
Date: Fri, 31 Mar 2023 12:38:10 +0900
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <CAL_crJTTVNy9c3xtpk1W-5D5p2+1R97B3VS0sS_caSMt7XW2Zw@mail.gmail.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <187357cd13c.116421986184361.6976437712376488900@ghanshyammann.com>
 <CAL_crJTTVNy9c3xtpk1W-5D5p2+1R97B3VS0sS_caSMt7XW2Zw@mail.gmail.com>
Message-ID: <CAL_crJScRe9R50f2fcbFYemWMPxW-b64Kg+17J3Yj7ai+_twxQ@mail.gmail.com>

On Fri, Mar 31, 2023 at 12:27?PM Takashi Kajinami <tkajinam at redhat.com>
wrote:

>
>
> On Fri, Mar 31, 2023 at 11:26?AM Ghanshyam Mann <gmann at ghanshyammann.com>
> wrote:
>
>>  ---- On Wed, 29 Mar 2023 19:20:43 -0700  Takashi Kajinami  wrote ---
>>  > Hello,
>>  >
>>  > I have had some local discussions with gmann, but I'd really like to
>> move this discussion forwardto fix the broken stable/xena gate in heat so I
>> will start this thread, hoping the thread can providemore context behind my
>> proposal.
>>  > Historically stable branches of heat have been frequently affected by
>> any change in requirementsof tempest. This is mainly because in our CI we
>> install our own in-tree integration tests[1] intotempest venv where tempest
>> and heat-tempest-plugin are installed. Because in-tree integration testsare
>> tied to that specific stable branch, this has been often causing conflicts
>> in requirements(master constraint vs stable/X constraint).
>>
>> Let me explain the issue here. It is not because of using tempest from
>> upper constraints or branchless things,
>> it is because we are not setting the tempest venv constraints correctly
>> for the target tempest run in the grenade job.
>>
>> We fixed the tempest venv constraints setting for the tempest test run on
>> the base branch[1] but forgot to do the same
>> for the target branch test run. As we do not have any grenade job except
>> heat which is running tempest on the target branch
>> in the grenade job, we could not face this issue and heat testing unhide
>> it. I then reproduce it on a normal grenade job by
>> running the tempest on target and the same issue[2][3].
>>
>> The issue is when base and target branches have different Tempest and
>> constraints to use (for example, stable/wallaby uses old tempest
>> and stable/wallaby constraints, but stable/xena use tempest master and
>> master constraints); in such cases, we need to set proper constraints
>> defined in devstack and then run tempest. It will happen in the grenade
>> job running on the immediately supported branch of the latest EM.
>>
> This is the core problem in heat, which is conflicting what has been done
> in heat testing.
> During tests after upgrade we run not only tempest + heat-tempest-plugin
> tests but also in-tree heat integration tests
> which test more actual resources. However in-tree integration tests are
> dependent on a specific stable/requirement.
> So when we run tests in stable/xena then we need stable/xena constraints
> installed in venv, which means we need to
> install tempest which is compatible with stable/xena uc, rather than
> master tempest. For this sake, I'm asking for
> adding requirements so that we can install tempest with stable/xena u-c.
> (Currently we do not explicitly install tempest
> but it is installed as a dependency of heat-tempest-plugin. I tried to set
> an explicit tempest version but it has been failing
> for some reason.)
>
> https://review.opendev.org/c/openstack/heat/+/878610
>
> If running tempest tests after upgrade is not commonly done then we
> probably can replace tempest by more simple
> ones as is done for the core services such as keystone, or at least we can
> get rid of integration tests. Though we still
> likely face an issue with our normal integration tests which run the same
> set of tests.
>


Hmm. Looking at the result of integration tests in
https://review.opendev.org/c/openstack/heat/+/872055 , it seems
heat integration tests in stable/xena works with master constraints. So we
probably can use master constraints for now
but in the future when any backport incompatibility affects integration
tests then we have to find out the way to switch
to stable constraints at that time.


>
>
>> I have pushed the grenade fix[4] and testing it by applying the same in
>> heat[5]. If it work then I will push heat change
>> form master itself and backported till stable/xena, so we fix it for all
>> future EM/stable branches.
>>
>> [1] https://review.opendev.org/q/topic:bug%252F2003993
>> [2] https://review.opendev.org/c/openstack/grenade/+/878247/1
>> [3]
>> https://zuul.opendev.org/t/openstack/build/1b503d359717459c9c77010608068e27/log/controller/logs/grenade.sh_log.txt#17184
>> [4] https://review.opendev.org/c/openstack/grenade/+/879113
>> [5] https://review.opendev.org/c/openstack/heat/+/872055
>>
>>
>> -gmann
>>
>>  >
>>  > [1]https://github.com/openstack/heat/tree/master/heat_integrationtests
>>  > In the past we changed our test installation[2] to use stable
>> constraint to avoid this conflicts,but this approach does no longer work
>> since stable/xena because
>>  > 1. stable/xena u-c no longer includes tempest
>>  > 2. latest tempest CAN'T be installed with stable/xena u-c because
>> current tempest requires    fasteners>=0.16.0 which conflicts with 0.14.1
>> in stable/xena u-c.
>>  > [2]
>> https://review.opendev.org/c/openstack/heat/+/803890https://review.opendev.org/c/openstack/heat/+/848215
>>  > I've proposed the change to pin tempest[3] in stable/xena u-c so that
>> people can install tempestwith stable/xena u-c.
>>  > [3] https://review.opendev.org/c/openstack/requirements/+/878228
>>  > I understand the reason tempest was removed from u-c was that we
>> should use the latest tempestto test recent stable releases.I agree we can
>> keep tempest excluded for stable/yoga and onwardsbecause tempest is
>> installable with their u-c, but stable/xena u-c is no longer compatible
>> with master.Adding pin to xena u-c does not mainly affect the policy to
>> test stable branches with latest tempestbecause for that we anyway need to
>> use more recent u-c.
>>  > I'm still trying to find out the workaround within heat but IMO adding
>> tempest pin to stable/xena u-cis harmless but beneficial in case anyone is
>> trying to use tempest with stable/xena u-c.
>>  >
>>  > Thank you,
>>  > Takashi Kajinami
>>  >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/3601bc13/attachment-0001.htm>

From mnasiadka at gmail.com  Fri Mar 31 05:51:42 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 31 Mar 2023 07:51:42 +0200
Subject: [kolla] horizon image build failed
In-Reply-To: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
Message-ID: <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>

Hi Satish,

Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this?

You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report.
Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913

Michal

> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com> wrote:
> 
> Folks,
> 
> All other images build successfully but when i am trying to build horizon which failed with following error:
> 
> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon
> 
> 
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
> INFO:kolla.common.utils.horizon:  Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
> INFO:kolla.common.utils.horizon:     ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
> INFO:kolla.common.utils.horizon:  Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
> INFO:kolla.common.utils.horizon:  Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0
> INFO:kolla.common.utils.horizon:  Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished with status 'error'
> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run successfully.
> INFO:kolla.common.utils.horizon:  ? exit code: 1
> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in <module>
> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>", line 34, in <module>
> INFO:kolla.common.utils.horizon:        File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in <module>
> INFO:kolla.common.utils.horizon:          from xstatic.pkg import moment_timezone as xs
> INFO:kolla.common.utils.horizon:      ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location)
> INFO:kolla.common.utils.horizon:      [end of output]
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  note: This error originates from a subprocess, and is likely not a problem with pip.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
> INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata.
> INFO:kolla.common.utils.horizon:??> See above for output.
> INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip.
> INFO:kolla.common.utils.horizon:hint: See above for details.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529
> ERROR:kolla.common.utils.horizon:Error'd with the following message
> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon     && sed -i /^horizon=/d /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon     && mkdir -p /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/     && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then            SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*;        fi     && for locale in  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do            (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:Successfully built images
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:base
> INFO:kolla.common.utils:openstack-base
> INFO:kolla.common.utils:===========================
> INFO:kolla.common.utils:Images that failed to build
> INFO:kolla.common.utils:===========================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/c21d296a/attachment.htm>

From manchandavishal143 at gmail.com  Fri Mar 31 05:58:45 2023
From: manchandavishal143 at gmail.com (vishal manchanda)
Date: Fri, 31 Mar 2023 11:28:45 +0530
Subject: [kolla] horizon image build failed
In-Reply-To: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
Message-ID: <CADrq38s+s_p+0ZVe7=9_r1x7i7v_7uE3=VJfpv_+vNVedTm+HQ@mail.gmail.com>

JFYI, there is also a bug opened for this issue in the horizon [1].
But no progress as of today.

Thanks & regards,
Vishal Manchanda

[1] https://bugs.launchpad.net/horizon/+bug/2007574


On Fri, Mar 31, 2023 at 8:37?AM Satish Patel <satish.txt at gmail.com> wrote:

> Folks,
>
> All other images build successfully but when i am trying to build horizon
> which failed with following error:
>
> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed
> horizon
>
>
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied:
> XStatic-Font-Awesome>=4.7.0.0 in
> /var/lib/kolla/venv/lib/python3.10/site-packages (from
> vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
> INFO:kolla.common.utils.horizon:
> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied:
> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages
> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
> INFO:kolla.common.utils.horizon:
> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting
> XStatic-Moment-Timezone>=0.5.22.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
> INFO:kolla.common.utils.horizon:
> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished
> with status 'error'
> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run
> successfully.
> INFO:kolla.common.utils.horizon:  ? exit code: 1
> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in
> <module>
> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>",
> line 34, in <module>
> INFO:kolla.common.utils.horizon:        File
> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py",
> line 2, in <module>
> INFO:kolla.common.utils.horizon:          from xstatic.pkg import
> moment_timezone as xs
> INFO:kolla.common.utils.horizon:      ImportError: cannot import name
> 'moment_timezone' from 'xstatic.pkg' (unknown location)
> INFO:kolla.common.utils.horizon:      [end of output]
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  note: This error originates from a
> subprocess, and is likely not a problem with pip.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
> INFO:kolla.common.utils.horizon:? Encountered error while generating
> package metadata.
> INFO:kolla.common.utils.horizon:??> See above for output.
> INFO:kolla.common.utils.horizon:note: This is an issue with the package
> mentioned above, not pip.
> INFO:kolla.common.utils.horizon:hint: See above for details.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:Removing intermediate container
> e6cd437ba529
> ERROR:kolla.common.utils.horizon:Error'd with the following message
> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s
> horizon-source/* horizon     && sed -i /^horizon=/d
> /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib
> python3 -m pip --no-cache-dir install --upgrade -c
> /requirements/upper-constraints.txt /horizon     && mkdir -p
> /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/*
> /etc/openstack-dashboard/     && cp
> /horizon/openstack_dashboard/local/local_settings.py.example
> /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py
> /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then
>        SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir
> install --upgrade -c /requirements/upper-constraints.txt /plugins/*;
>  fi     && for locale in
>  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do
>  (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)
>        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a
> non-zero code: 1
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:Successfully built images
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:base
> INFO:kolla.common.utils:openstack-base
> INFO:kolla.common.utils:===========================
> INFO:kolla.common.utils:Images that failed to build
> INFO:kolla.common.utils:===========================
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/67bb416a/attachment-0001.htm>

From mnasiadka at gmail.com  Fri Mar 31 06:47:42 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 31 Mar 2023 08:47:42 +0200
Subject: [kolla] Image building question
In-Reply-To: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
References: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
Message-ID: <8A0037B4-3C63-4EA5-ADC5-282B9246E578@gmail.com>

Hello Satish,

Only OpenStack is installed from source, all the dependencies (e.g. MariaDB, Apache, etc) are installed from distribution repositories.
The set of images you need depends on what you enable in Kolla-Ansible (unless you use a different mechanism for deploying those images).

Michal

> On 30 Mar 2023, at 22:49, Satish Patel <satish.txt at gmail.com> wrote:
> 
> Folks,
> 
> I am playing with kolla image building to understand how it works. I am using the following command to build images and wanted to check with you folks if that is the correct way to do it. 
> 
> $ kolla-build -b ubuntu -t source keystone nova neutron glance
> 
> Does the above command compile code from source or just download images from remote repositories and re-compile them?  because in command output I've not noticed anything related to the compiling process going on. 
> 
> Here is the output of all images produced by kolla-build command. Do I need anything else or is this enough to deploy kolla?
> 
> root at docker-reg:~# docker images
> REPOSITORY                            TAG       IMAGE ID       CREATED             SIZE
> kolla/mariadb-server                  15.1.0    2a497eee8269   26 minutes ago      595MB
> kolla/cron                            15.1.0    342877f26a8a   30 minutes ago      250MB
> kolla/memcached                       15.1.0    0d19a4902644   31 minutes ago      250MB
> kolla/mariadb-clustercheck            15.1.0    d84427d3c639   31 minutes ago      314MB
> kolla/mariadb-base                    15.1.0    34447e3e59b6   31 minutes ago      314MB
> kolla/keepalived                      15.1.0    82133b09fbf0   31 minutes ago      260MB
> kolla/prometheus-memcached-exporter   15.1.0    6c2d605f70ee   31 minutes ago      262MB
> <none>                                <none>    e66b228c2a07   31 minutes ago      248MB
> kolla/rabbitmq                        15.1.0    8de5c39379d3   32 minutes ago      309MB
> kolla/fluentd                         15.1.0    adfd19027862   33 minutes ago      519MB
> kolla/haproxy-ssh                     15.1.0    514357ac4d36   36 minutes ago      255MB
> kolla/haproxy                         15.1.0    e5b9cfdf6dfc   37 minutes ago      257MB
> kolla/prometheus-haproxy-exporter     15.1.0    a679f65fd735   37 minutes ago      263MB
> kolla/prometheus-base                 15.1.0    afeff3ed5dce   37 minutes ago      248MB
> kolla/glance-api                      15.1.0    a2241f68f23a   38 minutes ago      1.04GB
> kolla/glance-base                     15.1.0    7286772a03a4   About an hour ago   1.03GB
> kolla/neutron-infoblox-ipam-agent     15.1.0    f90ffc1a3326   About an hour ago   1.05GB
> kolla/neutron-server                  15.1.0    69c844a2e3a9   About an hour ago   1.05GB
> kolla/neutron-l3-agent                15.1.0    4d87e6963c96   About an hour ago   1.05GB
> <none>                                <none>    486da9a6562e   About an hour ago   1.05GB
> kolla/neutron-linuxbridge-agent       15.1.0    e5b3ca7e099c   About an hour ago   1.04GB
> kolla/neutron-bgp-dragent             15.1.0    ac37377820c6   About an hour ago   1.04GB
> kolla/ironic-neutron-agent            15.1.0    90993adcd74b   About an hour ago   1.04GB
> kolla/neutron-metadata-agent          15.1.0    8522f147f88d   About an hour ago   1.04GB
> kolla/neutron-sriov-agent             15.1.0    8a92ce7d13c0   About an hour ago   1.04GB
> kolla/neutron-dhcp-agent              15.1.0    5c214b0171f5   About an hour ago   1.04GB
> kolla/neutron-metering-agent          15.1.0    7b3b91ecd77b   About an hour ago   1.04GB
> kolla/neutron-openvswitch-agent       15.1.0    1f8807308814   About an hour ago   1.04GB
> kolla/neutron-base                    15.1.0    f85b6a2e2725   About an hour ago   1.04GB
> kolla/nova-libvirt                    15.1.0    0f3ecefe4752   About an hour ago   987MB
> kolla/nova-compute                    15.1.0    241b7e7fafbe   About an hour ago   1.47GB
> kolla/nova-spicehtml5proxy            15.1.0    b740820a7ad1   About an hour ago   1.15GB
> kolla/nova-novncproxy                 15.1.0    1ba2f443d5c3   About an hour ago   1.22GB
> kolla/nova-compute-ironic             15.1.0    716612107532   About an hour ago   1.12GB
> kolla/nova-ssh                        15.1.0    ae2397f4e1c1   About an hour ago   1.11GB
> kolla/nova-api                        15.1.0    2aef02667ff8   About an hour ago   1.11GB
> kolla/nova-conductor                  15.1.0    6f1da3400901   About an hour ago   1.11GB
> kolla/nova-scheduler                  15.1.0    628326776b1d   About an hour ago   1.11GB
> kolla/nova-serialproxy                15.1.0    28eb7a4a13f8   About an hour ago   1.11GB
> kolla/nova-base                       15.1.0    e47420013283   About an hour ago   1.11GB
> kolla/keystone                        15.1.0    e5530d829d5f   2 hours ago         947MB
> kolla/keystone-ssh                    15.1.0    eaa7e3f3985a   2 hours ago         953MB
> kolla/keystone-fernet                 15.1.0    8a4fa24853a8   2 hours ago         951MB
> kolla/keystone-base                   15.1.0    b6f9562364a9   2 hours ago         945MB
> kolla/barbican-base                   15.1.0    b2fdef1afb44   2 hours ago         915MB
> kolla/barbican-keystone-listener      15.1.0    58bd59de2c63   2 hours ago         915MB
> kolla/openstack-base                  15.1.0    c805b4b3b1c1   2 hours ago         893MB
> kolla/base                            15.1.0    f68e9ef3dd30   2 hours ago         248MB
> registry                              2         8db46f9d7550   19 hours ago        24.2MB
> ubuntu                                22.04     08d22c0ceb15   3 weeks ago         77.8MB
> 
> 


From gthiemonge at redhat.com  Fri Mar 31 08:19:49 2023
From: gthiemonge at redhat.com (Gregory Thiemonge)
Date: Fri, 31 Mar 2023 10:19:49 +0200
Subject: [Octavia] moving back to Launchpad
Message-ID: <CAKsmYT16v+=_P66z4DMVe42-+HG9v0coVwmQnXKA=eZdP+ZvSg@mail.gmail.com>

Hi Folks,

During the Antelope PTG, we discussed the move from Storyboard to
Launchpad, and in the Bobcat PTG session, we decided to switch at the
beginning of the B cycle (now).

The Octavia Launchpad [0] is now re-enabled, we are marking all the old
entries as Invalid.
We don't plan to have an automated migration script, we will duplicate
manually the most recent bugs reported in Storyboard (not started bugs).

I'm also proposing patches to update the links to the bug tracker in the
Octavia projects.

Greg

[0] https://launchpad.net/octavia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/3b513c47/attachment.htm>

From tweining at redhat.com  Fri Mar 31 09:50:11 2023
From: tweining at redhat.com (Tom Weininger)
Date: Fri, 31 Mar 2023 11:50:11 +0200
Subject: [Octavia] moving back to Launchpad
In-Reply-To: <CAKsmYT16v+=_P66z4DMVe42-+HG9v0coVwmQnXKA=eZdP+ZvSg@mail.gmail.com>
References: <CAKsmYT16v+=_P66z4DMVe42-+HG9v0coVwmQnXKA=eZdP+ZvSg@mail.gmail.com>
Message-ID: <7d01bf75-13d0-0008-8535-904378768d46@redhat.com>

Thank you Greg for working on this and for coordinating the migration. I'm happy that 
Octavia does this migration now.

Best regards,
Tom

On 31.03.23 10:19, Gregory Thiemonge wrote:
> Hi Folks,
> 
> During the Antelope PTG, we discussed the move from Storyboard to Launchpad, and in the 
> Bobcat PTG session, we decided to switch at the beginning?of the B cycle (now).
> 
> The Octavia Launchpad [0] is now re-enabled, we are marking all the old entries as Invalid.
> We don't plan to have an automated migration script, we will duplicate manually the most 
> recent bugs reported in Storyboard (not started bugs).
> 
> I'm also proposing patches to update the links to the bug tracker in the Octavia projects.
> 
> Greg
> 
> [0] https://launchpad.net/octavia <https://launchpad.net/octavia>


From smooney at redhat.com  Fri Mar 31 11:01:21 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 31 Mar 2023 12:01:21 +0100
Subject: [kolla] Image building question
In-Reply-To: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
References: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
Message-ID: <d419fa044dab2f07ff408fc10b7e708185310a48.camel@redhat.com>

On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote:
> Folks,
> 
> I am playing with kolla image building to understand how it works. I am
> using the following command to build images and wanted to check with you
> folks if that is the correct way to do it.
> 
> $ kolla-build -b ubuntu -t source keystone nova neutron glance
> 
> Does the above command compile code from source or just download images
> from remote repositories and re-compile them?
> 
openstack is mainly python so in general ther is no complie step.
but to answer your question that builds the image using the source tarballs
or the openstakc packages.

the defaults soruce locations are rendered into a file which you can override 
from the data stored in https://github.com/openstack/kolla/blob/master/kolla/common/sources.py
the other build config defaults are generated form this code
https://github.com/openstack/kolla/blob/master/kolla/common/config.py

when you invoke kolla-build its executing https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py
but the main build workflow is here https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95

the tl;dr is the build worklow starts by creating  build director and locating the docker file templats.
in otherwords the content of the https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker directory

each project has a direcoty in the docker directory and then each contaienr that project has has a directory in the project directory 

so the aodh project has a aodh folder https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh
the convention is to have a <project>-base contaienr which handels the depency installation and then one addtional contaienr for each binary deamon
the project has i.e. aodh-api

the name of the folder in teh project dir is used as the name of the contaienr 

if we look at the content of the docker files we will see that they are not actuly dockerfiles
https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2

they are jinja2 templates that produce docker files

kolla as far as i am aware has drop support for binary images and alternitiv distos

but looking at an older release we can se ehow this worked
https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52

each docker file template would use the jinja2 to generate a set of concreate docker files form the template
and make dession based on the parmater passed in.

so when you are invokeing 
kolla-build -b ubuntu -t source keystone nova neutron glance

what actully happening is that the -t flag is being set as teh install_type parmater in the the jinja2 environemtn when
the docker file is rendered.

after all the docer files are rendered into normal docker files kolla just invokes the build.

in the case of a source build that inovles pre fetching the source tar from https://tarballs.opendev.org
and puting it in the build directory so that it can be included into the contianer.

kolla also used to supprot git repo as a alternitve  source fromat

i have glossed over a lot of the details of how this actully work but that is the essence of what that command is doing
creating a build dir, downloading the source, rendering the dockerfile templates to docker files, invokeing docker build on those
and then taging them with the contaienr nameand build tag


https://docs.openstack.org/kolla/latest/admin/image-building.html
covers this form a high level

>   because in command output
> I've not noticed anything related to the compiling process going on.
> 
> Here is the output of all images produced by kolla-build command. Do I need
> anything else or is this enough to deploy kolla?
you can deploy coll with what you have yes although since the kolla files are automaticaly
built by ci kolla-ansible can just use the ones form the docker hub or quay instead you do not need to build them yourself

if you do build them your self then there is basically one other stpe that you shoudl take if this si a multi node deployment
you should push the iamges to an interally host docker registry although based on the hostname in the prompt below
it looks like you ahve alredy done that.
> 
> root at docker-reg:~# docker images
> REPOSITORY                            TAG       IMAGE ID       CREATED
>         SIZE
> kolla/mariadb-server                  15.1.0    2a497eee8269   26 minutes
> ago      595MB
> kolla/cron                            15.1.0    342877f26a8a   30 minutes
> ago      250MB
> kolla/memcached                       15.1.0    0d19a4902644   31 minutes
> ago      250MB
> kolla/mariadb-clustercheck            15.1.0    d84427d3c639   31 minutes
> ago      314MB
> kolla/mariadb-base                    15.1.0    34447e3e59b6   31 minutes
> ago      314MB
> kolla/keepalived                      15.1.0    82133b09fbf0   31 minutes
> ago      260MB
> kolla/prometheus-memcached-exporter   15.1.0    6c2d605f70ee   31 minutes
> ago      262MB
> <none>                                <none>    e66b228c2a07   31 minutes
> ago      248MB
> kolla/rabbitmq                        15.1.0    8de5c39379d3   32 minutes
> ago      309MB
> kolla/fluentd                         15.1.0    adfd19027862   33 minutes
> ago      519MB
> kolla/haproxy-ssh                     15.1.0    514357ac4d36   36 minutes
> ago      255MB
> kolla/haproxy                         15.1.0    e5b9cfdf6dfc   37 minutes
> ago      257MB
> kolla/prometheus-haproxy-exporter     15.1.0    a679f65fd735   37 minutes
> ago      263MB
> kolla/prometheus-base                 15.1.0    afeff3ed5dce   37 minutes
> ago      248MB
> kolla/glance-api                      15.1.0    a2241f68f23a   38 minutes
> ago      1.04GB
> kolla/glance-base                     15.1.0    7286772a03a4   About an
> hour ago   1.03GB
> kolla/neutron-infoblox-ipam-agent     15.1.0    f90ffc1a3326   About an
> hour ago   1.05GB
> kolla/neutron-server                  15.1.0    69c844a2e3a9   About an
> hour ago   1.05GB
> kolla/neutron-l3-agent                15.1.0    4d87e6963c96   About an
> hour ago   1.05GB
> <none>                                <none>    486da9a6562e   About an
> hour ago   1.05GB
> kolla/neutron-linuxbridge-agent       15.1.0    e5b3ca7e099c   About an
> hour ago   1.04GB
> kolla/neutron-bgp-dragent             15.1.0    ac37377820c6   About an
> hour ago   1.04GB
> kolla/ironic-neutron-agent            15.1.0    90993adcd74b   About an
> hour ago   1.04GB
> kolla/neutron-metadata-agent          15.1.0    8522f147f88d   About an
> hour ago   1.04GB
> kolla/neutron-sriov-agent             15.1.0    8a92ce7d13c0   About an
> hour ago   1.04GB
> kolla/neutron-dhcp-agent              15.1.0    5c214b0171f5   About an
> hour ago   1.04GB
> kolla/neutron-metering-agent          15.1.0    7b3b91ecd77b   About an
> hour ago   1.04GB
> kolla/neutron-openvswitch-agent       15.1.0    1f8807308814   About an
> hour ago   1.04GB
> kolla/neutron-base                    15.1.0    f85b6a2e2725   About an
> hour ago   1.04GB
> kolla/nova-libvirt                    15.1.0    0f3ecefe4752   About an
> hour ago   987MB
> kolla/nova-compute                    15.1.0    241b7e7fafbe   About an
> hour ago   1.47GB
> kolla/nova-spicehtml5proxy            15.1.0    b740820a7ad1   About an
> hour ago   1.15GB
> kolla/nova-novncproxy                 15.1.0    1ba2f443d5c3   About an
> hour ago   1.22GB
> kolla/nova-compute-ironic             15.1.0    716612107532   About an
> hour ago   1.12GB
> kolla/nova-ssh                        15.1.0    ae2397f4e1c1   About an
> hour ago   1.11GB
> kolla/nova-api                        15.1.0    2aef02667ff8   About an
> hour ago   1.11GB
> kolla/nova-conductor                  15.1.0    6f1da3400901   About an
> hour ago   1.11GB
> kolla/nova-scheduler                  15.1.0    628326776b1d   About an
> hour ago   1.11GB
> kolla/nova-serialproxy                15.1.0    28eb7a4a13f8   About an
> hour ago   1.11GB
> kolla/nova-base                       15.1.0    e47420013283   About an
> hour ago   1.11GB
> kolla/keystone                        15.1.0    e5530d829d5f   2 hours ago
>         947MB
> kolla/keystone-ssh                    15.1.0    eaa7e3f3985a   2 hours ago
>         953MB
> kolla/keystone-fernet                 15.1.0    8a4fa24853a8   2 hours ago
>         951MB
> kolla/keystone-base                   15.1.0    b6f9562364a9   2 hours ago
>         945MB
> kolla/barbican-base                   15.1.0    b2fdef1afb44   2 hours ago
>         915MB
> kolla/barbican-keystone-listener      15.1.0    58bd59de2c63   2 hours ago
>         915MB
> kolla/openstack-base                  15.1.0    c805b4b3b1c1   2 hours ago
>         893MB
> kolla/base                            15.1.0    f68e9ef3dd30   2 hours ago
>         248MB
> registry                              2         8db46f9d7550   19 hours ago
>        24.2MB
> ubuntu                                22.04     08d22c0ceb15   3 weeks ago
>         77.8MB


From skidoo at tlen.pl  Fri Mar 31 11:11:55 2023
From: skidoo at tlen.pl (Luk)
Date: Fri, 31 Mar 2023 13:11:55 +0200
Subject: Migration from linuxbridge to ovs
In-Reply-To: <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de>
References: <1253710667.20230330121023@tlen.pl>
 <4aacfba4-0e04-9197-70b8-178005ea6e96@inovex.de>
Message-ID: <781242298.20230331131155@tlen.pl>

Cze??,

> On 30/03/2023 12:10, Luk wrote:
>> Can You share some thoughts/ideas or some clues regarding migration from linux bridge to ovs ? Does this migration is posible without interrupting traffic from VMs ?

> I asked a similar questions back in August - https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030070.html, maybe there are some insights there.

Thank You, this thread is quite good in this case :)

> We did not replace the SDN in place, but as actively looking into setting up a new cloud. Not that we do not believe in the idea of being able to replace the SDN,
> but we intend to change much much more and migrating through many big changes is too inefficient compared to replacing the cloud with a new one.

It looks the best way...

Anyway - there is chance to make live migration between lb and openvswitch, but need to add flows by hand and add proper tag into br-int - and this 'solution' works only for external/provider network. As S?awek pointed out - in case of vxlan connection there is no opportunity to connect neturon ovs controller with linuxbridge compute nodes.


-- 
Pozdrowienia,
 Lukasz


From ralonsoh at redhat.com  Fri Mar 31 11:12:37 2023
From: ralonsoh at redhat.com (Rodolfo Alonso Hernandez)
Date: Fri, 31 Mar 2023 13:12:37 +0200
Subject: [neutron][ptg] Today's agenda
Message-ID: <CAECr9X4UbvTViLB08ZuDT=fnwToD+xuBrZK0=32S8F7bvZUMdw@mail.gmail.com>

Hello Neutrinos:

Today is the last day of the PTG. You can check the agenda in [1]. This is
a quick summary:
* Open hour for core reviewers: join us! Neutron needs you.
* FIPS jobs: status and distro support.

If you have a last minute topic, today is the best moment to show up and
present it (it doesn't matter if it is not on the agenda).

Regards.

[1]https://etherpad.opendev.org/p/neutron-bobcat-ptg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/5a041206/attachment.htm>

From pdeore at redhat.com  Fri Mar 31 11:25:16 2023
From: pdeore at redhat.com (Pranali Deore)
Date: Fri, 31 Mar 2023 16:55:16 +0530
Subject: [Glance] Bobcat PTG Updates
Message-ID: <CADkbuWi5J5xwyqoZYeuz=gFiM5HOtiRN+O2gKLbR5cRfF=Oq0w@mail.gmail.com>

Hello Everyone,

We have concluded Glance vPTG yesterday and I will share the summary early
next week.
You can find tentative milestone wise priorities for glance in PTG
etherpad[1].
Join us in the weekly meeting coming Thursday if you have any
doubts/suggestions.


Thanks & Regards,
Pranali

[1]: https://etherpad.opendev.org/p/glance-bobcat-ptg#L143
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/23510641/attachment.htm>

From fungi at yuggoth.org  Fri Mar 31 11:25:49 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Fri, 31 Mar 2023 11:25:49 +0000
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org>
 <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com>
Message-ID: <20230331112549.we4jjivoxgeew2qt@yuggoth.org>

On 2023-03-30 19:12:46 -0700 (-0700), Ghanshyam Mann wrote:
[...]
> This is not related to stable/xena or heat tests. Grenade job
> running on immediately supported branch from EM branch where the
> base is EM branch using old tempest and stable constraints and
> target use master tempest and constraints. When you run tempest on
> target, it causes an issue as constraints var are not set properly
> for the target.

We're still running grenade jobs that test upgrades from
stable/wallaby to stable/xena? I thought by policy we dropped those
when stable/wallaby entered extended maintenance, and that grenade
only intended to support upgrading from maintained stable branches.
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/00f73538/attachment.sig>

From sbauza at redhat.com  Fri Mar 31 12:00:10 2023
From: sbauza at redhat.com (Sylvain Bauza)
Date: Fri, 31 Mar 2023 14:00:10 +0200
Subject: [nova][ptg] Today's agenda (Friday)
Message-ID: <CALOCmukWS-bYMJyEoX4nKBCph9LFa0zHZgYY72-h5i0tTPwxZA@mail.gmail.com>

Hey folks,

Today is our last vPTG day. The agenda is so :

13:00 UTC - 14:45 UTC :
* Users reported exhaustion of primary keys ('id') in some large tables
like system_metadata. How could we achieve a data migration from sa.Integer
to sa.BigInteger ?
* Should cold-migrate have a specific new policy like
os_compute_api:os-migrate-server:migrate-specify-host to accept non-admins
to migrate if they don't provide a host value ?
* Discuss the next steps with compute hostname robustification
* Disable compute services after being discovered
* Limited lower constraint job in nova and placement

14:45 UTC - 15:00 UTC : break

15:00 UTC - 17:00 UTC :
* Openstack server show command output (cross-project discussion with
openstackSDK contributors)
* Should we support dynamically disabling post copy live migration based on
vm_state is paused ?
* Evacuation with multiple allocations
* Bobcat proposed planning (cont.)
* trim support for virtio-blk feature or bug?

* Summit/PTG : what could we be doing for the physical PTG ? (if we have
time)

As a reminder, you can look at the topics here :
https://etherpad.opendev.org/p/nova-bobcat-ptg#L426 and you can add your
IRC nick in the courtesy ping list if you want to be around.


Hope you had a good week and see you in an hour by now !

-Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/fedeaa66/attachment.htm>

From kozhukalov at gmail.com  Fri Mar 31 12:33:52 2023
From: kozhukalov at gmail.com (Vladimir Kozhukalov)
Date: Fri, 31 Mar 2023 15:33:52 +0300
Subject: [openstack-helm][ptg] agenda changes for (Fri) Mar/31/2023
Message-ID: <CANxTg759BAkPiQHtZm1qG-oJ47VdJdsvxEjwTADPycNEOdV=CQ@mail.gmail.com>

Dear helmers,

Since we don't have any new topics in the etherpad [1] to discuss apart
from those we discussed yesterday I think we can unbook our meeting room.

Thanks for attending yesterday.
I'll send the summary of our discussions later.

[1] https://etherpad.opendev.org/p/march2023-ptg-openstack-helm

-- 
Best regards,
Kozhukalov Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/844591c5/attachment.htm>

From kennelson11 at gmail.com  Fri Mar 31 12:41:33 2023
From: kennelson11 at gmail.com (Kendall Nelson)
Date: Fri, 31 Mar 2023 07:41:33 -0500
Subject: Reminder! In Person PTG 2023 Team Signup Deadline
Message-ID: <CAJ6yrQgivmEiSyFo9JZgup=tqyrq36w4ufJbvJQvcYZXS9Sc6g@mail.gmail.com>

Hello Everyone,

This is the last call to sign your team up for the *in-person* Project
Teams Gathering (PTG) happening in Vancouver at the OpenInfra Summit!

If you haven't already done so and your team is interested in
participating, please complete the survey[1] by April 2nd, 2023 at 7:00 UTC.

Registration for the PTG is included as a part of registration for the
OpenInfra Summit in Vancouver. Prices increase May 5th so register soon!

Thanks!
-Kendall (diablo_rojo)

[1] Team Survey:
https://openinfrafoundation.formstack.com/forms/june2023_ptg_survey
[2] Summit Registration: https://vancouver2023.openinfra.dev/
<https://vancouver2023.openinfra.dev/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/14d69f03/attachment.htm>

From gmann at ghanshyammann.com  Fri Mar 31 13:25:25 2023
From: gmann at ghanshyammann.com (Ghanshyam Mann)
Date: Fri, 31 Mar 2023 06:25:25 -0700
Subject: [heat][qa][requirements] Pinning tempest in stable/xena
 constraints
In-Reply-To: <20230331112549.we4jjivoxgeew2qt@yuggoth.org>
References: <CAL_crJRXM1wihWL+JqpDhbcW+mLHU7vfZdvy8YAj_YpMvWAcEw@mail.gmail.com>
 <20230330141738.hoyhlfjxdxdvuko4@yuggoth.org>
 <1873570517d.ffe11d53184242.8810335954828690882@ghanshyammann.com>
 <20230331112549.we4jjivoxgeew2qt@yuggoth.org>
Message-ID: <18737d827d2.112ab3364239474.9216356833253892791@ghanshyammann.com>

 ---- On Fri, 31 Mar 2023 04:25:49 -0700  Jeremy Stanley  wrote --- 
 > On 2023-03-30 19:12:46 -0700 (-0700), Ghanshyam Mann wrote:
 > [...]
 > > This is not related to stable/xena or heat tests. Grenade job
 > > running on immediately supported branch from EM branch where the
 > > base is EM branch using old tempest and stable constraints and
 > > target use master tempest and constraints. When you run tempest on
 > > target, it causes an issue as constraints var are not set properly
 > > for the target.
 > 
 > We're still running grenade jobs that test upgrades from
 > stable/wallaby to stable/xena? I thought by policy we dropped those
 > when stable/wallaby entered extended maintenance, and that grenade
 > only intended to support upgrading from maintained stable branches.

We do not need to run as you mentioned, but we try to keep it running as
long as it can pass. This is my last attempt to fix on EM upgrade and in the
next failure, I need to stop fixing and better to stop the grenade there.

-gmann


 > -- 
 > Jeremy Stanley
 > 


From satish.txt at gmail.com  Fri Mar 31 13:53:31 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 31 Mar 2023 09:53:31 -0400
Subject: [kolla] horizon image build failed
In-Reply-To: <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
 <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
Message-ID: <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>

Thank Michal,

I have posted commands in my original post which have distribution Ubuntu
and release zed.  ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t
source --tag zed horizon )

I can definitely open a new bug but it looks like vishal already on it. Are
there any workarounds or interim solutions? I am new to the kolla-image
building process so I'm not sure where I should change the setup tool
version to move on.

Very curious how the CI-CD pipeline passed this bug?


On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka <mnasiadka at gmail.com> wrote:

> Hi Satish,
>
> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this?
>
> You have also not mentioned what distribution and Kolla release are you
> using, so please do that in the bug report.
> Looking at the output probably it?s stable/yoga and Debian - being fixed
> in https://review.opendev.org/c/openstack/kolla/+/873913
>
> Michal
>
> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com> wrote:
>
> Folks,
>
> All other images build successfully but when i am trying to build horizon
> which failed with following error:
>
> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed
> horizon
>
>
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied:
> XStatic-Font-Awesome>=4.7.0.0 in
> /var/lib/kolla/venv/lib/python3.10/site-packages (from
> vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
> INFO:kolla.common.utils.horizon:
> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Requirement already satisfied:
> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages
> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
> INFO:kolla.common.utils.horizon:
> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
> INFO:kolla.common.utils.horizon:
> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:Collecting
> XStatic-Moment-Timezone>=0.5.22.0
> INFO:kolla.common.utils.horizon:  Downloading
> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
> INFO:kolla.common.utils.horizon:
> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished
> with status 'error'
> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run
> successfully.
> INFO:kolla.common.utils.horizon:  ? exit code: 1
> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in
> <module>
> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>",
> line 34, in <module>
> INFO:kolla.common.utils.horizon:        File
> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py",
> line 2, in <module>
> INFO:kolla.common.utils.horizon:          from xstatic.pkg import
> moment_timezone as xs
> INFO:kolla.common.utils.horizon:      ImportError: cannot import name
> 'moment_timezone' from 'xstatic.pkg' (unknown location)
> INFO:kolla.common.utils.horizon:      [end of output]
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:  note: This error originates from a
> subprocess, and is likely not a problem with pip.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
> INFO:kolla.common.utils.horizon:? Encountered error while generating
> package metadata.
> INFO:kolla.common.utils.horizon:??> See above for output.
> INFO:kolla.common.utils.horizon:note: This is an issue with the package
> mentioned above, not pip.
> INFO:kolla.common.utils.horizon:hint: See above for details.
> INFO:kolla.common.utils.horizon:
> INFO:kolla.common.utils.horizon:Removing intermediate container
> e6cd437ba529
> ERROR:kolla.common.utils.horizon:Error'd with the following message
> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s
> horizon-source/* horizon     && sed -i /^horizon=/d
> /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib
> python3 -m pip --no-cache-dir install --upgrade -c
> /requirements/upper-constraints.txt /horizon     && mkdir -p
> /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/*
> /etc/openstack-dashboard/     && cp
> /horizon/openstack_dashboard/local/local_settings.py.example
> /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py
> /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then
>        SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir
> install --upgrade -c /requirements/upper-constraints.txt /plugins/*;
>  fi     && for locale in
>  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do
>  (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)
>        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a
> non-zero code: 1
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:Successfully built images
> INFO:kolla.common.utils:=========================
> INFO:kolla.common.utils:base
> INFO:kolla.common.utils:openstack-base
> INFO:kolla.common.utils:===========================
> INFO:kolla.common.utils:Images that failed to build
> INFO:kolla.common.utils:===========================
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/9e3c4c30/attachment.htm>

From mnasiadka at gmail.com  Fri Mar 31 13:59:04 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 31 Mar 2023 15:59:04 +0200
Subject: [kolla] horizon image build failed
In-Reply-To: <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
 <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
 <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>
Message-ID: <B222A2CC-7342-4EAA-8CA6-6DE61D3A82AB@gmail.com>

Hi Satish,

Vishal mentioned a bug that I raised in Horizon, but we have been pinning to earlier setuptools in Kolla builds just because of that (and that?s the workaround).
Are you using kolla from PyPI or the latest stable/zed checkout from Git? We recommend the latter.

Michal

> On 31 Mar 2023, at 15:53, Satish Patel <satish.txt at gmail.com> wrote:
> 
> Thank Michal,
> 
> I have posted commands in my original post which have distribution Ubuntu and release zed.  ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon )
> 
> I can definitely open a new bug but it looks like vishal already on it. Are there any workarounds or interim solutions? I am new to the kolla-image building process so I'm not sure where I should change the setup tool version to move on. 
> 
> Very curious how the CI-CD pipeline passed this bug? 
> 
> 
> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka <mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> wrote:
>> Hi Satish,
>> 
>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla <http://bugs.launchpad.net/kolla>) for this?
>> 
>> You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report.
>> Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913
>> 
>> Michal
>> 
>>> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com <mailto:satish.txt at gmail.com>> wrote:
>>> 
>>> Folks,
>>> 
>>> All other images build successfully but when i am trying to build horizon which failed with following error:
>>> 
>>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon
>>> 
>>> 
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
>>> INFO:kolla.common.utils.horizon:     ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
>>> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0
>>> INFO:kolla.common.utils.horizon:  Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
>>> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished with status 'error'
>>> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run successfully.
>>> INFO:kolla.common.utils.horizon:  ? exit code: 1
>>> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
>>> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
>>> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in <module>
>>> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>", line 34, in <module>
>>> INFO:kolla.common.utils.horizon:        File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in <module>
>>> INFO:kolla.common.utils.horizon:          from xstatic.pkg import moment_timezone as xs
>>> INFO:kolla.common.utils.horizon:      ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location)
>>> INFO:kolla.common.utils.horizon:      [end of output]
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:  note: This error originates from a subprocess, and is likely not a problem with pip.
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
>>> INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata.
>>> INFO:kolla.common.utils.horizon:??> See above for output.
>>> INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip.
>>> INFO:kolla.common.utils.horizon:hint: See above for details.
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529
>>> ERROR:kolla.common.utils.horizon:Error'd with the following message
>>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon     && sed -i /^horizon=/d /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon     && mkdir -p /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/     && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then            SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*;        fi     && for locale in  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do            (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1
>>> INFO:kolla.common.utils:=========================
>>> INFO:kolla.common.utils:Successfully built images
>>> INFO:kolla.common.utils:=========================
>>> INFO:kolla.common.utils:base
>>> INFO:kolla.common.utils:openstack-base
>>> INFO:kolla.common.utils:===========================
>>> INFO:kolla.common.utils:Images that failed to build
>>> INFO:kolla.common.utils:===========================
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/4b852ff7/attachment-0001.htm>

From satish.txt at gmail.com  Fri Mar 31 14:25:03 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 31 Mar 2023 10:25:03 -0400
Subject: [kolla] Image building question
In-Reply-To: <d419fa044dab2f07ff408fc10b7e708185310a48.camel@redhat.com>
References: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
 <d419fa044dab2f07ff408fc10b7e708185310a48.camel@redhat.com>
Message-ID: <CAPgF-fqm-OvCzd+My85P8MFe6uobwJ6y_OpFU2K+i7R2u6qdDA@mail.gmail.com>

Thank you Sean,

What a wonderful explanation of the process. Yes I can download images from
the public domain and push them to a local repository but in some cases I
would like to add my own tools like monitoring agents, utilities etc
for debugging so i decided to build my own images.

I believe https://tarballs.opendev.org is the right place to source
software correctly?

If I want to add some tools or packages inside images then I should use
Dockerfile.j2 to add and compile images. correct?

~S

On Fri, Mar 31, 2023 at 7:01?AM Sean Mooney <smooney at redhat.com> wrote:

> On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote:
> > Folks,
> >
> > I am playing with kolla image building to understand how it works. I am
> > using the following command to build images and wanted to check with you
> > folks if that is the correct way to do it.
> >
> > $ kolla-build -b ubuntu -t source keystone nova neutron glance
> >
> > Does the above command compile code from source or just download images
> > from remote repositories and re-compile them?
> >
> openstack is mainly python so in general ther is no complie step.
> but to answer your question that builds the image using the source tarballs
> or the openstakc packages.
>
> the defaults soruce locations are rendered into a file which you can
> override
> from the data stored in
> https://github.com/openstack/kolla/blob/master/kolla/common/sources.py
> the other build config defaults are generated form this code
> https://github.com/openstack/kolla/blob/master/kolla/common/config.py
>
> when you invoke kolla-build its executing
> https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py
> but the main build workflow is here
> https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95
>
> the tl;dr is the build worklow starts by creating  build director and
> locating the docker file templats.
> in otherwords the content of the
> https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker
> directory
>
> each project has a direcoty in the docker directory and then each
> contaienr that project has has a directory in the project directory
>
> so the aodh project has a aodh folder
> https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh
> the convention is to have a <project>-base contaienr which handels the
> depency installation and then one addtional contaienr for each binary deamon
> the project has i.e. aodh-api
>
> the name of the folder in teh project dir is used as the name of the
> contaienr
>
> if we look at the content of the docker files we will see that they are
> not actuly dockerfiles
>
> https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2
>
> they are jinja2 templates that produce docker files
>
> kolla as far as i am aware has drop support for binary images and
> alternitiv distos
>
> but looking at an older release we can se ehow this worked
>
> https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52
>
> each docker file template would use the jinja2 to generate a set of
> concreate docker files form the template
> and make dession based on the parmater passed in.
>
> so when you are invokeing
> kolla-build -b ubuntu -t source keystone nova neutron glance
>
> what actully happening is that the -t flag is being set as teh
> install_type parmater in the the jinja2 environemtn when
> the docker file is rendered.
>
> after all the docer files are rendered into normal docker files kolla just
> invokes the build.
>
> in the case of a source build that inovles pre fetching the source tar
> from https://tarballs.opendev.org
> and puting it in the build directory so that it can be included into the
> contianer.
>
> kolla also used to supprot git repo as a alternitve  source fromat
>
> i have glossed over a lot of the details of how this actully work but that
> is the essence of what that command is doing
> creating a build dir, downloading the source, rendering the dockerfile
> templates to docker files, invokeing docker build on those
> and then taging them with the contaienr nameand build tag
>
>
> https://docs.openstack.org/kolla/latest/admin/image-building.html
> covers this form a high level
>
> >   because in command output
> > I've not noticed anything related to the compiling process going on.
> >
> > Here is the output of all images produced by kolla-build command. Do I
> need
> > anything else or is this enough to deploy kolla?
> you can deploy coll with what you have yes although since the kolla files
> are automaticaly
> built by ci kolla-ansible can just use the ones form the docker hub or
> quay instead you do not need to build them yourself
>
> if you do build them your self then there is basically one other stpe that
> you shoudl take if this si a multi node deployment
> you should push the iamges to an interally host docker registry although
> based on the hostname in the prompt below
> it looks like you ahve alredy done that.
> >
> > root at docker-reg:~# docker images
> > REPOSITORY                            TAG       IMAGE ID       CREATED
> >         SIZE
> > kolla/mariadb-server                  15.1.0    2a497eee8269   26 minutes
> > ago      595MB
> > kolla/cron                            15.1.0    342877f26a8a   30 minutes
> > ago      250MB
> > kolla/memcached                       15.1.0    0d19a4902644   31 minutes
> > ago      250MB
> > kolla/mariadb-clustercheck            15.1.0    d84427d3c639   31 minutes
> > ago      314MB
> > kolla/mariadb-base                    15.1.0    34447e3e59b6   31 minutes
> > ago      314MB
> > kolla/keepalived                      15.1.0    82133b09fbf0   31 minutes
> > ago      260MB
> > kolla/prometheus-memcached-exporter   15.1.0    6c2d605f70ee   31 minutes
> > ago      262MB
> > <none>                                <none>    e66b228c2a07   31 minutes
> > ago      248MB
> > kolla/rabbitmq                        15.1.0    8de5c39379d3   32 minutes
> > ago      309MB
> > kolla/fluentd                         15.1.0    adfd19027862   33 minutes
> > ago      519MB
> > kolla/haproxy-ssh                     15.1.0    514357ac4d36   36 minutes
> > ago      255MB
> > kolla/haproxy                         15.1.0    e5b9cfdf6dfc   37 minutes
> > ago      257MB
> > kolla/prometheus-haproxy-exporter     15.1.0    a679f65fd735   37 minutes
> > ago      263MB
> > kolla/prometheus-base                 15.1.0    afeff3ed5dce   37 minutes
> > ago      248MB
> > kolla/glance-api                      15.1.0    a2241f68f23a   38 minutes
> > ago      1.04GB
> > kolla/glance-base                     15.1.0    7286772a03a4   About an
> > hour ago   1.03GB
> > kolla/neutron-infoblox-ipam-agent     15.1.0    f90ffc1a3326   About an
> > hour ago   1.05GB
> > kolla/neutron-server                  15.1.0    69c844a2e3a9   About an
> > hour ago   1.05GB
> > kolla/neutron-l3-agent                15.1.0    4d87e6963c96   About an
> > hour ago   1.05GB
> > <none>                                <none>    486da9a6562e   About an
> > hour ago   1.05GB
> > kolla/neutron-linuxbridge-agent       15.1.0    e5b3ca7e099c   About an
> > hour ago   1.04GB
> > kolla/neutron-bgp-dragent             15.1.0    ac37377820c6   About an
> > hour ago   1.04GB
> > kolla/ironic-neutron-agent            15.1.0    90993adcd74b   About an
> > hour ago   1.04GB
> > kolla/neutron-metadata-agent          15.1.0    8522f147f88d   About an
> > hour ago   1.04GB
> > kolla/neutron-sriov-agent             15.1.0    8a92ce7d13c0   About an
> > hour ago   1.04GB
> > kolla/neutron-dhcp-agent              15.1.0    5c214b0171f5   About an
> > hour ago   1.04GB
> > kolla/neutron-metering-agent          15.1.0    7b3b91ecd77b   About an
> > hour ago   1.04GB
> > kolla/neutron-openvswitch-agent       15.1.0    1f8807308814   About an
> > hour ago   1.04GB
> > kolla/neutron-base                    15.1.0    f85b6a2e2725   About an
> > hour ago   1.04GB
> > kolla/nova-libvirt                    15.1.0    0f3ecefe4752   About an
> > hour ago   987MB
> > kolla/nova-compute                    15.1.0    241b7e7fafbe   About an
> > hour ago   1.47GB
> > kolla/nova-spicehtml5proxy            15.1.0    b740820a7ad1   About an
> > hour ago   1.15GB
> > kolla/nova-novncproxy                 15.1.0    1ba2f443d5c3   About an
> > hour ago   1.22GB
> > kolla/nova-compute-ironic             15.1.0    716612107532   About an
> > hour ago   1.12GB
> > kolla/nova-ssh                        15.1.0    ae2397f4e1c1   About an
> > hour ago   1.11GB
> > kolla/nova-api                        15.1.0    2aef02667ff8   About an
> > hour ago   1.11GB
> > kolla/nova-conductor                  15.1.0    6f1da3400901   About an
> > hour ago   1.11GB
> > kolla/nova-scheduler                  15.1.0    628326776b1d   About an
> > hour ago   1.11GB
> > kolla/nova-serialproxy                15.1.0    28eb7a4a13f8   About an
> > hour ago   1.11GB
> > kolla/nova-base                       15.1.0    e47420013283   About an
> > hour ago   1.11GB
> > kolla/keystone                        15.1.0    e5530d829d5f   2 hours
> ago
> >         947MB
> > kolla/keystone-ssh                    15.1.0    eaa7e3f3985a   2 hours
> ago
> >         953MB
> > kolla/keystone-fernet                 15.1.0    8a4fa24853a8   2 hours
> ago
> >         951MB
> > kolla/keystone-base                   15.1.0    b6f9562364a9   2 hours
> ago
> >         945MB
> > kolla/barbican-base                   15.1.0    b2fdef1afb44   2 hours
> ago
> >         915MB
> > kolla/barbican-keystone-listener      15.1.0    58bd59de2c63   2 hours
> ago
> >         915MB
> > kolla/openstack-base                  15.1.0    c805b4b3b1c1   2 hours
> ago
> >         893MB
> > kolla/base                            15.1.0    f68e9ef3dd30   2 hours
> ago
> >         248MB
> > registry                              2         8db46f9d7550   19 hours
> ago
> >        24.2MB
> > ubuntu                                22.04     08d22c0ceb15   3 weeks
> ago
> >         77.8MB
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/14702459/attachment-0001.htm>

From satish.txt at gmail.com  Fri Mar 31 14:27:50 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 31 Mar 2023 10:27:50 -0400
Subject: [kolla] horizon image build failed
In-Reply-To: <B222A2CC-7342-4EAA-8CA6-6DE61D3A82AB@gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
 <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
 <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>
 <B222A2CC-7342-4EAA-8CA6-6DE61D3A82AB@gmail.com>
Message-ID: <CAPgF-fr3xDGQgQPkqFBT+B0dKqq2GHvuf9UDNtGPGcUOZVYusA@mail.gmail.com>

Hi Micha?,

This is my sandbox environment so I did  "pip install kolla" and started
building images. How do I check out specific stable/zed or tag releases to
build images?

~S

On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka <mnasiadka at gmail.com> wrote:

> Hi Satish,
>
> Vishal mentioned a bug that I raised in Horizon, but we have been pinning
> to earlier setuptools in Kolla builds just because of that (and that?s the
> workaround).
> Are you using kolla from PyPI or the latest stable/zed checkout from Git?
> We recommend the latter.
>
> Michal
>
> On 31 Mar 2023, at 15:53, Satish Patel <satish.txt at gmail.com> wrote:
>
> Thank Michal,
>
> I have posted commands in my original post which have distribution Ubuntu
> and release zed.  ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t
> source --tag zed horizon )
>
> I can definitely open a new bug but it looks like vishal already on it.
> Are there any workarounds or interim solutions? I am new to the kolla-image
> building process so I'm not sure where I should change the setup tool
> version to move on.
>
> Very curious how the CI-CD pipeline passed this bug?
>
>
> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka <mnasiadka at gmail.com>
> wrote:
>
>> Hi Satish,
>>
>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this?
>>
>> You have also not mentioned what distribution and Kolla release are you
>> using, so please do that in the bug report.
>> Looking at the output probably it?s stable/yoga and Debian - being fixed
>> in https://review.opendev.org/c/openstack/kolla/+/873913
>>
>> Michal
>>
>> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com> wrote:
>>
>> Folks,
>>
>> All other images build successfully but when i am trying to build horizon
>> which failed with following error:
>>
>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed
>> horizon
>>
>>
>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
>> INFO:kolla.common.utils.horizon:
>> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
>> INFO:kolla.common.utils.horizon:
>> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:Requirement already satisfied:
>> XStatic-Font-Awesome>=4.7.0.0 in
>> /var/lib/kolla/venv/lib/python3.10/site-packages (from
>> vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
>> INFO:kolla.common.utils.horizon:
>> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:Requirement already satisfied:
>> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages
>> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
>> INFO:kolla.common.utils.horizon:
>> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
>> INFO:kolla.common.utils.horizon:
>> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:Collecting
>> XStatic-Moment-Timezone>=0.5.22.0
>> INFO:kolla.common.utils.horizon:  Downloading
>> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
>> INFO:kolla.common.utils.horizon:
>> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished
>> with status 'error'
>> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
>> INFO:kolla.common.utils.horizon:
>> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run
>> successfully.
>> INFO:kolla.common.utils.horizon:  ? exit code: 1
>> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
>> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
>> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in
>> <module>
>> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>",
>> line 34, in <module>
>> INFO:kolla.common.utils.horizon:        File
>> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py",
>> line 2, in <module>
>> INFO:kolla.common.utils.horizon:          from xstatic.pkg import
>> moment_timezone as xs
>> INFO:kolla.common.utils.horizon:      ImportError: cannot import name
>> 'moment_timezone' from 'xstatic.pkg' (unknown location)
>> INFO:kolla.common.utils.horizon:      [end of output]
>> INFO:kolla.common.utils.horizon:
>> INFO:kolla.common.utils.horizon:  note: This error originates from a
>> subprocess, and is likely not a problem with pip.
>> INFO:kolla.common.utils.horizon:
>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
>> INFO:kolla.common.utils.horizon:? Encountered error while generating
>> package metadata.
>> INFO:kolla.common.utils.horizon:??> See above for output.
>> INFO:kolla.common.utils.horizon:note: This is an issue with the package
>> mentioned above, not pip.
>> INFO:kolla.common.utils.horizon:hint: See above for details.
>> INFO:kolla.common.utils.horizon:
>> INFO:kolla.common.utils.horizon:Removing intermediate container
>> e6cd437ba529
>> ERROR:kolla.common.utils.horizon:Error'd with the following message
>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s
>> horizon-source/* horizon     && sed -i /^horizon=/d
>> /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib
>> python3 -m pip --no-cache-dir install --upgrade -c
>> /requirements/upper-constraints.txt /horizon     && mkdir -p
>> /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/*
>> /etc/openstack-dashboard/     && cp
>> /horizon/openstack_dashboard/local/local_settings.py.example
>> /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py
>> /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then
>>        SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir
>> install --upgrade -c /requirements/upper-constraints.txt /plugins/*;
>>  fi     && for locale in
>>  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do
>>  (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)
>>        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a
>> non-zero code: 1
>> INFO:kolla.common.utils:=========================
>> INFO:kolla.common.utils:Successfully built images
>> INFO:kolla.common.utils:=========================
>> INFO:kolla.common.utils:base
>> INFO:kolla.common.utils:openstack-base
>> INFO:kolla.common.utils:===========================
>> INFO:kolla.common.utils:Images that failed to build
>> INFO:kolla.common.utils:===========================
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/499b428f/attachment.htm>

From mnasiadka at gmail.com  Fri Mar 31 14:33:54 2023
From: mnasiadka at gmail.com (=?utf-8?Q?Micha=C5=82_Nasiadka?=)
Date: Fri, 31 Mar 2023 16:33:54 +0200
Subject: [kolla] horizon image build failed
In-Reply-To: <CAPgF-fr3xDGQgQPkqFBT+B0dKqq2GHvuf9UDNtGPGcUOZVYusA@mail.gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
 <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
 <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>
 <B222A2CC-7342-4EAA-8CA6-6DE61D3A82AB@gmail.com>
 <CAPgF-fr3xDGQgQPkqFBT+B0dKqq2GHvuf9UDNtGPGcUOZVYusA@mail.gmail.com>
Message-ID: <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com>

Hi Satish,

git clone https://opendev.org/openstack/kolla -b stable/zed
cd kolla
pip3 install . 

I think we should amend the docs a bit to make it easier - thanks for pointing out.

Michal

> On 31 Mar 2023, at 16:27, Satish Patel <satish.txt at gmail.com> wrote:
> 
> Hi Micha?,
> 
> This is my sandbox environment so I did  "pip install kolla" and started building images. How do I check out specific stable/zed or tag releases to build images? 
> 
> ~S
> 
> On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka <mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> wrote:
>> Hi Satish,
>> 
>> Vishal mentioned a bug that I raised in Horizon, but we have been pinning to earlier setuptools in Kolla builds just because of that (and that?s the workaround).
>> Are you using kolla from PyPI or the latest stable/zed checkout from Git? We recommend the latter.
>> 
>> Michal
>> 
>>> On 31 Mar 2023, at 15:53, Satish Patel <satish.txt at gmail.com <mailto:satish.txt at gmail.com>> wrote:
>>> 
>>> Thank Michal,
>>> 
>>> I have posted commands in my original post which have distribution Ubuntu and release zed.  ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon )
>>> 
>>> I can definitely open a new bug but it looks like vishal already on it. Are there any workarounds or interim solutions? I am new to the kolla-image building process so I'm not sure where I should change the setup tool version to move on. 
>>> 
>>> Very curious how the CI-CD pipeline passed this bug? 
>>> 
>>> 
>>> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka <mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> wrote:
>>>> Hi Satish,
>>>> 
>>>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla <http://bugs.launchpad.net/kolla>) for this?
>>>> 
>>>> You have also not mentioned what distribution and Kolla release are you using, so please do that in the bug report.
>>>> Looking at the output probably it?s stable/yoga and Debian - being fixed in https://review.opendev.org/c/openstack/kolla/+/873913
>>>> 
>>>> Michal
>>>> 
>>>>> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com <mailto:satish.txt at gmail.com>> wrote:
>>>>> 
>>>>> Folks,
>>>>> 
>>>>> All other images build successfully but when i am trying to build horizon which failed with following error:
>>>>> 
>>>>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed horizon
>>>>> 
>>>>> 
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
>>>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
>>>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-Font-Awesome>=4.7.0.0 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
>>>>> INFO:kolla.common.utils.horizon:     ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:Requirement already satisfied: XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
>>>>> INFO:kolla.common.utils.horizon:     ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
>>>>> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Moment-Timezone>=0.5.22.0
>>>>> INFO:kolla.common.utils.horizon:  Downloading XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
>>>>> INFO:kolla.common.utils.horizon:     ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
>>>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
>>>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): finished with status 'error'
>>>>> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
>>>>> INFO:kolla.common.utils.horizon:
>>>>> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run successfully.
>>>>> INFO:kolla.common.utils.horizon:  ? exit code: 1
>>>>> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
>>>>> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
>>>>> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in <module>
>>>>> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>", line 34, in <module>
>>>>> INFO:kolla.common.utils.horizon:        File "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py", line 2, in <module>
>>>>> INFO:kolla.common.utils.horizon:          from xstatic.pkg import moment_timezone as xs
>>>>> INFO:kolla.common.utils.horizon:      ImportError: cannot import name 'moment_timezone' from 'xstatic.pkg' (unknown location)
>>>>> INFO:kolla.common.utils.horizon:      [end of output]
>>>>> INFO:kolla.common.utils.horizon:
>>>>> INFO:kolla.common.utils.horizon:  note: This error originates from a subprocess, and is likely not a problem with pip.
>>>>> INFO:kolla.common.utils.horizon:
>>>>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
>>>>> INFO:kolla.common.utils.horizon:? Encountered error while generating package metadata.
>>>>> INFO:kolla.common.utils.horizon:??> See above for output.
>>>>> INFO:kolla.common.utils.horizon:note: This is an issue with the package mentioned above, not pip.
>>>>> INFO:kolla.common.utils.horizon:hint: See above for details.
>>>>> INFO:kolla.common.utils.horizon:
>>>>> INFO:kolla.common.utils.horizon:Removing intermediate container e6cd437ba529
>>>>> ERROR:kolla.common.utils.horizon:Error'd with the following message
>>>>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s horizon-source/* horizon     && sed -i /^horizon=/d /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /horizon     && mkdir -p /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/* /etc/openstack-dashboard/     && cp /horizon/openstack_dashboard/local/local_settings.py.example /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then            SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir install --upgrade -c /requirements/upper-constraints.txt /plugins/*;        fi     && for locale in  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do            (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a non-zero code: 1
>>>>> INFO:kolla.common.utils:=========================
>>>>> INFO:kolla.common.utils:Successfully built images
>>>>> INFO:kolla.common.utils:=========================
>>>>> INFO:kolla.common.utils:base
>>>>> INFO:kolla.common.utils:openstack-base
>>>>> INFO:kolla.common.utils:===========================
>>>>> INFO:kolla.common.utils:Images that failed to build
>>>>> INFO:kolla.common.utils:===========================
>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/de22ed59/attachment-0001.htm>

From satish.txt at gmail.com  Fri Mar 31 14:45:20 2023
From: satish.txt at gmail.com (Satish Patel)
Date: Fri, 31 Mar 2023 10:45:20 -0400
Subject: [kolla] horizon image build failed
In-Reply-To: <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com>
References: <CAPgF-fpUEaapOVDdWpX2z-3-u8V=tLruyfvpoOeSObm2suGh3g@mail.gmail.com>
 <D5A23ACF-1301-4182-964E-08122ACECE97@gmail.com>
 <CAPgF-fprrVfbN-oVwYTsONF5EHfoVV5b84hBaPOJ_0R5ia4qwg@mail.gmail.com>
 <B222A2CC-7342-4EAA-8CA6-6DE61D3A82AB@gmail.com>
 <CAPgF-fr3xDGQgQPkqFBT+B0dKqq2GHvuf9UDNtGPGcUOZVYusA@mail.gmail.com>
 <92A192DF-ABFC-4ED3-A65A-15A2E3869B2B@gmail.com>
Message-ID: <CAPgF-frpTBVbPvsqPY3w2D4SvBmMHLtnWO4gORN1BsBwHeAvUQ@mail.gmail.com>

Awesome, thanks!

This is easy, and +++1 to amend that in docs :)

On Fri, Mar 31, 2023 at 10:34?AM Micha? Nasiadka <mnasiadka at gmail.com>
wrote:

> Hi Satish,
>
> git clone https://opendev.org/openstack/kolla -b stable/zed
> cd kolla
> pip3 install .
>
> I think we should amend the docs a bit to make it easier - thanks for
> pointing out.
>
> Michal
>
> On 31 Mar 2023, at 16:27, Satish Patel <satish.txt at gmail.com> wrote:
>
> Hi Micha?,
>
> This is my sandbox environment so I did  "pip install kolla" and started
> building images. How do I check out specific stable/zed or tag releases to
> build images?
>
> ~S
>
> On Fri, Mar 31, 2023 at 9:59?AM Micha? Nasiadka <mnasiadka at gmail.com>
> wrote:
>
>> Hi Satish,
>>
>> Vishal mentioned a bug that I raised in Horizon, but we have been pinning
>> to earlier setuptools in Kolla builds just because of that (and that?s the
>> workaround).
>> Are you using kolla from PyPI or the latest stable/zed checkout from Git?
>> We recommend the latter.
>>
>> Michal
>>
>> On 31 Mar 2023, at 15:53, Satish Patel <satish.txt at gmail.com> wrote:
>>
>> Thank Michal,
>>
>> I have posted commands in my original post which have distribution Ubuntu
>> and release zed.  ( $ kolla-build --registry docker-reg:4000 -b ubuntu -t
>> source --tag zed horizon )
>>
>> I can definitely open a new bug but it looks like vishal already on it.
>> Are there any workarounds or interim solutions? I am new to the kolla-image
>> building process so I'm not sure where I should change the setup tool
>> version to move on.
>>
>> Very curious how the CI-CD pipeline passed this bug?
>>
>>
>> On Fri, Mar 31, 2023 at 1:51?AM Micha? Nasiadka <mnasiadka at gmail.com>
>> wrote:
>>
>>> Hi Satish,
>>>
>>> Have you raised a bug in Launchpad (bugs.launchpad.net/kolla) for this?
>>>
>>> You have also not mentioned what distribution and Kolla release are you
>>> using, so please do that in the bug report.
>>> Looking at the output probably it?s stable/yoga and Debian - being fixed
>>> in https://review.opendev.org/c/openstack/kolla/+/873913
>>>
>>> Michal
>>>
>>> On 31 Mar 2023, at 05:05, Satish Patel <satish.txt at gmail.com> wrote:
>>>
>>> Folks,
>>>
>>> All other images build successfully but when i am trying to build
>>> horizon which failed with following error:
>>>
>>> $ kolla-build --registry docker-reg:4000 -b ubuntu -t source --tag zed
>>> horizon
>>>
>>>
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre>=0.6.4.0
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic_Dagre-0.6.4.1-py2.py3-none-any.whl (140 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ?????????????????????????????????????? 140.0/140.0 kB 14.4 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Dagre-D3>=0.4.17.0
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic_Dagre_D3-0.4.17.0-py2.py3-none-any.whl (357 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ?????????????????????????????????????? 357.4/357.4 kB 13.5 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Requirement already satisfied:
>>> XStatic-Font-Awesome>=4.7.0.0 in
>>> /var/lib/kolla/venv/lib/python3.10/site-packages (from
>>> vitrage-dashboard==3.6.1.dev2) (4.7.0.0)
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-Graphlib>=2.1.7.0
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic_Graphlib-2.1.7.0-py2.py3-none-any.whl (51 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ??????????????????????????????????????? 51.5/51.5 kB 114.3 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Requirement already satisfied:
>>> XStatic-jQuery>=1.8.2.1 in /var/lib/kolla/venv/lib/python3.10/site-packages
>>> (from vitrage-dashboard==3.6.1.dev2) (1.12.4.1)
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-lodash>=4.16.4.1
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic_lodash-4.16.4.2-py3-none-any.whl (167 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ?????????????????????????????????????? 167.9/167.9 kB 12.4 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting XStatic-moment>=2.8.4.1
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic_moment-2.8.4.3-py3-none-any.whl (58 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ???????????????????????????????????????? 58.0/58.0 kB 66.7 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:Collecting
>>> XStatic-Moment-Timezone>=0.5.22.0
>>> INFO:kolla.common.utils.horizon:  Downloading
>>> XStatic-Moment-Timezone-0.5.22.0.tar.gz (99 kB)
>>> INFO:kolla.common.utils.horizon:
>>> ???????????????????????????????????????? 99.7/99.7 kB 45.1 MB/s eta 0:00:00
>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py): started
>>> INFO:kolla.common.utils.horizon:  Preparing metadata (setup.py):
>>> finished with status 'error'
>>> INFO:kolla.common.utils.horizon:  error: subprocess-exited-with-error
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:  ? python setup.py egg_info did not run
>>> successfully.
>>> INFO:kolla.common.utils.horizon:  ? exit code: 1
>>> INFO:kolla.common.utils.horizon:  ??> [6 lines of output]
>>> INFO:kolla.common.utils.horizon:      Traceback (most recent call last):
>>> INFO:kolla.common.utils.horizon:        File "<string>", line 2, in
>>> <module>
>>> INFO:kolla.common.utils.horizon:        File "<pip-setuptools-caller>",
>>> line 34, in <module>
>>> INFO:kolla.common.utils.horizon:        File
>>> "/tmp/pip-install-dqag1zef/xstatic-moment-timezone_60eeadc1dfb9492781fe3ca90e3b95c2/setup.py",
>>> line 2, in <module>
>>> INFO:kolla.common.utils.horizon:          from xstatic.pkg import
>>> moment_timezone as xs
>>> INFO:kolla.common.utils.horizon:      ImportError: cannot import name
>>> 'moment_timezone' from 'xstatic.pkg' (unknown location)
>>> INFO:kolla.common.utils.horizon:      [end of output]
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:  note: This error originates from a
>>> subprocess, and is likely not a problem with pip.
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:error: metadata-generation-failed
>>> INFO:kolla.common.utils.horizon:? Encountered error while generating
>>> package metadata.
>>> INFO:kolla.common.utils.horizon:??> See above for output.
>>> INFO:kolla.common.utils.horizon:note: This is an issue with the package
>>> mentioned above, not pip.
>>> INFO:kolla.common.utils.horizon:hint: See above for details.
>>> INFO:kolla.common.utils.horizon:
>>> INFO:kolla.common.utils.horizon:Removing intermediate container
>>> e6cd437ba529
>>> ERROR:kolla.common.utils.horizon:Error'd with the following message
>>> ERROR:kolla.common.utils.horizon:The command '/bin/sh -c ln -s
>>> horizon-source/* horizon     && sed -i /^horizon=/d
>>> /requirements/upper-constraints.txt     && SETUPTOOLS_USE_DISTUTILS=stdlib
>>> python3 -m pip --no-cache-dir install --upgrade -c
>>> /requirements/upper-constraints.txt /horizon     && mkdir -p
>>> /etc/openstack-dashboard     && cp -r /horizon/openstack_dashboard/conf/*
>>> /etc/openstack-dashboard/     && cp
>>> /horizon/openstack_dashboard/local/local_settings.py.example
>>> /etc/openstack-dashboard/local_settings     && cp /horizon/manage.py
>>> /var/lib/kolla/venv/bin/manage.py     && if [ "$(ls /plugins)" ]; then
>>>        SETUPTOOLS_USE_DISTUTILS=stdlib python3 -m pip --no-cache-dir
>>> install --upgrade -c /requirements/upper-constraints.txt /plugins/*;
>>>  fi     && for locale in
>>>  /var/lib/kolla/venv/lib/python3.10/site-packages/*/locale; do
>>>  (cd ${locale%/*} && /var/lib/kolla/venv/bin/django-admin compilemessages)
>>>        done     && chmod 644 /usr/local/bin/kolla_extend_start' returned a
>>> non-zero code: 1
>>> INFO:kolla.common.utils:=========================
>>> INFO:kolla.common.utils:Successfully built images
>>> INFO:kolla.common.utils:=========================
>>> INFO:kolla.common.utils:base
>>> INFO:kolla.common.utils:openstack-base
>>> INFO:kolla.common.utils:===========================
>>> INFO:kolla.common.utils:Images that failed to build
>>> INFO:kolla.common.utils:===========================
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/ca8c670e/attachment.htm>

From jonathan.rosser at rd.bbc.co.uk  Fri Mar 31 14:49:13 2023
From: jonathan.rosser at rd.bbc.co.uk (Jonathan Rosser)
Date: Fri, 31 Mar 2023 15:49:13 +0100
Subject: [ironic] ARM Support in CI: Call for vendors / contributors /
 interested parties
In-Reply-To: <CA+sTGNfqAzE=bR16oXwBFyObKV8iz_RVkyB43t-ZemftdHsESQ@mail.gmail.com>
References: <CA+sTGNfqAzE=bR16oXwBFyObKV8iz_RVkyB43t-ZemftdHsESQ@mail.gmail.com>
Message-ID: <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk>

I have Ironic working with Supermicro MegaDC / Ampere CPU in a R12SPD-A 
system board using the ipmi driver.

Jon.

On 29/03/2023 19:39, Jay Faulkner wrote:
> Hi stackers,
>
> Ironic has published an experimental Ironic Python Agent image for 
> ARM64 
> (https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/) 
> and discussed promoting this image to supported via CI testing. 
> However, we have a problem: there are no Ironic developers with easy 
> access to ARM hardware at the moment, and no Ironic developers with 
> free time to commit to improving our support of ARM hardware.
>
> So we're putting out a call for help:
> - If you're a hardware vendor and want your ARM hardware supported? 
> Please come talk to the Ironic community about setting up third-party-CI.
> - Are you an operator or contributor from a company invested in ARM 
> bare metal? Please come join the Ironic community to help us build 
> this support.
>
> Thanks,
> Jay Faulkner
> Ironic PTL
>
>


From elod.illes at est.tech  Fri Mar 31 15:08:26 2023
From: elod.illes at est.tech (=?utf-8?B?RWzDtWQgSWxsw6lz?=)
Date: Fri, 31 Mar 2023 15:08:26 +0000
Subject: [release] Release countdown for week R-26, Apr 03 - 07
Message-ID: <VI1P18901MB075162883F0A593DA7B1C9E5FF8F9@VI1P18901MB0751.EURP189.PROD.OUTLOOK.COM>

Hi,

Welcome back to the release countdown emails! These will be sent at
major points in the 2023.2 Bobcat development cycle, which should
conclude with a final release on October 4th, 2023.

Development Focus
-----------------

At this stage in the release cycle, focus should be on planning the
2023.2 Bobcat development cycle and approving 2023.2 Bobcat specs.

General Information
-------------------

2023.2 Bobcat is a 28 weeks long development cycle. In case you haven't
seen it yet, please take a look over the schedule for this release:

https://releases.openstack.org/bobcat/schedule.html

By default, the team PTL is responsible for handling the release cycle
and approving release requests. This task can (and probably should) be
delegated to release liaisons. Now is a good time to review release
liaison information for your team and make sure it is up to date:

https://opendev.org/openstack/releases/src/branch/master/data/release_liaisons.yaml

By default, all your team deliverables from the 2023.1 Antelope release
are continued in 2023.2 Bobcat with a similar release model.

Upcoming Deadlines & Dates
--------------------------

Bobcat-1 milestone: May 11th, 2023
OpenInfra Summit + PTG Vancouver 2023 - June 13-15, 2023


El?d Ill?s
irc: elodilles @ #openstack-release
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/fcda00c9/attachment.htm>

From rlandy at redhat.com  Fri Mar 31 15:59:27 2023
From: rlandy at redhat.com (Ronelle Landy)
Date: Fri, 31 Mar 2023 11:59:27 -0400
Subject: [tripleo] Removal of TripleO Master Integration and Component Lines
Message-ID: <CANnyimP_f94dEL79Uvai-fvYC1DYKoGFsbNw4HZ4pp3oPLj8gQ@mail.gmail.com>

Hello All,

Removal of TripleO Master Integration and Component Lines

Per the decision to not maintain TripleO after the Zed release [1], the
master/main integration and component lines are being removed in the
following patches:

    https://review.rdoproject.org/r/c/config/+/48073
    https://review.rdoproject.org/r/c/config/+/48074
    https://review.rdoproject.org/r/c/rdo-jobs/+/48075

The last promoted release of master through TripleO is:
https://trunk.rdoproject.org/centos9-master/current-tripleo/delorean.repo
(hash: ddce25bad764dde7e0515094b4d40471),
which was promoted on 03/28/2023.

Check/gate testing for the master branch is in process of being removed as
well.

[1] https://review.opendev.org/c/openstack/governance/+/878799
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/f3be2afc/attachment.htm>

From jay at gr-oss.io  Fri Mar 31 16:01:24 2023
From: jay at gr-oss.io (Jay Faulkner)
Date: Fri, 31 Mar 2023 09:01:24 -0700
Subject: [ironic] ARM Support in CI: Call for vendors / contributors /
 interested parties
In-Reply-To: <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk>
References: <CA+sTGNfqAzE=bR16oXwBFyObKV8iz_RVkyB43t-ZemftdHsESQ@mail.gmail.com>
 <6faf5514-2ac8-9e8b-c543-0f8125b4001b@rd.bbc.co.uk>
Message-ID: <CA+sTGNefqNaOOpAGj-O0LDqSitsF2KxZPmgC=rGOZjxkRpqcxA@mail.gmail.com>

Thanks for responding, Jonathan!

Did you have to make any downstream changes to Ironic for this to work? Are
you using our published ARM64 image or using their own?

Thanks,
Jay Faulkner
Ironic PTL


On Fri, Mar 31, 2023 at 7:56?AM Jonathan Rosser <
jonathan.rosser at rd.bbc.co.uk> wrote:

> I have Ironic working with Supermicro MegaDC / Ampere CPU in a R12SPD-A
> system board using the ipmi driver.
>
> Jon.
>
> On 29/03/2023 19:39, Jay Faulkner wrote:
> > Hi stackers,
> >
> > Ironic has published an experimental Ironic Python Agent image for
> > ARM64
> > (
> https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/)
>
> > and discussed promoting this image to supported via CI testing.
> > However, we have a problem: there are no Ironic developers with easy
> > access to ARM hardware at the moment, and no Ironic developers with
> > free time to commit to improving our support of ARM hardware.
> >
> > So we're putting out a call for help:
> > - If you're a hardware vendor and want your ARM hardware supported?
> > Please come talk to the Ironic community about setting up third-party-CI.
> > - Are you an operator or contributor from a company invested in ARM
> > bare metal? Please come join the Ironic community to help us build
> > this support.
> >
> > Thanks,
> > Jay Faulkner
> > Ironic PTL
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/3238f14e/attachment-0001.htm>

From knikolla at bu.edu  Fri Mar 31 17:27:34 2023
From: knikolla at bu.edu (Nikolla, Kristi)
Date: Fri, 31 Mar 2023 17:27:34 +0000
Subject: [tc] No TC weekly meeting next week
Message-ID: <CF7E4446-1EBE-424C-94D3-FCA7E10479BB@bu.edu>

Hi all,

There will be no TC meeting on Tuesday, April 4th.

The next TC meeting will be held on Tuesday, April 11 at 18.00 UTC. More information and an ICS file can be found here https://meetings.opendev.org/#Technical_Committee_Meeting

Thank you,
Kristi Nikolla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/163311ac/attachment.htm>

From smooney at redhat.com  Fri Mar 31 17:31:01 2023
From: smooney at redhat.com (Sean Mooney)
Date: Fri, 31 Mar 2023 18:31:01 +0100
Subject: [kolla] Image building question
In-Reply-To: <CAPgF-fqm-OvCzd+My85P8MFe6uobwJ6y_OpFU2K+i7R2u6qdDA@mail.gmail.com>
References: <CAPgF-fqFP1YENfG-XsrL2oJQeqzVhUTdJzn2_VwKsZvkvpeApQ@mail.gmail.com>
 <d419fa044dab2f07ff408fc10b7e708185310a48.camel@redhat.com>
 <CAPgF-fqm-OvCzd+My85P8MFe6uobwJ6y_OpFU2K+i7R2u6qdDA@mail.gmail.com>
Message-ID: <93a78d3cdbc7fb8ca66545a981ca145c6ce21d7a.camel@redhat.com>

On Fri, 2023-03-31 at 10:25 -0400, Satish Patel wrote:
> Thank you Sean,
> 
> What a wonderful explanation of the process. Yes I can download images from
> the public domain and push them to a local repository but in some cases I
> would like to add my own tools like monitoring agents, utilities etc
> for debugging so i decided to build my own images.
> 
> I believe https://tarballs.opendev.org is the right place to source
> software correctly?
that is the offcial location where all opendev/openstack projects are released
and its the location distros use to build there packages.
> 
> If I want to add some tools or packages inside images then I should use
> Dockerfile.j2 to add and compile images. correct?

yes so one of the great things about kolla images is tiem was taken to write
down the image api when the project was first started
https://docs.openstack.org/kolla/yoga/admin/kolla_api.html

over time the common usecaes were then docuemnted in the admin image-building guide
https://docs.openstack.org/kolla/yoga/admin/image-building.html#dockerfile-customisation


all of the templated imnages have convension/contract that they provide delement that operator can
use to add customisations.

for examplel the nova_base_footer block and be used to add addtional content to the nova-base image
https://github.com/openstack/kolla/blob/master/docker/nova/nova-base/Dockerfile.j2#L82

to customise the iamges you provide what is know as a template override file

the contrib folder has a number of exmaples

https://github.com/openstack/kolla/blob/master/contrib/template-override/ovs-dpdk.j2
https://github.com/openstack/kolla/blob/master/contrib/neutron-plugins/template_override-networking-mlnx.j2
https://github.com/openstack/kolla/blob/master/contrib/neutron-plugins/template_override-vmware-nsx.j2


the way they work is you starte it with 

{% extends parent_template %}

then you just create a block that mages the name of the one you want to repalce
{% extends parent_template %}

{% block nova_base_footer %} 
 RUN /bin/true
{% endblock %}


what ever content you put in the block will be injected directly into the rendered docer file


https://docs.openstack.org/kolla/yoga/admin/image-building.html#plugin-functionality

show how to use that for neutron 

in addtion to replacing block you can set the conten of specal variables
like horizon_packages_append or horizon_packages_remove

https://docs.openstack.org/kolla/yoga/admin/image-building.html#packages-customisation

that allow you add an remove packages in a simple way
there is also a set of macros that you can use

you include them with 
{% import "macros.j2" as macros with context %}

they are defiend here
https://github.com/openstack/kolla/blob/master/docker/macros.j2

if you know that the docs exist the capablitys are coverd pretty well in the kolla docs you just need to know where to look
hopefully that helps.

the ovs-dpdk template overide docs can be fouund here https://docs.openstack.org/kolla/yoga/admin/template-override/ovs-dpdk.html

its a liltle differnt then the other since there i used the template override mechanium to allow compiling ovs with dpdk form souces

kolla normlly does not support that but it serves as a demonstration for how operator can do that if they really need too.
i.e. compile a replacment for a minary componet like mariadb. it salso show how to use git as the souce localtion instead
of tars if that is your prefernce.

> 
> ~S
> 
> On Fri, Mar 31, 2023 at 7:01?AM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Thu, 2023-03-30 at 16:49 -0400, Satish Patel wrote:
> > > Folks,
> > > 
> > > I am playing with kolla image building to understand how it works. I am
> > > using the following command to build images and wanted to check with you
> > > folks if that is the correct way to do it.
> > > 
> > > $ kolla-build -b ubuntu -t source keystone nova neutron glance
> > > 
> > > Does the above command compile code from source or just download images
> > > from remote repositories and re-compile them?
> > > 
> > openstack is mainly python so in general ther is no complie step.
> > but to answer your question that builds the image using the source tarballs
> > or the openstakc packages.
> > 
> > the defaults soruce locations are rendered into a file which you can
> > override
> > from the data stored in
> > https://github.com/openstack/kolla/blob/master/kolla/common/sources.py
> > the other build config defaults are generated form this code
> > https://github.com/openstack/kolla/blob/master/kolla/common/config.py
> > 
> > when you invoke kolla-build its executing
> > https://github.com/openstack/kolla/blob/master/kolla/cmd/build.py
> > but the main build workflow is here
> > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/kolla/image/build.py#L95
> > 
> > the tl;dr is the build worklow starts by creating  build director and
> > locating the docker file templats.
> > in otherwords the content of the
> > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker
> > directory
> > 
> > each project has a direcoty in the docker directory and then each
> > contaienr that project has has a directory in the project directory
> > 
> > so the aodh project has a aodh folder
> > https://github.com/openstack/kolla/tree/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh
> > the convention is to have a <project>-base contaienr which handels the
> > depency installation and then one addtional contaienr for each binary deamon
> > the project has i.e. aodh-api
> > 
> > the name of the folder in teh project dir is used as the name of the
> > contaienr
> > 
> > if we look at the content of the docker files we will see that they are
> > not actuly dockerfiles
> > 
> > https://github.com/openstack/kolla/blob/be15d6212f278027c257f9dd67e5b2719e9f730a/docker/aodh/aodh-api/Dockerfile.j2
> > 
> > they are jinja2 templates that produce docker files
> > 
> > kolla as far as i am aware has drop support for binary images and
> > alternitiv distos
> > 
> > but looking at an older release we can se ehow this worked
> > 
> > https://github.com/openstack/kolla/blob/stable/wallaby/docker/nova/nova-base/Dockerfile.j2#L13-L52
> > 
> > each docker file template would use the jinja2 to generate a set of
> > concreate docker files form the template
> > and make dession based on the parmater passed in.
> > 
> > so when you are invokeing
> > kolla-build -b ubuntu -t source keystone nova neutron glance
> > 
> > what actully happening is that the -t flag is being set as teh
> > install_type parmater in the the jinja2 environemtn when
> > the docker file is rendered.
> > 
> > after all the docer files are rendered into normal docker files kolla just
> > invokes the build.
> > 
> > in the case of a source build that inovles pre fetching the source tar
> > from https://tarballs.opendev.org
> > and puting it in the build directory so that it can be included into the
> > contianer.
> > 
> > kolla also used to supprot git repo as a alternitve  source fromat
> > 
> > i have glossed over a lot of the details of how this actully work but that
> > is the essence of what that command is doing
> > creating a build dir, downloading the source, rendering the dockerfile
> > templates to docker files, invokeing docker build on those
> > and then taging them with the contaienr nameand build tag
> > 
> > 
> > https://docs.openstack.org/kolla/latest/admin/image-building.html
> > covers this form a high level
> > 
> > >   because in command output
> > > I've not noticed anything related to the compiling process going on.
> > > 
> > > Here is the output of all images produced by kolla-build command. Do I
> > need
> > > anything else or is this enough to deploy kolla?
> > you can deploy coll with what you have yes although since the kolla files
> > are automaticaly
> > built by ci kolla-ansible can just use the ones form the docker hub or
> > quay instead you do not need to build them yourself
> > 
> > if you do build them your self then there is basically one other stpe that
> > you shoudl take if this si a multi node deployment
> > you should push the iamges to an interally host docker registry although
> > based on the hostname in the prompt below
> > it looks like you ahve alredy done that.
> > > 
> > > root at docker-reg:~# docker images
> > > REPOSITORY                            TAG       IMAGE ID       CREATED
> > >         SIZE
> > > kolla/mariadb-server                  15.1.0    2a497eee8269   26 minutes
> > > ago      595MB
> > > kolla/cron                            15.1.0    342877f26a8a   30 minutes
> > > ago      250MB
> > > kolla/memcached                       15.1.0    0d19a4902644   31 minutes
> > > ago      250MB
> > > kolla/mariadb-clustercheck            15.1.0    d84427d3c639   31 minutes
> > > ago      314MB
> > > kolla/mariadb-base                    15.1.0    34447e3e59b6   31 minutes
> > > ago      314MB
> > > kolla/keepalived                      15.1.0    82133b09fbf0   31 minutes
> > > ago      260MB
> > > kolla/prometheus-memcached-exporter   15.1.0    6c2d605f70ee   31 minutes
> > > ago      262MB
> > > <none>                                <none>    e66b228c2a07   31 minutes
> > > ago      248MB
> > > kolla/rabbitmq                        15.1.0    8de5c39379d3   32 minutes
> > > ago      309MB
> > > kolla/fluentd                         15.1.0    adfd19027862   33 minutes
> > > ago      519MB
> > > kolla/haproxy-ssh                     15.1.0    514357ac4d36   36 minutes
> > > ago      255MB
> > > kolla/haproxy                         15.1.0    e5b9cfdf6dfc   37 minutes
> > > ago      257MB
> > > kolla/prometheus-haproxy-exporter     15.1.0    a679f65fd735   37 minutes
> > > ago      263MB
> > > kolla/prometheus-base                 15.1.0    afeff3ed5dce   37 minutes
> > > ago      248MB
> > > kolla/glance-api                      15.1.0    a2241f68f23a   38 minutes
> > > ago      1.04GB
> > > kolla/glance-base                     15.1.0    7286772a03a4   About an
> > > hour ago   1.03GB
> > > kolla/neutron-infoblox-ipam-agent     15.1.0    f90ffc1a3326   About an
> > > hour ago   1.05GB
> > > kolla/neutron-server                  15.1.0    69c844a2e3a9   About an
> > > hour ago   1.05GB
> > > kolla/neutron-l3-agent                15.1.0    4d87e6963c96   About an
> > > hour ago   1.05GB
> > > <none>                                <none>    486da9a6562e   About an
> > > hour ago   1.05GB
> > > kolla/neutron-linuxbridge-agent       15.1.0    e5b3ca7e099c   About an
> > > hour ago   1.04GB
> > > kolla/neutron-bgp-dragent             15.1.0    ac37377820c6   About an
> > > hour ago   1.04GB
> > > kolla/ironic-neutron-agent            15.1.0    90993adcd74b   About an
> > > hour ago   1.04GB
> > > kolla/neutron-metadata-agent          15.1.0    8522f147f88d   About an
> > > hour ago   1.04GB
> > > kolla/neutron-sriov-agent             15.1.0    8a92ce7d13c0   About an
> > > hour ago   1.04GB
> > > kolla/neutron-dhcp-agent              15.1.0    5c214b0171f5   About an
> > > hour ago   1.04GB
> > > kolla/neutron-metering-agent          15.1.0    7b3b91ecd77b   About an
> > > hour ago   1.04GB
> > > kolla/neutron-openvswitch-agent       15.1.0    1f8807308814   About an
> > > hour ago   1.04GB
> > > kolla/neutron-base                    15.1.0    f85b6a2e2725   About an
> > > hour ago   1.04GB
> > > kolla/nova-libvirt                    15.1.0    0f3ecefe4752   About an
> > > hour ago   987MB
> > > kolla/nova-compute                    15.1.0    241b7e7fafbe   About an
> > > hour ago   1.47GB
> > > kolla/nova-spicehtml5proxy            15.1.0    b740820a7ad1   About an
> > > hour ago   1.15GB
> > > kolla/nova-novncproxy                 15.1.0    1ba2f443d5c3   About an
> > > hour ago   1.22GB
> > > kolla/nova-compute-ironic             15.1.0    716612107532   About an
> > > hour ago   1.12GB
> > > kolla/nova-ssh                        15.1.0    ae2397f4e1c1   About an
> > > hour ago   1.11GB
> > > kolla/nova-api                        15.1.0    2aef02667ff8   About an
> > > hour ago   1.11GB
> > > kolla/nova-conductor                  15.1.0    6f1da3400901   About an
> > > hour ago   1.11GB
> > > kolla/nova-scheduler                  15.1.0    628326776b1d   About an
> > > hour ago   1.11GB
> > > kolla/nova-serialproxy                15.1.0    28eb7a4a13f8   About an
> > > hour ago   1.11GB
> > > kolla/nova-base                       15.1.0    e47420013283   About an
> > > hour ago   1.11GB
> > > kolla/keystone                        15.1.0    e5530d829d5f   2 hours
> > ago
> > >         947MB
> > > kolla/keystone-ssh                    15.1.0    eaa7e3f3985a   2 hours
> > ago
> > >         953MB
> > > kolla/keystone-fernet                 15.1.0    8a4fa24853a8   2 hours
> > ago
> > >         951MB
> > > kolla/keystone-base                   15.1.0    b6f9562364a9   2 hours
> > ago
> > >         945MB
> > > kolla/barbican-base                   15.1.0    b2fdef1afb44   2 hours
> > ago
> > >         915MB
> > > kolla/barbican-keystone-listener      15.1.0    58bd59de2c63   2 hours
> > ago
> > >         915MB
> > > kolla/openstack-base                  15.1.0    c805b4b3b1c1   2 hours
> > ago
> > >         893MB
> > > kolla/base                            15.1.0    f68e9ef3dd30   2 hours
> > ago
> > >         248MB
> > > registry                              2         8db46f9d7550   19 hours
> > ago
> > >        24.2MB
> > > ubuntu                                22.04     08d22c0ceb15   3 weeks
> > ago
> > >         77.8MB
> > 
> > 


From fungi at yuggoth.org  Fri Mar 31 18:11:31 2023
From: fungi at yuggoth.org (Jeremy Stanley)
Date: Fri, 31 Mar 2023 18:11:31 +0000
Subject: [tripleo] Removal of TripleO Master Integration and Component
 Lines
In-Reply-To: <CANnyimP_f94dEL79Uvai-fvYC1DYKoGFsbNw4HZ4pp3oPLj8gQ@mail.gmail.com>
References: <CANnyimP_f94dEL79Uvai-fvYC1DYKoGFsbNw4HZ4pp3oPLj8gQ@mail.gmail.com>
Message-ID: <20230331181131.jbdeq3nrylsrag4s@yuggoth.org>

On 2023-03-31 11:59:27 -0400 (-0400), Ronelle Landy wrote:
[...]
> Check/gate testing for the master branch is in process of being removed as
> well.
> 
> [1] https://review.opendev.org/c/openstack/governance/+/878799

I notice that the tripleo-ci repository only has a master branch.
Will its contents going away cause problems with upstream testing
for stable branches of other TripleO deliverables? Are there other
single-branch TripleO deliverable repositories for which removal of
content would impact stable branches?
-- 
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230331/c072f6c0/attachment-0001.sig>