DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo

Swogat Pradhan swogatpradhan22 at gmail.com
Sat Mar 4 18:19:36 UTC 2023


Hi,
Can someone please help me out on this issue?

With regards,
Swogat Pradhan

On Thu, Mar 2, 2023 at 1:24 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi
> I don't see any major packet loss.
> It seems the problem is somewhere in rabbitmq maybe but not due to packet
> loss.
>
> with regards,
> Swogat Pradhan
>
> On Wed, Mar 1, 2023 at 3:34 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> Yes the MTU is the same as the default '1500'.
>> Generally I haven't seen any packet loss, but never checked when
>> launching the instance.
>> I will check that and come back.
>> But everytime i launch an instance the instance gets stuck at spawning
>> state and there the hypervisor becomes down, so not sure if packet loss
>> causes this.
>>
>> With regards,
>> Swogat pradhan
>>
>> On Wed, Mar 1, 2023 at 3:30 PM Eugen Block <eblock at nde.ag> wrote:
>>
>>> One more thing coming to mind is MTU size. Are they identical between
>>> central and edge site? Do you see packet loss through the tunnel?
>>>
>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
>>>
>>> > Hi Eugen,
>>> > Request you to please add my email either on 'to' or 'cc' as i am not
>>> > getting email's from you.
>>> > Coming to the issue:
>>> >
>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p
>>> /
>>> > Listing policies for vhost "/" ...
>>> > vhost   name    pattern apply-to        definition      priority
>>> > /       ha-all  ^(?!amq\.).*    queues
>>> >
>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>> >
>>> > I have the edge site compute nodes up, it only goes down when i am
>>> trying
>>> > to launch an instance and the instance comes to a spawning state and
>>> then
>>> > gets stuck.
>>> >
>>> > I have a tunnel setup between the central and the edge sites.
>>> >
>>> > With regards,
>>> > Swogat Pradhan
>>> >
>>> > On Tue, Feb 28, 2023 at 9:11 PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Eugen,
>>> >> For some reason i am not getting your email to me directly, i am
>>> checking
>>> >> the email digest and there i am able to find your reply.
>>> >> Here is the log for download: https://we.tl/t-L8FEkGZFSq
>>> >> Yes, these logs are from the time when the issue occurred.
>>> >>
>>> >> *Note: i am able to create vm's and perform other activities in the
>>> >> central site, only facing this issue in the edge site.*
>>> >>
>>> >> With regards,
>>> >> Swogat Pradhan
>>> >>
>>> >> On Mon, Feb 27, 2023 at 5:12 PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Hi Eugen,
>>> >>> Thanks for your response.
>>> >>> I have actually a 4 controller setup so here are the details:
>>> >>>
>>> >>> *PCS Status:*
>>> >>>   * Container bundle set: rabbitmq-bundle [
>>> >>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>>> >>>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-no-ceph-3
>>> >>>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-2
>>> >>>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-1
>>> >>>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):
>>>  Started
>>> >>> overcloud-controller-0
>>> >>>
>>> >>> I have tried restarting the bundle multiple times but the issue is
>>> still
>>> >>> present.
>>> >>>
>>> >>> *Cluster status:*
>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
>>> >>> Cluster status of node
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>> >>> Basics
>>> >>>
>>> >>> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>> >>>
>>> >>> Disk Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Running Nodes
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> >>>
>>> >>> Versions
>>> >>>
>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ
>>> 3.8.3
>>> >>> on Erlang 22.3.4.1
>>> >>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>> RabbitMQ
>>> >>> 3.8.3 on Erlang 22.3.4.1
>>> >>>
>>> >>> Alarms
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Network Partitions
>>> >>>
>>> >>> (none)
>>> >>>
>>> >>> Listeners
>>> >>>
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI
>>> tool
>>> >>> communication
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>> interface:
>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 25672, protocol: clustering, purpose:
>>> inter-node and
>>> >>> CLI tool communication
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP
>>> 0-9-1
>>> >>> and AMQP 1.0
>>> >>> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>> ,
>>> >>> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>>> >>>
>>> >>> Feature flags
>>> >>>
>>> >>> Flag: drop_unroutable_metric, state: enabled
>>> >>> Flag: empty_basic_get_metric, state: enabled
>>> >>> Flag: implicit_default_bindings, state: enabled
>>> >>> Flag: quorum_queue, state: enabled
>>> >>> Flag: virtual_host_metadata, state: enabled
>>> >>>
>>> >>> *Logs:*
>>> >>> *(Attached)*
>>> >>>
>>> >>> With regards,
>>> >>> Swogat Pradhan
>>> >>>
>>> >>> On Sun, Feb 26, 2023 at 2:34 PM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>> Please find the nova conductor as well as nova api log.
>>> >>>>
>>> >>>> nova-conuctor:
>>> >>>>
>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 16152921c1eb45c2b1f562087140168b
>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_276049ec36a84486a8a406911d9802f4).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>> backend dogpile.cache.null.
>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>> oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds
>>> due to a
>>> >>>> missing queue (reply_349bcb075f8c49329435a0f884b33066).
>>> Abandoning...:
>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>> >>>>
>>> >>>> With regards,
>>> >>>> Swogat Pradhan
>>> >>>>
>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>> >>>>
>>> >>>>> Hi,
>>> >>>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>> >>>>> launch vm's.
>>> >>>>> When the VM is in spawning state the node goes down (openstack
>>> compute
>>> >>>>> service list), the node comes backup when i restart the nova
>>> compute
>>> >>>>> service but then the launch of the vm fails.
>>> >>>>>
>>> >>>>> nova-compute.log
>>> >>>>>
>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running
>>> >>>>> instance usage
>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00
>>> to
>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>> >>>>> dcn01-hci-0.bdxworld.com
>>> >>>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device
>>> name:
>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled
>>> with
>>> >>>>> backend dogpile.cache.null.
>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running
>>> >>>>> privsep helper:
>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
>>> 'privsep-helper',
>>> >>>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new
>>> privsep
>>> >>>>> daemon via rootwrap
>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon starting
>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with uid/gid: 0/0
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> process running with capabilities (eff/prm/inh):
>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep
>>> >>>>> daemon running as pid 2647
>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>> os_brick.initiator.connectors.nvmeof
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process
>>> >>>>> execution error
>>> >>>>> in _get_host_uuid: Unexpected error while running command.
>>> >>>>> Command: blkid overlay -s UUID -o value
>>> >>>>> Exit code: 2
>>> >>>>> Stdout: ''
>>> >>>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>> >>>>> Unexpected error while running command.
>>> >>>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>> >>>>>
>>> >>>>> Is there a way to solve this issue?
>>> >>>>>
>>> >>>>>
>>> >>>>> With regards,
>>> >>>>>
>>> >>>>> Swogat Pradhan
>>> >>>>>
>>> >>>>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230304/f40e07f9/attachment-0001.htm>


More information about the openstack-discuss mailing list