DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo

Eugen Block eblock at nde.ag
Tue Feb 28 09:17:43 UTC 2023


The logs are not attached, actually. Can you use something like  
https://paste.openstack.org/ or pastebin to upload your logs and then  
paste the link here? Since they can be quite verbose please make sure  
that you only upload logs from the time of the failure.

Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:

> Hi Eugen,
> Thanks for your response.
> I have actually a 4 controller setup so here are the details:
>
> *PCS Status:*
>   * Container bundle set: rabbitmq-bundle [
> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
>     * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started
> overcloud-controller-no-ceph-3
>     * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):       Started
> overcloud-controller-2
>     * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):       Started
> overcloud-controller-1
>     * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster):       Started
> overcloud-controller-0
>
> I have tried restarting the bundle multiple times but the issue is still
> present.
>
> *Cluster status:*
> [root at overcloud-controller-0 /]# rabbitmqctl cluster_status
> Cluster status of node
> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
> Basics
>
> Cluster name: rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>
> Disk Nodes
>
> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>
> Running Nodes
>
> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>
> Versions
>
> rabbit at overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
> Erlang 22.3.4.1
> rabbit at overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
> Erlang 22.3.4.1
> rabbit at overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 on
> Erlang 22.3.4.1
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ
> 3.8.3 on Erlang 22.3.4.1
>
> Alarms
>
> (none)
>
> Network Partitions
>
> (none)
>
> Listeners
>
> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
> communication
> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
> 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
> Node: rabbit at overcloud-controller-0.internalapi.bdxworld.com, interface:
> [::], port: 15672, protocol: http, purpose: HTTP API
> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
> communication
> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
> 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
> Node: rabbit at overcloud-controller-1.internalapi.bdxworld.com, interface:
> [::], port: 15672, protocol: http, purpose: HTTP API
> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
> [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool
> communication
> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
> 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
> Node: rabbit at overcloud-controller-2.internalapi.bdxworld.com, interface:
> [::], port: 15672, protocol: http, purpose: HTTP API
> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> interface: [::], port: 25672, protocol: clustering, purpose: inter-node and
> CLI tool communication
> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1
> and AMQP 1.0
> Node: rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com,
> interface: [::], port: 15672, protocol: http, purpose: HTTP API
>
> Feature flags
>
> Flag: drop_unroutable_metric, state: enabled
> Flag: empty_basic_get_metric, state: enabled
> Flag: implicit_default_bindings, state: enabled
> Flag: quorum_queue, state: enabled
> Flag: virtual_host_metadata, state: enabled
>
> *Logs:*
> *(Attached)*
>
> With regards,
> Swogat Pradhan
>
> On Sun, Feb 26, 2023 at 2:34 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi,
>> Please find the nova conductor as well as nova api log.
>>
>> nova-conuctor:
>>
>> 2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> 16152921c1eb45c2b1f562087140168b
>> 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver
>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> 83dbe5f567a940b698acfe986f6194fa
>> 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver
>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to
>> f3bfd7f65bd542b18d84cea3033abb43:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver
>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply
>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a
>> missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> d4b9180f91a94f9a82c3c9c4b7595566:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a
>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> 897911a234a445d8a0d8af02ece40f6f:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a
>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>> backend dogpile.cache.null.
>> 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to
>> 8f723ceb10c3472db9a9f324861df2bb:
>> oslo_messaging.exceptions.MessageUndeliverable
>> 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver
>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply
>> 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a
>> missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...:
>> oslo_messaging.exceptions.MessageUndeliverable
>>
>> With regards,
>> Swogat Pradhan
>>
>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
>> wrote:
>>
>>> Hi,
>>> I currently have 3 compute nodes on edge site1 where i am trying to
>>> launch vm's.
>>> When the VM is in spawning state the node goes down (openstack compute
>>> service list), the node comes backup when i restart the nova compute
>>> service but then the launch of the vm fails.
>>>
>>> nova-compute.log
>>>
>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running instance usage
>>> audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to
>>> 2023-02-26 08:00:00. 0 instances.
>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node
>>> dcn01-hci-0.bdxworld.com
>>> 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name:
>>> /dev/vda. Libvirt can't honour user-supplied dev names
>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume
>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with
>>> backend dogpile.cache.null.
>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Running privsep helper:
>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper',
>>> '--config-file', '/etc/nova/nova.conf', '--config-file',
>>> '/etc/nova/nova-compute.conf', '--privsep_context',
>>> 'os_brick.privileged.default', '--privsep_sock_path',
>>> '/tmp/tmpin40tah6/privsep.sock']
>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep
>>> daemon via rootwrap
>>> 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep daemon
>>> starting
>>> 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep process
>>> running with uid/gid: 0/0
>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep process
>>> running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>> 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep daemon
>>> running as pid 2647
>>> 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>> in _get_host_uuid: Unexpected error while running command.
>>> Command: blkid overlay -s UUID -o value
>>> Exit code: 2
>>> Stdout: ''
>>> Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
>>> Unexpected error while running command.
>>> 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver
>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db
>>> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
>>>
>>> Is there a way to solve this issue?
>>>
>>>
>>> With regards,
>>>
>>> Swogat Pradhan
>>>
>>






More information about the openstack-discuss mailing list