Hi, Yes the MTU is the same as the default '1500'. Generally I haven't seen any packet loss, but never checked when launching the instance. I will check that and come back. But everytime i launch an instance the instance gets stuck at spawning state and there the hypervisor becomes down, so not sure if packet loss causes this. With regards, Swogat pradhan On Wed, Mar 1, 2023 at 3:30 PM Eugen Block <eblock@nde.ag> wrote:
One more thing coming to mind is MTU size. Are they identical between central and edge site? Do you see packet loss through the tunnel?
Zitat von Swogat Pradhan <swogatpradhan22@gmail.com>:
Hi Eugen, Request you to please add my email either on 'to' or 'cc' as i am not getting email's from you. Coming to the issue:
[root@overcloud-controller-no-ceph-3 /]# rabbitmqctl list_policies -p / Listing policies for vhost "/" ... vhost name pattern apply-to definition priority / ha-all ^(?!amq\.).* queues {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"} 0
I have the edge site compute nodes up, it only goes down when i am trying to launch an instance and the instance comes to a spawning state and then gets stuck.
I have a tunnel setup between the central and the edge sites.
With regards, Swogat Pradhan
On Tue, Feb 28, 2023 at 9:11 PM Swogat Pradhan < swogatpradhan22@gmail.com> wrote:
Hi Eugen, For some reason i am not getting your email to me directly, i am checking the email digest and there i am able to find your reply. Here is the log for download: https://we.tl/t-L8FEkGZFSq Yes, these logs are from the time when the issue occurred.
*Note: i am able to create vm's and perform other activities in the central site, only facing this issue in the edge site.*
With regards, Swogat Pradhan
On Mon, Feb 27, 2023 at 5:12 PM Swogat Pradhan < swogatpradhan22@gmail.com> wrote:
Hi Eugen, Thanks for your response. I have actually a 4 controller setup so here are the details:
*PCS Status:* * Container bundle set: rabbitmq-bundle [ 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]: * rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-no-ceph-3 * rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-2 * rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-1 * rabbitmq-bundle-3 (ocf::heartbeat:rabbitmq-cluster): Started overcloud-controller-0
I have tried restarting the bundle multiple times but the issue is still present.
*Cluster status:* [root@overcloud-controller-0 /]# rabbitmqctl cluster_status Cluster status of node rabbit@overcloud-controller-0.internalapi.bdxworld.com ... Basics
Cluster name: rabbit@overcloud-controller-no-ceph-3.bdxworld.com
Disk Nodes
rabbit@overcloud-controller-0.internalapi.bdxworld.com rabbit@overcloud-controller-1.internalapi.bdxworld.com rabbit@overcloud-controller-2.internalapi.bdxworld.com rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com
Running Nodes
rabbit@overcloud-controller-0.internalapi.bdxworld.com rabbit@overcloud-controller-1.internalapi.bdxworld.com rabbit@overcloud-controller-2.internalapi.bdxworld.com rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com
Versions
rabbit@overcloud-controller-0.internalapi.bdxworld.com: RabbitMQ 3.8.3 on Erlang 22.3.4.1 rabbit@overcloud-controller-1.internalapi.bdxworld.com: RabbitMQ 3.8.3 on Erlang 22.3.4.1 rabbit@overcloud-controller-2.internalapi.bdxworld.com: RabbitMQ 3.8.3 on Erlang 22.3.4.1 rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com: RabbitMQ 3.8.3 on Erlang 22.3.4.1
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@overcloud-controller-0.internalapi.bdxworld.com, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@overcloud-controller-0.internalapi.bdxworld.com, interface: 172.25.201.212, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit@overcloud-controller-0.internalapi.bdxworld.com, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@overcloud-controller-1.internalapi.bdxworld.com, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@overcloud-controller-1.internalapi.bdxworld.com, interface: 172.25.201.205, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit@overcloud-controller-1.internalapi.bdxworld.com, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@overcloud-controller-2.internalapi.bdxworld.com, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@overcloud-controller-2.internalapi.bdxworld.com, interface: 172.25.201.201, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit@overcloud-controller-2.internalapi.bdxworld.com, interface: [::], port: 15672, protocol: http, purpose: HTTP API Node: rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Node: rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com, interface: 172.25.201.209, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Node: rabbit@overcloud-controller-no-ceph-3.internalapi.bdxworld.com, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Feature flags
Flag: drop_unroutable_metric, state: enabled Flag: empty_basic_get_metric, state: enabled Flag: implicit_default_bindings, state: enabled Flag: quorum_queue, state: enabled Flag: virtual_host_metadata, state: enabled
*Logs:* *(Attached)*
With regards, Swogat Pradhan
On Sun, Feb 26, 2023 at 2:34 PM Swogat Pradhan < swogatpradhan22@gmail.com> wrote:
Hi, Please find the nova conductor as well as nova api log.
nova-conuctor:
2023-02-26 08:45:01.108 31 WARNING oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to 16152921c1eb45c2b1f562087140168b 2023-02-26 08:45:02.144 26 WARNING oslo_messaging._drivers.amqpdriver [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to 83dbe5f567a940b698acfe986f6194fa 2023-02-26 08:45:02.314 32 WARNING oslo_messaging._drivers.amqpdriver [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] reply_276049ec36a84486a8a406911d9802f4 doesn't exist, drop reply to f3bfd7f65bd542b18d84cea3033abb43: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:45:02.316 32 ERROR oslo_messaging._drivers.amqpdriver [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -] The reply f3bfd7f65bd542b18d84cea3033abb43 failed to send after 60 seconds due to a missing queue (reply_276049ec36a84486a8a406911d9802f4). Abandoning...: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:48:01.282 35 WARNING oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to d4b9180f91a94f9a82c3c9c4b7595566: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:48:01.284 35 ERROR oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply d4b9180f91a94f9a82c3c9c4b7595566 failed to send after 60 seconds due to a missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:49:01.303 33 WARNING oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to 897911a234a445d8a0d8af02ece40f6f: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:49:01.304 33 ERROR oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply 897911a234a445d8a0d8af02ece40f6f failed to send after 60 seconds due to a missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with backend dogpile.cache.null. 2023-02-26 08:50:01.264 27 WARNING oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] reply_349bcb075f8c49329435a0f884b33066 doesn't exist, drop reply to 8f723ceb10c3472db9a9f324861df2bb: oslo_messaging.exceptions.MessageUndeliverable 2023-02-26 08:50:01.266 27 ERROR oslo_messaging._drivers.amqpdriver [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -] The reply 8f723ceb10c3472db9a9f324861df2bb failed to send after 60 seconds due to a missing queue (reply_349bcb075f8c49329435a0f884b33066). Abandoning...: oslo_messaging.exceptions.MessageUndeliverable
With regards, Swogat Pradhan
On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan < swogatpradhan22@gmail.com> wrote:
Hi, I currently have 3 compute nodes on edge site1 where i am trying to launch vm's. When the VM is in spawning state the node goes down (openstack compute service list), the node comes backup when i restart the nova compute service but then the launch of the vm fails.
nova-compute.log
2023-02-26 08:15:51.808 7 INFO nova.compute.manager [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -] Running instance usage audit for host dcn01-hci-0.bdxworld.com from 2023-02-26 07:00:00 to 2023-02-26 08:00:00. 0 instances. 2023-02-26 08:49:52.813 7 INFO nova.compute.claims [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 0c62c1ef-9010-417d-a05f-4db77e901600] Claim successful on node dcn01-hci-0.bdxworld.com 2023-02-26 08:49:54.225 7 INFO nova.virt.libvirt.driver [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring supplied device name: /dev/vda. Libvirt can't honour user-supplied dev names 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with volume c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Cache enabled with backend dogpile.cache.null. 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Running privsep helper: ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/nova/nova.conf', '--config-file', '/etc/nova/nova-compute.conf', '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpin40tah6/privsep.sock'] 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Spawned new privsep daemon via rootwrap 2023-02-26 08:49:55.717 2647 INFO oslo.privsep.daemon [-] privsep daemon starting 2023-02-26 08:49:55.722 2647 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none 2023-02-26 08:49:55.726 2647 INFO oslo.privsep.daemon [-] privsep daemon running as pid 2647 2023-02-26 08:49:55.956 7 WARNING os_brick.initiator.connectors.nvmeof [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error in _get_host_uuid: Unexpected error while running command. Command: blkid overlay -s UUID -o value Exit code: 2 Stdout: '' Stderr: '': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2023-02-26 08:49:58.247 7 INFO nova.virt.libvirt.driver [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45 b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default default] [instance: 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
Is there a way to solve this issue?
With regards,
Swogat Pradhan