DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo

Swogat Pradhan swogatpradhan22 at gmail.com
Wed Mar 22 13:54:32 UTC 2023


My glance container is running but is in an unhealthy state.
I don't see any errors in podman logs glance_api or anywhere.

[root at dcn02-compute-0 ~]# podman ps --all | grep glance
03a07452704a
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
                                 9 days ago      Exited (0) 41 minutes ago
                 container-puppet-glance_api
b61e96e9f504
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           /bin/bash -c chow...  9 days ago      Exited (0) 36 minutes ago
                 glance_init_logs
ec1734dfb072
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           /usr/bin/bootstra...  34 minutes ago  Exited (0) 34 minutes ago
                 glance_api_db_sync
a8eb5d18b8d6
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           kolla_start           31 minutes ago  Up 32 minutes ago
(healthy)                glance_api_cron
74a92f45a4a2
172.25.201.68:8787/tripleomaster/openstack-glance-api:current-tripleo
           kolla_start           31 minutes ago  Up 32 minutes ago
(unhealthy)              glance_api

With regards,
Swogat Pradhan

On Wed, Mar 22, 2023 at 7:16 PM John Fulton <johfulto at redhat.com> wrote:

> On Wed, Mar 22, 2023 at 9:42 AM Swogat Pradhan
> <swogatpradhan22 at gmail.com> wrote:
> >
> > Hi Jhon,
> > After some changes i feel like the cinder is now trying to pull the
> image from local glance as i am getting the following error in
> cinder-colume log:
> >
> > 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
> finding address for
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> Unable to establish connection to
> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
> NewConnectionError('<urllib3.connection.HTTPConnection object at
> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
> ECONNREFUSED',))
> >
> > As the endpoint it is trying to reach is the dcn02 IP address.
> >
> > But when i check the ports i don't find the port 9292 running:
> > [root at dcn02-compute-2 ceph]# netstat -nultp
> > Active Internet connections (only servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address
>  State       PID/Program name
> > tcp        0      0 0.0.0.0:2022            0.0.0.0:*
>  LISTEN      656800/sshd
> > tcp        0      0 127.0.0.1:199           0.0.0.0:*
>  LISTEN      4878/snmpd
> > tcp        0      0 172.25.228.253:2379     0.0.0.0:*
>  LISTEN      6232/etcd
> > tcp        0      0 172.25.228.253:2380     0.0.0.0:*
>  LISTEN      6232/etcd
> > tcp        0      0 0.0.0.0:111             0.0.0.0:*
>  LISTEN      1/systemd
> > tcp        0      0 127.0.0.1:6640          0.0.0.0:*
>  LISTEN      2779/ovsdb-server
> > tcp        0      0 0.0.0.0:22              0.0.0.0:*
>  LISTEN      4918/sshd
> > tcp6       0      0 :::2022                 :::*
> LISTEN      656800/sshd
> > tcp6       0      0 :::111                  :::*
> LISTEN      1/systemd
> > tcp6       0      0 :::22                   :::*
> LISTEN      4918/sshd
> > udp        0      0 0.0.0.0:111             0.0.0.0:*
>          1/systemd
> > udp        0      0 0.0.0.0:161             0.0.0.0:*
>          4878/snmpd
> > udp        0      0 127.0.0.1:323           0.0.0.0:*
>          2609/chronyd
> > udp        0      0 0.0.0.0:6081            0.0.0.0:*
>          -
> > udp6       0      0 :::111                  :::*
>         1/systemd
> > udp6       0      0 ::1:161                 :::*
>         4878/snmpd
> > udp6       0      0 ::1:323                 :::*
>         2609/chronyd
> > udp6       0      0 :::6081                 :::*
>         -
> >
> > I see in the glance-api.conf that bind port parameter is set to 9292 but
> the port is not listed in netstat command.
> > Can you please guide me in getting this port up and running as i feel
> like this would solve the issue i am facing right now.
>
> Looks like your glance container stopped running. Ask podman to show
> you all containers (including stopped ones) and investigate why the
> glance container stopped.
>
> >
> > With regards,
> > Swogat Pradhan
> >
> > On Wed, Mar 22, 2023 at 4:55 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>
> >> Update:
> >> Here is the log when creating a volume using cirros image:
> >>
> >> 2023-03-22 11:04:38.449 109 INFO
> cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
> specification: {'status': 'creating', 'volume_name':
> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
> tzinfo=datetime.timezone.utc), 'locations': [{'url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'ceph'}}, {'url':
> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'metadata': {'store': 'dcn02'}}], 'direct_url':
> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
> 'owner_specified.openstack.object': 'images/cirros',
> 'owner_specified.openstack.sha256': ''}}, 'image_service':
> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
> >> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
> >> 2023-03-22 11:07:54.023 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
> >>   category=FutureWarning)
> >>
> >> 2023-03-22 11:11:12.161 109 WARNING py.warnings
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -]
> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
> FutureWarning: The human format is deprecated and the format parameter will
> be removed. Use explicitly json instead in version 'xena'
> >>   category=FutureWarning)
> >>
> >> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
> MB/s
> >> 2023-03-22 11:11:14.998 109 INFO
> cinder.volume.flows.manager.create_volume
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
> >> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
> >>
> >> The image is present in dcn02 store but still it downloaded the image
> in 0.16 MB/s and then created the volume.
> >>
> >> With regards,
> >> Swogat Pradhan
> >>
> >> On Tue, Mar 21, 2023 at 6:10 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>
> >>> Hi Jhon,
> >>> This seems to be an issue.
> >>> When i deployed the dcn ceph in both dcn01 and dcn02 the --cluster
> parameter was specified to the respective cluster names but the config
> files were created in the name of ceph.conf and keyring was
> ceph.client.openstack.keyring.
> >>>
> >>> Which created issues in glance as well as the naming convention of the
> files didn't match the cluster names, so i had to manually rename the
> central ceph conf file as such:
> >>>
> >>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
> >>> [root at dcn02-compute-0 ceph]# ll
> >>> total 16
> >>> -rw-------. 1 root root 257 Mar 13 13:56
> ceph_central.client.openstack.keyring
> >>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
> >>> -rw-------. 1 root root 205 Mar 15 18:45 ceph.client.openstack.keyring
> >>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
> >>> [root at dcn02-compute-0 ceph]#
> >>>
> >>> ceph.conf and ceph.client.openstack.keyring contain the fsid of the
> respective clusters in both dcn01 and dcn02.
> >>> In the above cli output, the ceph.conf and ceph.client... are the
> files used to access dcn02 ceph cluster and ceph_central* files are used in
> for accessing central ceph cluster.
> >>>
> >>> glance multistore config:
> >>> [dcn02]
> >>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
> >>> rbd_store_user=openstack
> >>> rbd_store_pool=images
> >>> rbd_thin_provisioning=False
> >>> store_description=dcn02 rbd glance store
> >>>
> >>> [ceph_central]
> >>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
> >>> rbd_store_user=openstack
> >>> rbd_store_pool=images
> >>> rbd_thin_provisioning=False
> >>> store_description=Default glance store backend.
> >>>
> >>>
> >>> With regards,
> >>> Swogat Pradhan
> >>>
> >>> On Tue, Mar 21, 2023 at 5:52 PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>>
> >>>> On Tue, Mar 21, 2023 at 8:03 AM Swogat Pradhan
> >>>> <swogatpradhan22 at gmail.com> wrote:
> >>>> >
> >>>> > Hi,
> >>>> > Seems like cinder is not using the local ceph.
> >>>>
> >>>> That explains the issue. It's a misconfiguration.
> >>>>
> >>>> I hope this is not a production system since the mailing list now has
> >>>> the cinder.conf which contains passwords.
> >>>>
> >>>> The section that looks like this:
> >>>>
> >>>> [tripleo_ceph]
> >>>> volume_backend_name=tripleo_ceph
> >>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
> >>>> rbd_ceph_conf=/etc/ceph/ceph.conf
> >>>> rbd_user=openstack
> >>>> rbd_pool=volumes
> >>>> rbd_flatten_volume_from_snapshot=False
> >>>> rbd_secret_uuid=<redacted>
> >>>> report_discard_supported=True
> >>>>
> >>>> Should be updated to refer to the local DCN ceph cluster and not the
> >>>> central one. Use the ceph conf file for that cluster and ensure the
> >>>> rbd_secret_uuid corresponds to that one.
> >>>>
> >>>> TripleO’s convention is to set the rbd_secret_uuid to the FSID of the
> >>>> Ceph cluster. The FSID should be in the ceph.conf file. The
> >>>> tripleo_nova_libvirt role will use virsh secret-* commands so that
> >>>> libvirt can retrieve the cephx secret using the FSID as a key. This
> >>>> can be confirmed with `podman exec nova_virtsecretd virsh
> >>>> secret-get-value $FSID`.
> >>>>
> >>>> The documentation describes how to configure the central and DCN sites
> >>>> correctly but an error seems to have occurred while you were following
> >>>> it.
> >>>>
> >>>>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
> >>>>
> >>>>   John
> >>>>
> >>>> >
> >>>> > Ceph Output:
> >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
> >>>> > NAME                                       SIZE     PARENT  FMT
> PROT  LOCK
> >>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB            2
>     excl
> >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB            2
> >>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB            2
> yes
> >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB            2
> >>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB            2
> yes
> >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB            2
> >>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB            2
> yes
> >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB            2
> >>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB            2
> yes
> >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB            2
> >>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB            2
> yes
> >>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB            2
> >>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB            2
> yes
> >>>> >
> >>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
> >>>> > NAME                                         SIZE     PARENT  FMT
> PROT  LOCK
> >>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB            2
> >>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB            2
> >>>> > [ceph: root at dcn02-ceph-all-0 /]#
> >>>> >
> >>>> > Attached the cinder config.
> >>>> > Please let me know how I can solve this issue.
> >>>> >
> >>>> > With regards,
> >>>> > Swogat Pradhan
> >>>> >
> >>>> > On Tue, Mar 21, 2023 at 3:53 PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>> >>
> >>>> >> in my last message under the line "On a DCN site if you run a
> command like this:" I suggested some steps you could try to confirm the
> image is a COW from the local glance as well as how to look at your cinder
> config.
> >>>> >>
> >>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>
> >>>> >>> Update:
> >>>> >>> I uploaded an image directly to the dcn02 store, and it takes
> around 10,15 minutes to create a volume with image in dcn02.
> >>>> >>> The image size is 389 MB.
> >>>> >>>
> >>>> >>> On Mon, Mar 20, 2023 at 10:26 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>
> >>>> >>>> Hi Jhon,
> >>>> >>>> I checked in the ceph od dcn02, I can see the images created
> after importing from the central site.
> >>>> >>>> But launching an instance normally fails as it takes a long time
> for the volume to get created.
> >>>> >>>>
> >>>> >>>> When launching an instance from volume the instance is getting
> created properly without any errors.
> >>>> >>>>
> >>>> >>>> I tried to cache images in nova using
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> but getting checksum failed error.
> >>>> >>>>
> >>>> >>>> With regards,
> >>>> >>>> Swogat Pradhan
> >>>> >>>>
> >>>> >>>> On Thu, Mar 16, 2023 at 5:24 PM John Fulton <johfulto at redhat.com>
> wrote:
> >>>> >>>>>
> >>>> >>>>> On Wed, Mar 15, 2023 at 8:05 PM Swogat Pradhan
> >>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >
> >>>> >>>>> > Update: After restarting the nova services on the controller
> and running the deploy script on the edge site, I was able to launch the VM
> from volume.
> >>>> >>>>> >
> >>>> >>>>> > Right now the instance creation is failing as the block
> device creation is stuck in creating state, it is taking more than 10 mins
> for the volume to be created, whereas the image has already been imported
> to the edge glance.
> >>>> >>>>>
> >>>> >>>>> Try following this document and making the same observations in
> your
> >>>> >>>>> environment for AZs and their local ceph cluster.
> >>>> >>>>>
> >>>> >>>>>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
> >>>> >>>>>
> >>>> >>>>> On a DCN site if you run a command like this:
> >>>> >>>>>
> >>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf --keyring
> >>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
> >>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
> >>>> >>>>> NAME                                      SIZE  PARENT
> >>>> >>>>>                           FMT PROT LOCK
> >>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
> >>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2      excl
> >>>> >>>>> $
> >>>> >>>>>
> >>>> >>>>> Then, you should see the parent of the volume is the image
> which is on
> >>>> >>>>> the same local ceph cluster.
> >>>> >>>>>
> >>>> >>>>> I wonder if something is misconfigured and thus you're
> encountering
> >>>> >>>>> the streaming behavior described here:
> >>>> >>>>>
> >>>> >>>>> Ideally all images should reside in the central Glance and be
> copied
> >>>> >>>>> to DCN sites before instances of those images are booted on DCN
> sites.
> >>>> >>>>> If an image is not copied to a DCN site before it is booted,
> then the
> >>>> >>>>> image will be streamed to the DCN site and then the image will
> boot as
> >>>> >>>>> an instance. This happens because Glance at the DCN site has
> access to
> >>>> >>>>> the images store at the Central ceph cluster. Though the
> booting of
> >>>> >>>>> the image will take time because it has not been copied in
> advance,
> >>>> >>>>> this is still preferable to failing to boot the image.
> >>>> >>>>>
> >>>> >>>>> You can also exec into the cinder container at the DCN site and
> >>>> >>>>> confirm it's using it's local ceph cluster.
> >>>> >>>>>
> >>>> >>>>>   John
> >>>> >>>>>
> >>>> >>>>> >
> >>>> >>>>> > I will try and create a new fresh image and test again then
> update.
> >>>> >>>>> >
> >>>> >>>>> > With regards,
> >>>> >>>>> > Swogat Pradhan
> >>>> >>>>> >
> >>>> >>>>> > On Wed, Mar 15, 2023 at 11:13 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>
> >>>> >>>>> >> Update:
> >>>> >>>>> >> In the hypervisor list the compute node state is showing
> down.
> >>>> >>>>> >>
> >>>> >>>>> >>
> >>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>>
> >>>> >>>>> >>> Hi Brendan,
> >>>> >>>>> >>> Now i have deployed another site where i have used 2 linux
> bonds network template for both 3 compute nodes and 3 ceph nodes.
> >>>> >>>>> >>> The bonding options is set to mode=802.3ad (lacp=active).
> >>>> >>>>> >>> I used a cirros image to launch instance but the instance
> timed out so i waited for the volume to be created.
> >>>> >>>>> >>> Once the volume was created i tried launching the instance
> from the volume and still the instance is stuck in spawning state.
> >>>> >>>>> >>>
> >>>> >>>>> >>> Here is the nova-compute log:
> >>>> >>>>> >>>
> >>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO oslo.privsep.daemon [-]
> privsep daemon starting
> >>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO oslo.privsep.daemon [-]
> privsep process running with uid/gid: 0/0
> >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep process running with capabilities (eff/prm/inh):
> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO oslo.privsep.daemon [-]
> privsep daemon running as pid 185437
> >>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
> os_brick.initiator.connectors.nvmeof
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
> in _get_host_uuid: Unexpected error while running command.
> >>>> >>>>> >>> Command: blkid overlay -s UUID -o value
> >>>> >>>>> >>> Exit code: 2
> >>>> >>>>> >>> Stdout: ''
> >>>> >>>>> >>> Stderr: '':
> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
> running command.
> >>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO nova.virt.libvirt.driver
> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
> 4160ce999a31485fa643aed0936dfef0 - default default] [instance:
> 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
> >>>> >>>>> >>>
> >>>> >>>>> >>> It is stuck in creating image, do i need to run the
> template mentioned here ?:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
> >>>> >>>>> >>>
> >>>> >>>>> >>> The volume is already created and i do not understand why
> the instance is stuck in spawning state.
> >>>> >>>>> >>>
> >>>> >>>>> >>> With regards,
> >>>> >>>>> >>> Swogat Pradhan
> >>>> >>>>> >>>
> >>>> >>>>> >>>
> >>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02 PM Brendan Shephard <
> bshephar at redhat.com> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Does your environment use different network interfaces for
> each of the networks? Or does it have a bond with everything on it?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> One issue I have seen before is that when launching
> instances, there is a lot of network traffic between nodes as the
> hypervisor needs to download the image from Glance. Along with various
> other services sending normal network traffic, it can be enough to cause
> issues if everything is running over a single 1Gbe interface.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I have seen the same situation in fact when using a single
> active/backup bond on 1Gbe nics. It’s worth checking the network traffic
> while you try to spawn the instance to see if you’re dropping packets. In
> the situation I described, there were dropped packets which resulted in a
> loss of communication between nova_compute and RMQ, so the node appeared
> offline. You should also confirm that nova_compute is being disconnected in
> the nova_compute logs if you tail them on the Hypervisor while spawning the
> instance.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> In my case, changing from active/backup to LACP helped.
> So, based on that experience, from my perspective, is certainly sounds like
> some kind of network issue.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Regards,
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Brendan Shephard
> >>>> >>>>> >>>> Senior Software Engineer
> >>>> >>>>> >>>> Red Hat Australia
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <eblock at nde.ag>
> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I tried to help someone with a similar issue some time ago
> in this thread:
> >>>> >>>>> >>>>
> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for that
> user, not sure if that could apply here. But is it possible that your nova
> and neutron versions are different between central and edge site? Have you
> restarted nova and neutron services on the compute nodes after
> installation? Have you debug logs of nova-conductor and maybe nova-compute?
> Maybe they can help narrow down the issue.
> >>>> >>>>> >>>> If there isn't any additional information in the debug
> logs I probably would start "tearing down" rabbitmq. I didn't have to do
> that in a production system yet so be careful. I can think of two routes:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit is
> running, this will most likely impact client IO depending on your load.
> Check out the rabbitmqctl commands.
> >>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia tables
> from all nodes and restart rabbitmq so the exchanges, queues etc. rebuild.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> I can imagine that the failed reply "survives" while being
> replicated across the rabbit nodes. But I don't really know the rabbit
> internals too well, so maybe someone else can chime in here and give a
> better advice.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Regards,
> >>>> >>>>> >>>> Eugen
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>> Can someone please help me out on this issue?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> With regards,
> >>>> >>>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi
> >>>> >>>>> >>>> I don't see any major packet loss.
> >>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe but
> not due to packet
> >>>> >>>>> >>>> loss.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> with regards,
> >>>> >>>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34 PM Swogat Pradhan <
> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Hi,
> >>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
> >>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
> checked when
> >>>> >>>>> >>>> launching the instance.
> >>>> >>>>> >>>> I will check that and come back.
> >>>> >>>>> >>>> But everytime i launch an instance the instance gets stuck
> at spawning
> >>>> >>>>> >>>> state and there the hypervisor becomes down, so not sure
> if packet loss
> >>>> >>>>> >>>> causes this.
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> With regards,
> >>>> >>>>> >>>> Swogat pradhan
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30 PM Eugen Block <eblock at nde.ag>
> wrote:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
> identical between
> >>>> >>>>> >>>> central and edge site? Do you see packet loss through the
> tunnel?
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com>:
> >>>> >>>>> >>>>
> >>>> >>>>> >>>> > Hi Eugen,
> >>>> >>>>> >>>> > Request you to please add my email either on 'to' or
> 'cc' as i am not
> >>>> >>>>> >>>> > getting email's from you.
> >>>> >>>>> >>>> > Coming to the issue:
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]# rabbitmqctl
> list_policies -p
> >>>> >>>>> >>>> /
> >>>> >>>>> >>>> > Listing policies for vhost "/" ...
> >>>> >>>>> >>>> > vhost   name    pattern apply-to        definition
> priority
> >>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>>
> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > I have the edge site compute nodes up, it only goes down
> when i am
> >>>> >>>>> >>>> trying
> >>>> >>>>> >>>> > to launch an instance and the instance comes to a
> spawning state and
> >>>> >>>>> >>>> then
> >>>> >>>>> >>>> > gets stuck.
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > I have a tunnel setup between the central and the edge
> sites.
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > With regards,
> >>>> >>>>> >>>> > Swogat Pradhan
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11 PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> > wrote:
> >>>> >>>>> >>>> >
> >>>> >>>>> >>>> >> Hi Eugen,
> >>>> >>>>> >>>> >> For some reason i am not getting your email to me
> directly, i am
> >>>> >>>>> >>>> checking
> >>>> >>>>> >>>> >> the email digest and there i am able to find your reply.
> >>>> >>>>> >>>> >> Here is the log for download:
> https://we.tl/t-L8FEkGZFSq
> >>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
> occurred.
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
> activities in the
> >>>> >>>>> >>>> >> central site, only facing this issue in the edge site.*
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> With regards,
> >>>> >>>>> >>>> >> Swogat Pradhan
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12 PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> >> wrote:
> >>>> >>>>> >>>> >>
> >>>> >>>>> >>>> >>> Hi Eugen,
> >>>> >>>>> >>>> >>> Thanks for your response.
> >>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are the
> details:
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *PCS Status:*
> >>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
> >>>> >>>>> >>>> >>>
> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest]:
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-2
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-1
> >>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
> (ocf::heartbeat:rabbitmq-cluster):
> >>>> >>>>> >>>> Started
> >>>> >>>>> >>>> >>> overcloud-controller-0
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> I have tried restarting the bundle multiple times but
> the issue is
> >>>> >>>>> >>>> still
> >>>> >>>>> >>>> >>> present.
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *Cluster status:*
> >>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
> cluster_status
> >>>> >>>>> >>>> >>> Cluster status of node
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> ...
> >>>> >>>>> >>>> >>> Basics
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Cluster name:
> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Disk Nodes
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Running Nodes
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Versions
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
> RabbitMQ
> >>>> >>>>> >>>> 3.8.3
> >>>> >>>>> >>>> >>> on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>>
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
> >>>> >>>>> >>>> RabbitMQ
> >>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Alarms
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> (none)
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Network Partitions
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> (none)
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Listeners
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering, purpose:
> inter-node and CLI
> >>>> >>>>> >>>> tool
> >>>> >>>>> >>>> >>> communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp, purpose:
> AMQP 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
> >>>> >>>>> >>>> interface:
> >>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP API
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol: clustering,
> purpose:
> >>>> >>>>> >>>> inter-node and
> >>>> >>>>> >>>> >>> CLI tool communication
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol: amqp,
> purpose: AMQP
> >>>> >>>>> >>>> 0-9-1
> >>>> >>>>> >>>> >>> and AMQP 1.0
> >>>> >>>>> >>>> >>> Node:
> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
> >>>> >>>>> >>>> ,
> >>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http, purpose:
> HTTP API
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Feature flags
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
> >>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
> >>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
> >>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
> >>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> *Logs:*
> >>>> >>>>> >>>> >>> *(Attached)*
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> With regards,
> >>>> >>>>> >>>> >>> Swogat Pradhan
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34 PM Swogat Pradhan <
> >>>> >>>>> >>>> swogatpradhan22 at gmail.com>
> >>>> >>>>> >>>> >>> wrote:
> >>>> >>>>> >>>> >>>
> >>>> >>>>> >>>> >>>> Hi,
> >>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova api
> log.
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> nova-conuctor:
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> >>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_276049ec36a84486a8a406911d9802f4).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING nova.cache_utils
> >>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Cache enabled
> >>>> >>>>> >>>> with
> >>>> >>>>> >>>> >>>> backend dogpile.cache.null.
> >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
> >>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> >>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't exist,
> drop reply to
> >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
> oslo_messaging._drivers.amqpdriver
> >>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - - - -]
> The reply
> >>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send after
> 60 seconds
> >>>> >>>>> >>>> due to a
> >>>> >>>>> >>>> >>>> missing queue
> (reply_349bcb075f8c49329435a0f884b33066).
> >>>> >>>>> >>>> Abandoning...:
> >>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> With regards,
> >>>> >>>>> >>>> >>>> Swogat Pradhan
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
> >>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>> >>>>> Hi,
> >>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1 where
> i am trying to
> >>>> >>>>> >>>> >>>>> launch vm's.
> >>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes down
> (openstack
> >>>> >>>>> >>>> compute
> >>>> >>>>> >>>> >>>>> service list), the node comes backup when i restart
> the nova
> >>>> >>>>> >>>> compute
> >>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> nova-compute.log
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO nova.compute.manager
> >>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - - - - -]
> Running
> >>>> >>>>> >>>> >>>>> instance usage
> >>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
> 2023-02-26 07:00:00
> >>>> >>>>> >>>> to
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO nova.compute.claims
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
> successful on node
> >>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
> nova.virt.libvirt.driver
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
> supplied device
> >>>> >>>>> >>>> name:
> >>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied dev
> names
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO nova.virt.block_device
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting with
> volume
> >>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at /dev/vda
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING nova.cache_utils
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Cache enabled
> >>>> >>>>> >>>> with
> >>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO oslo.privsep.daemon
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Running
> >>>> >>>>> >>>> >>>>> privsep helper:
> >>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf',
> >>>> >>>>> >>>> 'privsep-helper',
> >>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
> '--config-file',
> >>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf', '--privsep_context',
> >>>> >>>>> >>>> >>>>> 'os_brick.privileged.default', '--privsep_sock_path',
> >>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO oslo.privsep.daemon
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Spawned new
> >>>> >>>>> >>>> privsep
> >>>> >>>>> >>>> >>>>> daemon via rootwrap
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> daemon starting
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> process running with capabilities (eff/prm/inh):
> >>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
> oslo.privsep.daemon [-] privsep
> >>>> >>>>> >>>> >>>>> daemon running as pid 2647
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
> >>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> Process
> >>>> >>>>> >>>> >>>>> execution error
> >>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while running
> command.
> >>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
> >>>> >>>>> >>>> >>>>> Exit code: 2
> >>>> >>>>> >>>> >>>>> Stdout: ''
> >>>> >>>>> >>>> >>>>> Stderr: '':
> oslo_concurrency.processutils.ProcessExecutionError:
> >>>> >>>>> >>>> >>>>> Unexpected error while running command.
> >>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
> nova.virt.libvirt.driver
> >>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
> >>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
> >>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default default]
> [instance:
> >>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating image
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> With regards,
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>> Swogat Pradhan
> >>>> >>>>> >>>> >>>>>
> >>>> >>>>> >>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>> >>>>
> >>>> >>>>>
> >>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/2ca7d2a6/attachment-0001.htm>


More information about the openstack-discuss mailing list