Open Stack

Thu Mar 23 17:05:30 UTC 2023

On Thu, Mar 23, 2023 at 9:01 AM Swogat Pradhan <swogatpradhan22 at gmail.com>
wrote:

> Hi,
> Can someone please help me identify the issue here?
> Latest cinder-volume logs from dcn02:
> (ATTACHED)
>

It's really not possible to analyze what's happening with just one or two
log entries. Do you have
debug logs enabled? One thing I noticed is the glance image's disk_format
is qcow2. You should
use "raw" images with ceph RBD.

Alan

>
> The volume is stuck in creating state.
>
> With regards,
> Swogat Pradhan
>
> On Thu, Mar 23, 2023 at 6:12 PM Swogat Pradhan <swogatpradhan22 at gmail.com>
> wrote:
>
>> Hi Jhon,
>> Thank you for clarifying that.
>> Right now the cinder volume is stuck in *creating *state when adding
>> image as volume source.
>> But when creating an empty volume the volumes are getting created
>> successfully without any errors.
>>
>> We are getting volume creation request in cinder-volume.log as such:
>> 2023-03-23 12:34:40.152 108 INFO
>> cinder.volume.flows.manager.create_volume
>> [req-18556796-a61c-4097-8fa8-b136ce9814f7 b240e3e89d99489284cd731e75f2a5db
>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>> 872a2ae6-c75b-4fc0-8172-17a29d07a66c: being created as image with
>> specification: {'status': 'creating', 'volume_name':
>> 'volume-872a2ae6-c75b-4fc0-8172-17a29d07a66c', 'volume_size': 1,
>> 'image_id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'image_location':
>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>> 'id': '131ed4e0-0474-45be-b74a-43b599a7d6c5', 'created_at':
>> datetime.datetime(2023, 3, 23, 11, 41, 51, tzinfo=datetime.timezone.utc),
>> 'updated_at': datetime.datetime(2023, 3, 23, 11, 46, 37,
>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'ceph'}}, {'url':
>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/snap',
>> 'tags': [], 'file': '/v2/images/131ed4e0-0474-45be-b74a-43b599a7d6c5/file',
>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>> 'owner_specified.openstack.object': 'images/cirros',
>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>> <cinder.image.glance.GlanceImageService object at 0x7f98d869ed68>}
>>
>> But there is nothing else after that and the volume doesn't even timeout,
>> it just gets stuck in creating state.
>> Can you advise what might be the issue here?
>> All the containers are in a healthy state now.
>>
>> With regards,
>> Swogat Pradhan
>>
>>
>> On Thu, Mar 23, 2023 at 6:06 PM Alan Bishop <abishop at redhat.com> wrote:
>>
>>>
>>>
>>> On Thu, Mar 23, 2023 at 5:20 AM Swogat Pradhan <
>>> swogatpradhan22 at gmail.com> wrote:
>>>
>>>> Hi,
>>>> Is this bind not required for cinder_scheduler container?
>>>>
>>>> "/var/lib/tripleo-config/ceph:/var/lib/kolla/config_files/src-ceph:ro,rprivate,rbind",
>>>> I do not see this particular bind on the cinder scheduler containers on
>>>> my controller nodes.
>>>>
>>>
>>> That is correct, because the scheduler does not access the ceph cluster.
>>>
>>> Alan
>>>
>>>
>>>> With regards,
>>>> Swogat Pradhan
>>>>
>>>> On Thu, Mar 23, 2023 at 2:46 AM Swogat Pradhan <
>>>> swogatpradhan22 at gmail.com> wrote:
>>>>
>>>>> Cinder volume config:
>>>>>
>>>>> [tripleo_ceph]
>>>>> volume_backend_name=tripleo_ceph
>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>> rbd_user=openstack
>>>>> rbd_pool=volumes
>>>>> rbd_flatten_volume_from_snapshot=False
>>>>> rbd_secret_uuid=a8d5f1f5-48e7-5ede-89ab-8aca59b6397b
>>>>> report_discard_supported=True
>>>>> rbd_ceph_conf=/etc/ceph/dcn02.conf
>>>>> rbd_cluster_name=dcn02
>>>>>
>>>>> Glance api config:
>>>>>
>>>>> [dcn02]
>>>>> rbd_store_ceph_conf=/etc/ceph/dcn02.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=dcn02 rbd glance store
>>>>> [ceph]
>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>> rbd_store_user=openstack
>>>>> rbd_store_pool=images
>>>>> rbd_thin_provisioning=False
>>>>> store_description=Default glance store backend.
>>>>>
>>>>> On Thu, Mar 23, 2023 at 2:29 AM Swogat Pradhan <
>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>
>>>>>> I still have the same issue, I'm not sure what's left to try.
>>>>>> All the pods are now in a healthy state, I am getting log entries 3
>>>>>> mins after I hit the create volume button in cinder-volume when I try to
>>>>>> create a volume with an image.
>>>>>> And the volumes are just stuck in creating state for more than 20
>>>>>> mins now.
>>>>>>
>>>>>> Cinder logs:
>>>>>> 2023-03-22 20:32:44.010 108 INFO cinder.rpc
>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Automatically selected
>>>>>> cinder-volume RPC version 3.17 as minimum service version.
>>>>>> 2023-03-22 20:34:59.166 108 INFO
>>>>>> cinder.volume.flows.manager.create_volume
>>>>>> [req-0d2093a0-efbd-45a5-bd7d-cce25ddc200e b240e3e89d99489284cd731e75f2a5db
>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>> 5743a879-090d-46db-bc7c-1c0b0669a112: being created as image with
>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>> 'volume-5743a879-090d-46db-bc7c-1c0b0669a112', 'volume_size': 2,
>>>>>> 'image_id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'image_location':
>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>> 'id': 'acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b', 'created_at':
>>>>>> datetime.datetime(2023, 3, 22, 18, 50, 5, tzinfo=datetime.timezone.utc),
>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 20, 3, 54,
>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/snap',
>>>>>> 'tags': [], 'file': '/v2/images/acfd0a14-69e0-44d6-a6a1-aa9dc83e9d5b/file',
>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f8147973438>}
>>>>>>
>>>>>> With regards,
>>>>>> Swogat Pradhan
>>>>>>
>>>>>> On Wed, Mar 22, 2023 at 9:19 PM Alan Bishop <abishop at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 22, 2023 at 8:38 AM Swogat Pradhan <
>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Adam,
>>>>>>>> The systems are in same LAN, in this case it seemed like the image
>>>>>>>> was getting pulled from the central site which was caused due to an
>>>>>>>> misconfiguration in ceph.conf file in /var/lib/tripleo-config/ceph/
>>>>>>>> directory, which seems to have been resolved after the changes i made to
>>>>>>>> fix it.
>>>>>>>>
>>>>>>>> Right now the glance api podman is running in unhealthy state and
>>>>>>>> the podman logs don't show any error whatsoever and when issued the command
>>>>>>>> netstat -nultp i do not see any entry for glance port i.e. 9292 in the dcn
>>>>>>>> site, which is why cinder is throwing an error stating:
>>>>>>>>
>>>>>>>> 2023-03-22 13:32:29.786 108 ERROR oslo_messaging.rpc.server
>>>>>>>> cinder.exception.GlanceConnectionFailed: Connection to glance failed: Error
>>>>>>>> finding address for
>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>> Unable to establish connection to
>>>>>>>> http://172.25.228.253:9292/v2/images/736d8779-07cd-4510-bab2-adcb653cc538:
>>>>>>>> HTTPConnectionPool(host='172.25.228.253', port=9292): Max retries exceeded
>>>>>>>> with url: /v2/images/736d8779-07cd-4510-bab2-adcb653cc538 (Caused by
>>>>>>>> NewConnectionError('<urllib3.connection.HTTPConnection object at
>>>>>>>> 0x7f7682d2cd30>: Failed to establish a new connection: [Errno 111]
>>>>>>>> ECONNREFUSED',))
>>>>>>>>
>>>>>>>> Now i need to find out why the port is not listed as the glance
>>>>>>>> service is running, which i am not sure how to find out.
>>>>>>>>
>>>>>>>
>>>>>>> One other thing to investigate is whether your deployment includes
>>>>>>> this patch [1]. If it does, then bear in mind
>>>>>>> the glance-api service running at the edge site will be an
>>>>>>> "internal" (non public facing) instance that uses port 9293
>>>>>>> instead of 9292. You should familiarize yourself with the release
>>>>>>> note [2].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/commit/3605d45e417a77a1d0f153fbeffcbb283ec85fe6
>>>>>>> [2]
>>>>>>> https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/releasenotes/notes/glance-internal-service-86274f56712ffaac.yaml
>>>>>>>
>>>>>>> Alan
>>>>>>>
>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Swogat Pradhan
>>>>>>>>
>>>>>>>> On Wed, Mar 22, 2023 at 8:11 PM Alan Bishop <abishop at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 22, 2023 at 6:37 AM Swogat Pradhan <
>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Update:
>>>>>>>>>> Here is the log when creating a volume using cirros image:
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:04:38.449 109 INFO
>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>> bf341343-6609-4b8c-b9e0-93e2a89c8c8f: being created as image with
>>>>>>>>>> specification: {'status': 'creating', 'volume_name':
>>>>>>>>>> 'volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f', 'volume_size': 4,
>>>>>>>>>> 'image_id': '736d8779-07cd-4510-bab2-adcb653cc538', 'image_location':
>>>>>>>>>> ('rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> [{'url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'dcn02'}}]), 'image_meta': {'name': 'cirros',
>>>>>>>>>> 'disk_format': 'qcow2', 'container_format': 'bare', 'visibility': 'public',
>>>>>>>>>> 'size': 16338944, 'virtual_size': 117440512, 'status': 'active',
>>>>>>>>>> 'checksum': '1d3062cd89af34e419f7100277f38b2b', 'protected': False,
>>>>>>>>>> 'min_ram': 0, 'min_disk': 0, 'owner': '4160ce999a31485fa643aed0936dfef0',
>>>>>>>>>> 'os_hidden': False, 'os_hash_algo': 'sha512', 'os_hash_value':
>>>>>>>>>> '553d220ed58cfee7dafe003c446a9f197ab5edf8ffc09396c74187cf83873c877e7ae041cb80f3b91489acf687183adcd689b53b38e3ddd22e627e7f98a09c46',
>>>>>>>>>> 'id': '736d8779-07cd-4510-bab2-adcb653cc538', 'created_at':
>>>>>>>>>> datetime.datetime(2023, 3, 22, 10, 44, 12, tzinfo=datetime.timezone.utc),
>>>>>>>>>> 'updated_at': datetime.datetime(2023, 3, 22, 10, 54, 1,
>>>>>>>>>> tzinfo=datetime.timezone.utc), 'locations': [{'url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'ceph'}}, {'url':
>>>>>>>>>> 'rbd://a8d5f1f5-48e7-5ede-89ab-8aca59b6397b/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'metadata': {'store': 'dcn02'}}], 'direct_url':
>>>>>>>>>> 'rbd://a5ae877c-bcba-53fe-8336-450e63014757/images/736d8779-07cd-4510-bab2-adcb653cc538/snap',
>>>>>>>>>> 'tags': [], 'file': '/v2/images/736d8779-07cd-4510-bab2-adcb653cc538/file',
>>>>>>>>>> 'stores': 'ceph,dcn02', 'properties': {'os_glance_failed_import': '',
>>>>>>>>>> 'os_glance_importing_to_stores': '', 'owner_specified.openstack.md5': '',
>>>>>>>>>> 'owner_specified.openstack.object': 'images/cirros',
>>>>>>>>>> 'owner_specified.openstack.sha256': ''}}, 'image_service':
>>>>>>>>>> <cinder.image.glance.GlanceImageService object at 0x7f449ded1198>}
>>>>>>>>>> 2023-03-22 11:06:16.570 109 INFO cinder.image.image_utils
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Image download 15.58 MB at 0.16 MB/s
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As Adam Savage would say, well there's your problem ^^ (Image
>>>>>>>>> download 15.58 MB at 0.16 MB/s). Downloading the image takes too long, and
>>>>>>>>> 0.16 MB/s suggests you have a network issue.
>>>>>>>>>
>>>>>>>>> John Fulton previously stated your cinder-volume service at the
>>>>>>>>> edge site is not using the local ceph image store. Assuming you are
>>>>>>>>> deploying GlanceApiEdge service [1], then the cinder-volume service should
>>>>>>>>> be configured to use the local glance service [2]. You should check
>>>>>>>>> cinder's glance_api_servers to confirm it's the edge site's glance service.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/environments/dcn.yaml#L29
>>>>>>>>> [2]
>>>>>>>>> https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/deployment/glance/glance-api-edge-container-puppet.yaml#L80
>>>>>>>>>
>>>>>>>>> Alan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:07:54.023 109 WARNING py.warnings
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:11:12.161 109 WARNING py.warnings
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -]
>>>>>>>>>> /usr/lib/python3.6/site-packages/oslo_utils/imageutils.py:75:
>>>>>>>>>> FutureWarning: The human format is deprecated and the format parameter will
>>>>>>>>>> be removed. Use explicitly json instead in version 'xena'
>>>>>>>>>>   category=FutureWarning)
>>>>>>>>>>
>>>>>>>>>> 2023-03-22 11:11:12.163 109 INFO cinder.image.image_utils
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Converted 112.00 MB image at 112.00
>>>>>>>>>> MB/s
>>>>>>>>>> 2023-03-22 11:11:14.998 109 INFO
>>>>>>>>>> cinder.volume.flows.manager.create_volume
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Volume
>>>>>>>>>> volume-bf341343-6609-4b8c-b9e0-93e2a89c8c8f
>>>>>>>>>> (bf341343-6609-4b8c-b9e0-93e2a89c8c8f): created successfully
>>>>>>>>>> 2023-03-22 11:11:15.195 109 INFO cinder.volume.manager
>>>>>>>>>> [req-646b9ac8-a5a7-45ac-a96d-8dd6bb45da17 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - - -] Created volume successfully.
>>>>>>>>>>
>>>>>>>>>> The image is present in dcn02 store but still it downloaded the
>>>>>>>>>> image in 0.16 MB/s and then created the volume.
>>>>>>>>>>
>>>>>>>>>> With regards,
>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 21, 2023 at 6:10 PM Swogat Pradhan <
>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jhon,
>>>>>>>>>>> This seems to be an issue.
>>>>>>>>>>> When i deployed the dcn ceph in both dcn01 and dcn02 the
>>>>>>>>>>> --cluster parameter was specified to the respective cluster names but the
>>>>>>>>>>> config files were created in the name of ceph.conf and keyring was
>>>>>>>>>>> ceph.client.openstack.keyring.
>>>>>>>>>>>
>>>>>>>>>>> Which created issues in glance as well as the naming convention
>>>>>>>>>>> of the files didn't match the cluster names, so i had to manually rename
>>>>>>>>>>> the central ceph conf file as such:
>>>>>>>>>>>
>>>>>>>>>>> [root at dcn02-compute-0 ~]# cd /var/lib/tripleo-config/ceph/
>>>>>>>>>>> [root at dcn02-compute-0 ceph]# ll
>>>>>>>>>>> total 16
>>>>>>>>>>> -rw-------. 1 root root 257 Mar 13 13:56
>>>>>>>>>>> ceph_central.client.openstack.keyring
>>>>>>>>>>> -rw-r--r--. 1 root root 428 Mar 13 13:56 ceph_central.conf
>>>>>>>>>>> -rw-------. 1 root root 205 Mar 15 18:45
>>>>>>>>>>> ceph.client.openstack.keyring
>>>>>>>>>>> -rw-r--r--. 1 root root 362 Mar 15 18:45 ceph.conf
>>>>>>>>>>> [root at dcn02-compute-0 ceph]#
>>>>>>>>>>>
>>>>>>>>>>> ceph.conf and ceph.client.openstack.keyring contain the fsid of
>>>>>>>>>>> the respective clusters in both dcn01 and dcn02.
>>>>>>>>>>> In the above cli output, the ceph.conf and ceph.client... are
>>>>>>>>>>> the files used to access dcn02 ceph cluster and ceph_central* files are
>>>>>>>>>>> used in for accessing central ceph cluster.
>>>>>>>>>>>
>>>>>>>>>>> glance multistore config:
>>>>>>>>>>> [dcn02]
>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>> store_description=dcn02 rbd glance store
>>>>>>>>>>>
>>>>>>>>>>> [ceph_central]
>>>>>>>>>>> rbd_store_ceph_conf=/etc/ceph/ceph_central.conf
>>>>>>>>>>> rbd_store_user=openstack
>>>>>>>>>>> rbd_store_pool=images
>>>>>>>>>>> rbd_thin_provisioning=False
>>>>>>>>>>> store_description=Default glance store backend.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> With regards,
>>>>>>>>>>> Swogat Pradhan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 21, 2023 at 5:52 PM John Fulton <johfulto at redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 21, 2023 at 8:03 AM Swogat Pradhan
>>>>>>>>>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> > Seems like cinder is not using the local ceph.
>>>>>>>>>>>>
>>>>>>>>>>>> That explains the issue. It's a misconfiguration.
>>>>>>>>>>>>
>>>>>>>>>>>> I hope this is not a production system since the mailing list
>>>>>>>>>>>> now has
>>>>>>>>>>>> the cinder.conf which contains passwords.
>>>>>>>>>>>>
>>>>>>>>>>>> The section that looks like this:
>>>>>>>>>>>>
>>>>>>>>>>>> [tripleo_ceph]
>>>>>>>>>>>> volume_backend_name=tripleo_ceph
>>>>>>>>>>>> volume_driver=cinder.volume.drivers.rbd.RBDDriver
>>>>>>>>>>>> rbd_ceph_conf=/etc/ceph/ceph.conf
>>>>>>>>>>>> rbd_user=openstack
>>>>>>>>>>>> rbd_pool=volumes
>>>>>>>>>>>> rbd_flatten_volume_from_snapshot=False
>>>>>>>>>>>> rbd_secret_uuid=<redacted>
>>>>>>>>>>>> report_discard_supported=True
>>>>>>>>>>>>
>>>>>>>>>>>> Should be updated to refer to the local DCN ceph cluster and
>>>>>>>>>>>> not the
>>>>>>>>>>>> central one. Use the ceph conf file for that cluster and ensure
>>>>>>>>>>>> the
>>>>>>>>>>>> rbd_secret_uuid corresponds to that one.
>>>>>>>>>>>>
>>>>>>>>>>>> TripleO’s convention is to set the rbd_secret_uuid to the FSID
>>>>>>>>>>>> of the
>>>>>>>>>>>> Ceph cluster. The FSID should be in the ceph.conf file. The
>>>>>>>>>>>> tripleo_nova_libvirt role will use virsh secret-* commands so
>>>>>>>>>>>> that
>>>>>>>>>>>> libvirt can retrieve the cephx secret using the FSID as a key.
>>>>>>>>>>>> This
>>>>>>>>>>>> can be confirmed with `podman exec nova_virtsecretd virsh
>>>>>>>>>>>> secret-get-value $FSID`.
>>>>>>>>>>>>
>>>>>>>>>>>> The documentation describes how to configure the central and
>>>>>>>>>>>> DCN sites
>>>>>>>>>>>> correctly but an error seems to have occurred while you were
>>>>>>>>>>>> following
>>>>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html
>>>>>>>>>>>>
>>>>>>>>>>>>   John
>>>>>>>>>>>>
>>>>>>>>>>>> >
>>>>>>>>>>>> > Ceph Output:
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p images ls -l
>>>>>>>>>>>> > NAME                                       SIZE     PARENT
>>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>>> > 2abfafaa-eff4-4c2e-a538-dc2e1249ab65         8 MiB
>>>>>>>>>>>> 2        excl
>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19        16 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 55f40c8a-8f79-48c5-a52a-9b679b762f19 at snap   16 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d       321 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 59f6a9cd-721c-45b5-a15f-fd021b08160d at snap  321 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0       386 MiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 5f5ddd77-35f3-45e8-9dd3-8c1cbb1f39f0 at snap  386 MiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > 9b27248e-a8cf-4f00-a039-d3e3066cd26a at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > b7356adc-bb47-4c05-968b-6d3c9ca0079b at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3        15 GiB
>>>>>>>>>>>> 2
>>>>>>>>>>>> > e77e78ad-d369-4a1d-b758-8113621269a3 at snap   15 GiB
>>>>>>>>>>>>   2  yes
>>>>>>>>>>>> >
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]# rbd -p volumes ls -l
>>>>>>>>>>>> > NAME                                         SIZE     PARENT
>>>>>>>>>>>> FMT  PROT  LOCK
>>>>>>>>>>>> > volume-c644086f-d3cf-406d-b0f1-7691bde5981d  100 GiB
>>>>>>>>>>>>   2
>>>>>>>>>>>> > volume-f0969935-a742-4744-9375-80bf323e4d63   10 GiB
>>>>>>>>>>>>   2
>>>>>>>>>>>> > [ceph: root at dcn02-ceph-all-0 /]#
>>>>>>>>>>>> >
>>>>>>>>>>>> > Attached the cinder config.
>>>>>>>>>>>> > Please let me know how I can solve this issue.
>>>>>>>>>>>> >
>>>>>>>>>>>> > With regards,
>>>>>>>>>>>> > Swogat Pradhan
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Tue, Mar 21, 2023 at 3:53 PM John Fulton <
>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> in my last message under the line "On a DCN site if you run
>>>>>>>>>>>> a command like this:" I suggested some steps you could try to confirm the
>>>>>>>>>>>> image is a COW from the local glance as well as how to look at your cinder
>>>>>>>>>>>> config.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Tue, Mar 21, 2023, 12:06 AM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Update:
>>>>>>>>>>>> >>> I uploaded an image directly to the dcn02 store, and it
>>>>>>>>>>>> takes around 10,15 minutes to create a volume with image in dcn02.
>>>>>>>>>>>> >>> The image size is 389 MB.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Mon, Mar 20, 2023 at 10:26 PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Hi Jhon,
>>>>>>>>>>>> >>>> I checked in the ceph od dcn02, I can see the images
>>>>>>>>>>>> created after importing from the central site.
>>>>>>>>>>>> >>>> But launching an instance normally fails as it takes a
>>>>>>>>>>>> long time for the volume to get created.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> When launching an instance from volume the instance is
>>>>>>>>>>>> getting created properly without any errors.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> I tried to cache images in nova using
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>> but getting checksum failed error.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> With regards,
>>>>>>>>>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On Thu, Mar 16, 2023 at 5:24 PM John Fulton <
>>>>>>>>>>>> johfulto at redhat.com> wrote:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> On Wed, Mar 15, 2023 at 8:05 PM Swogat Pradhan
>>>>>>>>>>>> >>>>> <swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > Update: After restarting the nova services on the
>>>>>>>>>>>> controller and running the deploy script on the edge site, I was able to
>>>>>>>>>>>> launch the VM from volume.
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > Right now the instance creation is failing as the block
>>>>>>>>>>>> device creation is stuck in creating state, it is taking more than 10 mins
>>>>>>>>>>>> for the volume to be created, whereas the image has already been imported
>>>>>>>>>>>> to the edge glance.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Try following this document and making the same
>>>>>>>>>>>> observations in your
>>>>>>>>>>>> >>>>> environment for AZs and their local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html#confirm-images-may-be-copied-between-sites
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> On a DCN site if you run a command like this:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> $ sudo cephadm shell --config /etc/ceph/dcn0.conf
>>>>>>>>>>>> --keyring
>>>>>>>>>>>> >>>>> /etc/ceph/dcn0.client.admin.keyring
>>>>>>>>>>>> >>>>> $ rbd --cluster dcn0 -p volumes ls -l
>>>>>>>>>>>> >>>>> NAME                                      SIZE  PARENT
>>>>>>>>>>>> >>>>>                           FMT PROT LOCK
>>>>>>>>>>>> >>>>> volume-28c6fc32-047b-4306-ad2d-de2be02716b7 8 GiB
>>>>>>>>>>>> >>>>> images/8083c7e7-32d8-4f7a-b1da-0ed7884f1076 at snap   2
>>>>>>>>>>>>   excl
>>>>>>>>>>>> >>>>> $
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Then, you should see the parent of the volume is the
>>>>>>>>>>>> image which is on
>>>>>>>>>>>> >>>>> the same local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> I wonder if something is misconfigured and thus you're
>>>>>>>>>>>> encountering
>>>>>>>>>>>> >>>>> the streaming behavior described here:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Ideally all images should reside in the central Glance
>>>>>>>>>>>> and be copied
>>>>>>>>>>>> >>>>> to DCN sites before instances of those images are booted
>>>>>>>>>>>> on DCN sites.
>>>>>>>>>>>> >>>>> If an image is not copied to a DCN site before it is
>>>>>>>>>>>> booted, then the
>>>>>>>>>>>> >>>>> image will be streamed to the DCN site and then the image
>>>>>>>>>>>> will boot as
>>>>>>>>>>>> >>>>> an instance. This happens because Glance at the DCN site
>>>>>>>>>>>> has access to
>>>>>>>>>>>> >>>>> the images store at the Central ceph cluster. Though the
>>>>>>>>>>>> booting of
>>>>>>>>>>>> >>>>> the image will take time because it has not been copied
>>>>>>>>>>>> in advance,
>>>>>>>>>>>> >>>>> this is still preferable to failing to boot the image.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> You can also exec into the cinder container at the DCN
>>>>>>>>>>>> site and
>>>>>>>>>>>> >>>>> confirm it's using it's local ceph cluster.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>   John
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > I will try and create a new fresh image and test again
>>>>>>>>>>>> then update.
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > With regards,
>>>>>>>>>>>> >>>>> > Swogat Pradhan
>>>>>>>>>>>> >>>>> >
>>>>>>>>>>>> >>>>> > On Wed, Mar 15, 2023 at 11:13 PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >> Update:
>>>>>>>>>>>> >>>>> >> In the hypervisor list the compute node state is
>>>>>>>>>>>> showing down.
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >>
>>>>>>>>>>>> >>>>> >> On Wed, Mar 15, 2023 at 11:11 PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> Hi Brendan,
>>>>>>>>>>>> >>>>> >>> Now i have deployed another site where i have used 2
>>>>>>>>>>>> linux bonds network template for both 3 compute nodes and 3 ceph nodes.
>>>>>>>>>>>> >>>>> >>> The bonding options is set to mode=802.3ad
>>>>>>>>>>>> (lacp=active).
>>>>>>>>>>>> >>>>> >>> I used a cirros image to launch instance but the
>>>>>>>>>>>> instance timed out so i waited for the volume to be created.
>>>>>>>>>>>> >>>>> >>> Once the volume was created i tried launching the
>>>>>>>>>>>> instance from the volume and still the instance is stuck in spawning state.
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> Here is the nova-compute log:
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.739 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon starting
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.744 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep process running with capabilities
>>>>>>>>>>>> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.749 185437 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep daemon running as pid 185437
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:47.974 8 WARNING
>>>>>>>>>>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>> [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266 b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> 4160ce999a31485fa643aed0936dfef0 - default default] Process execution error
>>>>>>>>>>>> in _get_host_uuid: Unexpected error while running command.
>>>>>>>>>>>> >>>>> >>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>> >>>>> >>> Exit code: 2
>>>>>>>>>>>> >>>>> >>> Stdout: ''
>>>>>>>>>>>> >>>>> >>> Stderr: '':
>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while
>>>>>>>>>>>> running command.
>>>>>>>>>>>> >>>>> >>> 2023-03-15 17:35:51.616 8 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver [req-dbb11a9b-317e-4957-b141-f9e0bdf6a266
>>>>>>>>>>>> b240e3e89d99489284cd731e75f2a5db 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance: 450b749c-a10a-4308-80a9-3b8020fee758] Creating image
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> It is stuck in creating image, do i need to run the
>>>>>>>>>>>> template mentioned here ?:
>>>>>>>>>>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/post_deployment/pre_cache_images.html
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> The volume is already created and i do not understand
>>>>>>>>>>>> why the instance is stuck in spawning state.
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> With regards,
>>>>>>>>>>>> >>>>> >>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>>
>>>>>>>>>>>> >>>>> >>> On Sun, Mar 5, 2023 at 4:02 PM Brendan Shephard <
>>>>>>>>>>>> bshephar at redhat.com> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Does your environment use different network
>>>>>>>>>>>> interfaces for each of the networks? Or does it have a bond with everything
>>>>>>>>>>>> on it?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> One issue I have seen before is that when launching
>>>>>>>>>>>> instances, there is a lot of network traffic between nodes as the
>>>>>>>>>>>> hypervisor needs to download the image from Glance. Along with various
>>>>>>>>>>>> other services sending normal network traffic, it can be enough to cause
>>>>>>>>>>>> issues if everything is running over a single 1Gbe interface.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I have seen the same situation in fact when using a
>>>>>>>>>>>> single active/backup bond on 1Gbe nics. It’s worth checking the network
>>>>>>>>>>>> traffic while you try to spawn the instance to see if you’re dropping
>>>>>>>>>>>> packets. In the situation I described, there were dropped packets which
>>>>>>>>>>>> resulted in a loss of communication between nova_compute and RMQ, so the
>>>>>>>>>>>> node appeared offline. You should also confirm that nova_compute is being
>>>>>>>>>>>> disconnected in the nova_compute logs if you tail them on the Hypervisor
>>>>>>>>>>>> while spawning the instance.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> In my case, changing from active/backup to LACP
>>>>>>>>>>>> helped. So, based on that experience, from my perspective, is certainly
>>>>>>>>>>>> sounds like some kind of network issue.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Brendan Shephard
>>>>>>>>>>>> >>>>> >>>> Senior Software Engineer
>>>>>>>>>>>> >>>>> >>>> Red Hat Australia
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On 5 Mar 2023, at 6:47 am, Eugen Block <
>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I tried to help someone with a similar issue some
>>>>>>>>>>>> time ago in this thread:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> https://serverfault.com/questions/1116771/openstack-oslo-messaging-exception-in-nova-conductor
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> But apparently a neutron reinstallation fixed it for
>>>>>>>>>>>> that user, not sure if that could apply here. But is it possible that your
>>>>>>>>>>>> nova and neutron versions are different between central and edge site? Have
>>>>>>>>>>>> you restarted nova and neutron services on the compute nodes after
>>>>>>>>>>>> installation? Have you debug logs of nova-conductor and maybe nova-compute?
>>>>>>>>>>>> Maybe they can help narrow down the issue.
>>>>>>>>>>>> >>>>> >>>> If there isn't any additional information in the
>>>>>>>>>>>> debug logs I probably would start "tearing down" rabbitmq. I didn't have to
>>>>>>>>>>>> do that in a production system yet so be careful. I can think of two routes:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> - Either remove queues, exchanges etc. while rabbit
>>>>>>>>>>>> is running, this will most likely impact client IO depending on your load.
>>>>>>>>>>>> Check out the rabbitmqctl commands.
>>>>>>>>>>>> >>>>> >>>> - Or stop the rabbitmq cluster, remove the mnesia
>>>>>>>>>>>> tables from all nodes and restart rabbitmq so the exchanges, queues etc.
>>>>>>>>>>>> rebuild.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> I can imagine that the failed reply "survives" while
>>>>>>>>>>>> being replicated across the rabbit nodes. But I don't really know the
>>>>>>>>>>>> rabbit internals too well, so maybe someone else can chime in here and give
>>>>>>>>>>>> a better advice.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Regards,
>>>>>>>>>>>> >>>>> >>>> Eugen
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>> >:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> Can someone please help me out on this issue?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Thu, Mar 2, 2023 at 1:24 PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi
>>>>>>>>>>>> >>>>> >>>> I don't see any major packet loss.
>>>>>>>>>>>> >>>>> >>>> It seems the problem is somewhere in rabbitmq maybe
>>>>>>>>>>>> but not due to packet
>>>>>>>>>>>> >>>>> >>>> loss.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> with regards,
>>>>>>>>>>>> >>>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:34 PM Swogat Pradhan <
>>>>>>>>>>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> Yes the MTU is the same as the default '1500'.
>>>>>>>>>>>> >>>>> >>>> Generally I haven't seen any packet loss, but never
>>>>>>>>>>>> checked when
>>>>>>>>>>>> >>>>> >>>> launching the instance.
>>>>>>>>>>>> >>>>> >>>> I will check that and come back.
>>>>>>>>>>>> >>>>> >>>> But everytime i launch an instance the instance gets
>>>>>>>>>>>> stuck at spawning
>>>>>>>>>>>> >>>>> >>>> state and there the hypervisor becomes down, so not
>>>>>>>>>>>> sure if packet loss
>>>>>>>>>>>> >>>>> >>>> causes this.
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> Swogat pradhan
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> On Wed, Mar 1, 2023 at 3:30 PM Eugen Block <
>>>>>>>>>>>> eblock at nde.ag> wrote:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> One more thing coming to mind is MTU size. Are they
>>>>>>>>>>>> identical between
>>>>>>>>>>>> >>>>> >>>> central and edge site? Do you see packet loss
>>>>>>>>>>>> through the tunnel?
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> Zitat von Swogat Pradhan <swogatpradhan22 at gmail.com
>>>>>>>>>>>> >:
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>> > Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> > Request you to please add my email either on 'to'
>>>>>>>>>>>> or 'cc' as i am not
>>>>>>>>>>>> >>>>> >>>> > getting email's from you.
>>>>>>>>>>>> >>>>> >>>> > Coming to the issue:
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > [root at overcloud-controller-no-ceph-3 /]#
>>>>>>>>>>>> rabbitmqctl list_policies -p
>>>>>>>>>>>> >>>>> >>>> /
>>>>>>>>>>>> >>>>> >>>> > Listing policies for vhost "/" ...
>>>>>>>>>>>> >>>>> >>>> > vhost   name    pattern apply-to
>>>>>>>>>>>> definition      priority
>>>>>>>>>>>> >>>>> >>>> > /       ha-all  ^(?!amq\.).*    queues
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> {"ha-mode":"exactly","ha-params":2,"ha-promote-on-shutdown":"always"}   0
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > I have the edge site compute nodes up, it only
>>>>>>>>>>>> goes down when i am
>>>>>>>>>>>> >>>>> >>>> trying
>>>>>>>>>>>> >>>>> >>>> > to launch an instance and the instance comes to a
>>>>>>>>>>>> spawning state and
>>>>>>>>>>>> >>>>> >>>> then
>>>>>>>>>>>> >>>>> >>>> > gets stuck.
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > I have a tunnel setup between the central and the
>>>>>>>>>>>> edge sites.
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > With regards,
>>>>>>>>>>>> >>>>> >>>> > Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> > On Tue, Feb 28, 2023 at 9:11 PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> > wrote:
>>>>>>>>>>>> >>>>> >>>> >
>>>>>>>>>>>> >>>>> >>>> >> Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> >> For some reason i am not getting your email to me
>>>>>>>>>>>> directly, i am
>>>>>>>>>>>> >>>>> >>>> checking
>>>>>>>>>>>> >>>>> >>>> >> the email digest and there i am able to find your
>>>>>>>>>>>> reply.
>>>>>>>>>>>> >>>>> >>>> >> Here is the log for download:
>>>>>>>>>>>> https://we.tl/t-L8FEkGZFSq
>>>>>>>>>>>> >>>>> >>>> >> Yes, these logs are from the time when the issue
>>>>>>>>>>>> occurred.
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> *Note: i am able to create vm's and perform other
>>>>>>>>>>>> activities in the
>>>>>>>>>>>> >>>>> >>>> >> central site, only facing this issue in the edge
>>>>>>>>>>>> site.*
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> With regards,
>>>>>>>>>>>> >>>>> >>>> >> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >> On Mon, Feb 27, 2023 at 5:12 PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> >> wrote:
>>>>>>>>>>>> >>>>> >>>> >>
>>>>>>>>>>>> >>>>> >>>> >>> Hi Eugen,
>>>>>>>>>>>> >>>>> >>>> >>> Thanks for your response.
>>>>>>>>>>>> >>>>> >>>> >>> I have actually a 4 controller setup so here are
>>>>>>>>>>>> the details:
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *PCS Status:*
>>>>>>>>>>>> >>>>> >>>> >>>   * Container bundle set: rabbitmq-bundle [
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> 172.25.201.68:8787/tripleomaster/openstack-rabbitmq:pcmklatest
>>>>>>>>>>>> ]:
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-0
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-no-ceph-3
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-1
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-2
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-2
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-1
>>>>>>>>>>>> >>>>> >>>> >>>     * rabbitmq-bundle-3
>>>>>>>>>>>> (ocf::heartbeat:rabbitmq-cluster):
>>>>>>>>>>>> >>>>> >>>> Started
>>>>>>>>>>>> >>>>> >>>> >>> overcloud-controller-0
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> I have tried restarting the bundle multiple
>>>>>>>>>>>> times but the issue is
>>>>>>>>>>>> >>>>> >>>> still
>>>>>>>>>>>> >>>>> >>>> >>> present.
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *Cluster status:*
>>>>>>>>>>>> >>>>> >>>> >>> [root at overcloud-controller-0 /]# rabbitmqctl
>>>>>>>>>>>> cluster_status
>>>>>>>>>>>> >>>>> >>>> >>> Cluster status of node
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com ...
>>>>>>>>>>>> >>>>> >>>> >>> Basics
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Cluster name:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Disk Nodes
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Running Nodes
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Versions
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com:
>>>>>>>>>>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> 3.8.3
>>>>>>>>>>>> >>>>> >>>> >>> on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com:
>>>>>>>>>>>> >>>>> >>>> RabbitMQ
>>>>>>>>>>>> >>>>> >>>> >>> 3.8.3 on Erlang 22.3.4.1
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Alarms
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Network Partitions
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> (none)
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Listeners
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.212, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-0.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.205, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-1.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 25672, protocol: clustering,
>>>>>>>>>>>> purpose: inter-node and CLI
>>>>>>>>>>>> >>>>> >>>> tool
>>>>>>>>>>>> >>>>> >>>> >>> communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> 172.25.201.201, port: 5672, protocol: amqp,
>>>>>>>>>>>> purpose: AMQP 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-2.internalapi.bdxworld.com,
>>>>>>>>>>>> >>>>> >>>> interface:
>>>>>>>>>>>> >>>>> >>>> >>> [::], port: 15672, protocol: http, purpose: HTTP
>>>>>>>>>>>> API
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 25672, protocol:
>>>>>>>>>>>> clustering, purpose:
>>>>>>>>>>>> >>>>> >>>> inter-node and
>>>>>>>>>>>> >>>>> >>>> >>> CLI tool communication
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: 172.25.201.209, port: 5672, protocol:
>>>>>>>>>>>> amqp, purpose: AMQP
>>>>>>>>>>>> >>>>> >>>> 0-9-1
>>>>>>>>>>>> >>>>> >>>> >>> and AMQP 1.0
>>>>>>>>>>>> >>>>> >>>> >>> Node:
>>>>>>>>>>>> rabbit at overcloud-controller-no-ceph-3.internalapi.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> ,
>>>>>>>>>>>> >>>>> >>>> >>> interface: [::], port: 15672, protocol: http,
>>>>>>>>>>>> purpose: HTTP API
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Feature flags
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> Flag: drop_unroutable_metric, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: empty_basic_get_metric, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: implicit_default_bindings, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: quorum_queue, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>> Flag: virtual_host_metadata, state: enabled
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> *Logs:*
>>>>>>>>>>>> >>>>> >>>> >>> *(Attached)*
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>> On Sun, Feb 26, 2023 at 2:34 PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> swogatpradhan22 at gmail.com>
>>>>>>>>>>>> >>>>> >>>> >>> wrote:
>>>>>>>>>>>> >>>>> >>>> >>>
>>>>>>>>>>>> >>>>> >>>> >>>> Hi,
>>>>>>>>>>>> >>>>> >>>> >>>> Please find the nova conductor as well as nova
>>>>>>>>>>>> api log.
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> nova-conuctor:
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:01.108 31 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 16152921c1eb45c2b1f562087140168b
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.144 26 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 83dbe5f567a940b698acfe986f6194fa
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.314 32 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_276049ec36a84486a8a406911d9802f4 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:45:02.316 32 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-7b43c4e5-0475-4598-92c0-fcacb51d9813 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> f3bfd7f65bd542b18d84cea3033abb43 failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_276049ec36a84486a8a406911d9802f4).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.282 35 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:48:01.284 35 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> d4b9180f91a94f9a82c3c9c4b7595566 failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.303 33 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:01.304 33 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> 897911a234a445d8a0d8af02ece40f6f failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:49:52.254 31 WARNING
>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>> >>>>> >>>> >>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>> >>>>> >>>> >>>> backend dogpile.cache.null.
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.264 27 WARNING
>>>>>>>>>>>> >>>>> >>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -]
>>>>>>>>>>>> >>>>> >>>> >>>> reply_349bcb075f8c49329435a0f884b33066 doesn't
>>>>>>>>>>>> exist, drop reply to
>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>> 2023-02-26 08:50:01.266 27 ERROR
>>>>>>>>>>>> oslo_messaging._drivers.amqpdriver
>>>>>>>>>>>> >>>>> >>>> >>>> [req-caefe26d-153a-4dfd-9ea6-bc5ca0d46679 - - -
>>>>>>>>>>>> - -] The reply
>>>>>>>>>>>> >>>>> >>>> >>>> 8f723ceb10c3472db9a9f324861df2bb failed to send
>>>>>>>>>>>> after 60 seconds
>>>>>>>>>>>> >>>>> >>>> due to a
>>>>>>>>>>>> >>>>> >>>> >>>> missing queue
>>>>>>>>>>>> (reply_349bcb075f8c49329435a0f884b33066).
>>>>>>>>>>>> >>>>> >>>> Abandoning...:
>>>>>>>>>>>> >>>>> >>>> >>>> oslo_messaging.exceptions.MessageUndeliverable
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>> On Sun, Feb 26, 2023 at 2:26 PM Swogat Pradhan <
>>>>>>>>>>>> >>>>> >>>> >>>> swogatpradhan22 at gmail.com> wrote:
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Hi,
>>>>>>>>>>>> >>>>> >>>> >>>>> I currently have 3 compute nodes on edge site1
>>>>>>>>>>>> where i am trying to
>>>>>>>>>>>> >>>>> >>>> >>>>> launch vm's.
>>>>>>>>>>>> >>>>> >>>> >>>>> When the VM is in spawning state the node goes
>>>>>>>>>>>> down (openstack
>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>> >>>>> >>>> >>>>> service list), the node comes backup when i
>>>>>>>>>>>> restart the nova
>>>>>>>>>>>> >>>>> >>>> compute
>>>>>>>>>>>> >>>>> >>>> >>>>> service but then the launch of the vm fails.
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> nova-compute.log
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:15:51.808 7 INFO
>>>>>>>>>>>> nova.compute.manager
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-bc0f5f2e-53fc-4dae-b1da-82f1f972d617 - -
>>>>>>>>>>>> - - -] Running
>>>>>>>>>>>> >>>>> >>>> >>>>> instance usage
>>>>>>>>>>>> >>>>> >>>> >>>>> audit for host dcn01-hci-0.bdxworld.com from
>>>>>>>>>>>> 2023-02-26 07:00:00
>>>>>>>>>>>> >>>>> >>>> to
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:00:00. 0 instances.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:52.813 7 INFO
>>>>>>>>>>>> nova.compute.claims
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Claim
>>>>>>>>>>>> successful on node
>>>>>>>>>>>> >>>>> >>>> >>>>> dcn01-hci-0.bdxworld.com
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.225 7 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Ignoring
>>>>>>>>>>>> supplied device
>>>>>>>>>>>> >>>>> >>>> name:
>>>>>>>>>>>> >>>>> >>>> >>>>> /dev/vda. Libvirt can't honour user-supplied
>>>>>>>>>>>> dev names
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:54.398 7 INFO
>>>>>>>>>>>> nova.virt.block_device
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Booting
>>>>>>>>>>>> with volume
>>>>>>>>>>>> >>>>> >>>> >>>>> c4bd7885-5973-4860-bbe6-7a2f726baeee at
>>>>>>>>>>>> /dev/vda
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.216 7 WARNING
>>>>>>>>>>>> nova.cache_utils
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Cache enabled
>>>>>>>>>>>> >>>>> >>>> with
>>>>>>>>>>>> >>>>> >>>> >>>>> backend dogpile.cache.null.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.283 7 INFO
>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Running
>>>>>>>>>>>> >>>>> >>>> >>>>> privsep helper:
>>>>>>>>>>>> >>>>> >>>> >>>>> ['sudo', 'nova-rootwrap',
>>>>>>>>>>>> '/etc/nova/rootwrap.conf',
>>>>>>>>>>>> >>>>> >>>> 'privsep-helper',
>>>>>>>>>>>> >>>>> >>>> >>>>> '--config-file', '/etc/nova/nova.conf',
>>>>>>>>>>>> '--config-file',
>>>>>>>>>>>> >>>>> >>>> >>>>> '/etc/nova/nova-compute.conf',
>>>>>>>>>>>> '--privsep_context',
>>>>>>>>>>>> >>>>> >>>> >>>>> 'os_brick.privileged.default',
>>>>>>>>>>>> '--privsep_sock_path',
>>>>>>>>>>>> >>>>> >>>> >>>>> '/tmp/tmpin40tah6/privsep.sock']
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.791 7 INFO
>>>>>>>>>>>> oslo.privsep.daemon
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Spawned new
>>>>>>>>>>>> >>>>> >>>> privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon via rootwrap
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.717 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon starting
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.722 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> process running with uid/gid: 0/0
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> process running with capabilities
>>>>>>>>>>>> (eff/prm/inh):
>>>>>>>>>>>> >>>>> >>>> >>>>> CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.726 2647 INFO
>>>>>>>>>>>> oslo.privsep.daemon [-] privsep
>>>>>>>>>>>> >>>>> >>>> >>>>> daemon running as pid 2647
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:55.956 7 WARNING
>>>>>>>>>>>> >>>>> >>>> os_brick.initiator.connectors.nvmeof
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] Process
>>>>>>>>>>>> >>>>> >>>> >>>>> execution error
>>>>>>>>>>>> >>>>> >>>> >>>>> in _get_host_uuid: Unexpected error while
>>>>>>>>>>>> running command.
>>>>>>>>>>>> >>>>> >>>> >>>>> Command: blkid overlay -s UUID -o value
>>>>>>>>>>>> >>>>> >>>> >>>>> Exit code: 2
>>>>>>>>>>>> >>>>> >>>> >>>>> Stdout: ''
>>>>>>>>>>>> >>>>> >>>> >>>>> Stderr: '':
>>>>>>>>>>>> oslo_concurrency.processutils.ProcessExecutionError:
>>>>>>>>>>>> >>>>> >>>> >>>>> Unexpected error while running command.
>>>>>>>>>>>> >>>>> >>>> >>>>> 2023-02-26 08:49:58.247 7 INFO
>>>>>>>>>>>> nova.virt.libvirt.driver
>>>>>>>>>>>> >>>>> >>>> >>>>> [req-3a1547ea-326f-4dd0-9127-7f4a4bdf1e45
>>>>>>>>>>>> >>>>> >>>> >>>>> b240e3e89d99489284cd731e75f2a5db
>>>>>>>>>>>> >>>>> >>>> >>>>> 4160ce999a31485fa643aed0936dfef0 - default
>>>>>>>>>>>> default] [instance:
>>>>>>>>>>>> >>>>> >>>> >>>>> 0c62c1ef-9010-417d-a05f-4db77e901600] Creating
>>>>>>>>>>>> image
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Is there a way to solve this issue?
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> With regards,
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>> Swogat Pradhan
>>>>>>>>>>>> >>>>> >>>> >>>>>
>>>>>>>>>>>> >>>>> >>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>> >>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230323/1d3ac151/attachment-0001.htm>

Open Stack

DCN compute service goes down when a instance is scheduled to launch | wallaby | tripleo

OpenStack

Community

Documentation

Branding & Legal