<div dir="ltr">That is a bit strange. I don't use the Ceph backend so I don't know any magic tricks.<div><ul><li>I'm surprised that the Debug logging level doesn't add anything else. Is there any other lines besides the "connecting" one?</li><li>Can we narrow down the port/IP destination for the Ceph RBD traffic?<br></li><li>Can we failover the cinder-volume service to another controller and check the status of the volume service?</li><li>Did the power outage impact the Ceph cluster + network gear + all the controllers?</li><li>Does the content of /etc/ceph/ceph.conf appear to be valid inside the container?</li></ul><div>Looking at the code - <a href="https://github.com/openstack/cinder/blob/stable/train/cinder/volume/drivers/rbd.py#L432">https://github.com/openstack/cinder/blob/stable/train/cinder/volume/drivers/rbd.py#L432</a></div></div><div><br></div><div>It should raise an exception if there is a timeout when the connection client is built.</div><div><br></div><div><table class="gmail-highlight gmail-tab-size gmail-js-file-line-container" style="border-spacing:0px;border-collapse:collapse;color:rgb(201,209,217);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";font-size:14px;background-color:rgb(13,17,23)"><tbody style="box-sizing:border-box"><tr style="box-sizing:border-box;background-color:initial"><td id="gmail-LC466" class="gmail-blob-code gmail-blob-code-inner gmail-js-file-line" style="box-sizing:border-box;padding:0px 10px;line-height:20px;vertical-align:top;overflow:visible;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;white-space:pre"><span class="gmail-pl-k" style="box-sizing:border-box">except</span> <span class="gmail-pl-s1" style="box-sizing:border-box">self</span>.<span class="gmail-pl-s1" style="box-sizing:border-box">rados</span>.<span class="gmail-pl-v" style="box-sizing:border-box">Error</span>:</td></tr><tr style="box-sizing:border-box"><td id="gmail-L467" class="gmail-blob-num gmail-js-line-number" style="box-sizing:border-box;padding:0px 10px;width:50px;min-width:50px;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;line-height:20px;text-align:right;white-space:nowrap;vertical-align:top"></td><td id="gmail-LC467" class="gmail-blob-code gmail-blob-code-inner gmail-js-file-line" style="box-sizing:border-box;padding:0px 10px;line-height:20px;vertical-align:top;overflow:visible;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;white-space:pre">                <span class="gmail-pl-s1" style="box-sizing:border-box">msg</span> <span class="gmail-pl-c1" style="box-sizing:border-box">=</span> <span class="gmail-pl-en" style="box-sizing:border-box">_</span>(<span class="gmail-pl-s" style="box-sizing:border-box">"Error connecting to ceph cluster."</span>)</td></tr><tr style="box-sizing:border-box;background-color:initial"><td id="gmail-L468" class="gmail-blob-num gmail-js-line-number" style="box-sizing:border-box;padding:0px 10px;width:50px;min-width:50px;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;line-height:20px;text-align:right;white-space:nowrap;vertical-align:top"></td><td id="gmail-LC468" class="gmail-blob-code gmail-blob-code-inner gmail-js-file-line" style="box-sizing:border-box;padding:0px 10px;line-height:20px;vertical-align:top;overflow:visible;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;white-space:pre">                <span class="gmail-pl-v" style="box-sizing:border-box">LOG</span>.<span class="gmail-pl-en" style="box-sizing:border-box">exception</span>(<span class="gmail-pl-s1" style="box-sizing:border-box">msg</span>)</td></tr><tr style="box-sizing:border-box"><td id="gmail-L469" class="gmail-blob-num gmail-js-line-number" style="box-sizing:border-box;padding:0px 10px;width:50px;min-width:50px;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;line-height:20px;text-align:right;white-space:nowrap;vertical-align:top"></td><td id="gmail-LC469" class="gmail-blob-code gmail-blob-code-inner gmail-js-file-line" style="box-sizing:border-box;padding:0px 10px;line-height:20px;vertical-align:top;overflow:visible;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;white-space:pre">                <span class="gmail-pl-s1" style="box-sizing:border-box">client</span>.<span class="gmail-pl-en" style="box-sizing:border-box">shutdown</span>()</td></tr><tr style="box-sizing:border-box;background-color:initial"><td id="gmail-L470" class="gmail-blob-num gmail-js-line-number" style="box-sizing:border-box;padding:0px 10px;width:50px;min-width:50px;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;line-height:20px;text-align:right;white-space:nowrap;vertical-align:top"></td><td id="gmail-LC470" class="gmail-blob-code gmail-blob-code-inner gmail-js-file-line" style="box-sizing:border-box;padding:0px 10px;line-height:20px;vertical-align:top;overflow:visible;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;white-space:pre">                <span class="gmail-pl-k" style="box-sizing:border-box">raise</span> <span class="gmail-pl-s1" style="box-sizing:border-box">exception</span>.<span class="gmail-pl-v" style="box-sizing:border-box"><span class="gmail-pl-token" style="box-sizing:border-box">VolumeBackendAPIException</span></span>(<span class="gmail-pl-s1" style="box-sizing:border-box"><span class="gmail-pl-token" style="box-sizing:border-box">data</span></span><span class="gmail-pl-c1" style="box-sizing:border-box">=</span><span class="gmail-pl-s1" style="box-sizing:border-box">msg</span>)</td></tr></tbody></table></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 15, 2021 at 4:16 AM Sebastian Luna Valero <<a href="mailto:sebastian.luna.valero@gmail.com">sebastian.luna.valero@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div><br></div><div>Hi All,</div><div><br></div><div>Thanks for your inputs so far. I am also trying to help Manu with this issue.</div><div><br></div><div>The "cinder-volume" service was working properly with the existing configuration. However, after a power outage the service is no longer reported as "up".</div><div><br></div><div>Looking at the source code, the service status is reported as "down" by "cinder-scheduler" in here:</div><div><br></div><div><a href="https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L618" target="_blank">https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L618</a></div><div><br></div><div>With message: "WARNING cinder.scheduler.host_manager [req-<>- default default] volume service is down. (host: rbd:volumes@ceph-rbd)"</div><div><br></div><div>I printed out the "service" tuple <a href="https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L615" target="_blank">https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L615</a> and we get:</div><div><br></div><div>"2021-05-15 09:57:24.918 7 WARNING cinder.scheduler.host_manager [<> - default default] Service(active_backend_id=None,availability_zone='nova',binary='cinder-volume',cluster=<?>,cluster_name=None,created_at=2020-06-12T07:53:42Z,deleted=False,deleted_at=None,disabled=False,disabled_reason=None,frozen=False,host='rbd:volumes@ceph-rbd',id=12,modified_at=None,object_current_version='1.38',replication_status='disabled',report_count=8067424,rpc_current_version='3.16',topic='cinder-volume',updated_at=2021-05-12T15:37:52Z,uuid='604668e8-c2e7-46ed-a2b8-086e588079ac')"</div><div><br></div><div>Cinder is configured with a Ceph RBD backend, as explained in <a href="https://github.com/openstack/kolla-ansible/blob/stable/train/doc/source/reference/storage/external-ceph-guide.rst#cinder" target="_blank">https://github.com/openstack/kolla-ansible/blob/stable/train/doc/source/reference/storage/external-ceph-guide.rst#cinder</a></div><div><br></div><div>That's where the "backend_host=rbd:volumes" configuration is coming from.</div><div><br></div><div>We are using 3 controller nodes for OpenStack and 3 monitor nodes for Ceph.<br></div><div><br></div><div>The Ceph cluster doesn't report any error. The "cinder-volume" containers don't report any error. Moreover, when we go inside the "cinder-volume" container we are able to list existing volumes with:</div><div><br></div><div><div>rbd -p cinder.volumes --id cinder -k /etc/ceph/ceph.client.cinder.keyring ls</div><div><br></div><div>So the connection to the Ceph cluster works.<br></div><div><br></div><div>Why is "cinder-scheduler" reporting the that the backend Ceph cluster is down?</div><div><br></div><div>Many thanks,</div><div>Sebastian<br></div></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 13 May 2021 at 13:12, Tobias Urdin <<a href="mailto:tobias.urdin@binero.com" target="_blank">tobias.urdin@binero.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>

<br>

I just saw that you are running Ceph Octopus with Train release and wanted to let you know that we saw issues with the os-brick version shipped with Train not supporting client version of Ceph Octopus.<br>

<br>

So for our Ceph cluster running Octopus we had to keep the client version on Nautilus until upgrading to Victoria which included a newer version of os-brick.<br>

<br>

Maybe this is unrelated to your issue but just wanted to put it out there.<br>

<br>

Best regards <br>

Tobias <br>

<br>

> On 13 May 2021, at 12:55, ManuParra <<a href="mailto:mparra@iaa.es" target="_blank">mparra@iaa.es</a>> wrote:<br>

> <br>

> Hello Gorka, not yet, let me update cinder configuration, add the option, restart cinder and I’ll update the status.<br>

> Do you recommend other things to try for this cycle?<br>

> Regards.<br>

> <br>

>> On 13 May 2021, at 09:37, Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>> wrote:<br>

>> <br>

>>> On 13/05, ManuParra wrote:<br>

>>> Hi Gorka again, yes, the first thing is to know why you can't connect to that host (Ceph is actually set up for HA) so that's the way to do it. I tell you this because previously from the beginning of the setup of our setup it has always been like that, with that hostname and there has been no problem.<br>

>>> <br>

>>> As for the errors, the strangest thing is that in Monasca I have not found any error log, only warning on “volume service is down. (host: rbd:volumes@ceph-rbd)" and info, which is even stranger.<br>

>> <br>

>> Have you tried the configuration change I recommended?<br>

>> <br>

>> <br>

>>> <br>

>>> Regards.<br>

>>> <br>

>>>> On 12 May 2021, at 23:34, Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>> wrote:<br>

>>>> <br>

>>>> On 12/05, ManuParra wrote:<br>

>>>>> Hi Gorka, let me show the cinder config:<br>

>>>>> <br>

>>>>> [ceph-rbd]<br>

>>>>> rbd_ceph_conf = /etc/ceph/ceph.conf<br>

>>>>> rbd_user = cinder<br>

>>>>> backend_host = rbd:volumes<br>

>>>>> rbd_pool = cinder.volumes<br>

>>>>> volume_backend_name = ceph-rbd<br>

>>>>> volume_driver = cinder.volume.drivers.rbd.RBDDriver<br>

>>>>> …<br>

>>>>> <br>

>>>>> So, using rbd_exclusive_cinder_pool=True it will be used just for volumes? but the log is saying no connection to the backend_host.<br>

>>>> <br>

>>>> Hi,<br>

>>>> <br>

>>>> Your backend_host doesn't have a valid hostname, please set a proper<br>

>>>> hostname in that configuration option.<br>

>>>> <br>

>>>> Then the next thing you need to have is the cinder-volume service<br>

>>>> running correctly before making any requests.<br>

>>>> <br>

>>>> I would try adding rbd_exclusive_cinder_pool=true then tailing the<br>

>>>> volume logs, and restarting the service.<br>

>>>> <br>

>>>> See if the logs show any ERROR level entries.<br>

>>>> <br>

>>>> I would also check the service-list output right after the service is<br>

>>>> restarted, if it's up then I would check it again after 2 minutes.<br>

>>>> <br>

>>>> Cheers,<br>

>>>> Gorka.<br>

>>>> <br>

>>>> <br>

>>>>> <br>

>>>>> Regards.<br>

>>>>> <br>

>>>>> <br>

>>>>>> On 12 May 2021, at 11:49, Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>> wrote:<br>

>>>>>> <br>

>>>>>> On 12/05, ManuParra wrote:<br>

>>>>>>> Thanks, I have restarted the service and I see that after a few minutes then cinder-volume service goes down again when I check it with the command openstack volume service list.<br>

>>>>>>> The host/service that contains the cinder-volumes is rbd:volumes@ceph-rbd that is RDB in Ceph, so the problem does not come from Cinder, rather from Ceph or from the RDB (Ceph) pools that stores the volumes. I have checked Ceph and the status of everything is correct, no errors or warnings.<br>

>>>>>>> The error I have is that cinder can’t  connect to rbd:volumes@ceph-rbd. Any further suggestions? Thanks in advance.<br>

>>>>>>> Kind regards.<br>

>>>>>>> <br>

>>>>>> <br>

>>>>>> Hi,<br>

>>>>>> <br>

>>>>>> You are most likely using an older release, have a high number of cinder<br>

>>>>>> RBD volumes, and have not changed configuration option<br>

>>>>>> "rbd_exclusive_cinder_pool" from its default "false" value.<br>

>>>>>> <br>

>>>>>> Please add to your driver's section in cinder.conf the following:<br>

>>>>>> <br>

>>>>>> rbd_exclusive_cinder_pool = true<br>

>>>>>> <br>

>>>>>> <br>

>>>>>> And restart the service.<br>

>>>>>> <br>

>>>>>> Cheers,<br>

>>>>>> Gorka.<br>

>>>>>> <br>

>>>>>>>> On 11 May 2021, at 22:30, Eugen Block <<a href="mailto:eblock@nde.ag" target="_blank">eblock@nde.ag</a>> wrote:<br>

>>>>>>>> <br>

>>>>>>>> Hi,<br>

>>>>>>>> <br>

>>>>>>>> so restart the volume service;-)<br>

>>>>>>>> <br>

>>>>>>>> systemctl restart openstack-cinder-volume.service<br>

>>>>>>>> <br>

>>>>>>>> <br>

>>>>>>>> Zitat von ManuParra <<a href="mailto:mparra@iaa.es" target="_blank">mparra@iaa.es</a>>:<br>

>>>>>>>> <br>

>>>>>>>>> Dear OpenStack community,<br>

>>>>>>>>> <br>

>>>>>>>>> I have encountered a problem a few days ago and that is that when creating new volumes with:<br>

>>>>>>>>> <br>

>>>>>>>>> "openstack volume create --size 20 testmv"<br>

>>>>>>>>> <br>

>>>>>>>>> the volume creation status shows an error.  If I go to the error log detail it indicates:<br>

>>>>>>>>> <br>

>>>>>>>>> "Schedule allocate volume: Could not find any available weighted backend".<br>

>>>>>>>>> <br>

>>>>>>>>> Indeed then I go to the cinder log and it indicates:<br>

>>>>>>>>> <br>

>>>>>>>>> "volume service is down - host: rbd:volumes@ceph-rbd”.<br>

>>>>>>>>> <br>

>>>>>>>>> I check with:<br>

>>>>>>>>> <br>

>>>>>>>>> "openstack volume service list”  in which state are the services and I see that indeed this happens:<br>

>>>>>>>>> <br>

>>>>>>>>> <br>

>>>>>>>>> | cinder-volume | rbd:volumes@ceph-rbd | nova | enabled | down | 2021-04-29T09:48:42.000000 |<br>

>>>>>>>>> <br>

>>>>>>>>> And stopped since 2021-04-29 !<br>

>>>>>>>>> <br>

>>>>>>>>> I have checked Ceph (monitors,managers, osds. etc) and there are no problems with the Ceph BackEnd, everything is apparently working.<br>

>>>>>>>>> <br>

>>>>>>>>> This happened after an uncontrolled outage.So my question is how do I restart only cinder-volumes (I also have cinder-backup, cinder-scheduler but they are ok).<br>

>>>>>>>>> <br>

>>>>>>>>> Thank you very much in advance. Regards.<br>

>>>>>>>> <br>

>>>>>>>> <br>

>>>>>>>> <br>

>>>>>>>> <br>

>>>>>>> <br>

>>>>>>> <br>

>>>>>> <br>

>>>>>> <br>

>>>>> <br>

>>>> <br>

>>> <br>

>> <br>

>> <br>

> <br>

> <br>

</blockquote></div>

</blockquote></div>