Restart cinder-volume with Ceph rdb

Laurent Dumont laurentfdumont at gmail.com
Sat May 15 17:39:57 UTC 2021


That is a bit strange. I don't use the Ceph backend so I don't know any
magic tricks.

   - I'm surprised that the Debug logging level doesn't add anything else.
   Is there any other lines besides the "connecting" one?
   - Can we narrow down the port/IP destination for the Ceph RBD traffic?
   - Can we failover the cinder-volume service to another controller and
   check the status of the volume service?
   - Did the power outage impact the Ceph cluster + network gear + all the
   controllers?
   - Does the content of /etc/ceph/ceph.conf appear to be valid inside the
   container?

Looking at the code -
https://github.com/openstack/cinder/blob/stable/train/cinder/volume/drivers/rbd.py#L432

It should raise an exception if there is a timeout when the connection
client is built.

except self.rados.Error:
msg = _("Error connecting to ceph cluster.")
LOG.exception(msg)
client.shutdown()
raise exception.VolumeBackendAPIException(data=msg)

On Sat, May 15, 2021 at 4:16 AM Sebastian Luna Valero <
sebastian.luna.valero at gmail.com> wrote:

>
> Hi All,
>
> Thanks for your inputs so far. I am also trying to help Manu with this
> issue.
>
> The "cinder-volume" service was working properly with the existing
> configuration. However, after a power outage the service is no longer
> reported as "up".
>
> Looking at the source code, the service status is reported as "down" by
> "cinder-scheduler" in here:
>
>
> https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L618
>
> With message: "WARNING cinder.scheduler.host_manager [req-<>- default
> default] volume service is down. (host: rbd:volumes at ceph-rbd)"
>
> I printed out the "service" tuple
> https://github.com/openstack/cinder/blob/stable/train/cinder/scheduler/host_manager.py#L615
> and we get:
>
> "2021-05-15 09:57:24.918 7 WARNING cinder.scheduler.host_manager [<> -
> default default]
> Service(active_backend_id=None,availability_zone='nova',binary='cinder-volume',cluster=<?>,cluster_name=None,created_at=2020-06-12T07:53:42Z,deleted=False,deleted_at=None,disabled=False,disabled_reason=None,frozen=False,host='rbd:volumes at ceph-rbd
> ',id=12,modified_at=None,object_current_version='1.38',replication_status='disabled',report_count=8067424,rpc_current_version='3.16',topic='cinder-volume',updated_at=2021-05-12T15:37:52Z,uuid='604668e8-c2e7-46ed-a2b8-086e588079ac')"
>
> Cinder is configured with a Ceph RBD backend, as explained in
> https://github.com/openstack/kolla-ansible/blob/stable/train/doc/source/reference/storage/external-ceph-guide.rst#cinder
>
> That's where the "backend_host=rbd:volumes" configuration is coming from.
>
> We are using 3 controller nodes for OpenStack and 3 monitor nodes for Ceph.
>
> The Ceph cluster doesn't report any error. The "cinder-volume" containers
> don't report any error. Moreover, when we go inside the "cinder-volume"
> container we are able to list existing volumes with:
>
> rbd -p cinder.volumes --id cinder -k /etc/ceph/ceph.client.cinder.keyring
> ls
>
> So the connection to the Ceph cluster works.
>
> Why is "cinder-scheduler" reporting the that the backend Ceph cluster is
> down?
>
> Many thanks,
> Sebastian
>
>
> On Thu, 13 May 2021 at 13:12, Tobias Urdin <tobias.urdin at binero.com>
> wrote:
>
>> Hello,
>>
>> I just saw that you are running Ceph Octopus with Train release and
>> wanted to let you know that we saw issues with the os-brick version shipped
>> with Train not supporting client version of Ceph Octopus.
>>
>> So for our Ceph cluster running Octopus we had to keep the client version
>> on Nautilus until upgrading to Victoria which included a newer version of
>> os-brick.
>>
>> Maybe this is unrelated to your issue but just wanted to put it out there.
>>
>> Best regards
>> Tobias
>>
>> > On 13 May 2021, at 12:55, ManuParra <mparra at iaa.es> wrote:
>> >
>> > Hello Gorka, not yet, let me update cinder configuration, add the
>> option, restart cinder and I’ll update the status.
>> > Do you recommend other things to try for this cycle?
>> > Regards.
>> >
>> >> On 13 May 2021, at 09:37, Gorka Eguileor <geguileo at redhat.com> wrote:
>> >>
>> >>> On 13/05, ManuParra wrote:
>> >>> Hi Gorka again, yes, the first thing is to know why you can't connect
>> to that host (Ceph is actually set up for HA) so that's the way to do it. I
>> tell you this because previously from the beginning of the setup of our
>> setup it has always been like that, with that hostname and there has been
>> no problem.
>> >>>
>> >>> As for the errors, the strangest thing is that in Monasca I have not
>> found any error log, only warning on “volume service is down. (host:
>> rbd:volumes at ceph-rbd)" and info, which is even stranger.
>> >>
>> >> Have you tried the configuration change I recommended?
>> >>
>> >>
>> >>>
>> >>> Regards.
>> >>>
>> >>>> On 12 May 2021, at 23:34, Gorka Eguileor <geguileo at redhat.com>
>> wrote:
>> >>>>
>> >>>> On 12/05, ManuParra wrote:
>> >>>>> Hi Gorka, let me show the cinder config:
>> >>>>>
>> >>>>> [ceph-rbd]
>> >>>>> rbd_ceph_conf = /etc/ceph/ceph.conf
>> >>>>> rbd_user = cinder
>> >>>>> backend_host = rbd:volumes
>> >>>>> rbd_pool = cinder.volumes
>> >>>>> volume_backend_name = ceph-rbd
>> >>>>> volume_driver = cinder.volume.drivers.rbd.RBDDriver
>> >>>>> …
>> >>>>>
>> >>>>> So, using rbd_exclusive_cinder_pool=True it will be used just for
>> volumes? but the log is saying no connection to the backend_host.
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> Your backend_host doesn't have a valid hostname, please set a proper
>> >>>> hostname in that configuration option.
>> >>>>
>> >>>> Then the next thing you need to have is the cinder-volume service
>> >>>> running correctly before making any requests.
>> >>>>
>> >>>> I would try adding rbd_exclusive_cinder_pool=true then tailing the
>> >>>> volume logs, and restarting the service.
>> >>>>
>> >>>> See if the logs show any ERROR level entries.
>> >>>>
>> >>>> I would also check the service-list output right after the service is
>> >>>> restarted, if it's up then I would check it again after 2 minutes.
>> >>>>
>> >>>> Cheers,
>> >>>> Gorka.
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> Regards.
>> >>>>>
>> >>>>>
>> >>>>>> On 12 May 2021, at 11:49, Gorka Eguileor <geguileo at redhat.com>
>> wrote:
>> >>>>>>
>> >>>>>> On 12/05, ManuParra wrote:
>> >>>>>>> Thanks, I have restarted the service and I see that after a few
>> minutes then cinder-volume service goes down again when I check it with the
>> command openstack volume service list.
>> >>>>>>> The host/service that contains the cinder-volumes is
>> rbd:volumes at ceph-rbd that is RDB in Ceph, so the problem does not come
>> from Cinder, rather from Ceph or from the RDB (Ceph) pools that stores the
>> volumes. I have checked Ceph and the status of everything is correct, no
>> errors or warnings.
>> >>>>>>> The error I have is that cinder can’t  connect to
>> rbd:volumes at ceph-rbd. Any further suggestions? Thanks in advance.
>> >>>>>>> Kind regards.
>> >>>>>>>
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> You are most likely using an older release, have a high number of
>> cinder
>> >>>>>> RBD volumes, and have not changed configuration option
>> >>>>>> "rbd_exclusive_cinder_pool" from its default "false" value.
>> >>>>>>
>> >>>>>> Please add to your driver's section in cinder.conf the following:
>> >>>>>>
>> >>>>>> rbd_exclusive_cinder_pool = true
>> >>>>>>
>> >>>>>>
>> >>>>>> And restart the service.
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Gorka.
>> >>>>>>
>> >>>>>>>> On 11 May 2021, at 22:30, Eugen Block <eblock at nde.ag> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> so restart the volume service;-)
>> >>>>>>>>
>> >>>>>>>> systemctl restart openstack-cinder-volume.service
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Zitat von ManuParra <mparra at iaa.es>:
>> >>>>>>>>
>> >>>>>>>>> Dear OpenStack community,
>> >>>>>>>>>
>> >>>>>>>>> I have encountered a problem a few days ago and that is that
>> when creating new volumes with:
>> >>>>>>>>>
>> >>>>>>>>> "openstack volume create --size 20 testmv"
>> >>>>>>>>>
>> >>>>>>>>> the volume creation status shows an error.  If I go to the
>> error log detail it indicates:
>> >>>>>>>>>
>> >>>>>>>>> "Schedule allocate volume: Could not find any available
>> weighted backend".
>> >>>>>>>>>
>> >>>>>>>>> Indeed then I go to the cinder log and it indicates:
>> >>>>>>>>>
>> >>>>>>>>> "volume service is down - host: rbd:volumes at ceph-rbd”.
>> >>>>>>>>>
>> >>>>>>>>> I check with:
>> >>>>>>>>>
>> >>>>>>>>> "openstack volume service list”  in which state are the
>> services and I see that indeed this happens:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> | cinder-volume | rbd:volumes at ceph-rbd | nova | enabled | down
>> | 2021-04-29T09:48:42.000000 |
>> >>>>>>>>>
>> >>>>>>>>> And stopped since 2021-04-29 !
>> >>>>>>>>>
>> >>>>>>>>> I have checked Ceph (monitors,managers, osds. etc) and there
>> are no problems with the Ceph BackEnd, everything is apparently working.
>> >>>>>>>>>
>> >>>>>>>>> This happened after an uncontrolled outage.So my question is
>> how do I restart only cinder-volumes (I also have cinder-backup,
>> cinder-scheduler but they are ok).
>> >>>>>>>>>
>> >>>>>>>>> Thank you very much in advance. Regards.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210515/7163bc28/attachment-0001.html>


More information about the openstack-discuss mailing list