[openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

Arne Wiebalck Arne.Wiebalck at cern.ch
Tue May 12 07:06:06 UTC 2015


Here’s Dan’s answer for the exact procedure (he replied, but it bounced):


We have two clusters with mons behind two DNS aliases:

 cephmon.cern.ch: production cluster with five mons A, B, C, D, E

 cephmond.cern.ch: testing cluster with five mons X, Y, Z


The procedure was:

 1. Stop mon on host X. Remove from DNS alias cephmond. Remove from mon map.

 2. Stop mon on host A. Remove from DNS alias cephmon. Remove from mon map.

 3. Add mon on host X to cephmon cluster. mkfs the new mon, start the ceph-mon process; after quorum add it to the cephmon alias.

 4. Add mon on host A to cephmond cluster. mkfs the new mon, start the ceph-mon process; after quorum add it to the cephmond alias.

 5. Repeat for B/Y and C/Z.



In the end, three of the hosts which were previously running cephmon mon’s were then running cephmond mon’s. Hence when a client comes with an config pointing to an old mon, they get authentication denied and the client stops there — it doesn’t try the next IP in the list of mons. As a workaround we moved all the cephmond mon’s to port 6790 — this way the Cinder clients failover to one of the two cephmon mon’s which have not changed.



Cheers, Dan



On 12 May 2015, at 01:46, Josh Durgin <jdurgin at redhat.com> wrote:

> On 05/08/2015 12:41 AM, Arne Wiebalck wrote:
>> Hi Josh,
>> 
>> In our case adding the monitor hostnames (alias) would have made only a
>> slight difference:
>> as we moved the servers to another cluster, the client received an
>> authorisation failure rather
>> than a connection failure and did not try to fail over to the next IP in
>> the list. So, adding the
>> alias to list would have improved the chances to hit a good monitor, but
>> it would not have
>> eliminated the problem.
> 
> Could you provide more details on the procedure you followed to move
> between clusters? I missed the separate clusters part initially, and
> thought you were simply replacing the monitor nodes.
> 
>> I’m not sure storing IPs in the nova database is a good idea in gerenal.
>> Replacing (not adding)
>> these by the hostnames is probably better. Another approach may be to
>> generate this part of
>> connection_info (and hence the XML) dynamically from the local ceph.conf
>> when the connection
>> is created. I think a mechanism like this is for instance used to select
>> a free port for the vnc
>> console when the instance is started.
> 
> Yes, with different clusters only using the hostnames is definitely
> the way to go. I agree that keeping the information in nova's db may
> not be the best idea. It is handy to allow nova to use different
> clusters from cinder, so I'd prefer not generating the connection info
> locally. The qos_specs are also part of connection_info, and if changed
> they would have a similar problem of not applying the new value to
> existing instances, even after reboot. Maybe nova should simply refresh
> the connection info each time it uses a volume.
> 
> Josh
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list