[openstack-dev] [puppet] [swift] Storage service startup should await ring creation
Mark Kirkwood
mark.kirkwood at catalyst.net.nz
Fri Aug 14 06:25:45 UTC 2015
On 10/07/15 12:43, Mark Kirkwood wrote:
> Hi,
>
> I am using puppet-swift to deploy a swift multi node cluster (Icehouse),
> following the setup in supplied tests/site.pp. I am running into two
> issues that seem to be related to the subject above:
>
> 1/ Errors when the storage replication services try to start before the
> ring files exist. e.g:
>
> Error: Could not start Service[swift-object-replicator]: Execution of
> '/sbin/start swift-object-replicator' returned 1: start: Job failed to
> start
> Wrapped exception:
> Execution of '/sbin/start swift-object-replicator' returned 1: start:
> Job failed to start
> Error:
> /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Service[swift-object-replicator]/ensure:
> change from stopped to running failed: Could not start
> Service[swift-object-replicator]: Execution of '/sbin/start
> swift-object-replicator' returned 1: start: Job failed to start
>
> Now these will be fixed the *next* time I do a puppet run (provided I've
> performed a run on the appropriate proxy/ringmaster). However the
> failing services make scripted testing difficult as we have to put in
> logic to the effect "don't worry about errors the 1st time".
>
> 2/ Container and object stats not updated without full restart of services
>
> This one is a bit more subtle - everything works but 'swift stat' always
> shows zero objects and bytes for every container. The only way to fix
> this is to stop and start all services on each storage node.
>
> Again this complicates scripted builds as there is the need to go and
> stop + start all the swift storage services! Not to mention an extra
> little quirk for ops to remember at zero dark 30 oclock...
>
> I've made a patch that prevents these services starting until the ring
> files exist (actually for now it just checks the object ring) - see
> attached.
>
> Now while I'm not entirely sure that this is the best way to solve the
> issue (custom fact that changes the service start flag)...I *do* think
> that making the storage services await the ring existence *is* a needed
> change, so any thoughts on better ways to do this are appreciated.
>
> Also note that this change *does* require one more puppet run on the
> storage nodes:
> - one to create the storage servers config and drives
> - one to get the ring from the proxy/ringmaster
> - one to start the services
>
I decided to work around these rather than trying to battle in my patch
to the swift module.
For 1/ I'm trapping the return code for the 1st puppet run and handling
errors there...run not doing anything for any subsequent run as there
shouldn't be any errors thereafter. Seems to work ok
For 2/ I'm inserting an exec in our driving puppet code to just stop and
start (not restart as that does not solve it...growl) *all* the services
on a storage node. e.g (see tests/site.pp in the swift module for context):
# collect resources for synchronizing the ring databases
Swift::Ringsync<<||>>
# stop and start all swift services
# this is a workaround due to the services starting before the ring
# is synced which stops stats (and maybe other things) working.
# We can't just restart as this does *not* achieve the same result.
exec { 'stop-start-services':
provider => shell,
command => 'for x in `initctl list|grep swift|awk \'{print
$1}\'`;do stop $x;start $x;done;exit 0',
path => '/usr/bin:/usr/sbin:/bin:/sbin',
logoutput => true,
}
# make sure we stop and start all services after syncing ring
Swift::Ringsync<||>
-> Exec['stop-start-services']
Ok, so it is a pretty big hammer, but leaves all of the services in a
known good state, so I'm happy.
Regards
Mark
More information about the OpenStack-dev
mailing list