[OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

Sean Dague sean at dague.net
Mon Mar 10 20:07:30 UTC 2014


So, honestly, running stack.sh / unstack.sh that many times in a row
really isn't expected to work in my experience. You should at minimum be
doing ./clean.sh to try to reset the state further.

	-Sean

On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:
> In my case, the base OS is 12.04 Precise.
> 
> The problem is intermittent in that it takes maybe 15 to 20 cycles of unstack/stack to get it into the failure mode, but once in the failure mode, it appears that tgt daemon is 100% dead-in-the-water.
> 
> -----Original Message-----
> From: Sean Dague [mailto:sean at dague.net] 
> Sent: Monday, March 10, 2014 1:49 PM
> To: Dane Leblanc (leblancd); openstack-infra at lists.openstack.org
> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
> 
> What base OS? A change was made there recently to better handle debian because we believed (possibly incorrectly) that precise actually had working init scripts.
> 
> It would be interesting to understand if this was a 100% failure, or only intermittent, and what base OS it was on.
> 
> 	-Sean
> 
> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
>> I don't know if anyone can give me some troubleshooting advice with this issue.
>>
>> I'm seeing an occasional problem whereby after several DevStack unstack.sh/stack.sh cycles, the tgt daemon (tgtd) fails to start during Cinder startup.  Here's a snippet from the stack.sh log:
>>
>> 2014-03-10 07:09:45.214 | Starting Cinder
>> 2014-03-10 07:09:45.215 | + return 0
>> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
>> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
>> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
>> 2014-03-10 07:09:45.219 | + is_ubuntu
>> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
>> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
>> 2014-03-10 07:09:45.222 | + sudo service tgt restart
>> 2014-03-10 07:09:45.223 | stop: Unknown instance: 
>> 2014-03-10 07:09:45.619 | start: Job failed to start 
>> jenkins at neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | + 
>> exit_trap
>> 2014-03-10 07:09:45.622 | + local r=1
>> 2014-03-10 07:09:45.623 | ++ jobs -p
>> 2014-03-10 07:09:45.624 | + jobs=
>> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
>> 2014-03-10 07:09:45.626 | + exit 1
>>
>> If I try to restart tgt manually without success:
>>
>> jenkins at neutronpluginsci:~$ sudo service tgt restart
>> stop: Unknown instance: 
>> start: Job failed to start
>> jenkins at neutronpluginsci:~$ sudo tgtd
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?
>> (null): fcoe_init(214) (null)
>> (null): fcoe_create_interface(171) no interface specified.
>> jenkins at neutronpluginsci:~$
>>
>> The config in /etc/tgt is:
>>
>> jenkins at neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root 
>> root 4096 Mar 10 07:03 conf.d
>> lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d -> /opt/stack/data/cinder/volumes
>> -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
>> jenkins at neutronpluginsci:/etc/tgt$ cat targets.conf include 
>> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/* 
>> jenkins at neutronpluginsci:/etc/tgt$ ls conf.d 
>> jenkins at neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes 
>> jenkins at neutronpluginsci:/etc/tgt$
>>
>> I don't know if there's any missing Cinder config in my DevStack localrc files. Here's one that I'm using:
>>
>> MYSQL_PASSWORD=nova
>> RABBIT_PASSWORD=nova
>> SERVICE_TOKEN=nova
>> SERVICE_PASSWORD=nova
>> ADMIN_PASSWORD=nova
>> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
>> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
>> enable_service mysql
>> disable_service n-net
>> enable_service q-svc
>> enable_service q-agt
>> enable_service q-l3
>> enable_service q-dhcp
>> enable_service q-meta
>> enable_service q-lbaas
>> enable_service neutron
>> enable_service tempest
>> VOLUME_BACKING_FILE_SIZE=2052M
>> Q_PLUGIN=cisco
>> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A 
>> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
>> pluginsci:1/9) 
>> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git
>> PHYSICAL_NETWORK=physnet1
>> OVS_PHYSICAL_BRIDGE=br-eth1
>> TENANT_VLAN_RANGE=810:819
>> ENABLE_TENANT_VLANS=True
>> API_RATE_LIMIT=False
>> VERBOSE=True
>> DEBUG=True
>> LOGFILE=/opt/stack/logs/stack.sh.log
>> USE_SCREEN=True
>> SCREEN_LOGDIR=/opt/stack/logs
>>
>> Here are links to a log showing another localrc file that I use, and the corresponding stack.sh log:
>>
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo
>> g.txt 
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l
>> og.txt
>>
>> Does anyone have any advice on how to debug this, or recover from this (beyond rebooting the node)? Or am I missing any Cinder config?
>>
>> Thanks in advance for any help on this!!!
>> Dane
>>
>>
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
> 
> 
> --
> Sean Dague
> Samsung Research America
> sean at dague.net / sean.dague at samsung.com
> http://dague.net
> 


-- 
Sean Dague
Samsung Research America
sean at dague.net / sean.dague at samsung.com
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140310/54af60e1/attachment-0001.pgp>


More information about the OpenStack-Infra mailing list