<div dir="ltr"><div class="gmail_default" style="font-family:courier new,monospace"><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 10, 2014 at 2:17 PM, Sukhdev Kapur <span dir="ltr"><<a href="mailto:sukhdevkapur@gmail.com" target="_blank">sukhdevkapur@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I should have clarified. In my case it is identical - once it hits the failure, after that it is always 100% of time failure - i.e. every run fails after that.<div>
<br></div><div>HTH</div><span class="HOEnZb"><font color="#888888"><div>
-Sukhdev</div><div><br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 10, 2014 at 12:00 PM, Dane Leblanc (leblancd) <span dir="ltr"><<a href="mailto:leblancd@cisco.com" target="_blank">leblancd@cisco.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In my case, the base OS is 12.04 Precise.<br>
<br>
The problem is intermittent in that it takes maybe 15 to 20 cycles of unstack/stack to get it into the failure mode, but once in the failure mode, it appears that tgt daemon is 100% dead-in-the-water.<br>
<div><div><br>
-----Original Message-----<br>
From: Sean Dague [mailto:<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>]<br>
Sent: Monday, March 10, 2014 1:49 PM<br>
To: Dane Leblanc (leblancd); <a href="mailto:openstack-infra@lists.openstack.org" target="_blank">openstack-infra@lists.openstack.org</a><br>
Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"<br>
<br>
What base OS? A change was made there recently to better handle debian because we believed (possibly incorrectly) that precise actually had working init scripts.<br>
<br>
It would be interesting to understand if this was a 100% failure, or only intermittent, and what base OS it was on.<br>
<br>
-Sean<br>
<br>
On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:<br>
> I don't know if anyone can give me some troubleshooting advice with this issue.<br>
><br>
> I'm seeing an occasional problem whereby after several DevStack <a href="http://unstack.sh/stack.sh" target="_blank">unstack.sh/stack.sh</a> cycles, the tgt daemon (tgtd) fails to start during Cinder startup. Here's a snippet from the stack.sh log:<br>
><br>
> 2014-03-10 07:09:45.214 | Starting Cinder<br>
> 2014-03-10 07:09:45.215 | + return 0<br>
> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf<br>
> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d<br>
> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]<br>
> 2014-03-10 07:09:45.219 | + is_ubuntu<br>
> 2014-03-10 07:09:45.220 | + [[ -z deb ]]<br>
> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'<br>
> 2014-03-10 07:09:45.222 | + sudo service tgt restart<br>
> 2014-03-10 07:09:45.223 | stop: Unknown instance:<br>
> 2014-03-10 07:09:45.619 | start: Job failed to start<br>
> jenkins@neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +<br>
> exit_trap<br>
> 2014-03-10 07:09:45.622 | + local r=1<br>
> 2014-03-10 07:09:45.623 | ++ jobs -p<br>
> 2014-03-10 07:09:45.624 | + jobs=<br>
> 2014-03-10 07:09:45.625 | + [[ -n '' ]]<br>
> 2014-03-10 07:09:45.626 | + exit 1<br>
><br>
> If I try to restart tgt manually without success:<br>
><br>
> jenkins@neutronpluginsci:~$ sudo service tgt restart<br>
> stop: Unknown instance:<br>
> start: Job failed to start<br>
> jenkins@neutronpluginsci:~$ sudo tgtd<br>
> librdmacm: couldn't read ABI version.<br>
> librdmacm: assuming: 4<br>
> CMA: unable to get RDMA device list<br>
> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?<br>
> (null): fcoe_init(214) (null)<br>
> (null): fcoe_create_interface(171) no interface specified.<br>
> jenkins@neutronpluginsci:~$<br>
><br>
> The config in /etc/tgt is:<br>
><br>
> jenkins@neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root<br>
> root 4096 Mar 10 07:03 conf.d<br>
> lrwxrwxrwx 1 root root 30 Mar 10 06:50 stack.d -> /opt/stack/data/cinder/volumes<br>
> -rw-r--r-- 1 root root 58 Mar 10 07:07 targets.conf<br>
> jenkins@neutronpluginsci:/etc/tgt$ cat targets.conf include<br>
> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*<br>
> jenkins@neutronpluginsci:/etc/tgt$ ls conf.d<br>
> jenkins@neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes<br>
> jenkins@neutronpluginsci:/etc/tgt$<br>
><br>
> I don't know if there's any missing Cinder config in my DevStack localrc files. Here's one that I'm using:<br>
><br>
> MYSQL_PASSWORD=nova<br>
> RABBIT_PASSWORD=nova<br>
> SERVICE_TOKEN=nova<br>
> SERVICE_PASSWORD=nova<br>
> ADMIN_PASSWORD=nova<br>
> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder<br>
> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit<br>
> enable_service mysql<br>
> disable_service n-net<br>
> enable_service q-svc<br>
> enable_service q-agt<br>
> enable_service q-l3<br>
> enable_service q-dhcp<br>
> enable_service q-meta<br>
> enable_service q-lbaas<br>
> enable_service neutron<br>
> enable_service tempest<br>
> VOLUME_BACKING_FILE_SIZE=2052M<br>
> Q_PLUGIN=cisco<br>
> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A<br>
> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron<br>
> pluginsci:1/9)<br>
> NCCLIENT_REPO=git://<a href="http://github.com/CiscoSystems/ncclient.git" target="_blank">github.com/CiscoSystems/ncclient.git</a><br>
> PHYSICAL_NETWORK=physnet1<br>
> OVS_PHYSICAL_BRIDGE=br-eth1<br>
> TENANT_VLAN_RANGE=810:819<br>
> ENABLE_TENANT_VLANS=True<br>
> API_RATE_LIMIT=False<br>
> VERBOSE=True<br>
> DEBUG=True<br>
> LOGFILE=/opt/stack/logs/stack.sh.log<br>
> USE_SCREEN=True<br>
> SCREEN_LOGDIR=/opt/stack/logs<br>
><br>
> Here are links to a log showing another localrc file that I use, and the corresponding stack.sh log:<br>
><br>
> <a href="http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo" target="_blank">http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo</a><br>
> g.txt<br>
> <a href="http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l" target="_blank">http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l</a><br>
> og.txt<br>
><br>
> Does anyone have any advice on how to debug this, or recover from this (beyond rebooting the node)? Or am I missing any Cinder config?<br>
><br>
> Thanks in advance for any help on this!!!<br>
> Dane<br>
><br>
><br>
><br>
> _______________________________________________<br>
> OpenStack-Infra mailing list<br>
> <a href="mailto:OpenStack-Infra@lists.openstack.org" target="_blank">OpenStack-Infra@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>
><br>
<br>
<br>
--<br>
Sean Dague<br>
Samsung Research America<br>
<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a> / <a href="mailto:sean.dague@samsung.com" target="_blank">sean.dague@samsung.com</a><br>
<a href="http://dague.net" target="_blank">http://dague.net</a><br>
<br>
_______________________________________________<br>
OpenStack-Infra mailing list<br>
<a href="mailto:OpenStack-Infra@lists.openstack.org" target="_blank">OpenStack-Infra@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>
</div></div></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
OpenStack-Infra mailing list<br>
<a href="mailto:OpenStack-Infra@lists.openstack.org">OpenStack-Infra@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>
<br></blockquote></div><div class="gmail_default" style="font-family:'courier new',monospace">I would expect it to continue to fail once you're in this state. I would have to agree with Sean's comment about stack/unstack that many cycles. Try clean.sh that may help, otherwise I'd say reboot and do a fresh devstack. That many cycles on an existing install is going to be problematic. That being said see if you can find more info in syslog and possibly kernel logs and we may be able to come up with a more elegant fix. Typically if/when I hit things like this I run clean.sh and it handles things nicely. Otherwise I reboot and it's not a big deal. I don't think this seems like a real-world case that we might hit. Also FWIW I've seen this in the past, so I wouldn't tie it to any recent commit or change anywhere.</div>
<br></div></div>