<div dir="ltr">Hi Roey, <div><br></div><div style>Thanks for the tip. I have made the change according to your suggestion and fired off tests for overnight test. Will let you know in the morning if this fixes the issue. </div>
<div style><br></div><div style>Thanks</div><div style>-Sukhdev</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 10, 2014 at 4:17 PM, Roey Chen <span dir="ltr"><<a href="mailto:roeyc@mellanox.com" target="_blank">roeyc@mellanox.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">
<div>Hi,</div>
<div><br>
</div>
<div>
<div>Hope this could help,</div>
</div>
<div><br>
</div>
<div>I've encountered this issue myself not to long ago on Ubuntu 12.04 host,</div>
<div><span style="font-size:10pt">it didn't happen again after </span><span style="font-size:10pt">messing with the Kernel Semaphore Limits parameters [1]:</span></div>
<div><br>
</div>
<div><span style="font-size:10pt">Adding this [2] line to `/etc/sysctl.conf` seems to do the trick.</span></div>
<div><br>
</div>
<div><br>
</div>
<div>- Roey</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>[1] <a href="http://paste.openstack.org/show/73086/" style="font-size:10pt" target="_blank">http://paste.openstack.org/show/73086/</a></div>
<div>[2] <a href="http://paste.openstack.org/show/73082/" style="font-size:10pt" target="_blank">http://paste.openstack.org/show/73082/</a></div>
<div><br>
</div>
</div>
<div><br>
</div>
<div><br>
<div style="font-size:16px;font-family:Times New Roman">
<hr>
<div style="direction:ltr"><font face="Tahoma" color="#000000"><b>From:</b> Dane Leblanc (leblancd) [<a href="mailto:leblancd@cisco.com" target="_blank">leblancd@cisco.com</a>]<br>
<b>Sent:</b> Monday, March 10, 2014 11:54 PM<br>
<b>To:</b> Sukhdev Kapur; Sean Dague; John Griffith<br>
<b>Cc:</b> <a href="mailto:openstack-infra@lists.openstack.org" target="_blank">openstack-infra@lists.openstack.org</a><div><div class="h5"><br>
<b>Subject:</b> Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"<br>
</div></div></font><br>
</div><div><div class="h5">
<div></div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Sean, John:</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue"> </span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">I’ve had a similar experience as Sukhdev… I had tried doing clean.sh on every run, but that didn’t help prevent the tgt problem, and it doesn’t help recover from
it.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Sounds like the best option is to reset the VM for each run.</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue"> </span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Thanks,</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue">Dane</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:blue"> </span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sukhdev Kapur [mailto:<a href="mailto:sukhdevkapur@gmail.com" target="_blank">sukhdevkapur@gmail.com</a>]
<br>
<b>Sent:</b> Monday, March 10, 2014 4:33 PM<br>
<b>To:</b> Sean Dague<br>
<b>Cc:</b> Dane Leblanc (leblancd); <a href="mailto:openstack-infra@lists.openstack.org" target="_blank">openstack-infra@lists.openstack.org</a><br>
<b>Subject:</b> Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"</span></p>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">Hi Sean, </p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">In my case, for every run, I do unstack.sh, clean.sh, sudo rm -rf devstack, sudo rm -rf /opt/stack. </p>
</div>
<div>
<p class="MsoNormal">Then I go get everything fresh and stack.sh, and a full run of smoke tests</p>
</div>
<div>
<p class="MsoNormal">Few iterations of this sequence will get you into this condition. Once in this condition - clean.sh and unstack.sh, nothing helps, it fails solid 100% of times. If reboot the VM, everything works just fine for next 10-20 cycles until it
hits the same condition. So, I am planning on modifying the script to reboot the VM every two hours or so....as a work around....but, the underlying problem occurred close to Ichouse check-ins. I started to notice this few days earlier than the Icehouse deadline,
prior to that I was running the same sequence without any issue (for several weeks) - if that helps any...</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">-Sukhdev</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> </p>
<div>
<p class="MsoNormal">On Mon, Mar 10, 2014 at 1:07 PM, Sean Dague <<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>> wrote:</p>
<p class="MsoNormal">So, honestly, running stack.sh / unstack.sh that many times in a row<br>
really isn't expected to work in my experience. You should at minimum be<br>
doing ./clean.sh to try to reset the state further.<br>
<span style="color:#888888"><br>
<span> -Sean</span></span></p>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:<br>
> In my case, the base OS is 12.04 Precise.<br>
><br>
> The problem is intermittent in that it takes maybe 15 to 20 cycles of unstack/stack to get it into the failure mode, but once in the failure mode, it appears that tgt daemon is 100% dead-in-the-water.<br>
><br>
> -----Original Message-----<br>
> From: Sean Dague [mailto:<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>]<br>
> Sent: Monday, March 10, 2014 1:49 PM<br>
> To: Dane Leblanc (leblancd); <a href="mailto:openstack-infra@lists.openstack.org" target="_blank">
openstack-infra@lists.openstack.org</a><br>
> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"<br>
><br>
> What base OS? A change was made there recently to better handle debian because we believed (possibly incorrectly) that precise actually had working init scripts.<br>
><br>
> It would be interesting to understand if this was a 100% failure, or only intermittent, and what base OS it was on.<br>
><br>
> -Sean<br>
><br>
> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:<br>
>> I don't know if anyone can give me some troubleshooting advice with this issue.<br>
>><br>
>> I'm seeing an occasional problem whereby after several DevStack <a href="http://unstack.sh/stack.sh" target="_blank">
unstack.sh/stack.sh</a> cycles, the tgt daemon (tgtd) fails to start during Cinder startup. Here's a snippet from the stack.sh log:<br>
>><br>
>> 2014-03-10 07:09:45.214 | Starting Cinder<br>
>> 2014-03-10 07:09:45.215 | + return 0<br>
>> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf<br>
>> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d<br>
>> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]<br>
>> 2014-03-10 07:09:45.219 | + is_ubuntu<br>
>> 2014-03-10 07:09:45.220 | + [[ -z deb ]]<br>
>> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'<br>
>> 2014-03-10 07:09:45.222 | + sudo service tgt restart<br>
>> 2014-03-10 07:09:45.223 | stop: Unknown instance:<br>
>> 2014-03-10 07:09:45.619 | start: Job failed to start<br>
>> jenkins@neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +<br>
>> exit_trap<br>
>> 2014-03-10 07:09:45.622 | + local r=1<br>
>> 2014-03-10 07:09:45.623 | ++ jobs -p<br>
>> 2014-03-10 07:09:45.624 | + jobs=<br>
>> 2014-03-10 07:09:45.625 | + [[ -n '' ]]<br>
>> 2014-03-10 07:09:45.626 | + exit 1<br>
>><br>
>> If I try to restart tgt manually without success:<br>
>><br>
>> jenkins@neutronpluginsci:~$ sudo service tgt restart<br>
>> stop: Unknown instance:<br>
>> start: Job failed to start<br>
>> jenkins@neutronpluginsci:~$ sudo tgtd<br>
>> librdmacm: couldn't read ABI version.<br>
>> librdmacm: assuming: 4<br>
>> CMA: unable to get RDMA device list<br>
>> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?<br>
>> (null): fcoe_init(214) (null)<br>
>> (null): fcoe_create_interface(171) no interface specified.<br>
>> jenkins@neutronpluginsci:~$<br>
>><br>
>> The config in /etc/tgt is:<br>
>><br>
>> jenkins@neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root<br>
>> root 4096 Mar 10 07:03 conf.d<br>
>> lrwxrwxrwx 1 root root 30 Mar 10 06:50 stack.d -> /opt/stack/data/cinder/volumes<br>
>> -rw-r--r-- 1 root root 58 Mar 10 07:07 targets.conf<br>
>> jenkins@neutronpluginsci:/etc/tgt$ cat targets.conf include<br>
>> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*<br>
>> jenkins@neutronpluginsci:/etc/tgt$ ls conf.d<br>
>> jenkins@neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes<br>
>> jenkins@neutronpluginsci:/etc/tgt$<br>
>><br>
>> I don't know if there's any missing Cinder config in my DevStack localrc files. Here's one that I'm using:<br>
>><br>
>> MYSQL_PASSWORD=nova<br>
>> RABBIT_PASSWORD=nova<br>
>> SERVICE_TOKEN=nova<br>
>> SERVICE_PASSWORD=nova<br>
>> ADMIN_PASSWORD=nova<br>
>> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder<br>
>> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit<br>
>> enable_service mysql<br>
>> disable_service n-net<br>
>> enable_service q-svc<br>
>> enable_service q-agt<br>
>> enable_service q-l3<br>
>> enable_service q-dhcp<br>
>> enable_service q-meta<br>
>> enable_service q-lbaas<br>
>> enable_service neutron<br>
>> enable_service tempest<br>
>> VOLUME_BACKING_FILE_SIZE=2052M<br>
>> Q_PLUGIN=cisco<br>
>> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A<br>
>> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron<br>
>> pluginsci:1/9)<br>
>> NCCLIENT_REPO=git://<a href="http://github.com/CiscoSystems/ncclient.git" target="_blank">github.com/CiscoSystems/ncclient.git</a><br>
>> PHYSICAL_NETWORK=physnet1<br>
>> OVS_PHYSICAL_BRIDGE=br-eth1<br>
>> TENANT_VLAN_RANGE=810:819<br>
>> ENABLE_TENANT_VLANS=True<br>
>> API_RATE_LIMIT=False<br>
>> VERBOSE=True<br>
>> DEBUG=True<br>
>> LOGFILE=/opt/stack/logs/stack.sh.log<br>
>> USE_SCREEN=True<br>
>> SCREEN_LOGDIR=/opt/stack/logs<br>
>><br>
>> Here are links to a log showing another localrc file that I use, and the corresponding stack.sh log:<br>
>><br>
>> <a href="http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo" target="_blank">
http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo</a><br>
>> g.txt<br>
>> <a href="http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l" target="_blank">
http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l</a><br>
>> og.txt<br>
>><br>
>> Does anyone have any advice on how to debug this, or recover from this (beyond rebooting the node)? Or am I missing any Cinder config?<br>
>><br>
>> Thanks in advance for any help on this!!!<br>
>> Dane<br>
>><br>
>><br>
>><br>
>> _______________________________________________<br>
>> OpenStack-Infra mailing list<br>
>> <a href="mailto:OpenStack-Infra@lists.openstack.org" target="_blank">OpenStack-Infra@lists.openstack.org</a><br>
>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" target="_blank">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>
>><br>
><br>
><br>
> --<br>
> Sean Dague<br>
> Samsung Research America<br>
> <a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a> / <a href="mailto:sean.dague@samsung.com" target="_blank">
sean.dague@samsung.com</a><br>
> <a href="http://dague.net" target="_blank">http://dague.net</a><br>
><br>
<br>
<br>
--<br>
Sean Dague<br>
Samsung Research America<br>
<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a> / <a href="mailto:sean.dague@samsung.com" target="_blank">
sean.dague@samsung.com</a><br>
<a href="http://dague.net" target="_blank">http://dague.net</a></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
OpenStack-Infra mailing list<br>
<a href="mailto:OpenStack-Infra@lists.openstack.org" target="_blank">OpenStack-Infra@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a></p>
</div>
<p class="MsoNormal"> </p>
</div>
</div>
</div>
</div></div></div>
</div>
</div>
</div>
</blockquote></div><br></div>