[OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
Dane Leblanc (leblancd)
leblancd at cisco.com
Wed Mar 12 20:58:21 UTC 2014
Hi Roey:
Looks like your suggested changes to /etc/sysctl.conf have done the trick… I haven’t seen the problem with tgtd failing to start since I made this change.
Thanks!
Dane
From: Sukhdev Kapur [mailto:sukhdevkapur at gmail.com]
Sent: Monday, March 10, 2014 11:19 PM
To: Roey Chen
Cc: Dane Leblanc (leblancd); Sean Dague; John Griffith; openstack-infra at lists.openstack.org
Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
Hi Roey,
Thanks for the tip. I have made the change according to your suggestion and fired off tests for overnight test. Will let you know in the morning if this fixes the issue.
Thanks
-Sukhdev
On Mon, Mar 10, 2014 at 4:17 PM, Roey Chen <roeyc at mellanox.com<mailto:roeyc at mellanox.com>> wrote:
Hi,
Hope this could help,
I've encountered this issue myself not to long ago on Ubuntu 12.04 host,
it didn't happen again after messing with the Kernel Semaphore Limits parameters [1]:
Adding this [2] line to `/etc/sysctl.conf` seems to do the trick.
- Roey
[1] http://paste.openstack.org/show/73086/
[2] http://paste.openstack.org/show/73082/
________________________________
From: Dane Leblanc (leblancd) [leblancd at cisco.com<mailto:leblancd at cisco.com>]
Sent: Monday, March 10, 2014 11:54 PM
To: Sukhdev Kapur; Sean Dague; John Griffith
Cc: openstack-infra at lists.openstack.org<mailto:openstack-infra at lists.openstack.org>
Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
Sean, John:
I’ve had a similar experience as Sukhdev… I had tried doing clean.sh on every run, but that didn’t help prevent the tgt problem, and it doesn’t help recover from it.
Sounds like the best option is to reset the VM for each run.
Thanks,
Dane
From: Sukhdev Kapur [mailto:sukhdevkapur at gmail.com<mailto:sukhdevkapur at gmail.com>]
Sent: Monday, March 10, 2014 4:33 PM
To: Sean Dague
Cc: Dane Leblanc (leblancd); openstack-infra at lists.openstack.org<mailto:openstack-infra at lists.openstack.org>
Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
Hi Sean,
In my case, for every run, I do unstack.sh, clean.sh, sudo rm -rf devstack, sudo rm -rf /opt/stack.
Then I go get everything fresh and stack.sh, and a full run of smoke tests
Few iterations of this sequence will get you into this condition. Once in this condition - clean.sh and unstack.sh, nothing helps, it fails solid 100% of times. If reboot the VM, everything works just fine for next 10-20 cycles until it hits the same condition. So, I am planning on modifying the script to reboot the VM every two hours or so....as a work around....but, the underlying problem occurred close to Ichouse check-ins. I started to notice this few days earlier than the Icehouse deadline, prior to that I was running the same sequence without any issue (for several weeks) - if that helps any...
-Sukhdev
On Mon, Mar 10, 2014 at 1:07 PM, Sean Dague <sean at dague.net<mailto:sean at dague.net>> wrote:
So, honestly, running stack.sh / unstack.sh that many times in a row
really isn't expected to work in my experience. You should at minimum be
doing ./clean.sh to try to reset the state further.
-Sean
On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:
> In my case, the base OS is 12.04 Precise.
>
> The problem is intermittent in that it takes maybe 15 to 20 cycles of unstack/stack to get it into the failure mode, but once in the failure mode, it appears that tgt daemon is 100% dead-in-the-water.
>
> -----Original Message-----
> From: Sean Dague [mailto:sean at dague.net<mailto:sean at dague.net>]
> Sent: Monday, March 10, 2014 1:49 PM
> To: Dane Leblanc (leblancd); openstack-infra at lists.openstack.org<mailto:openstack-infra at lists.openstack.org>
> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"
>
> What base OS? A change was made there recently to better handle debian because we believed (possibly incorrectly) that precise actually had working init scripts.
>
> It would be interesting to understand if this was a 100% failure, or only intermittent, and what base OS it was on.
>
> -Sean
>
> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
>> I don't know if anyone can give me some troubleshooting advice with this issue.
>>
>> I'm seeing an occasional problem whereby after several DevStack unstack.sh/stack.sh<http://unstack.sh/stack.sh> cycles, the tgt daemon (tgtd) fails to start during Cinder startup. Here's a snippet from the stack.sh log:
>>
>> 2014-03-10 07:09:45.214 | Starting Cinder
>> 2014-03-10 07:09:45.215 | + return 0
>> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
>> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
>> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
>> 2014-03-10 07:09:45.219 | + is_ubuntu
>> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
>> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
>> 2014-03-10 07:09:45.222 | + sudo service tgt restart
>> 2014-03-10 07:09:45.223 | stop: Unknown instance:
>> 2014-03-10 07:09:45.619 | start: Job failed to start
>> jenkins at neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +
>> exit_trap
>> 2014-03-10 07:09:45.622 | + local r=1
>> 2014-03-10 07:09:45.623 | ++ jobs -p
>> 2014-03-10 07:09:45.624 | + jobs=
>> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
>> 2014-03-10 07:09:45.626 | + exit 1
>>
>> If I try to restart tgt manually without success:
>>
>> jenkins at neutronpluginsci:~$ sudo service tgt restart
>> stop: Unknown instance:
>> start: Job failed to start
>> jenkins at neutronpluginsci:~$ sudo tgtd
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?
>> (null): fcoe_init(214) (null)
>> (null): fcoe_create_interface(171) no interface specified.
>> jenkins at neutronpluginsci:~$
>>
>> The config in /etc/tgt is:
>>
>> jenkins at neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root
>> root 4096 Mar 10 07:03 conf.d
>> lrwxrwxrwx 1 root root 30 Mar 10 06:50 stack.d -> /opt/stack/data/cinder/volumes
>> -rw-r--r-- 1 root root 58 Mar 10 07:07 targets.conf
>> jenkins at neutronpluginsci:/etc/tgt$ cat targets.conf include
>> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*
>> jenkins at neutronpluginsci:/etc/tgt$ ls conf.d
>> jenkins at neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes
>> jenkins at neutronpluginsci:/etc/tgt$
>>
>> I don't know if there's any missing Cinder config in my DevStack localrc files. Here's one that I'm using:
>>
>> MYSQL_PASSWORD=nova
>> RABBIT_PASSWORD=nova
>> SERVICE_TOKEN=nova
>> SERVICE_PASSWORD=nova
>> ADMIN_PASSWORD=nova
>> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
>> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
>> enable_service mysql
>> disable_service n-net
>> enable_service q-svc
>> enable_service q-agt
>> enable_service q-l3
>> enable_service q-dhcp
>> enable_service q-meta
>> enable_service q-lbaas
>> enable_service neutron
>> enable_service tempest
>> VOLUME_BACKING_FILE_SIZE=2052M
>> Q_PLUGIN=cisco
>> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A
>> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
>> pluginsci:1/9)
>> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git<http://github.com/CiscoSystems/ncclient.git>
>> PHYSICAL_NETWORK=physnet1
>> OVS_PHYSICAL_BRIDGE=br-eth1
>> TENANT_VLAN_RANGE=810:819
>> ENABLE_TENANT_VLANS=True
>> API_RATE_LIMIT=False
>> VERBOSE=True
>> DEBUG=True
>> LOGFILE=/opt/stack/logs/stack.sh.log
>> USE_SCREEN=True
>> SCREEN_LOGDIR=/opt/stack/logs
>>
>> Here are links to a log showing another localrc file that I use, and the corresponding stack.sh log:
>>
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo
>> g.txt
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l
>> og.txt
>>
>> Does anyone have any advice on how to debug this, or recover from this (beyond rebooting the node)? Or am I missing any Cinder config?
>>
>> Thanks in advance for any help on this!!!
>> Dane
>>
>>
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org<mailto:OpenStack-Infra at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>
>
> --
> Sean Dague
> Samsung Research America
> sean at dague.net<mailto:sean at dague.net> / sean.dague at samsung.com<mailto:sean.dague at samsung.com>
> http://dague.net
>
--
Sean Dague
Samsung Research America
sean at dague.net<mailto:sean at dague.net> / sean.dague at samsung.com<mailto:sean.dague at samsung.com>
http://dague.net
_______________________________________________
OpenStack-Infra mailing list
OpenStack-Infra at lists.openstack.org<mailto:OpenStack-Infra at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140312/326d3900/attachment-0001.html>
More information about the OpenStack-Infra
mailing list