[OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

John Griffith john.griffith at solidfire.com
Mon Mar 10 20:29:03 UTC 2014


On Mon, Mar 10, 2014 at 2:17 PM, Sukhdev Kapur <sukhdevkapur at gmail.com>wrote:

> I should have clarified. In my case it is identical - once it hits the
> failure, after that it is always 100% of time failure - i.e. every run
> fails after that.
>
> HTH
> -Sukhdev
>
>
>
> On Mon, Mar 10, 2014 at 12:00 PM, Dane Leblanc (leblancd) <
> leblancd at cisco.com> wrote:
>
>> In my case, the base OS is 12.04 Precise.
>>
>> The problem is intermittent in that it takes maybe 15 to 20 cycles of
>> unstack/stack to get it into the failure mode, but once in the failure
>> mode, it appears that tgt daemon is 100% dead-in-the-water.
>>
>> -----Original Message-----
>> From: Sean Dague [mailto:sean at dague.net]
>> Sent: Monday, March 10, 2014 1:49 PM
>> To: Dane Leblanc (leblancd); openstack-infra at lists.openstack.org
>> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup
>> "start: job failed to start"
>>
>> What base OS? A change was made there recently to better handle debian
>> because we believed (possibly incorrectly) that precise actually had
>> working init scripts.
>>
>> It would be interesting to understand if this was a 100% failure, or only
>> intermittent, and what base OS it was on.
>>
>>         -Sean
>>
>> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
>> > I don't know if anyone can give me some troubleshooting advice with
>> this issue.
>> >
>> > I'm seeing an occasional problem whereby after several DevStack
>> unstack.sh/stack.sh cycles, the tgt daemon (tgtd) fails to start during
>> Cinder startup.  Here's a snippet from the stack.sh log:
>> >
>> > 2014-03-10 07:09:45.214 | Starting Cinder
>> > 2014-03-10 07:09:45.215 | + return 0
>> > 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
>> > 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
>> > 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
>> > 2014-03-10 07:09:45.219 | + is_ubuntu
>> > 2014-03-10 07:09:45.220 | + [[ -z deb ]]
>> > 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
>> > 2014-03-10 07:09:45.222 | + sudo service tgt restart
>> > 2014-03-10 07:09:45.223 | stop: Unknown instance:
>> > 2014-03-10 07:09:45.619 | start: Job failed to start
>> > jenkins at neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +
>> > exit_trap
>> > 2014-03-10 07:09:45.622 | + local r=1
>> > 2014-03-10 07:09:45.623 | ++ jobs -p
>> > 2014-03-10 07:09:45.624 | + jobs=
>> > 2014-03-10 07:09:45.625 | + [[ -n '' ]]
>> > 2014-03-10 07:09:45.626 | + exit 1
>> >
>> > If I try to restart tgt manually without success:
>> >
>> > jenkins at neutronpluginsci:~$ sudo service tgt restart
>> > stop: Unknown instance:
>> > start: Job failed to start
>> > jenkins at neutronpluginsci:~$ sudo tgtd
>> > librdmacm: couldn't read ABI version.
>> > librdmacm: assuming: 4
>> > CMA: unable to get RDMA device list
>> > (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel
>> modules?
>> > (null): fcoe_init(214) (null)
>> > (null): fcoe_create_interface(171) no interface specified.
>> > jenkins at neutronpluginsci:~$
>> >
>> > The config in /etc/tgt is:
>> >
>> > jenkins at neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root
>> > root 4096 Mar 10 07:03 conf.d
>> > lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d ->
>> /opt/stack/data/cinder/volumes
>> > -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
>> > jenkins at neutronpluginsci:/etc/tgt$ cat targets.conf include
>> > /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*
>> > jenkins at neutronpluginsci:/etc/tgt$ ls conf.d
>> > jenkins at neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes
>> > jenkins at neutronpluginsci:/etc/tgt$
>> >
>> > I don't know if there's any missing Cinder config in my DevStack
>> localrc files. Here's one that I'm using:
>> >
>> > MYSQL_PASSWORD=nova
>> > RABBIT_PASSWORD=nova
>> > SERVICE_TOKEN=nova
>> > SERVICE_PASSWORD=nova
>> > ADMIN_PASSWORD=nova
>> > ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
>> > ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
>> > enable_service mysql
>> > disable_service n-net
>> > enable_service q-svc
>> > enable_service q-agt
>> > enable_service q-l3
>> > enable_service q-dhcp
>> > enable_service q-meta
>> > enable_service q-lbaas
>> > enable_service neutron
>> > enable_service tempest
>> > VOLUME_BACKING_FILE_SIZE=2052M
>> > Q_PLUGIN=cisco
>> > declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A
>> > Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
>> > pluginsci:1/9)
>> > NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git
>> > PHYSICAL_NETWORK=physnet1
>> > OVS_PHYSICAL_BRIDGE=br-eth1
>> > TENANT_VLAN_RANGE=810:819
>> > ENABLE_TENANT_VLANS=True
>> > API_RATE_LIMIT=False
>> > VERBOSE=True
>> > DEBUG=True
>> > LOGFILE=/opt/stack/logs/stack.sh.log
>> > USE_SCREEN=True
>> > SCREEN_LOGDIR=/opt/stack/logs
>> >
>> > Here are links to a log showing another localrc file that I use, and
>> the corresponding stack.sh log:
>> >
>> > http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo
>> > g.txt
>> > http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l
>> > og.txt
>> >
>> > Does anyone have any advice on how to debug this, or recover from this
>> (beyond rebooting the node)? Or am I missing any Cinder config?
>> >
>> > Thanks in advance for any help on this!!!
>> > Dane
>> >
>> >
>> >
>> > _______________________________________________
>> > OpenStack-Infra mailing list
>> > OpenStack-Infra at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>> >
>>
>>
>> --
>> Sean Dague
>> Samsung Research America
>> sean at dague.net / sean.dague at samsung.com
>> http://dague.net
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> OpenStack-Infra at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
> I would expect it to continue to fail once you're in this state.  I would
have to agree with Sean's comment about stack/unstack that many cycles.
 Try clean.sh that may help, otherwise I'd say reboot and do a fresh
devstack.  That many cycles on an existing install is going to be
problematic.  That being said see if you can find more info in syslog and
possibly kernel logs and we may be able to come up with a more elegant fix.
 Typically if/when I hit things like this I run clean.sh and it handles
things nicely.  Otherwise I reboot and it's not a big deal.  I don't think
this seems like a real-world case that we might hit.  Also FWIW I've seen
this in the past, so I wouldn't tie it to any recent commit or change
anywhere.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140310/12688d97/attachment.html>


More information about the OpenStack-Infra mailing list