[OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

Sukhdev Kapur sukhdevkapur at gmail.com
Mon Mar 10 20:32:36 UTC 2014


Hi Sean,

In my case, for every run, I do unstack.sh, clean.sh, sudo rm -rf devstack,
sudo rm -rf /opt/stack.
Then I go get everything fresh and stack.sh, and a full run of smoke tests
Few iterations of this sequence will get you into this condition. Once in
this condition - clean.sh and unstack.sh, nothing helps, it fails solid
100% of times. If reboot the VM, everything works just fine for next 10-20
cycles until it hits the same condition. So, I am planning on modifying the
script to reboot the VM every two hours or so....as a work around....but,
the underlying problem occurred close to Ichouse check-ins. I started to
notice this few days earlier than the Icehouse deadline, prior to that I
was running the same sequence without any issue (for several weeks) - if
that helps any...

-Sukhdev



On Mon, Mar 10, 2014 at 1:07 PM, Sean Dague <sean at dague.net> wrote:

> So, honestly, running stack.sh / unstack.sh that many times in a row
> really isn't expected to work in my experience. You should at minimum be
> doing ./clean.sh to try to reset the state further.
>
>         -Sean
>
> On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:
> > In my case, the base OS is 12.04 Precise.
> >
> > The problem is intermittent in that it takes maybe 15 to 20 cycles of
> unstack/stack to get it into the failure mode, but once in the failure
> mode, it appears that tgt daemon is 100% dead-in-the-water.
> >
> > -----Original Message-----
> > From: Sean Dague [mailto:sean at dague.net]
> > Sent: Monday, March 10, 2014 1:49 PM
> > To: Dane Leblanc (leblancd); openstack-infra at lists.openstack.org
> > Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup
> "start: job failed to start"
> >
> > What base OS? A change was made there recently to better handle debian
> because we believed (possibly incorrectly) that precise actually had
> working init scripts.
> >
> > It would be interesting to understand if this was a 100% failure, or
> only intermittent, and what base OS it was on.
> >
> >       -Sean
> >
> > On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
> >> I don't know if anyone can give me some troubleshooting advice with
> this issue.
> >>
> >> I'm seeing an occasional problem whereby after several DevStack
> unstack.sh/stack.sh cycles, the tgt daemon (tgtd) fails to start during
> Cinder startup.  Here's a snippet from the stack.sh log:
> >>
> >> 2014-03-10 07:09:45.214 | Starting Cinder
> >> 2014-03-10 07:09:45.215 | + return 0
> >> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
> >> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
> >> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
> >> 2014-03-10 07:09:45.219 | + is_ubuntu
> >> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
> >> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
> >> 2014-03-10 07:09:45.222 | + sudo service tgt restart
> >> 2014-03-10 07:09:45.223 | stop: Unknown instance:
> >> 2014-03-10 07:09:45.619 | start: Job failed to start
> >> jenkins at neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +
> >> exit_trap
> >> 2014-03-10 07:09:45.622 | + local r=1
> >> 2014-03-10 07:09:45.623 | ++ jobs -p
> >> 2014-03-10 07:09:45.624 | + jobs=
> >> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
> >> 2014-03-10 07:09:45.626 | + exit 1
> >>
> >> If I try to restart tgt manually without success:
> >>
> >> jenkins at neutronpluginsci:~$ sudo service tgt restart
> >> stop: Unknown instance:
> >> start: Job failed to start
> >> jenkins at neutronpluginsci:~$ sudo tgtd
> >> librdmacm: couldn't read ABI version.
> >> librdmacm: assuming: 4
> >> CMA: unable to get RDMA device list
> >> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel
> modules?
> >> (null): fcoe_init(214) (null)
> >> (null): fcoe_create_interface(171) no interface specified.
> >> jenkins at neutronpluginsci:~$
> >>
> >> The config in /etc/tgt is:
> >>
> >> jenkins at neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root
> >> root 4096 Mar 10 07:03 conf.d
> >> lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d ->
> /opt/stack/data/cinder/volumes
> >> -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
> >> jenkins at neutronpluginsci:/etc/tgt$ cat targets.conf include
> >> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*
> >> jenkins at neutronpluginsci:/etc/tgt$ ls conf.d
> >> jenkins at neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes
> >> jenkins at neutronpluginsci:/etc/tgt$
> >>
> >> I don't know if there's any missing Cinder config in my DevStack
> localrc files. Here's one that I'm using:
> >>
> >> MYSQL_PASSWORD=nova
> >> RABBIT_PASSWORD=nova
> >> SERVICE_TOKEN=nova
> >> SERVICE_PASSWORD=nova
> >> ADMIN_PASSWORD=nova
> >> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
> >> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
> >> enable_service mysql
> >> disable_service n-net
> >> enable_service q-svc
> >> enable_service q-agt
> >> enable_service q-l3
> >> enable_service q-dhcp
> >> enable_service q-meta
> >> enable_service q-lbaas
> >> enable_service neutron
> >> enable_service tempest
> >> VOLUME_BACKING_FILE_SIZE=2052M
> >> Q_PLUGIN=cisco
> >> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A
> >> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
> >> pluginsci:1/9)
> >> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git
> >> PHYSICAL_NETWORK=physnet1
> >> OVS_PHYSICAL_BRIDGE=br-eth1
> >> TENANT_VLAN_RANGE=810:819
> >> ENABLE_TENANT_VLANS=True
> >> API_RATE_LIMIT=False
> >> VERBOSE=True
> >> DEBUG=True
> >> LOGFILE=/opt/stack/logs/stack.sh.log
> >> USE_SCREEN=True
> >> SCREEN_LOGDIR=/opt/stack/logs
> >>
> >> Here are links to a log showing another localrc file that I use, and
> the corresponding stack.sh log:
> >>
> >> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo
> >> g.txt
> >> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l
> >> og.txt
> >>
> >> Does anyone have any advice on how to debug this, or recover from this
> (beyond rebooting the node)? Or am I missing any Cinder config?
> >>
> >> Thanks in advance for any help on this!!!
> >> Dane
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-Infra mailing list
> >> OpenStack-Infra at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> >>
> >
> >
> > --
> > Sean Dague
> > Samsung Research America
> > sean at dague.net / sean.dague at samsung.com
> > http://dague.net
> >
>
>
> --
> Sean Dague
> Samsung Research America
> sean at dague.net / sean.dague at samsung.com
> http://dague.net
>
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140310/34d6a562/attachment-0001.html>


More information about the OpenStack-Infra mailing list