[openstack-dev] [Savanna] error in cluster launch: instance count gets to three, status=waiting, before going to error status

Dmitry Mescheryakov dmescheryakov at mirantis.com
Tue Sep 17 19:50:20 UTC 2013


Kesten,

The error occurs at the following line:
https://github.com/stackforge/savanna/blob/master/savanna/service/instances.py#L293

The reason is that Savanna detects that one of instances it spawned entered
'Error' state. I.e. the problem is either in your OpenStack or in a way
Savanna spawns instances. The instances are spawned by the following piece
of code:
https://github.com/stackforge/savanna/blob/master/savanna/service/instances.py#L214

Right now Savanna uses rather dummy policy: if anything goes wrong during
cluster start, Savanna completely rolls back startup and terminates spawned
instances. If you comment the following line:
https://github.com/stackforge/savanna/blob/master/savanna/service/instances.py#L367

, you should see one of cluster instances in 'Error' state after cluster
startup fails.

Dmitry


2013/9/17 kesten broughton <kesten.broughton at gmail.com>

> I have applied the proposed patch for the setattr launch error
>
>
> I patch works, but the launch still fails.
>
> The stack trace just says the creation failed.
>
>
> 127.0.0.1 - - [16/Sep/2013 12:21:20] "POST /v1.0/2c8b2627a169458e8ab875690a51eabd/clusters HTTP/1.1" 202 1877 1.441746
> 2013-09-16 12:21:42.592 47355 WARNING savanna.service.instances [-] Can't start cluster 'cluster-1' (reason: node cluster-1-workers-002 has error status)
> 2013-09-16 12:21:51.369 47355 ERROR savanna.context [-] Thread 'cluster-creating-057ae8f2-ce41-4508-9696-3affe064178d' fails with exception: 'node cluster-1-workers-002 has error status'
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context Traceback (most recent call last):
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context   File "/home/stack/savanna-venv/local/lib/python2.7/site-packages/savanna/context.py", line 93, in wrapper
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context     func(*args, **kwargs)
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context   File "/home/stack/savanna-venv/local/lib/python2.7/site-packages/savanna/service/api.py", line 137, in _provision_cluster
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context     i.create_cluster(cluster)
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context   File "/home/stack/savanna-venv/local/lib/python2.7/site-packages/savanna/service/instances.py", line 63, in create_cluster
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context     _rollback_cluster_creation(cluster, ex)
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context   File "/home/stack/savanna-venv/local/lib/python2.7/site-packages/savanna/openstack/common/excutils.py", line 70, in __exit__
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context     six.reraise(self.type_, self.value, self.tb)
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context   File "/home/stack/savanna-venv/local/lib/python2.7/site-packages/savanna/service/instances.py", line 45, in create_cluster
> 2013-09-16 12:21:51.369 47355 TRACE savanna.context     cluster = _await_instances(cluster)
>
>
> complete details here: http://paste.openstack.org/show/47126/
> I can see in horizon that it makes it past the "waiting" log message.
>
> My suspicion is with volumes.attach() below since the output from the
> cluster_create contains
>
> "volume_mount_prefix": "/volumes/disk",
>
>
> I am using the vmware machine as configured in the guide, but
> locate /volumes/disk returns nothing.
>
> --- here is instances.py where the exception is thrown
>
> def create_cluster(cluster):
>     ctx = context.ctx()
>     try:
>         # create all instances
>         conductor.cluster_update(ctx, cluster, {"status": "Spawning"})
>         LOG.info(g.format_cluster_status(cluster))
>         _create_instances(cluster)
>
>         # wait for all instances are up and accessible
>         cluster = conductor.cluster_update(ctx, cluster, {"status":
> "Waiting"})
>         LOG.info(g.format_cluster_status(cluster))
>         cluster = _await_instances(cluster)
>
>         # attach volumes
>         volumes.attach(cluster)
>
>         # prepare all instances
>         cluster = conductor.cluster_update(ctx, cluster,
>                                            {"status": "Preparing"})
>         LOG.info(g.format_cluster_status(cluster))
>
>         _configure_instances(cluster)
>
>     except Exception as ex:
>         LOG.warn("Can't start cluster '%s' (reason: %s)", cluster.name,
> ex)
>
> tips to debug?
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130917/ae2be97f/attachment.html>


More information about the OpenStack-dev mailing list