[openstack-dev] [rally] Rally tries to stop instance despite on its shutoff state

boden boden at linux.vnet.ibm.com
Fri Nov 15 13:42:47 UTC 2013


On 11/14/2013 9:29 AM, Anastasia Sklyankina wrote:
> Hello, everybody!
>
> I have some problem with rally "boot_and_bounce_server" test
> specifically with "stop_start".
>
> I ran stop_start test 3 times with 12 instances booted in 5 threads
> (used base_task.json file is attached).
> Rally launches all 12 instances but with 2 of them there is a problem
> during stopping and starting process:
>
> */var/log/nova/conductor.log* (on the controller node) has the error
> *"InstanceInvalidState: Instance 9a64bc68-bf14-4d8a-ac25-13d8c1f26513 in
> vm_state stopped. Cannot stop while the instance is in this state.*":
> http://paste.openstack.org/show/v5ahFwuqMjwOfViqmZ1G/
>
> *rally output*: http://paste.openstack.org/show/rsOa0TFMQzqKUYeNczgf/
> returns *"TimeoutException: Timeout exceeded."*
>
> */var/log/nova/compute.log* (on the compute node) has the
> error*"Timeout: Timeout while waiting on RPC response"*:
> http://paste.openstack.org/show/3W4moI1EaQybcOzYIgHJ/
>
> These errors occur in the last iterations within booting of 2 last
> instances. 10 rest instances are stopped and started successfully.
>
> It seems rally stops an instance after its launching, cannot stop it
> correctly in some defined time, wait some time and tries to stop this
> instance again. But the instance is still in shutoff status. That is why
> the first error occurs.
> Shutoff status of 2 last instances is illustrated in the attached
> screenshot "shutoff_instances_horizon.png".
>
>
> *As I noticed* if run the same test but with smaller data, for example,
> boot 6 instances (in 3 threads) and stop/start them 3 times
> aforementioned errors don't occur.
>
>
> Could you please give some clarity in such strange situation - is this a
> problem with test environment of with rally execution?
> Great thanks in advance!
>
> --
> Best Regards,
> Anastasiya Sklyankina,
> Senior QA engineer,
> Mirantis, Inc
> +7(927)22-90-300 (cell)
> Skype: anastasiya_sklyankina
> E-mail: asklyankina at mirantis.com <mailto:asklyankina at mirantis.com>
> www.mirantis.ru <http://www.mirantis.ru/>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
Hi,
A few questions/clarifications...

Just to confirm -- the system times on all your nodes in this scenario 
are near in-sync correct? i.e. for each of your nodes (as I understand):
(a) Rally client host
(b) Controller node
(c) Compute node
we can assume the log timestamp is accurate across all the nodes...


Assuming the system time is in sync across the nodes, it appears we are 
missing some pieces of information.
- The 1st error (time-wise) is related to rally trying to stop vm with 
id '9a64bc68-bf14-4d8a-ac25-13d8c1f26513' when its already stopped... I 
don't see any log messages from the rally output you linked related to this.
- The RPC timeout... After you see this error does your compute service 
for comp1.wd.com still show 'up' (run the "nova service-list" command on 
the controller).
- The Rally client error is related to not being able to start VM with 
ID 'a2787898-3c94-4854-8894-f34302acca94' within 10 minutes... This 
seems indicative of a compute/controller issue. At this point are you 
able to start/stop a VM (on the comp1 node) via the dashboard?
- Is this issue reproducible, or is it intermittent?

Thanks






More information about the OpenStack-dev mailing list