[qa][tempest] Waiting for interface status == ACTIVE before checking status

Terry Wilson twilson at redhat.com
Wed Jan 23 22:09:26 UTC 2019


In the networking-ovn project, we hit this bug *very* often:
https://bugs.launchpad.net/tempest/+bug/1728600. You can see the
logstash here where it has failed 330 times in the last week:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20%5B%5D%20is%20not%20true%20%3A%20No%20IPv4%20addresses%20found%20in%5C%22

The bug has been around since 2017, and there are earlier reports of
it than that. The bug happens in some projects outside of
networking-ovn as well.

At the core of the issue is that _get_server_port_id_and_ip4 loops
through server ports to return ones that are ACTIVE, but there is a
race where a port could become temporarily inactive if the ml2 driver
continually monitors the actual port status. In the case we hit,
os-vif started recreating the ovs port during an operation, so we
would detect the status of the port as down and change the status, and
then when the port is recreated we set the port status back to up. If
the check happens while the port is down, the test fails.

There have been comments that the port status shouldn't flip w/o any
user request that would cause it, but that would mean that a
plugin/driver would have to ignore the actual status of a port and
that seems wrong. External things can affect what state a port is in.

https://review.openstack.org/#/c/449695/7/tempest/scenario/manager.py
adds a wait mechanism to checking the port status so that momentary
flips of port status will not cause the test to inadvertently fail.
The patch currently has 10 +1s. We really need to get this fixed.

Thanks!
Terry



More information about the openstack-discuss mailing list