[openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

Neil Jerram Neil.Jerram at metaswitch.com
Thu May 14 12:16:43 UTC 2015


Hi Rossella,

Many thanks for your quick reply!

On 14/05/15 11:08, Rossella Sblendido wrote:
> Hi Neil,
>
> what's the status of the port after the migration? You might be hitting
> [1] . See also the patch that fixes the issue [2]

Thanks, but that is definitely not the cause of the problem in my case, 
because my agent does not call get_device_details.

(BTW - it seems obviously wrong to me for an API named 
get_device_details to change the port status to BUILD, even if the call 
is coming from the correct host.  I would expect that an agent could 
safely call get_device_details at any time without having any effect on 
the port state.)

> If you wait a bit longer, is the host_id updated by Nova?

No, it isn't.

I've now been able to reproduce this again, and look directly at the 
Neutron DB, and I think what I see indicates that this is definitely an 
OpenStack bug (as opposed to a problem in my mechanism driver).

My hosts are named calico-vm13 and calico-vm15, and calico-vm13 is set 
up so that libvirt will fail to launch any instances.  When I use the 
Horizon UI to create an instance, Nova tries calico-vm13 first - which 
fails - and then calico-vm15, which succeeds.

Horizon then shows that the instance is on calico-vm15:

	admin	calico-vm15	
dltst
	cirros-0.3.2-x86_64	

     10.28.29.214
     2001:db8:c41:2::1d9a

	m1.tiny	Active	None	Running	24 minutes	

The port for that instance is the cc80291c one here:

mysql> select * from ports;
+------------+-------------+------+-------------+-------------------+----------------+--------+-------------+--------------+
| tenant_id  | id          | name | network_id  | mac_address       | 
admin_state_up | status | device_id   | device_owner |
+------------+-------------+------+-------------+-------------------+----------------+--------+-------------+--------------+
| b2d9f70... | 79fd9d6c... |      | 1fca4aa4... | fa:16:3e:d3:1a:62 | 
            1 | DOWN   | dhcpea9f... | network:dhcp |
| b2d9f70... | cc80291c... |      | 1fca4aa4... | fa:16:3e:bc:df:f0 | 
            1 | ACTIVE | e2b61f5f... | compute:None |
| b2d9f70... | d9f7d1d0... |      | 1fca4aa4... | fa:16:3e:0b:29:3e | 
            1 | DOWN   | dhcp2ffe... | network:dhcp |

And the ml2_port_bindings table shows that Neutron/ML2 thinks that port 
is still on calico-vm13:

mysql> select * from ml2_port_bindings;
+-------------+-------------+----------+--------+-------------+-----------+-----------------------+---------+
| port_id     | host        | vif_type | driver | segment     | 
vnic_type | vif_details           | profile |
+-------------+-------------+----------+--------+-------------+-----------+-----------------------+---------+
| 79fd9d6c... | calico-vm13 | tap      | calico | fdc5ef44... | normal 
   | {"port_filter": true} |         |
| cc80291c... | calico-vm13 | tap      | calico | fdc5ef44... | normal 
   | {"port_filter": true} |         |
| d9f7d1d0... | calico-vm15 | tap      | calico | fdc5ef44... | normal 
   | {"port_filter": true} |         |


Where should I start looking, to see where Nova / Neutron _should_ be 
updating the port binding, in this scenario?

Many thanks,
	Neil


> cheers,
>
> Rossella
>
> [1] https://bugs.launchpad.net/neutron/+bug/1439857
> [2] https://review.openstack.org/#/c/163178/
>
> On 05/14/2015 11:29 AM, Neil Jerram wrote:
>> Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism
>> driver [1].  I'm expecting to see an update_port_postcommit call to
>> signal that the binding:host_id for a port is changing, but I don't see
>> that.
>>
>> The scenario is launching a new instance in a cluster with two compute
>> hosts, where we've rigged things so that one of the compute hosts will
>> always be chosen first, but libvirt isn't correctly configured there and
>> hence the instance launching attempt will fail.  Then Nova tries to use
>> the other compute host instead, and that mostly works - except that my
>> mechanism driver still thinks that the new instance's port is still
>> bound to the first compute host.
>>
>> Is anyone aware of a known problem in this area (in Juno-level code), or
>> where I could like to start pinning this down in more detail?
>>
>> Many thanks,
>>      Neil
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list