On Thu, 2019-01-31 at 11:30 -0500, Michael Turek wrote:
Hello all,
Our ironic job has been broken and it seems to be due to a lack of IPs. We allocate two IPs to our job, one for the dhcp server, and one for the target node. This had been working for as long as the job has existed but recently (since about early December 2018), we've been broken.
The job is able to clean the node during devstack, successfully deploy to the node during the tempest run, and is successfully validated via ssh. The node then moves to clean failed with a network error [1], and the job subsequently fails. Sometime between the validation and attempting to clean, the neutron port associated with the ironic port is deleted and a new port comes into existence. Where I'm having trouble is finding out what this port is. Based on it's MAC address It's a virtual port, and its MAC is not the same as the ironic port.
We could add an IP to the job to fix it, but I'd rather not do that needlessly.
Any insight or advice would be appreciated here!
While working on the neutron events I noticed a pattern I thought was a bit strange. (Note, this was with neutron networking.) Create nova baremetal instance: 1. The tenant VIF is created. 2. The provision port is created. 3. Provision port plugged (bound) 4. Provision port un-plugged (deleted) 5. Tenant port plugged (bound) On nova delete of barametal instance: 1. Tenant VIF is un-plugged (unbound) 2. Cleaning port created 3. Cleaning port plugged (bound) 4. Cleaning port un-plugged (deleted) 5. Tenant port deleted I think step 5, deleting the tenant port could happen after step 1. But it looks like it is'nt deleted before after cleaning is done. If this is the case with flat networks as well it could explain why you get the error on cleaning. The "tenant" port still exist, and there are no free IP's in the allocation pool to create a new port for cleaning. -- Harald