[ironic]: Timeout reached while waiting for callback for node

fsbiz at yahoo.com fsbiz at yahoo.com
Tue Nov 26 17:04:22 UTC 2019


 Thanks Arne and Julia with the great suggestions on scaling ironic nodes.
We are currently trying to root cause an issue (it has occured twice) where a large number of nodes(but not all the nodes) suddenly migrate from one IC to another.
E.g.69 nodes moved from sc-ironic04 and sc-ironic05 tosc-ironic06 from 21:07 to 21:10 on nov. 23rd.
[root at sc-ironic06 nova]# grep "moving from" /var/log/nova/nova-compute.log-20191124


2019-11-23 21:07:46.606 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 1cb9ef2e-aa7d-4e25-8878-14669a3ead7a moving fromsc-ironic05.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

2019-11-23 21:08:17.518 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 56e58642-12ac-4455-bc95-2a328198f845 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

2019-11-23 21:08:35.843 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode e0b9b94c-2ea3-4324-a85f-645d572e370b moving fromsc-ironic05.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

2019-11-23 21:08:42.264 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 1c7d461c-2de7-4d9a-beff-dcb490c7b2e4 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

2019-11-23 21:08:43.819 210241 INFO nova.compute.resource_tracker[req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - - -] ComputeNode73ed8bd4-23c2-46bc-b748-e6f5ab6fa932 moving from sc-ironic05.nvc.nvidia.com tosc-ironic06.nvc.nvidia.com

2019-11-23 21:08:45.651 210241 INFO nova.compute.resource_tracker[req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - - -] ComputeNode51da1570-5666-4a21-a46f-4b7510d28415 moving from sc-ironic05.nvc.nvidia.com tosc-ironic06.nvc.nvidia.com

2019-11-23 21:08:46.905 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode 38b41797-4b97-405b-bbd5-fccc61d237c3 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

2019-11-23 21:08:49.065 210241 INFOnova.compute.resource_tracker [req-96baf341-0ecb-4dec-a204-32c2f77f3f64 - - - --] ComputeNode c5c89749-a11c-4eb8-b159-e8d47ecfcbb9 moving fromsc-ironic04.nvc.nvidia.com to sc-ironic06.nvc.nvidia.com

Restarting nova-compute and ironic-conductor services on the IC seems to have fixed the issue but we are still in the root cause analysis phase and seem to have hit a wall narrowing this down.  Any suggestions are welcome.
Thanks,Fred.

    On Wednesday, October 30, 2019, 02:02:42 PM PDT, Arne Wiebalck <arne.wiebalck at cern.ch> wrote:  
 
 Hi Fred,

To confirm what Julia said:

We currently have ~3700 physical nodes in Ironic, managed by 3 controllers
(16GB VMs running httpd, conductor, and inspector). We recently moved to
l
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191126/94fdba0f/attachment-0001.html>


More information about the openstack-discuss mailing list