10 Nov
2019
10 Nov
'19
1:07 p.m.
On 11/10/2019 10:44 AM, Balázs Gibizer wrote:
On 3500 baremetal nodes _update_available_resource takes 1.5 hour.
Why have a single nova-compute service manage this many nodes? Or even 1000?
Why not try to partition things a bit more reasonably like a normal cell where you might have ~200 nodes per compute service host (I think CERN keeps their cells to around 200 physical compute hosts for scaling)?
That way you can also leverage the compute service hashring / failover feature for HA?
I realize the locking stuff is not great, but at what point is it unreasonable to expect a single compute service to manage that many nodes/instances?
--
Thanks,
Matt