[nova][ironic][ptg] Resource tracker scaling issues
sbauza at redhat.com
Tue Nov 12 15:44:10 UTC 2019
On Mon, Nov 11, 2019 at 4:05 PM Dan Smith <dms at danplanet.com> wrote:
> > Sharding with and/or within cells will help to some degree (and we are
> > actively looking into this as you probably know), but I think that
> > should not stop us from checking if there are algorithmic improvements
> > (e.g. when collecting the data), or if moving to a different locking
> > granularity or even parallelising the update are feasible additional
> > improvements.
> All of that code was designed around one node per compute host. In the
> ironic case it was expanded (hacked) to support N where N is not
> huge. Giving it a huge number, and using a driver where nodes go into
> maintenance/cleaning for long periods of time is asking for trouble.
> Given there is only one case where N can legitimately be greater than
> one, I'm really hesitant to back a proposal to redesign it for large
> values of N.
> Perhaps we as a team just need to document what sane, tested, and
> expected-to-work values for N are?
What we discussed at the PTG was the fact that we only have one global
semaphore for this module but we have N ResourceTracker python objects
(where N is the number of Ironic nodes per compute service).
As per CERN, it looks this semaphore blocks when updating periodically so
we basically said it could only be a bugfix given we could create N
That said, as it could have some problems, we want to make sure we can test
the change not only by the gate but also directly by CERN.
Another discussion was about having more than one thread for the compute
service (ie. N threads) but my opinion was that we should first look at the
above before discussing about any other way.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss