[OpenStack-Infra] Debugging nodepool over-allocation with history tracking (109132)

Ian Wienand iwienand at redhat.com
Tue Aug 26 06:04:39 UTC 2014


Hi,

I'm having a hard time getting the description in [1] to trigger after
trying several different approaches.  I've modified some of the tests
to have a few more providers and images to be closer to the situation
that caused the problem [2] (will clean it up into a new test-case
soon)

One thing I've noticed, from the revert message

---
 * Nodepool has a very high (>2x quota) request load
 * Restart nodepool
 * First pass through allocator appears normal and allocates up to quota
 * Second pass allocates an additional $quota worth of nodes, all from
   the last-defined provider
 * Repeats until request load is satisfied
---

I tried that; one thing I noticed was that the output of the allocator
can be a bit confusing if the images aren't ready, e.g.

For example, here's what I see first time through

---
[2014-08-26 15:20:58,149 NodePool] DEBUG:nodepool.NodePool:Beginning node launch calculation
[2014-08-26 15:20:58,155 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-trusty: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:20:58,160 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-centos: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:20:58,166 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-fedora: 5 (start: 0 min: 5 ready: 0)
[2014-08-26 15:20:58,171 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-precise: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:20:58,184 NodePool] DEBUG:nodepool.NodePool:  Allocation requests:
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 5.0 of fake-fedora>
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 2.5 (out of 5.0) of fake-fedora from fake-provider2>
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 2.5 (out of 5.0) of fake-fedora from fake-provider1>
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-precise>
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-precise from fake-provider2>
[2014-08-26 15:20:58,185 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-precise from fake-provider1>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-trusty>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-trusty from fake-provider2>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-trusty from fake-provider1>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-centos>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-centos from fake-provider2>
[2014-08-26 15:20:58,186 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-centos from fake-provider1>
[2014-08-26 15:20:58,187 NodePool] DEBUG:nodepool.NodePool:  Grants:
[2014-08-26 15:20:58,187 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 2 of fake-trusty from fake-provider2>
[2014-08-26 15:20:58,187 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 2 of fake-trusty to fake-target>
[2014-08-26 15:20:58,187 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-centos from fake-provider2>
[2014-08-26 15:20:58,188 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-centos to fake-target>
[2014-08-26 15:20:58,188 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 2 of fake-precise from fake-provider2>
[2014-08-26 15:20:58,188 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 2 of fake-precise to fake-target>
[2014-08-26 15:20:58,189 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 2 of fake-trusty from fake-provider1>
[2014-08-26 15:20:58,189 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 2 of fake-trusty to fake-target>
[2014-08-26 15:20:58,189 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-centos from fake-provider1>
[2014-08-26 15:20:58,190 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-centos to fake-target>
[2014-08-26 15:20:58,190 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 2 of fake-precise from fake-provider1>
[2014-08-26 15:20:58,190 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 2 of fake-precise to fake-target>
[2014-08-26 15:20:58,190 NodePool] DEBUG:nodepool.NodePool:Finished node launch calculation
---

Looks about right ... 10 nodes allocated.  But, then you have to
notice underneath where it says

---
[2014-08-26 15:20:58,190 NodePool] INFO:nodepool.NodePool:Need to launch 2 fake-trusty nodes for fake-target on fake-provider2
[2014-08-26 15:20:58,194 NodePool] DEBUG:nodepool.NodePool:No current image for fake-trusty on fake-provider2
[2014-08-26 15:20:58,196 NodePool] DEBUG:nodepool.NodePool:No current image for fake-trusty on fake-provider2
[2014-08-26 15:20:58,196 NodePool] INFO:nodepool.NodePool:Need to launch 1 fake-centos nodes for fake-target on fake-provider2
[2014-08-26 15:20:58,198 NodePool] DEBUG:nodepool.NodePool:No current image for fake-centos on fake-provider2
---

So really it didn't launch these nodes ("No current image").  Thus on
the next loop

---
[2014-08-26 15:21:08,276 NodePool] DEBUG:nodepool.NodePool:Beginning node launch calculation
[2014-08-26 15:21:08,282 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-trusty: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:21:08,288 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-centos: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:21:08,293 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-fedora: 5 (start: 0 min: 5 ready: 0)
[2014-08-26 15:21:08,299 NodePool] DEBUG:nodepool.NodePool:  Deficit: fake-precise: 100 (start: 0 min: 100 ready: 0)
[2014-08-26 15:21:08,311 NodePool] DEBUG:nodepool.NodePool:  Allocation requests:
[2014-08-26 15:21:08,311 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 5.0 of fake-fedora>
[2014-08-26 15:21:08,311 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 2.5 (out of 5.0) of fake-fedora from fake-provider2>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 2.5 (out of 5.0) of fake-fedora from fake-provider1>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-precise>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-precise from fake-provider2>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-precise from fake-provider1>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-trusty>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-trusty from fake-provider2>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-trusty from fake-provider1>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:    <AllocationRequest for 100.0 of fake-centos>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-centos from fake-provider2>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:      <AllocationSubRequest for 50.0 (out of 100.0) of fake-centos from fake-provider1>
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:  Grants:
[2014-08-26 15:21:08,312 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 2 of fake-fedora from fake-provider2>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 2 of fake-fedora to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-trusty from fake-provider2>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-trusty to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-centos from fake-provider2>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-centos to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-precise from fake-provider2>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-precise to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 3 of fake-fedora from fake-provider1>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 3 of fake-fedora to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-trusty from fake-provider1>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-trusty to fake-target>
[2014-08-26 15:21:08,313 NodePool] DEBUG:nodepool.NodePool:    <AllocationGrant of 1 of fake-centos from fake-provider1>
[2014-08-26 15:21:08,314 NodePool] DEBUG:nodepool.NodePool:      <AllocationGrantTarget for 1 of fake-centos to fake-target>
[2014-08-26 15:21:08,314 NodePool] DEBUG:nodepool.NodePool:Finished node launch calculation
---

I see it make the grants again (doing what it is supposed to do,
notice it allocates some fedora nodes there).

Is it possible this is what was ascribed to the double-quota problem?
It would be great to get the logs to see what happened after the
launch calculation to see if there is some way we can double-count.

I'm open to any ideas, especially the pre-conditions expected to cause
the problem for [2]

Thanks,

-i

[1] https://review.openstack.org/#/c/109132/
[2] https://review.openstack.org/#/c/116189/3
[3] https://review.openstack.org/#/c/116189/3/nodepool/tests/fixtures/capacity.yaml



More information about the OpenStack-Infra mailing list