On Thu, 31 Oct 2019, Matt Riedemann wrote:
After that we call the scheduler to find a host and that only takes about 1 second. That POST /allocations call to placement shouldn't be taking around 3 minutes so something crazy is going on there.
Yeah, this is indeed weird and wrong. There are two obvious ways, in the placement code, that _set_allocations (which underlies all the allocation changing requests) could be slowed down: * If the resource provider's generation has changed during the request, because something else is also writing allocations at the same time, there will be up to 10 retries, server-side. * For each of those tries, there's (apparently) a chance of a db deadlock, because the method is guarded by a retry on deadlock. We added that in a long long time ago. However, if the generation retry was happening, it would show up in the logs as "Retrying allocations write on resource provider". And if we were getting deadlock we ought to see "Performing DB retry for function". There could be other non-obvious things, but more data required... (more)
Oct 31 16:52:24.721346 ubuntu-bionic-inap-mtl01-0012620879 devstack@placement-api.service[8591]: DEBUG placement.requestlog [req-275af2df-bd4e-4e64-b46e-6582e8de5148 req-295f7350-2f0b-4f85-8cdf-d76801637221 None None] Starting request: 198.72.124.104 "POST /placement/allocations" {{(pid=8593) __call__ /opt/stack/placement/placement/requestlog.py:61}}
We start the requestlog at the very beginning of the request so there are a few steps between here and the actual data interaction, so another possible angle of investigation here is that keystonemiddleware was being slow to validate a token for some reason.
I know Chris Dent has done a lot of profiling on placement recently but I'm not sure if much has been done around profiling the POST /allocations call to move allocations from one consumer to another.
You're right that little was done to profile allocations, mostly because initial experiments showed that it was fast and get /a_c was slow. In the fullness of time I probably would have moved on to allocations but reality intervened. What's probably most important is evaluating allocation writes to the same set of several resource providers under high concurrency. However it doesn't seem like that is what is happening in this case (otherwise I'd expect to see more evidence in the logs). Instead it's "just being slow" which could be any of: * something in the auth process (which if other services are also being slow could be a unifying thing) * the database server having a sad (do we keep a slow query log?) * the vm/vms has/have a noisy neighbor Do we know anything about cpu and io stats during the slow period? We've (sorry, let me rephrase, I've) known for the entire 5.5 years I've been involved in OpenStack that the CI resources are way over-subscribed and controller nodes in the wild are typically way under-specified. Yes, our code should be robust in the face of that, but... Of course, I could be totally wrong, there could be something flat out wrong in the placement code, but if that were the case I'd expect (like Matt did) to see it more often. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent