State of the Gate (placement?)

Slawek Kaplonski skaplons at redhat.com
Thu Oct 31 22:38:36 UTC 2019


On Thu, Oct 31, 2019 at 04:52:39PM -0500, Matt Riedemann wrote:
> On 10/31/2019 4:15 PM, Slawek Kaplonski wrote:
> > > 1.http://status.openstack.org/elastic-recheck/index.html#1763070
> > > 
> > > This has resurfaced and I'm not sure why, nor do I think we ever had a great
> > > handle on what is causing this or how to work around it so if anyone has new
> > > ideas please chip in.
> > I think that this is "just" some slowdown of node on which job is running. I
> > noticed it too in some neutron jobs and I checked some. It seems that one API
> > request is processed for very long time. For example in one of fresh examples:
> > https://13cf3dd11b8f009809dc-97cb3b32849366f5bed744685e46b266.ssl.cf5.rackcdn.com/692206/3/check/tempest-integrated-compute/35ecb4a/job-output.txt
> > it was request to nova which caused very long time:
> > 
> > Oct 31 16:55:08.632162 ubuntu-bionic-inap-mtl01-0012620879devstack at n-api.service[7191]: INFO nova.api.openstack.requestlog [None req-275af2df-bd4e-4e64-b46e-6582e8de5148 tempest-ServerDiskConfigTestJSON-1598674508 tempest-ServerDiskConfigTestJSON-1598674508] 198.72.124.104 "POST /compute/v2.1/servers/d15d2033-b29b-44f7-b619-ed7ef83fe477/action" status: 500 len: 216 microversion: 2.1 time: 161.951140
> > 
> > 
> 
> That's really interesting, I hadn't noticed the MessagingTimeout in n-api
> linked to this. I dug into that one and it's really odd, but there is one
> known and one unknown issue here.

I point to this one only as an example. I saw similar timeouts on calls to
neutron and it was e.g. POST /v2.0/networks which took about 90 seconds. In same
test run all other similar POST calls took less than a second. That's why it's
hard to really debug what happens there and I think it is "some" slowdown of VM
on which job is running. I simply don't have any better explanation for now :/

> 
> The known issue is resize/cold migrate is synchronous through API, conductor
> and scheduler while picking a host and then we finally return to the API
> caller when we RPC cast to the selected destination host. If we RPC cast
> from API to conductor we'd avoid that MessagingTimeout issue.
> 
> The unknown weirdness that made this fail is it took ~3 minutes to move
> allocations from the instance to the migration record. We hit conductor
> about here:
> 
> Oct 31 16:52:24.300732 ubuntu-bionic-inap-mtl01-0012620879
> nova-conductor[14891]: DEBUG nova.conductor.tasks.migrate [None
> req-275af2df-bd4e-4e64-b46e-6582e8de5148
> tempest-ServerDiskConfigTestJSON-1598674508
> tempest-ServerDiskConfigTestJSON-1598674508] [instance:
> d15d2033-b29b-44f7-b619-ed7ef83fe477] Requesting cell
> 374faad6-7a6f-40c7-b2f4-9138e0dde5d5(cell1) while migrating {{(pid=16132)
> _restrict_request_spec_to_cell
> /opt/stack/nova/nova/conductor/tasks/migrate.py:172}}
> 
> And then finally get the message that we moved allocations prior to
> scheduling:
> 
> Oct 31 16:55:08.634009 ubuntu-bionic-inap-mtl01-0012620879
> nova-conductor[14891]: DEBUG nova.conductor.tasks.migrate [None
> req-275af2df-bd4e-4e64-b46e-6582e8de5148
> tempest-ServerDiskConfigTestJSON-1598674508
> tempest-ServerDiskConfigTestJSON-1598674508] Created allocations for
> migration 86ccf6f2-a2b0-4ae0-9c31-8bbdac4ceda9 on
> eccff5f8-3175-4170-b87e-cb556865fde0 {{(pid=16132)
> replace_allocation_with_migration
> /opt/stack/nova/nova/conductor/tasks/migrate.py:86}}
> 
> After that we call the scheduler to find a host and that only takes about 1
> second. That POST /allocations call to placement shouldn't be taking around
> 3 minutes so something crazy is going on there.
> 
> If I'm reading the placement log correctly, that POST starts here:
> 
> Oct 31 16:52:24.721346 ubuntu-bionic-inap-mtl01-0012620879
> devstack at placement-api.service[8591]: DEBUG placement.requestlog
> [req-275af2df-bd4e-4e64-b46e-6582e8de5148
> req-295f7350-2f0b-4f85-8cdf-d76801637221 None None] Starting request:
> 198.72.124.104 "POST /placement/allocations" {{(pid=8593) __call__
> /opt/stack/placement/placement/requestlog.py:61}}
> 
> And returns here:
> 
> Oct 31 16:55:08.629708 ubuntu-bionic-inap-mtl01-0012620879
> devstack at placement-api.service[8591]: DEBUG placement.handlers.allocation
> [req-275af2df-bd4e-4e64-b46e-6582e8de5148
> req-295f7350-2f0b-4f85-8cdf-d76801637221 service placement] Successfully
> wrote allocations [<placement.objects.allocation.Allocation object at
> 0x7f023c9bcfd0>, <placement.objects.allocation.Allocation object at
> 0x7f023c9bc6d8>, <placement.objects.allocation.Allocation object at
> 0x7f023c9bcc18>, <placement.objects.allocation.Allocation object at
> 0x7f023c9eb7f0>, <placement.objects.allocation.Allocation object at
> 0x7f023c9ebdd8>, <placement.objects.allocation.Allocation object at
> 0x7f023c9eb630>] {{(pid=8593) _create_allocations
> /opt/stack/placement/placement/handlers/allocation.py:524}}
> Oct 31 16:55:08.629708 ubuntu-bionic-inap-mtl01-0012620879
> devstack at placement-api.service[8591]: INFO placement.requestlog
> [req-275af2df-bd4e-4e64-b46e-6582e8de5148
> req-295f7350-2f0b-4f85-8cdf-d76801637221 service placement] 198.72.124.104
> "POST /placement/allocations" status: 204 len: 0 microversion: 1.28
> Oct 31 16:55:08.629708 ubuntu-bionic-inap-mtl01-0012620879
> devstack at placement-api.service[8591]: [pid: 8593|app: 0|req: 304/617]
> 198.72.124.104 () {72 vars in 1478 bytes} [Thu Oct 31 16:52:24 2019] POST
> /placement/allocations => generated 0 bytes in 163883 msecs (HTTP/1.1 204) 5
> headers in 199 bytes (1 switches on core 0)
> 
> I know Chris Dent has done a lot of profiling on placement recently but I'm
> not sure if much has been done around profiling the POST /allocations call
> to move allocations from one consumer to another.
> 
> That also looks like some kind of anomaly because looking at that log at
> other POST /allocations calls, many take only 50-70 milliseconds. I wonder
> if there is a potential lock somewhere in placement slowing things down in
> that path.
> 
> -- 
> 
> Thanks,
> 
> Matt
> 

-- 
Slawek Kaplonski
Senior software engineer
Red Hat




More information about the openstack-discuss mailing list