Scheduler sends VM to HV that lacks resources

Albert Braden Albert.Braden at synopsys.com
Tue Nov 12 21:25:25 UTC 2019


We're on Rocky

-----Original Message-----
From: Sean Mooney <smooney at redhat.com> 
Sent: Tuesday, November 12, 2019 1:23 PM
To: Albert Braden <albertb at synopsys.com>; openstack-discuss at lists.openstack.org
Subject: Re: Scheduler sends VM to HV that lacks resources

am what version of openstack have you deployed.
i did not see that in your email.
is it ocata or newer https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2018-2DJanuary_126283.html&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=jdZtsoEhWjn-EV3ffMxUc8E5Xum3xXbpR-0gpGp2Y14&e= 
i see you have the CoreFilter and RamFilter filters enabled. form octa on they shoudl be disabled
as we claim those in placement but it should not break anything on older releases.
we have removed them in train after we removed the caching scheduler.

On Tue, 2019-11-12 at 20:47 +0000, Albert Braden wrote:
> We are running placement under apache:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_mZviLVe5xONPsXfLqdxI6A&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=-7cWLHHrr0qduVnO6FYrDXp3b3QSIBgC3M3CABtQup8&e= 
> 
> The placement error logs show a lot of GETs but no errors:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_xDVGaXEdoQ5Z3wHv17Lezg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=tQuMny6EiubEruJIJyN1zj2GSUBGBzqD3SW06H8ZIe8&e= 
> 
> We are planning to use NUMA but haven't started yet. It's probably a config error. Where should I be looking? This is
> our nova config on the controllers:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_kNe1eRimk4ifrAuuN790bg&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=kS78s-S5ZnMEZmKqQgS-OX7Xz_pMjQQrrzv8fC0H5s8&s=r1qv0CcWP5-3CXkQsiNgoe3pxGqKGkqymdjTLsJ9dYI&e= 
> 
> -----Original Message-----
> From: Sean Mooney <smooney at redhat.com> 
> Sent: Tuesday, November 12, 2019 12:22 PM
> To: Albert Braden <albertb at synopsys.com>; openstack-discuss at lists.openstack.org
> Subject: Re: Scheduler sends VM to HV that lacks resources
> 
> On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote:
> > If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the
> > logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested
> > 16
> > VCPU."
> > 
> > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6N3wcDzlbNQgj6hRApHiDQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=buklMe5R5iK--nSTPE8_2kdSLjTRHLCbk0XatjhiCnY&e=
> >  
> > 
> > I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers,
> > and
> > then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable
> > to
> > establish connection to <LB>" but I still see the single scheduler sending VMs to a host that lacks resources "Free
> > vcpu 14.00 VCPU < requested 16 VCPU."
> > 
> > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_lGlVpfbB9C19mMzrWQcHCQ&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=PxLwkpEiTHvHxuPTPo0Pt5IHhe79vfnQqLgLLb7JQ8Y&e=
> >  
> > 
> > I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard:
> > 
> > enabled_filters =
> > RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilte
> > r,
> > ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter
> > 
> > What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots
> > of
> > compute hosts that are not full:
> > 
> > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.fedoraproject.org_paste_6SX9pQ4V1KnWfQkVnfoHOw&d=DwIFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=1k7MlGWzY6Ctm5Fc1BA3Y107Kj3oETQ0IigI_u8GlX8&s=Yl9s2ZJ47GPXSyPzh6Hf0gyoxbqKGD9J9I2eSE0V8TA&e=
> >  
> > 
> > This is the command line I used:
> > 
> > openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20
> > alberttestB
> 
> what version of openstack are you running?
> if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on
> the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot.
> 
> if you are using placement and you  are not using numa or pci pass though, which you do not appear to be based on your
> enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your
> configuration or in nova.
> 



More information about the openstack-discuss mailing list