[openstack-dev] [nova] Better tests for nova scheduler(esp. race conditions)?
Cheng, Yingxin
yingxin.cheng at intel.com
Thu Jan 28 03:21:04 UTC 2016
Thank you Nikola! I'm very interested in this.
According to my current understanding, a complete functional test for nova scheduler should include nova-api, the scheduler service, part of conductor service which forward scheduler decisions to compute services, and the part of compute service including claim, claim abort and compute node resource consumption inside resource tracker.
The inputs of this series of tests are the initial resource view, existing resource consumptions from fake instances and the coming schedule requests with flavors.
The outputs are the statistics of elapsed time in every schedule phases, the statistics of requests' lifecycles, and the sanity of final resource view with booted fake instances.
Extra features should also be taken into consideration including, but not limited to, image properties, host aggregates, availability zones, compute capabilities, servergroups, compute service status, forced hosts, metrics etc.
Please correct me if anything wrong, I also want to know the existing decisions/ideas from mid-cycle sprint.
I'll start from investigating existent functional test infrastructure, this could be much quicker if anyone (maybe Sean Dague) can provide help with the introduction of existing features. I've also seem others showing interests in this area -- Chris Dent(cdent). It would be great to work with other experienced contributors in community.
Regards,
-Yingxin
> -----Original Message-----
> From: Nikola Đipanov [mailto:ndipanov at redhat.com]
> Sent: Wednesday, January 27, 2016 9:58 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Cc: Cheng, Yingxin
> Subject: Re: [openstack-dev] [nova] Better tests for nova scheduler(esp. race
> conditions)?
>
> Top posting since better scheduler testing just got brought up during the
> midcycle meetup, so it might be useful to re-kindle this thread.
>
> Sean (Dague) brought up that there is some infrastructure already that could
> help us do what you propose bellow, but work may be needed to make it viable
> for proper reasource accounting tests.
>
> Yingxin - in case you are still interested in doing some of this stuff, we can
> discuss here or on IRC.
>
> Thanks,
> Nikola
>
> On 12/15/2015 03:33 AM, Cheng, Yingxin wrote:
> >
> >> -----Original Message-----
> >> From: Nikola Đipanov [mailto:ndipanov at redhat.com]
> >> Sent: Monday, December 14, 2015 11:11 PM
> >> To: OpenStack Development Mailing List (not for usage questions)
> >> Subject: Re: [openstack-dev] [nova] Better tests for nova
> >> scheduler(esp. race conditions)?
> >>
> >> On 12/14/2015 08:20 AM, Cheng, Yingxin wrote:
> >>> Hi All,
> >>>
> >>>
> >>>
> >>> When I was looking at bugs related to race conditions of scheduler
> >>> [1-3], it feels like nova scheduler lacks sanity checks of schedule
> >>> decisions according to different situations. We cannot even make
> >>> sure that some fixes successfully mitigate race conditions to an
> >>> acceptable scale. For example, there is no easy way to test whether
> >>> server-group race conditions still exists after a fix for bug[1], or
> >>> to make sure that after scheduling there will be no violations of
> >>> allocation ratios reported by bug[2], or to test that the retry rate
> >>> is acceptable in various corner cases proposed by bug[3]. And there
> >>> will be much more in this list.
> >>>
> >>>
> >>>
> >>> So I'm asking whether there is a plan to add those tests in the
> >>> future, or is there a design exist to simplify writing and executing
> >>> those kinds of tests? I'm thinking of using fake databases and fake
> >>> interfaces to isolate the entire scheduler service, so that we can
> >>> easily build up a disposable environment with all kinds of fake
> >>> resources and fake compute nodes to test scheduler behaviors. It is
> >>> even a good way to test whether scheduler is capable to scale to 10k
> >>> nodes without setting up 10k real compute nodes.
> >>>
> >>
> >> This would be a useful effort - however do not assume that this is
> >> going to be an easy task. Even in the paragraph above, you fail to
> >> take into account that in order to test the scheduling you also need
> >> to run all compute services since claims work like a kind of 2 phase
> >> commit where a scheduling decision gets checked on the destination
> >> compute host (through Claims logic), which involves locking in each compute
> process.
> >>
> >
> > Yes, the final goal is to test the entire scheduling process including 2PC.
> > As scheduler is still in the process to be decoupled, some parts such
> > as RT and retry mechanism are highly coupled with nova, thus IMO it is
> > not a good idea to include them in this stage. Thus I'll try to
> > isolate filter-scheduler as the first step, hope to be supported by community.
> >
> >
> >>>
> >>>
> >>> I'm also interested in the bp[4] to reduce scheduler race conditions
> >>> in green-thread level. I think it is a good start point in solving
> >>> the huge racing problem of nova scheduler, and I really wish I could help on
> that.
> >>>
> >>
> >> I proposed said blueprint but am very unlikely to have any time to
> >> work on it this cycle, so feel free to take a stab at it. I'd be more
> >> than happy to prioritize any reviews related to the above BP.
> >>
> >> Thanks for your interest in this
> >>
> >> N.
> >>
> >
> > Many thanks nikola! I'm still looking at the claim logic and try to
> > find a way to merge it with scheduler host state, will upload patches as soon
> as I figure it out.
> >
> >
> >>>
> >>>
> >>>
> >>>
> >>> [1] https://bugs.launchpad.net/nova/+bug/1423648
> >>>
> >>> [2] https://bugs.launchpad.net/nova/+bug/1370207
> >>>
> >>> [3] https://bugs.launchpad.net/nova/+bug/1341420
> >>>
> >>> [4]
> >>> https://blueprints.launchpad.net/nova/+spec/host-state-level-locking
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> -Yingxin
> >>>
> >
> >
> >
> > Regards,
> > -Yingxin
> >
> >
More information about the OpenStack-dev
mailing list