答复: [lists.openstack.org代发]Re: [nova] live migration with the NUMA topology

Brin Zhang(张百林) zhangbailin at inspur.com
Fri Dec 13 01:41:35 UTC 2019


> -----邮件原件-----
> 发件人: Artom Lifshitz [mailto:alifshit at redhat.com]
> 发送时间: 2019年12月13日 0:17
> 收件人: Matt Riedemann <mriedemos at gmail.com>
> 抄送: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> 主题: [lists.openstack.org代发]Re: [nova] live migration with the NUMA
> topology
>
> On Thu, Dec 12, 2019 at 9:01 AM Matt Riedemann <mriedemos at gmail.com>
> wrote:
> >
> > On 12/12/2019 7:24 AM, Brin Zhang(张百林) wrote:
> > > I have a question, if the destination server's NUMA topology (e.g.
> > > nume_node=2) < source server's NUMA topology (e.g. numa_noed=4) in a
> > > instance. If I am living migration *this* instance, what will be
> > > happened? Rollback and keep the instance to the original status? Or
> > > make it to ERROR? In that SPEC I had not find the details about the
> > > red description in "Third, information about the instance’s new NUMA
> > > characteristics needs to be generated on the destination (an
> > > InstanceNUMATopolgy object is not enough, more on that later)", or
> > > lack of careful reading J. Anyway, I want to know how to deal with
> > > this NUMA topology during live migration?
> >
> > Artom can answer this in detail but I would expect the claim to fail
> > on the dest host here:
> >
> > https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#
> > L6656
> >
> > Which will be handled here in conductor:
> >
> > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/tasks/liv
> > e_migrate.py#L502
> >
> > And trigger a "reschedule" to an alternate host. If we run out of
> > alternates then MaxRetriesExceeded would be raised:
> >
> > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/tasks/liv
> > e_migrate.py#L555
> >
> > And handled here as NoValidHost:
> >
> > https://github.com/openstack/nova/blob/20.0.0/nova/conductor/manager.p
> > y#L457
> >
> > The vm_state should be unchanged (stay ACTIVE) but the migration
> > status will go to "error".
> >
> > Artom has been working on functional tests [1] but I'm not sure if
> > they cover this kind of scenario - I'd hope they would.
> >
> > Of course the simpler answer might be, and it would be cool if it is,
> > the scheduler should not select the dest host that can't fit the
> > instance so we don't even get to the low-level compute resource claim.
>
> Yeah, the scheduler (unless it's bypassed, obviously) shouldn't pick a host
> where the instance can't fit. And once we're on the host, if the claim fails
> (either because the scheduler was bypassed or another instance raced with
> ours and took our resources), we'll keep rescheduling until we can't, and then
> the migration fails. So what Matt wrote above is correct as well.
>

Yes, it's better to do this scenario in [1] patch.

And I have a suggestion, in https://github.com/openstack/nova-specs/blob/master/specs/train/approved/numa-aware-live-migration.rst#proposed-change to claify this will be better,
rather than said " more on that later ", or add a jump link to the "more details info", and this will be good to the reader.


> >
> > [1] https://review.opendev.org/#/c/672595/
> >
> > --
> >
> > Thanks,
> >
> > Matt
> >
>
>
> --
> Artom Lifshitz
> Software Engineer, OpenStack Compute DFG
>

brinzhang


More information about the openstack-discuss mailing list