[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

Mooney, Sean K sean.k.mooney at intel.com
Thu Jun 21 13:50:41 UTC 2018



> -----Original Message-----
> From: Jay Pipes [mailto:jaypipes at gmail.com]
> Sent: Thursday, June 21, 2018 2:37 PM
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [nova] NUMA-aware live migration: easy but
> incomplete vs complete but hard
> 
> On 06/18/2018 10:16 AM, Artom Lifshitz wrote:
> > Hey all,
> >
> > For Rocky I'm trying to get live migration to work properly for
> > instances that have a NUMA topology [1].
> >
> > A question that came up on one of patches [2] is how to handle
> > resources claims on the destination, or indeed whether to handle that
> > at all.
> >
> > The previous attempt's approach [3] (call it A) was to use the
> > resource tracker. This is race-free and the "correct" way to do it,
> > but the code is pretty opaque and not easily reviewable, as evidenced
> > by [3] sitting in review purgatory for literally years.
> >
> > A simpler approach (call it B) is to ignore resource claims entirely
> > for now and wait for NUMA in placement to land in order to handle it
> > that way. This is obviously race-prone and not the "correct" way of
> > doing it, but the code would be relatively easy to review.
> >
> > For the longest time, live migration did not keep track of resources
> > (until it started updating placement allocations). The message to
> > operators was essentially "we're giving you this massive hammer,
> don't
> > break your fingers." Continuing to ignore resource claims for now is
> > just maintaining the status quo. In addition, there is value in
> > improving NUMA live migration *now*, even if the improvement is
> > incomplete because it's missing resource claims. "Best is the enemy
> of
> > good" and all that. Finally, making use of the resource tracker is
> > just work that we know will get thrown out once we start using
> > placement for NUMA resources.
> >
> > For all those reasons, I would favor approach B, but I wanted to ask
> > the community for their thoughts.
> 
> Side question... does either approach touch PCI device management
> during live migration?
> 
> I ask because the only workloads I've ever seen that pin guest vCPU
> threads to specific host processors -- or make use of huge pages
> consumed from a specific host NUMA node -- have also made use of SR-IOV
> and/or PCI passthrough. [1]
> 
> If workloads that use PCI passthrough or SR-IOV VFs cannot be live
> migrated (due to existing complications in the lower-level virt layers)
> I don't see much of a point spending lots of developer resources trying
> to "fix" this situation when in the real world, only a mythical
> workload that uses CPU pinning or huge pages but *doesn't* use PCI
> passthrough or SR-IOV VFs would be helped by it.
> 
> Best,
> -jay
> 
> [1 I know I'm only one person, but every workload I've seen that
> requires pinned CPUs and/or huge pages is a VNF that has been
> essentially an ASIC that a telco OEM/vendor has converted into software
> and requires the same guarantees that the ASIC and custom hardware gave
> the original hardware-based workload. These VNFs, every single one of
> them, used either PCI passthrough or SR-IOV VFs to handle latency-
> sensitive network I/O.
[Mooney, Sean K]  I would generally agree but with the extention of include dpdk based vswitch like ovs-dpdk or vpp.
Cpu pinned or hugepage backed guests generally also have some kind of high performance networking solution or use a hardware
Acclaortor like a gpu to justify the performance assertion that pinning of cores or ram is required.
Dpdk networking stack would however not require the pci remaping to be addressed though I belive that is planned to be added in stine.
> 
> _______________________________________________________________________
> ___
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list