[openstack-dev] [openstack-dev[[nova] Simple question about sorting CPU topologies
Mooney, Sean K
sean.k.mooney at intel.com
Tue Jun 20 17:21:50 UTC 2017
> -----Original Message-----
> From: Jay Pipes [mailto:jaypipes at gmail.com]
> Sent: Tuesday, June 20, 2017 5:59 PM
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [openstack-dev[[nova] Simple question
> about sorting CPU topologies
> On 06/20/2017 12:53 PM, Chris Friesen wrote:
> > On 06/20/2017 06:29 AM, Jay Pipes wrote:
> >> On 06/19/2017 10:45 PM, Zhenyu Zheng wrote:
> >>> Sorry, The mail sent accidentally by mis-typing ...
> >>> My question is, what is the benefit of the above preference?
> >> Hi Kevin!
> >> I believe the benefit is so that the compute node prefers CPU
> >> topologies that do not have hardware threads over CPU topologies
> >> do include hardware threads.
[Mooney, Sean K] if you have not expressed that you want the require or isolate policy
Then you really cant infer which is better as for some workloads preferring hyperthread
Siblings will improve performance( 2 threads sharing data via l2 cache) and other it will reduce it
(2 thread that do not share data)
> >> I'm not sure exactly of the reason for this preference, but perhaps
> >> it is due to assumptions that on some hardware, threads will compete
> >> for the same cache resources as other siblings on a core whereas
> >> cores may have their own caches (again, on some specific hardware).
> > Isn't the definition of hardware threads basically the fact that the
> > sibling threads share the resources of a single core?
> > Are there architectures that OpenStack runs on where hardware threads
> > don't compete for cache/TLB/execution units? (And if there are, then
> > why are they called threads and not cores?)
[Mooney, Sean K] well on x86 when you turn on hypter threading your L1 data and instruction cache is
Partitioned in 2 with each half allocated to a thread sibling. The l2 cache which is also per core is shared
Between the 2 thread siblings so on intels x86 implementation the thread do not compete for l1 cache but do share l2
That could easibly change though in new generations.
Pre xen architure I believe amd shared the floating point units between each smt thread but had separate integer execution units that
Were not shared. That meant for integer heavy workloads there smt implementation approached 2X performance limited by the
Shared load and store units and reduced to 0 scaling if both Treads tried to access the floating point execution unit concurrently.
So its not quite as clean cut as saying the thread do or don’t share resources
Each vendor addresses this differently even with in x86 you are not required to have the partitioning
described above for cache as intel did or for the execution units. On other architectures im sure they have
come up with equally inventive ways to make this an interesting shade of grey when describing the difference
between a hardware thread a full core.
> I've learned over the years not to make any assumptions about hardware.
> Thus my "not sure exactly" bet-hedging ;)
[Mooney, Sean K] yep hardware is weird and will always find ways to break your assumptions :)
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev