[nova] implementation options for nova spec: show-server-numa-topology
yonglihe
yongli.he at intel.com
Fri Dec 21 08:15:34 UTC 2018
On 2018/12/21 上午9:28, Matt Riedemann wrote:
> On 12/20/2018 7:13 PM, yonglihe wrote:
>> On 2018/12/20 下午10:47, Matt Riedemann wrote:
>>> On 12/18/2018 2:20 AM, yonglihe wrote:
>>>>
>>>> Base on IRC's discuss, we may have 3 options about how to deal with
>>>> those blobs:
>>>>
>>>> 1) include those directly in the server response details, like the
>>>> released POC does:
>>>> https://review.openstack.org/#/c/621476/3
>>>
>>> I would think these are potentially admin-level sensitive details as
>>> well and thus only exposed based on a policy check. A user requests
>>> a certain topology, but I'm not sure how low-level the user
>>> needs/should see what nova is doing for satisfying that topology,
>>> especially for things like pinning CPUs on the host. I thought the
>>> main use case for this spec (and several like these discussed at the
>>> PTG) was more about being able to get information (reporting) out of
>>> the REST API for debugging (by admins and/or support team members),
>>> less about user need.
>>>
>>>>
>>>> 2) add a new sub-resource endpoint to servers, most likely use key
>>>> word 'topology' then:
>>>> "GET /servers/{server_id}/topology" returns the NUMA information
>>>> for one server.
>>>
>>> Similar to (1) in that I'd think there would be a new policy check
>>> on this which defaults to admin-only. I think this would be better
>>> than (1) since it wouldnt' be confused with listing servers (GET
>>> /servers/detail).
>>>
>>>>
>>>> 3) put the NUMA info under existing 'diagnostics' API.
>>>> "GET /servers/{server_id}/diagnostics"
>>>> this is admin only API, normal user loss the possible to check
>>>> their topology.
>>>
>>> By default it's an admin-only API, but that is configurable in
>>> policy, so if a given cloud wants to expose this for admin or owner
>>> of the instance, they can do that, or alternatively expose it to
>>> support team members via a special role in keystone.
>>>
>>>>
>>>> when the information put into diagnostics, they will be look like:
>>>> {
>>>> ....
>>>> "numa_topology": {
>>>> cells [
>>>> {
>>>> "numa_node" : 3
>>>> "cpu_pinning": {0:5, 1:6},
>>>> "cpu_thread_policy": "prefer",
>>>> "cpuset": [0,1,2,3],
>>>> "siblings": [[0,1],[2,3]],
>>>> "mem": 1024,
>>>> "pagesize": 4096,
>>>> "sockets": 0,
>>>> "cores": 2,
>>>> "threads": 2,
>>>> },
>>>> ...
>>>> ] # cells
>>>> }
>>>> "emulator_threads_policy": "share"
>>>>
>>>> "pci_devices": [
>>>> {
>>>> "address":"00:1a.0",
>>>> "type": "VF",
>>>> "vendor": "8086",
>>>> "product": "1526"
>>>> },
>>>> ]
>>>> }
>>>
>>> I tend to prefer option (3) since it seems topology is a much more
>>> natural fit with the existing information (low-level CPU/RAM/disk
>>> usage) we expose out of the diagnostics API and is already
>>> restricted to admins by default in policy (but again as noted this
>>> can be configured).
>>>
>> Matt, thanks point this out. (3) is more clear and less
>> configuration mess, so (3) winning, spec is gonna be revised.
>
> I also commented in the spec today. I would also be OK(ish) with
> option 2. I'm mostly concerned about the performance implications of
> needing to fetch and process this data, including policy checks, when
> listing 1000 servers with details. The spec wasn't clear (to me) about
> where the data comes from exactly (do we join on the compute_nodes
> table?). I'm also unsure about how much end users need to see the
> NUMA/PCI information for their server (so is the admin-only policy
> sufficient for the diagnostics API?). I'd really like input from
> others here. I mostly just want users to have to opt in to getting
> this information, not nova needing to produce it in the main server
> resource response during show/list, so option 2 or 3 are preferable
> *to me*.
>
> I think option 3 is the safest one if we're unsure or deadlocked
> otherwise, but no one else has really said anything (outside of the
> spec anyway).
Spec patch set 10 address all your comments, and thanks a lot for all
typo things.
Spec path set 11, switch to implementation option 2, cause the data did
come from another DB query, it's obviously impact the performance some
how, especially on a batch operations.
thanks.
Regards
Yongli he
More information about the openstack-discuss
mailing list