[Openstack-operators] Fwd: [openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

Melvin Hillsman mrhillsman at gmail.com
Fri May 19 02:54:47 UTC 2017


---------- Forwarded message ----------
From: Matt Riedemann <mriedemos at gmail.com>
Date: Thu, May 18, 2017 at 7:55 PM
Subject: [openstack-dev] [nova] Boston Forum session recap - claims in the
scheduler (or conductor)
To: openstack-dev at lists.openstack.org


The etherpad for this session is here [1]. The goal for this session was to
inform operators and get feedback on the plan for what we're doing with
moving claims from the computes to the control layer (scheduler or
conductor).

We mostly talked about retries, which also came up in the cells v2 session
that Dan Smith led [2] and will recap later.

Without getting into too many details, in the cells v2 session we came to a
compromise on build retries and said that we could pass hosts down to the
cell so that the cell-level conductor could retry if needed (even though we
expect doing claims at the top will fix the majority of reasons you'd have
a reschedule in the first place).

During the claims in the scheduler session, a new wrinkle came up which is
the hosts that the scheduler returns to the top-level conductor may be in
different cells. So if we have two cells, A and B, with hosts x and y in
cell A and host z in cell B, we can't send z to A for retries, or x or y to
B for retries. So we need some kind of post-filter/weigher filtering such
that hosts are grouped by cell and then they can be sent to the cells for
retries as necessary.

There was also some side discussion asking if we somehow regressed
pack-first strategies by using Placement in Ocata. John Garbutt and Dan
Smith have the context on this (I think) so I'm hoping they can clarify if
we really need to fix something in Ocata at this point, or is this more of
a case of closing a loop-hole?

We also spent a good chunk of the session talking about overhead
calculations for memory_mb and disk_gb which happens in the compute and on
a per-hypervisor basis. In the absence of automating ways to adjust for
overhead, our solution for now is operators can adjust reserved host
resource values (vcpus, memory, disk) via config options and be
conservative or aggressive as they see fit. Chris Dent and I also noted
that you can adjust those reserved values via the placement REST API but
they will be overridden by the config in a periodic task - which may be a
bug, if not at least a surprise to an operator.

We didn't really get into this during the forum session, but there are
different opinions within the nova dev team on how to do claims in the
controller services (conductor vs scheduler). Sylvain Bauza has a series
which uses the conductor service, and Ed Leafe has a series using the
scheduler. More on that in the mailing list [3].

Next steps are going to be weighing both options between Sylvain and Ed,
picking a path and moving forward, as we don't have a lot of time to sit on
this fence if we're going to get it done in Pike.

As a side request, it would be great if companies that have teams doing
performance and scale testing could help out and compare before (Ocata) and
after (Pike with claims in the controller) results, because we eventually
want to deprecate the caching scheduler but that currently outperforms the
filter scheduler at scale because of the retries involved when using the
filter scheduler, and which we expect doing claims at the top will fix.

[1] https://etherpad.openstack.org/p/BOS-forum-move-claims-from-
compute-to-scheduler
[2] https://etherpad.openstack.org/p/BOS-forum-cellsv2-developer
-community-coordination
[3] http://lists.openstack.org/pipermail/openstack-dev/2017-May/116949.html

-- 

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
-- 
Kind regards,

Melvin Hillsman
mrhillsman at gmail.com
mobile: (832) 264-2646

Learner | Ideation | Belief | Responsibility | Command
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170518/7eba87a9/attachment.html>


More information about the OpenStack-operators mailing list