[Openstack-operators] [nova][blazar][scientific] advanced instance scheduling: reservations and preeemption - Forum session

Blair Bethwaite blair.bethwaite at gmail.com
Mon May 1 19:39:53 UTC 2017


Hi all,

Following up to the recent thread "[Openstack-operators] [scientific]
Resource reservation requirements (Blazar) - Forum session" and adding
openstack-dev.

This is now a confirmed forum session
(https://www.openstack.org/summit/boston-2017/summit-schedule/events/18781/advanced-instance-scheduling-reservations-and-preemption)
to cover any advanced scheduling use-cases people want to talk about,
but in particular focusing on reservations and preemption as they are
big priorities particularly for scientific deployers.

Etherpad draft is
https://etherpad.openstack.org/p/BOS-forum-advanced-instance-scheduling,
please attend and contribute! In particular I'd appreciate background
spec and review links added to the etherpad.

Jay, would you be able and interested to moderate this from the Nova side?

Cheers,

On 12 April 2017 at 05:22, Jay Pipes <jaypipes at gmail.com> wrote:
> On 04/11/2017 02:08 PM, Pierre Riteau wrote:
>>>
>>> On 4 Apr 2017, at 22:23, Jay Pipes <jaypipes at gmail.com
>>> <mailto:jaypipes at gmail.com>> wrote:
>>>
>>> On 04/04/2017 02:48 PM, Tim Bell wrote:
>>>>
>>>> Some combination of spot/OPIE
>>>
>>>
>>> What is OPIE?
>>
>>
>> Maybe I missed a message: I didn’t see any reply to Jay’s question about
>> OPIE.
>
>
> Thanks!
>
>> OPIE is the OpenStack Preemptible Instances
>> Extension: https://github.com/indigo-dc/opie
>> I am sure other on this list can provide more information.
>
>
> Got it.
>
>> I think running OPIE instances inside Blazar reservations would be
>> doable without many changes to the implementation.
>> We’ve talked about this idea several times, this forum session would be
>> an ideal place to draw up an implementation plan.
>
>
> I just looked through the OPIE source code. One thing I'm wondering is why
> the code for killing off pre-emptible instances is being done in the
> filter_scheduler module?
>
> Why not have a separate service that merely responds to the raising of a
> NoValidHost exception being raised from the scheduler with a call to go and
> terminate one or more instances that would have allowed the original request
> to land on a host?
>
> Right here is where OPIE goes and terminates pre-emptible instances:
>
> https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L92-L100
>
> However, that code should actually be run when line 90 raises NoValidHost:
>
> https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L90
>
> There would be no need at all for "detecting overcommit" here:
>
> https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L96
>
> Simply detect a NoValidHost being returned to the conductor from the
> scheduler, examine if there are pre-emptible instances currently running
> that could be terminated and terminate them, and re-run the original call to
> select_destinations() (the scheduler call) just like a Retry operation
> normally does.
>
> There's be no need whatsoever to involve any changes to the scheduler at
> all.
>
>>>> and Blazar would seem doable as long as the resource provider
>>>> reserves capacity appropriately (i.e. spot resources>>blazar
>>>> committed along with no non-spot requests for the same aggregate).
>>>> Is this feasible?
>
>
> No. :)
>
> As mentioned in previous emails and on the etherpad here:
>
> https://etherpad.openstack.org/p/new-instance-reservation
>
> I am firmly against having the resource tracker or the placement API
> represent inventory or allocations with a temporal aspect to them (i.e.
> allocations in the future).
>
> A separate system (hopefully Blazar) is needed to manage the time-based
> associations to inventories of resources over a period in the future.
>
> Best,
> -jay
>
>>> I'm not sure how the above is different from the constraints I mention
>>> below about having separate sets of resource providers for preemptible
>>> instances than for non-preemptible instances?
>>>
>>> Best,
>>> -jay
>>>
>>>> Tim
>>>>
>>>> On 04.04.17, 19:21, "Jay Pipes" <jaypipes at gmail.com
>>>> <mailto:jaypipes at gmail.com>> wrote:
>>>>
>>>>    On 04/03/2017 06:07 PM, Blair Bethwaite wrote:
>>>>    > Hi Jay,
>>>>    >
>>>>    > On 4 April 2017 at 00:20, Jay Pipes <jaypipes at gmail.com
>>>> <mailto:jaypipes at gmail.com>> wrote:
>>>>    >> However, implementing the above in any useful fashion requires
>>>> that Blazar
>>>>    >> be placed *above* Nova and essentially that the cloud operator
>>>> turns off
>>>>    >> access to Nova's  POST /servers API call for regular users.
>>>> Because if not,
>>>>    >> the information that Blazar acts upon can be simply
>>>> circumvented by any user
>>>>    >> at any time.
>>>>    >
>>>>    > That's something of an oversimplification. A reservation system
>>>>    > outside of Nova could manipulate Nova host-aggregates to "cordon
>>>> off"
>>>>    > infrastructure from on-demand access (I believe Blazar already uses
>>>>    > this approach), and it's not much of a jump to imagine operators
>>>> being
>>>>    > able to twiddle the available reserved capacity in a finite cloud
>>>> so
>>>>    > that reserved capacity can be offered to the subset of
>>>> users/projects
>>>>    > that need (or perhaps have paid for) it.
>>>>
>>>>    Sure, I'm following you up until here.
>>>>
>>>>    > Such a reservation system would even be able to backfill capacity
>>>>    > between reservations. At the end of the reservation the system
>>>>    > cleans-up any remaining instances and preps for the next
>>>>    > reservation.
>>>>
>>>>    By "backfill capacity between reservations", do you mean consume
>>>>    resources on the compute hosts that are "reserved" by this paying
>>>>    customer at some date in the future? i.e. Spot instances that can be
>>>>    killed off as necessary by the reservation system to free resources
>>>> to
>>>>    meet its reservation schedule?
>>>>
>>>>    > The are a couple of problems with putting this outside of Nova
>>>> though.
>>>>    > The main issue is that pre-emptible/spot type instances can't be
>>>>    > accommodated within the on-demand cloud capacity.
>>>>
>>>>    Correct. The reservation system needs complete control over a
>>>> subset of
>>>>    resource providers to be used for these spot instances. It would
>>>> be like
>>>>    a hotel reservation system being used for a motel where cars could
>>>>    simply pull up to a room with a vacant sign outside the door. The
>>>>    reservation system would never be able to work on accurate data
>>>> unless
>>>>    some part of the motel's rooms were carved out for reservation
>>>> system to
>>>>    use and cars to not pull up and take.
>>>>
>>>>     >  You could have the
>>>>    > reservation system implementing this feature, but that would
>>>> then put
>>>>    > other scheduling constraints on the cloud in order to be effective
>>>>    > (e.g., there would need to be automation changing the size of the
>>>>    > on-demand capacity so that the maximum pre-emptible capacity was
>>>>    > always available). The other issue (admittedly minor, but still a
>>>>    > consideration) is that it's another service - personally I'd love
>>>> to
>>>>    > see Nova support these advanced use-cases directly.
>>>>
>>>>    Welcome to the world of microservices. :)
>>>>
>>>>    -jay
>>>>
>>>>    _______________________________________________
>>>>    OpenStack-operators mailing list
>>>>    OpenStack-operators at lists.openstack.org
>>>> <mailto:OpenStack-operators at lists.openstack.org>
>>>>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> <mailto:OpenStack-operators at lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>



-- 
Cheers,
~Blairo



More information about the OpenStack-operators mailing list