[Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session

Blair Bethwaite blair.bethwaite at gmail.com
Thu Apr 6 13:33:46 UTC 2017


Hi Jay,

On 5 April 2017 at 03:21, Jay Pipes <jaypipes at gmail.com> wrote:
> On 04/03/2017 06:07 PM, Blair Bethwaite wrote:
>> That's something of an oversimplification. A reservation system
>> outside of Nova could manipulate Nova host-aggregates to "cordon off"
>> infrastructure from on-demand access (I believe Blazar already uses
>> this approach), and it's not much of a jump to imagine operators being
>> able to twiddle the available reserved capacity in a finite cloud so
>> that reserved capacity can be offered to the subset of users/projects
>> that need (or perhaps have paid for) it.
>
>
> Sure, I'm following you up until here.
>
>> Such a reservation system would even be able to backfill capacity
>> between reservations. At the end of the reservation the system
>> cleans-up any remaining instances and preps for the next
>> reservation.
>
>
> By "backfill capacity between reservations", do you mean consume resources
> on the compute hosts that are "reserved" by this paying customer at some
> date in the future? i.e. Spot instances that can be killed off as necessary
> by the reservation system to free resources to meet its reservation
> schedule?

That is one possible use-case, but it could also backfill with other
reservations that do not overlap. This is a common feature of HPC job
schedulers that have to deal with the competing needs of large
parallel jobs (single users with temporal workload constraints) and
many small jobs (many users with throughput needs).

>> The are a couple of problems with putting this outside of Nova though.
>> The main issue is that pre-emptible/spot type instances can't be
>> accommodated within the on-demand cloud capacity.
>
>
> Correct. The reservation system needs complete control over a subset of
> resource providers to be used for these spot instances. It would be like a
> hotel reservation system being used for a motel where cars could simply pull
> up to a room with a vacant sign outside the door. The reservation system
> would never be able to work on accurate data unless some part of the motel's
> rooms were carved out for reservation system to use and cars to not pull up
> and take.

In order to make reservations, yes. However, preemptible instances are
a valid use-case without also assuming reservations (they just happen
to complement each other). If we want the system to be really useful
and flexible we should be considering leases and queuing, e.g.:

- Leases requiring a single VM or groups of VMs that must run in parallel.
- Best-effort leases, which will wait in a queue until resources
become available.
- Advance reservation leases, which must start at a specific time.
- Immediate leases, which must start right now, or not at all.

The above bullets are pulled from
http://haizea.cs.uchicago.edu/whatis.html (Haizea is a scheduling
framework that can plug into OpenNebula), and I believe these fit very
well with the scheduling needs of the majority of private & hybrid
clouds. It also has other notable features such as preemptible leases.

I remain perplexed by the fact that OpenStack, as the preeminent open
private cloud framework, still only deals in on-demand access as
though most cloud-deployments are infinite. Yet today users have to
keep polling the boot API until they get something: "not now... not
now... not now..." - no queuing, no fair-share, nothing. Users should
only ever see NoValidHost if they requested "an instance now or not at
all".

I do not mean to ignore the existence of Blazar here, but development
on that has only recently started up again and part of the challenge
for Blazar is that resource leases, even simple whole compute nodes,
don't seem to have ever been well supported in Nova.

-- 
Cheers,
~Blairo



More information about the OpenStack-operators mailing list