[openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

Robert Collins robertc at robertcollins.net
Wed Sep 25 03:15:59 UTC 2013


One of the major things Tuskar does is model a datacenter - which is
very useful for error correlation, capacity planning and scheduling.

Long term I'd like this to be held somewhere where it is accessible
for schedulers and ceilometer etc. E.g. network topology + switch
information might be held by neutron where schedulers can rely on it
being available, or possibly held by a unified topology db with
scheduler glued into that, but updated by neutron / nova / cinder.
Obviously this is a) non-trivial and b) not designed yet.

However, the design of Tuskar today needs to accomodate a few things:
 - multiple reference architectures for clouds (unless there really is
one true design)
 - the fact that today we don't have such an integrated vertical scheduler.

So the current Tuskar model has three constructs that tie together to
model the DC:
 - nodes
 - resource classes (grouping different types of nodes into service
offerings - e.g. nodes that offer swift, or those that offer nova).
 - 'racks'

AIUI the initial concept of Rack was to map to a physical rack, but
this rapidly got shifted to be 'Logical Rack' rather than physical
rack, but I think of Rack as really just a special case of a general
modelling problem..

>From a deployment perspective, if you have two disconnected
infrastructures, thats two AZ's, and two underclouds : so we know that
any one undercloud is fully connected (possibly multiple subnets, but
one infrastructure). When would we want to subdivide that?

One case is quick fault aggregation: if a physical rack loses power,
rather than having 16 NOC folk independently investigating the same 16
down hypervisors, one would prefer to identify that the power to the
rack has failed (for non-HA powered racks); likewise if a single
switch fails (for non-HA network topologies) you want to identify that
that switch is down rather than investigating all the cascaded errors
independently.

A second case is scheduling: you may want to put nova instances on the
same switch as the cinder service delivering their block devices, when
possible, or split VM's serving HA tasks apart. (We currently do this
with host aggregates, but being able to do it directly would be much
nicer).

Lastly, if doing physical operations like power maintenance or moving
racks around in a datacentre, being able to identify machines in the
same rack can be super useful for planning, downtime announcements, or
host evacuation, and being able to find a specific machine in a DC is
also important (e.g. what shelf in the rack, what cartridge in a
chassis).

Back to 'Logical Rack' - you can see then that having a single
construct to group machines together doesn't really support these use
cases in a systematic fasion:- Physical rack modelling supports only a
subset of the location/performance/failure use cases, and Logical rack
doesn't support them at all: we're missing all the rich data we need
to aggregate faults rapidly : power, network, air conditioning - and
these things cover both single machine/groups of machines/racks/rows
of racks scale (consider a networked PDU with 10 hosts on it - thats a
fraction of a rack).

So, what I'm suggesting is that we model the failure and performance
domains directly, and include location (which is the incremental data
racks add once failure and performance domains are modelled) too. We
can separately noodle on exactly what failure domain and performance
domain modelling looks like - e.g. the scheduler focus group would be
a good place to have that discussion.

E.g. for any node I should be able to ask:
- what failure domains is this in? [e.g. power-45, switch-23, ac-15,
az-3, region-1]
- what locality-of-reference features does this have? [e.g. switch-23,
az-3, region-1]
- where is it [e.g. DC 2, pod 4, enclosure 2, row 5, rack 3, RU 30,
cartridge 40].

And then we should be able to slice and dice the DC easily by these aspects:
- location: what machines are in DC 2, or DC2 pod 4
- performance: what machines are all in region-1, or az-3, or switch-23.
- failure: what failure domains do machines X and Y have in common?
- failure: if we power off switch-23, what machines will be impacted?

So, what do you think?

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list