Open Stack

Sun Sep 8 00:20:08 UTC 2013

On Fri, Sep 6, 2013 at 4:49 PM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from James Slagle's message of 2013-09-06 10:27:32 -0700:
>> Idea 0
>> ------
>>  * For full HA, you can add additional nodes that didn't share single points of
>>    failures (like the bus).
>
> I'm not sure I agree that the bus is a SPOF. Both qpid and rabbit+kombu can
> operate in HA active/active mode, so why would those be SPOFs? Certainly
> not a _likely_ SPOF.

I meant that if you were just adding additional nodes for scalability, that
doesn't necessarily imply that you're also eliminating SPOFs.  I agree that if
you add additional nodes, and configured the bus to be active/active, that
would eliminate the SPOF.

This is along the lines of what I meant by "HA Management Node" in general.
You can add nodes for scalability, such as a bunch more Nova Compute nodes,
that don't have the message broker installed on them.  Or, you can add nodes
for HA (and scalability too) by deploying additional Nova Compute nodes that
had the broker installed and configured in active/active. At least I think you
could anyway :)

Scalability vs HA is kind of blurred.  But as I see it, adding for scalability
gets you some HA, but doesn't necessarily imply eliminating SPOFs, which is
important for true HA.

>
>>  * The green lines are meant to show the management network domain, and can be
>>    thought of roughly as "managed by".
>>  * Logical Rack is just meant to imply "a grouping of baremetal hardware".  It
>>    might be a physical rack, but it doesn't have to be.
>>  * Just of note, there's a box there representing where we feel Tuskar would
>>    get plugged in.
>>
>> Pros/Cons (+/-):
>> + Easy to install (You start with only one machine in the
>>   datacenter  running the whole stack of services in HA mode and from there you
>>   can  just expand it to another machine, enroll the rest of the
>> machines in  the
>>   datacenter and you're ready to go.)
>> + Easy to upgrade (Since we have fully HA, you could then turn off one machine
>>   in the control plane triggering a HA failover, update that machine, bring it
>>   up, turn off another machine in the control plane,  etc...)
>> - Every node in the overcloud has to be able to talk back to controller rack
>>   (e.g. heat/nova)
>
> Note that this is just OpenStack's architecture. Heat and Nova both
> have separated their API from their smarts to make it very straight
> forward to isolate tenant access from deeper resources. So each node
> just needs access to nova and heat API endpoints, and both in very
> limited, predictable capacities. I think that mitigates this to a very
> minor concern.

The thinking here was that having the nodes in a logical rack only needing
networking connectivity back to a single Nova Compute as opposed to the entire
Undercloud services could be desirable.

>
>> - Possible performance issues when bringing up a large number of machines.
>>   (think hyperscale).
>
> This is perhaps the largest concern, but is why we've always suggested
> that eventually the scaled out compute boxes will work better with some
> hardware affinity.
>
>> - Large failure domain.  If the HA cluster fails, you've lost all visibility
>>   into and management of the infrastructure.
>
> The point of HA is to make the impact and frequency of these failures
> very small. So this one is also mitigated by "doing HA well".

Agreed.  The point of listing this as a con for this Idea was just that the
other ideas had some amount of HA built into them due to their distributed
nature.

>
>> - What does the IPMI network look like in this model?  Can we assume full IPMI
>>   connectivity across racks, logical or physical?
>>
>
> Undercloud compute needs to be able to access IPMI. The current nova
> baremetal requires assigning specific hardware to specific compute nodes,
> so each rack can easily get its own compute node.

Having a compute node per rack was where we were going with the next idea.  So
in Idea 0, if you wanted to put a management node in a rack, it would have to
be all the Undercloud services.  In Idea 1, it could be the Leaf Node...just
compute and the other minimum required services.

It doesn't sound like that would have be a requirement though, as it seems
possible to have IPMI across racks (based on Robert's reply, and what other
reading I've done so far).

>> Idea 1
>> ------
>> The thought here is to have 1 Undercloud again, but be able to deploy N
>> Undercloud Leaf Nodes as needed for scale.  The Leaf Node is a smaller subset
>> of services than what is needed on the full Undercloud Node.  Essentially, it
>> is enough services to do baremetal provisioning, Heat orchestration, and
>> Neutron for networking.  Diagram of this idea is at [2].  In the diagram, there
>> is one Leaf Node per logical rack.
>>
>
> I think this is very close to the near-term evolution I've been thinking
> about for TripleO. We want to get good at deploying a simple architecture
> first, but then we know we don't need to be putting the heat engines,
> nova schedulers, etc, on every scale-out box in the undercloud.

That's good to know :).

>
>> In this model, the Undercloud provisions and deploys Leaf Nodes as needed when
>> new hardware is added to the environment.  The Leaf Nodes then handle
>> deployment requests from the Undercloud for the Overcloud nodes.
>>
>> As such, there is some scalability built into the architecture in a distributed
>> fashion.  Adding more scalability and HA would be accomplished in a similar
>> fashion to Idea 0, by adding additional HA Leaf Nodes, etc.
>>
>> Pros/Cons (+/-):
>> + As scale is added with more Leaf Nodes, it's a smaller set of services.
>> - Additional image management of the Leaf Node image
>
> I think if you've accepted image management for 3 images (undercloud,
> overcloud-control, overcloud-compute), adding one more is not such a
> daunting thing. The benefit is that there is less software running that
> may break your services.

+1.  I just wanted to make sure it was listed as something additional to do for
this Idea.

>
>> - Additional rack space wasted for the Leaf Node
>
> This is unavoidable for scale-out IMO. There is certainly a scenario where
> we can convert some of these to overcloud resources after an initial data center
> bring-up, so that also mitigates the impact.
>
>> + Smaller failure domain as the logical rack is only dependent on the Leaf
>>   Node.
>> + The ratio of HA Management Nodes would be smaller because of the offloaded
>>   services.
>
> I'm not sure I follow what an "HA Management Node" is.

I tried to explain it above a bit.  Basically, you can add a node for
scalability, but that doesn't necessarily eliminate SPOF, unless you're aiming
for HA specifically, and configure the node that way.

So, in this Idea, I think a user is encouraged (so to speak) to deploy Leaf
Nodes for Logical Racks.  Deploying Leaf Nodes should be cheaper and easier
than deploying something with all Undercloud services.  You would be inherently
adding scale to the architecture as you deploy it.  And hopefully, that would
mean less nodes to add further down the road for only scalability/HA reasons.

>> Idea 2
>> ------
>> Pros/Cons (+/-):
>> + network/security isolation
>> - multiple Undercloud complexity
>
> This is probably the main reason I am skeptical at this idea. We
> shouldn't have to make a whole new cloud/region/etc. just to scale what
> is essentially a homogeneous service. It adds management complexity,
> and complexity is far worse than a small amount of image management
> (which seems to be main difference between 1 and 2).
>
> All great ideas, thanks for sharing!

No problem, appreciate the time it took to read through it and reply :).

-- 
-- James Slagle
--

Open Stack

[openstack-dev] [tripleo] Scaling of TripleO

OpenStack

Community

Documentation

Branding & Legal