[openstack-dev] [Fuel] Getting rid of cluster status

Vladimir Kuklin vkuklin at mirantis.com
Wed Mar 16 10:53:04 UTC 2016


Folks

As I generally support the idea of getting rid of cluster status, this
requires thorough design. My opinion here is that we should leave it as a
function of nodes state until we come up with a variant of better
calculation of cluster status. Nevertheless it is true that cluster status
is actually a function of other primary data and should be calculated on
the client side. I suggest that we move towards more fine-grained
component-based architecture (simplest example is OpenStack Fuel vs
non-OpenStack Fuel) and figure out a way of calculating each component's
status. Then we should calculate each component's status and then a cluster
status should be an aggregate of those. For example, we could say that the
only components we have right now are nodes and the aggregate is based on
the nodes status and whether they are critical or not.

On Tue, Mar 15, 2016 at 9:16 PM, Andrew Woodward <xarses at gmail.com> wrote:

>
>
> On Tue, Mar 15, 2016 at 4:04 AM Roman Prykhodchenko <me at romcheg.me> wrote:
>
>> Fuelers,
>>
>> I would like to continue the series of "Getting rid of …" emails. This
>> time I’d like to talk about statuses of clusters.
>>
>> The issues with that attribute is that it is not actually related to real
>> world very much and represents nothing. A few month ago I proposed to make
>> it more real-world-like [1] by replacing a simple string by an aggregated
>> value. However, after task based deployment was introduced even that
>> approach lost its connection to the real world.
>>
>> My idea is to get rid of that attribute from a cluster and start working
>> with status of every single node in it. Nevertheless, we only have tasks
>> that are executed on nodes now, so we cannot apply the "status" term to
>> them. What if we replace that with a sort of boolean value called
>> maintenance_mode (or similar) that we will use to tell if the node is
>> operational or not. After that we will be able to use an aggregated
>> property for cluster and check, if there are any nodes that are under a
>> progress of performing some tasks on them.
>>
>
> Yes, we still need an operations attribute, I'm not sure a bool is enough,
> but you are quite correct, setting the status of the cluster after
> operational == True based on the result of a specific node failing, is in
> practice invalid.
>
> At the same time, operational == True is not necessarily deployment
> succeeded, its more along the line of deployment validated, which may be
> further testing passing like ostf, or more manual in the operator wants to
> do more testing their own prior to changing the state.
>
> As we adventure in to the LCM flow, we actually need status of each
> component in addition of the general status of the cluster to determine the
> proper course of action the on the next operation.
>
> For example nova-compute
> if the cluster is not operational, then we can provision compute nodes,
> and have them enabled, or active in the scheduler automatically. However if
> the cluster is operational, a new compute node must be disabled, or
> otherwise blocked from the default scheduler until the node has received
> validation. In this case the interpretation of operational is quite simple
>
> For example ceph
> Here we care less about the status of the cluster (slightly, this example
> ignores ceph's impact on nova-compute), and more about the status of the
> service. In the case that we deploy ceph-osd's when their are not replica
> factor osd hosts online (3) the we can provision the OSD's similar to
> nova-compute,  in that we can bring them all online and active and data
> could be placed to them immediately (more or less). but if the ceph status
> is operational, then we have to take a different action, the OSD's have to
> be brought in disabled, and gradually(probably by the operator) have their
> data weight increased so they don't clog the network with data peering
> which causes the clients may woes.
>
>
>> Thoughts, ideas?
>>
>>
>> References:
>>
>> 1. https://blueprints.launchpad.net/fuel/+spec/complex-cluster-status
>>
>>
>> - romcheg
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
>
> --
>
> Andrew Woodward
>
> Mirantis
>
> Fuel Community Ambassador
>
> Ceph Community
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
35bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin at mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160316/a89c3188/attachment.html>


More information about the OpenStack-dev mailing list