Open Stack

Wed Mar 16 11:19:43 UTC 2016

On 03/16/2016 11:53 AM, Vladimir Kuklin wrote:
> Folks
> 
> As I generally support the idea of getting rid of cluster status, this
> requires thorough design. My opinion here is that we should leave it as
> a function of nodes state until we come up with a variant of better
> calculation of cluster status. Nevertheless it is true that cluster
> status is actually a function of other primary data and should be
> calculated on the client side. I suggest that we move towards more
> fine-grained component-based architecture (simplest example is OpenStack
> Fuel vs non-OpenStack Fuel) and figure out a way of calculating each
> component's status. Then we should calculate each component's status and
> then a cluster status should be an aggregate of those. For example, we
> could say that the only components we have right now are nodes and the
> aggregate is based on the nodes status and whether they are critical or not.

I believe the cluster status should be renamed to the deployment status.
It has nothing to the real *cluster* status which is only may be figured
out by LMA tools.

> 
> On Tue, Mar 15, 2016 at 9:16 PM, Andrew Woodward <xarses at gmail.com
> <mailto:xarses at gmail.com>> wrote:
> 
> 
> 
>     On Tue, Mar 15, 2016 at 4:04 AM Roman Prykhodchenko <me at romcheg.me
>     <mailto:me at romcheg.me>> wrote:
> 
>         Fuelers,
> 
>         I would like to continue the series of "Getting rid of …"
>         emails. This time I’d like to talk about statuses of clusters.
> 
>         The issues with that attribute is that it is not actually
>         related to real world very much and represents nothing. A few
>         month ago I proposed to make it more real-world-like [1] by
>         replacing a simple string by an aggregated value. However, after
>         task based deployment was introduced even that approach lost its
>         connection to the real world.
> 
>         My idea is to get rid of that attribute from a cluster and start
>         working with status of every single node in it. Nevertheless, we
>         only have tasks that are executed on nodes now, so we cannot
>         apply the "status" term to them. What if we replace that with a
>         sort of boolean value called maintenance_mode (or similar) that
>         we will use to tell if the node is operational or not. After
>         that we will be able to use an aggregated property for cluster
>         and check, if there are any nodes that are under a progress of
>         performing some tasks on them.
> 
> 
>     Yes, we still need an operations attribute, I'm not sure a bool is
>     enough, but you are quite correct, setting the status of the cluster
>     after operational == True based on the result of a specific node
>     failing, is in practice invalid. 
> 
>     At the same time, operational == True is not necessarily deployment
>     succeeded, its more along the line of deployment validated, which
>     may be further testing passing like ostf, or more manual in the
>     operator wants to do more testing their own prior to changing the
>     state. 
> 
>     As we adventure in to the LCM flow, we actually need status of each
>     component in addition of the general status of the cluster to
>     determine the proper course of action the on the next operation.
> 
>     For example nova-compute
>     if the cluster is not operational, then we can provision compute
>     nodes, and have them enabled, or active in the scheduler
>     automatically. However if the cluster is operational, a new compute
>     node must be disabled, or otherwise blocked from the default
>     scheduler until the node has received validation. In this case the
>     interpretation of operational is quite simple
> 
>     For example ceph
>     Here we care less about the status of the cluster (slightly, this
>     example ignores ceph's impact on nova-compute), and more about the
>     status of the service. In the case that we deploy ceph-osd's when
>     their are not replica factor osd hosts online (3) the we can
>     provision the OSD's similar to nova-compute,  in that we can bring
>     them all online and active and data could be placed to them
>     immediately (more or less). but if the ceph status is operational,
>     then we have to take a different action, the OSD's have to be
>     brought in disabled, and gradually(probably by the operator) have
>     their data weight increased so they don't clog the network with data
>     peering which causes the clients may woes. 
>      
> 
>         Thoughts, ideas?
> 
> 
>         References:
> 
>         1.
>         https://blueprints.launchpad.net/fuel/+spec/complex-cluster-status
> 
> 
>         - romcheg
>         __________________________________________________________________________
>         OpenStack Development Mailing List (not for usage questions)
>         Unsubscribe:
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
>     -- 
> 
>     --
> 
>     Andrew Woodward
> 
>     Mirantis
> 
>     Fuel Community Ambassador
> 
>     Ceph Community
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> -- 
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru <http://www.mirantis.ru/>
> vkuklin at mirantis.com <mailto:vkuklin at mirantis.com>
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Open Stack

[openstack-dev] [Fuel] Getting rid of cluster status

OpenStack

Community

Documentation

Branding & Legal