[openstack-dev] [Fuel] Getting rid of cluster status
Bogdan Dobrelya
bdobrelia at mirantis.com
Wed Mar 16 11:19:43 UTC 2016
On 03/16/2016 11:53 AM, Vladimir Kuklin wrote:
> Folks
>
> As I generally support the idea of getting rid of cluster status, this
> requires thorough design. My opinion here is that we should leave it as
> a function of nodes state until we come up with a variant of better
> calculation of cluster status. Nevertheless it is true that cluster
> status is actually a function of other primary data and should be
> calculated on the client side. I suggest that we move towards more
> fine-grained component-based architecture (simplest example is OpenStack
> Fuel vs non-OpenStack Fuel) and figure out a way of calculating each
> component's status. Then we should calculate each component's status and
> then a cluster status should be an aggregate of those. For example, we
> could say that the only components we have right now are nodes and the
> aggregate is based on the nodes status and whether they are critical or not.
I believe the cluster status should be renamed to the deployment status.
It has nothing to the real *cluster* status which is only may be figured
out by LMA tools.
>
> On Tue, Mar 15, 2016 at 9:16 PM, Andrew Woodward <xarses at gmail.com
> <mailto:xarses at gmail.com>> wrote:
>
>
>
> On Tue, Mar 15, 2016 at 4:04 AM Roman Prykhodchenko <me at romcheg.me
> <mailto:me at romcheg.me>> wrote:
>
> Fuelers,
>
> I would like to continue the series of "Getting rid of …"
> emails. This time I’d like to talk about statuses of clusters.
>
> The issues with that attribute is that it is not actually
> related to real world very much and represents nothing. A few
> month ago I proposed to make it more real-world-like [1] by
> replacing a simple string by an aggregated value. However, after
> task based deployment was introduced even that approach lost its
> connection to the real world.
>
> My idea is to get rid of that attribute from a cluster and start
> working with status of every single node in it. Nevertheless, we
> only have tasks that are executed on nodes now, so we cannot
> apply the "status" term to them. What if we replace that with a
> sort of boolean value called maintenance_mode (or similar) that
> we will use to tell if the node is operational or not. After
> that we will be able to use an aggregated property for cluster
> and check, if there are any nodes that are under a progress of
> performing some tasks on them.
>
>
> Yes, we still need an operations attribute, I'm not sure a bool is
> enough, but you are quite correct, setting the status of the cluster
> after operational == True based on the result of a specific node
> failing, is in practice invalid.
>
> At the same time, operational == True is not necessarily deployment
> succeeded, its more along the line of deployment validated, which
> may be further testing passing like ostf, or more manual in the
> operator wants to do more testing their own prior to changing the
> state.
>
> As we adventure in to the LCM flow, we actually need status of each
> component in addition of the general status of the cluster to
> determine the proper course of action the on the next operation.
>
> For example nova-compute
> if the cluster is not operational, then we can provision compute
> nodes, and have them enabled, or active in the scheduler
> automatically. However if the cluster is operational, a new compute
> node must be disabled, or otherwise blocked from the default
> scheduler until the node has received validation. In this case the
> interpretation of operational is quite simple
>
> For example ceph
> Here we care less about the status of the cluster (slightly, this
> example ignores ceph's impact on nova-compute), and more about the
> status of the service. In the case that we deploy ceph-osd's when
> their are not replica factor osd hosts online (3) the we can
> provision the OSD's similar to nova-compute, in that we can bring
> them all online and active and data could be placed to them
> immediately (more or less). but if the ceph status is operational,
> then we have to take a different action, the OSD's have to be
> brought in disabled, and gradually(probably by the operator) have
> their data weight increased so they don't clog the network with data
> peering which causes the clients may woes.
>
>
> Thoughts, ideas?
>
>
> References:
>
> 1.
> https://blueprints.launchpad.net/fuel/+spec/complex-cluster-status
>
>
> - romcheg
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> --
>
> --
>
> Andrew Woodward
>
> Mirantis
>
> Fuel Community Ambassador
>
> Ceph Community
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 35bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru <http://www.mirantis.ru/>
> vkuklin at mirantis.com <mailto:vkuklin at mirantis.com>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
More information about the OpenStack-dev
mailing list