[openstack-dev] [Fuel] fuel master monitoring

Andrew Woodward xarses at gmail.com
Wed Jan 7 22:34:00 UTC 2015


On Wed, Jan 7, 2015 at 12:59 AM, Przemyslaw Kaminski
<pkaminski at mirantis.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hello,
>
> The updated version of monitoring code is available here:
>
> https://review.openstack.org/#/c/137785/
>
> This is based on monit as was agreed in this thread. The drawback of
> monit is that basically it's a very simple system that doesn't track
> state of checkers so still some Python code is needed so that user
> isn't spammed with low disk space notifications every minute.

can we make the alert an asserted state that needs to be cleared to
remove the warning? that way once asserted it won't re-raise the
error.

>
> On 01/05/2015 10:40 PM, Andrew Woodward wrote:
>> There are two threads here that need to be unraveled from each
>> other.
>>
>> 1. We need to prevent fuel from doing anything if the OS is out of
>> disk space. It leads to a very broken database from which it
>> requires a developer to reset to a usable state. From this point we
>> need to * develop a method for locking down the DB writes so that
>> fuel becomes RO until space is freed
>
> It's true that full disk space + DB writes can result in fatal
> database failure. I just don't know if we can lock the DB just like
> that? What if deployment is in progress?

We could do some form of complicated maths around guessing how much
space we need to finish a task but lets say at
20% free space we warn
5% free space we block tasks from starting
(both should be configurable, and probably ignore-able )

then we also need to have a separate volume for the DB from the logs.
This will remove the need do any complicated logic around blocking in
the DB

> I think the first way to reduce disk space usage would be to set
> logging level to WARNING instead of DEBUG. It's good to have DEBUG
> during development but I don't think it's that good for production.
> Besides it slows down deployment much, from what I observed.

The default logging level is supposed to be WARNING not debug.

>
>> * develop a method (or re-use existing) to notify the user that a
>> serious error state exists on the host. ( that could not be
>> dismissed)
>
> Well this is done already in the review I've linked above. It
> basically posts a notification to the UI system. Everything still
> works as before though until the disk is full. The CLI doesn't
> communicate in any way with notifications AFAIK so the warning is not
> shown there.
>
>> * we need some API that can lock / unlock the DB * we need some
>> monitor process that will trigger the lock/unlock
>
> This one can be easily changed with the code in the above review request.

I think this should become blocking tasks, not the db its self as above

>>
>> 2. We need monitoring for the master node and fuel components in
>> general as discussed at length above. unless we intend to use this
>>  to also monitor the services on deployed nodes (likely bad), then
>>  what we use to do this is irrelevant to getting this started. If
>> we are intending to use this to also monitor deployed nodes, (again
>> bad for the fuel node to do) then we need to standardize with what
>> we monitor the cloud with (Zabbix currently) and offer a single
>> pane of glass. Federation in the monitoring becomes a critical
>> requirement here as having more than one pane of glass is an
>> operations nightmare.
>
> AFAIK installation of Zabbix is optional. We want obligatory
> monitoring of the master which would somehow force its installation on
> the cloud nodes.
>
> P.
>
>>
>> Completing #1 is very important in the near term as I have had to
>> un-brick several deployments over it already. Also, in my mind
>> these are also separate tasks.
>>
>> On Thu, Nov 27, 2014 at 1:19 AM, Simon Pasquier
>> <spasquier at mirantis.com> wrote:
>>> I've added another option to the Etherpad: collectd can do basic
>>>  threshold monitoring and run any kind of scripts on alert
>>> notifications. The other advantage of collectd would be the RRD
>>> graphs for (almost) free. Of course since monit is already
>>> supported in Fuel, this is the fastest path to get something
>>> done. Simon
>>>
>>> On Thu, Nov 27, 2014 at 9:53 AM, Dmitriy Shulyak
>>> <dshulyak at mirantis.com> wrote:
>>>>
>>>> Is it possible to send http requests from monit, e.g for
>>>> creating notifications? I scanned through the docs and found
>>>> only alerts for sending mail, also where token (username/pass)
>>>>  for monit will be stored?
>>>>
>>>> Or maybe there is another plan? without any api interaction
>>>>
>>>> On Thu, Nov 27, 2014 at 9:39 AM, Przemyslaw Kaminski
>>>> <pkaminski at mirantis.com> wrote:
>>>>>
>>>>> This I didn't know. It's true in fact, I checked the
>>>>> manifests. Though monit is not deployed yet because of lack
>>>>> of packages in Fuel ISO. Anyways, I think the argument about
>>>>>  using yet another monitoring service is now rendered
>>>>> invalid.
>>>>>
>>>>> So +1 for monit? :)
>>>>>
>>>>> P.
>>>>>
>>>>>
>>>>> On 11/26/2014 05:55 PM, Sergii Golovatiuk wrote:
>>>>>
>>>>> Monit is easy and is used to control states of Compute nodes.
>>>>> We can adopt it for master node.
>>>>>
>>>>> -- Best regards, Sergii Golovatiuk, Skype #golserge IRC
>>>>> #holser
>>>>>
>>>>> On Wed, Nov 26, 2014 at 4:46 PM, Stanislaw Bogatkin
>>>>> <sbogatkin at mirantis.com> wrote:
>>>>>>
>>>>>> As for me - zabbix is overkill for one node. Zabbix Server
>>>>>>  + Agent + Frontend + DB + HTTP server, and all of it for
>>>>>> one node? Why not use something that was developed for
>>>>>> monitoring one node, doesn't have many deps and work out of
>>>>>> the box? Not necessarily Monit, but something similar.
>>>>>>
>>>>>> On Wed, Nov 26, 2014 at 6:22 PM, Przemyslaw Kaminski
>>>>>> <pkaminski at mirantis.com> wrote:
>>>>>>>
>>>>>>> We want to monitor Fuel master node while Zabbix is only
>>>>>>>  on slave nodes and not on master. The monitoring service
>>>>>>>  is supposed to be installed on Fuel master host (not
>>>>>>> inside a Docker container) and provide basic info about
>>>>>>> free disk space, etc.
>>>>>>>
>>>>>>> P.
>>>>>>>
>>>>>>>
>>>>>>> On 11/26/2014 02:58 PM, Jay Pipes wrote:
>>>>>>>>
>>>>>>>> On 11/26/2014 08:18 AM, Fox, Kevin M wrote:
>>>>>>>>>
>>>>>>>>> So then in the end, there will be 3 monitoring
>>>>>>>>> systems to learn, configure, and debug? Monasca for
>>>>>>>>> cloud users, zabbix for most of the physical systems,
>>>>>>>>> and sensu or monit "to be small"?
>>>>>>>>>
>>>>>>>>> Seems very complicated.
>>>>>>>>>
>>>>>>>>> If not just monasca, why not the zabbix thats already
>>>>>>>>> being deployed?
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, I had the same thoughts... why not just use zabbix
>>>>>>>> since it's used already?
>>>>>>>>
>>>>>>>> Best, -jay
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> OpenStack-dev mailing list
>>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
> _______________________________________________
>>>>> OpenStack-dev mailing list OpenStack-dev at lists.openstack.org
>>>>>
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
> _______________________________________________
>>>>> OpenStack-dev mailing list OpenStack-dev at lists.openstack.org
>>>>>
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
> _______________________________________________
>>>> OpenStack-dev mailing list OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
> _______________________________________________
>>> OpenStack-dev mailing list OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>>
>>>
>>>
>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iQIcBAEBCAAGBQJUrPV0AAoJEDMLDJqAXfZ0BTMP/2zoQmRXvpTPB77k5xiDim7X
> a3P5qVXIjeoRlAoCv1VPDZJra+cqKVRyZYTFf8j8VB3l/aKXhU7HxzFOVVgT8KiJ
> GnrudIsi9Nir1D+DxFZPWAb2zBPwp/6Wn90CkGwXWiHDzE/E8nSY5lgia2wK0tza
> /0dWLa6L6Lj4Vc5LViXXS7Q+7kEa1EZuAdAymEg6uAkEspWFvlUf2BQwxtHg1zbW
> 9Jd20DAUviR7xeWrbub/yIsTfQp+iMhI9beor4p7tcBtso33uA9H7UpGEbBwsnq9
> rF8xjO9cL0qObe0ki0uc7ymBmNKmONJvWz9F2hVUQCNt2085hj3ljMRJ661HYfWh
> vckoWRoGGBa9hPwklCSCMTLvtw2nzqXAC73WyVFmAMWPMX4sG9riSUKnXOeW68GM
> 9iSd5oYYqBeotdgc1daYcoEeX41KY7gcNEtBHt2B+xFFxPF0jeA/hDRir9HWTdWv
> /lNg+Bdqw7pHLQN9rlZHO9ggfPoJOR93YUjsUyv0L3ph3pxn55ebsY300lNJEbWk
> eR/xbn4yZaz9orApbr28F6CkQ0xzVTJXuN13QdzVivHqkwyXLHUycHfFA5bQlShu
> OmUXejfMcVDBlhTf+VGXwFAfSPNl0nyGKtdevJsGB7uqwh4pBvCxu2QH3CJIRjBe
> 2MvoKX75xKysGQecLX6S
> =ofKV
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Andrew
Mirantis
Ceph community



More information about the OpenStack-dev mailing list