[Openstack-operators] [monitoring][messaging][rpc] When your OpenStack app is dead
bdobrelia at mirantis.com
Tue Aug 11 14:20:22 UTC 2015
There is known issue  in Oslo messaging and it seems resolved in
Kilo. But the UX of this one is very sad. For example, each time when
your AMQP cluster executed a single node failover and recovered running
happy, there is a chance some OpenStack apps, like Nova Compute, may
stuck in broken state and only a restat could help to heal them.
The typical log pattern for this broken state of a service is a "Timed
out waiting for reply". Hence, it may be a good idea to implement
monitoring filters based on that pattern and automatically set an alert
status for affected OpenStack services.
More information about the OpenStack-operators