[nova][ptg] Healthchecks API

Ghanshyam Mann gmann at ghanshyammann.com
Thu May 28 03:37:15 UTC 2020


Hello Everyone,

This is one of the PTG topic[1] but it will be good if we can get some discussion to happen
before PTG and utilize the PTG time. I am sorry for starting this thread late.

To provide the healthchecks API for nova, I have done 1 PoC which is basically
extending the oslo healthchecks middleware plugins to add the real checks.

- https://review.opendev.org/#/c/731396/

I am writing the details on this etherpad[2] where you can see the response and the flow
of new plugins.

Also describing briefly here:

oslo provides healthchecks middleware with plugin framework. oslo provides two basic plugins which
 does file/port based health checks
- disable_by_file
- disable_by_files_ports

New Idea is to extend those checks with new plugins on nova side:
    
Nova can provide three plugins which are configurable[3]:
1. Nova_DB_healthcheck: Checks if API, cell0 and at least one cell DB is up. If so then return Healthy otherwise Unhealthy
2. Nova_MQ_healthcheck: Checks if at least one cells MQ is up. If so then return Healthy otherwise Unhealthy
3. Nova_services_healthcheck: Checks if at least one cell has at least one conductor and one compute service running. If so then return Healthy otherwise Unhealthy

All plugins will return the dict of results for example DB dict of API, all cells DB with status, Please refer the example response in later part.

TODO: Auth part for various plugins

Flow diagrame of new plugins: 
               - Nova_DB_healthcheck: 
                           -->API DB  is up
                                            |
                                             --> cell0 DB and at least one cell DB is up  
                                                                    I
                                                                     --> Return OK
               - Nova_MQ_healthcheck:
                            -->API MQ  is up
                                            |
                                             --> At leask: one cell MQ is up 
                                                                    I
                                                                     --> Return OK
               - Nova_services_healthcheck:
                            --> At least one cell has at least one conductor and one compute running
                            --> TODO: need to check other services for example scheduler at least
                                                                    I
                                                                     --> Return OK

               Result: 200 OK if all enabled plugins return OK otherwise 503 Service Unavailable.

Opinion/thoughts?

[1] https://etherpad.opendev.org/p/nova-victoria-ptg 
[2]https://etherpad.opendev.org/p/nova-healthchecks
[3] https://review.opendev.org/#/c/731396/1/etc/nova/api-paste.ini@107



More information about the openstack-discuss mailing list