[openstack-dev] [nova] Upgrade readiness check notes

Matt Riedemann mriedem at linux.vnet.ibm.com
Fri Dec 16 02:53:58 UTC 2016


A few of us have talked about writing a command to tell when you're 
ready to upgrade (restart services with new code) for Ocata because we 
have a few things landing in Ocata which are going from optional to 
required, namely cells v2 and the placement API service.

We already have the 030 API DB migration for cells v2 which went into 
the o-2 milestone today which breaks the API DB schema migrations if you 
haven't run at least 'nova-manage cell_v2 simple_cell_setup'.

We have noted a few times in the placement/resource providers 
discussions the need for something similar for the placement API, but 
because that spans multiple databases (API and cell DBs), and because it 
involves making sure the service is up and we can make REST API requests 
to it, we can't do it in just a DB schema migration.

So today dansmith, jaypipes, sdague, edleafe and myself jumped on a call 
to go over some notes / ideas being kicked around in an etherpad:

https://etherpad.openstack.org/p/nova-ocata-ready-for-upgrade

We agreed on writing a new CLI outside of nova-manage called nova-status 
which can perform the upgrade readiness check for both cells v2 and the 
placement API.

For cells v2 it's really going to check basically the same things as the 
030 API DB schema migration.

For the placement API, it's going to do at least two things:

1. Try and make a request to / on the placement API endpoint from the 
service catalog. This will at least check that (a) the placement 
endpoint is in the service catalog, (b) nova.conf is configured with 
credentials to make the request and (c) the service is running and 
accepting requests.

2. Count the number of resource_providers in the API DB and compare that 
to the number of compute_nodes in the cell DB and if there are fewer 
resource providers than compute nodes, it's an issue which we'll flag in 
the upgrade readiness CLI. This doesn't necessarily mean you can't 
upgrade to Ocata, it just means there might be fewer computes available 
for scheduling once you get to Ocata, so the chance of rebuilds and 
NoValidHost increases until the computes are upgraded to Ocata and 
configured to use the placement service to report 
inventories/usage/allocations for RAM, CPU and disk.

That 2nd point is important because we also agreed to make the filter 
scheduler NOT fallback to querying the compute_nodes table if there are 
no resource providers available from the placement API. That means when 
the scheduler gets a request to build or move a server, it's going to 
query the placement API for possible resource providers to serve the 
CPU/RAM/disk requirements for the build request spec and if nothing is 
available it'll result in a NoValidHost failure. That is a change in 
direction from a fallback plan we originally had in the spec here:

https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact

We're changing direction on that because we really want to make the 
placement service required in Ocata and not delay it's usage for another 
release, because as long as it's optional people are going to avoid 
deploying and using it, which pushes us further out from forward 
progress around the scheduler, placement service and resource tracker.

Regarding where this new CLI lived, and how it's deployed, and when it's 
called, we had discussed a few options there, even talking about 
splitting it out into it's own pip-installable package. We have a few 
options but we aren't going to be totally clear on that until we get the 
POC code written and then try to integrate it into grenade, so we're 
basically deferring that discussion/decision for now. Wherever it is, we 
know it needs to be run with the Ocata code (since that's where it's 
going to be available), and after running the simple_cell_setup 
command), and it needs to be run before restarting services with the 
Ocata code. I'm not totally sure if it needs to be run after the DB 
migrations or not, maybe Dan can clarify, but we'll sort it out for sure 
when we integrate with grenade.

Anyway, the POC is started here:

https://review.openstack.org/#/c/411517/

I've got the basic framework in place and there is a patch on top that 
does the cells v2 check. I plan on working on the placement API checks 
tomorrow.

If you've read this far, congratulations. This email is really just 
about communicating that things are happening because we have talked 
about the need for this a few times, but hadn't hashed it out yet.

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list