[openstack-dev] [nova] Austin summit cells v2 session recap
mriedem at linux.vnet.ibm.com
Tue May 3 01:32:05 UTC 2016
Andrew Laski led a double session for cells v2 on Wednesday afternoon.
The full session etherpad is here .
Andrew started with an overview of what's done and what's in progress.
Note that some of the background on cells, what's been completed for
cells v2 and what's being worked on is also in a summit video from a
user conference talk that Andrew gave .
We agreed to add the MQ switching  to get_cell_client and see what,
if anything, breaks.
We had a quick rundown on the database tables slated for migration to
the API database. Notable items for the DB table migrations:
* Aggregates and quotas will be migrated, there are specs up for both of
these from Mark Doffman.
* The nova-network related tables won't be migrated since we've
* The agent_builds table won't be migrated. We plan on deprecating this
API since it's only used by the XenAPI virt driver and it sounds like
Rackspace doesn't even use/enable it.
* We have to figure out what to do about the certificates table. The
only thing using it is the os-certificates REST API and nova-cert
service, and nothing in tree is using either of those now. The problem
is, the ec2api repo on GitHub is using the nova-cert rpc api directly
for s3 image download. So we need to figure out if we can move that into
the ec2api repo and drop it from Nova or find some other solution.
* keypairs will be migrated to the API DB. There was a TODO about
needing to store the keypair type in the instance. I honestly can't
remember exactly what that was for now, I'm hoping Andrew remembers.
* We agreed to move instance_groups and instance_group_policy to the API
DB, but there is a TODO to sort out if instance_group_members should be
in the API DB.
For nova-network we agreed that we'll fail hard if someone tries to add
a second cell to a cells v2 deployment and they aren't using Neutron.
Chuck Carmack is working on some test plans for cells v2. There would be
a multi-node/cell job where one node is running the API and cell0 and
another is running a regular nova cell. There would also be migration
testing as part of grenade.
We discussed what needs to be documented and where it should live.
Since all deployments will at least be a cell of one, setting that up
will be in the install guide in docs.o.o. A multi-cell deployment would
be documented in the admin guide.
Anything related to the call path flow for different requests would live
in the nova developer documentation (devref).
This took a significant portion of the second cells v2 session and is
one of the more complicated problems to sort out. There are problems
with listing all instances across all cells especially when we support
sorting. And we really have a latent bug in the API since we never
restricted the list of valid sort keys for listing instances, so you can
literally sort on anything in the instances table in the DB.
There were some ideas about how to handle this:
1. Don't support sorting in the API if you have multiple cells. Leave it
up to the caller to sort the results on their own. Obviously this isn't
a great solution for existing users that rely on this in the API.
2. Each cell sorts the results individually, and the API merge sorts the
results from the cells. There is still overhead here.
3. Don't split the database, or use a distributed database like Redis.
Since this wasn't brought up in person in the session, or on Friday, it
wasn't discussed. There is another thread about this though .
4. Use the OpenStack Searchlight project for doing massive queries like
this. This would be optional for a cell of one but recommended/required
for anyone running multiple cells. The downside to this is it's a new
dependency, and requires Elasticsearch (but many deployments are
probably already running an ELK stack for monitoring their logs). It's
also unclear at an early stage how easy this would be to integrate into
Nova. Plus deployers would need to setup Searchlight to listen to
notifications emitted from Nova so the indexes are updated in ES. It is,
however, arguably a better tool for the job than Nova trying to deal
with filtering and sorting with python. There is general agreement
within the core team that this is the path forward, but it's going to
require investigation and testing before we get a better idea of how
feasible this is.
Related to paging, we also have an existing problem with the marker that
will need to be sorted out before we can support multiple cells with v2.
Flavors are now in the API DB, and we return a marker for paging, but it
doesn't have the cell context, so we have to work that in. The good news
is we control the marker and we never documented anywhere that it's a
specific resource uuid (although for instances it is the last instance
uuid processed). So this is fixable, but is a known issue right now.
Jay Pipes has an idea about a generation ID for quotas, but it wasn't
fleshed out in the session, so TBD.
We didn't get into this too much. People are generally in agreement on
what's already planned for the upgrade process from a non-cells v1
deployment to cells v2. Andrew covers some of the proposed commands for
upgrades in his presentation . We hope to build into oslo.messaging
the ability to construct a transport_url from config options so that the
transport_url doesn't have to be provided to the migration commands, we
can just figure it out automatically.
There is a rough plan for upgrading from cells v1 to v2 in a docs patch
from Andrew .
REST API for managing cells resources
This came up at the very end of the session, but operators need a way to
manage cells resources like they can for hosts. This would ideally be a
REST API but could start as a nova-manage command for the initial version.
More information about the OpenStack-dev