[openstack-dev] [nova] Austin summit cells v2 session recap
andrew at lascii.com
Tue May 3 14:07:16 UTC 2016
Thanks for the summary, this is great. Comments inline.
On Mon, May 2, 2016, at 09:32 PM, Matt Riedemann wrote:
> Andrew Laski led a double session for cells v2 on Wednesday afternoon.
> The full session etherpad is here .
> Andrew started with an overview of what's done and what's in progress.
> Note that some of the background on cells, what's been completed for
> cells v2 and what's being worked on is also in a summit video from a
> user conference talk that Andrew gave .
> We agreed to add the MQ switching  to get_cell_client and see what,
> if anything, breaks.
> DB migrations
> We had a quick rundown on the database tables slated for migration to
> the API database. Notable items for the DB table migrations:
> * Aggregates and quotas will be migrated, there are specs up for both of
> these from Mark Doffman.
> * The nova-network related tables won't be migrated since we've
> deprecated nova-network.
> * The agent_builds table won't be migrated. We plan on deprecating this
> API since it's only used by the XenAPI virt driver and it sounds like
> Rackspace doesn't even use/enable it.
> * We have to figure out what to do about the certificates table. The
> only thing using it is the os-certificates REST API and nova-cert
> service, and nothing in tree is using either of those now. The problem
> is, the ec2api repo on GitHub is using the nova-cert rpc api directly
> for s3 image download. So we need to figure out if we can move that into
> the ec2api repo and drop it from Nova or find some other solution.
> * keypairs will be migrated to the API DB. There was a TODO about
> needing to store the keypair type in the instance. I honestly can't
> remember exactly what that was for now, I'm hoping Andrew remembers.
The metadata api exposes the keypair type but that information is not
passed down during the boot request. Currently the metadata service is
pulling the keypair from the db on each access, and for cellsv1 making
an RPC request to the parent cell for that data. To avoid requiring the
metadata service to query the nova_api database we should pass down
keypair information and persist it with the instance, perhaps in
instance_extra, so that lookups can be done locally to the cell.
> * We agreed to move instance_groups and instance_group_policy to the API
> DB, but there is a TODO to sort out if instance_group_members should be
> in the API DB.
> For nova-network we agreed that we'll fail hard if someone tries to add
> a second cell to a cells v2 deployment and they aren't using Neutron.
> Chuck Carmack is working on some test plans for cells v2. There would be
> a multi-node/cell job where one node is running the API and cell0 and
> another is running a regular nova cell. There would also be migration
> testing as part of grenade.
> We discussed what needs to be documented and where it should live.
> Since all deployments will at least be a cell of one, setting that up
> will be in the install guide in docs.o.o. A multi-cell deployment would
> be documented in the admin guide.
> Anything related to the call path flow for different requests would live
> in the nova developer documentation (devref).
> This took a significant portion of the second cells v2 session and is
> one of the more complicated problems to sort out. There are problems
> with listing all instances across all cells especially when we support
> sorting. And we really have a latent bug in the API since we never
> restricted the list of valid sort keys for listing instances, so you can
> literally sort on anything in the instances table in the DB.
> There were some ideas about how to handle this:
> 1. Don't support sorting in the API if you have multiple cells. Leave it
> up to the caller to sort the results on their own. Obviously this isn't
> a great solution for existing users that rely on this in the API.
> 2. Each cell sorts the results individually, and the API merge sorts the
> results from the cells. There is still overhead here.
> 3. Don't split the database, or use a distributed database like Redis.
> Since this wasn't brought up in person in the session, or on Friday, it
> wasn't discussed. There is another thread about this though .
> 4. Use the OpenStack Searchlight project for doing massive queries like
> this. This would be optional for a cell of one but recommended/required
> for anyone running multiple cells. The downside to this is it's a new
> dependency, and requires Elasticsearch (but many deployments are
> probably already running an ELK stack for monitoring their logs). It's
> also unclear at an early stage how easy this would be to integrate into
> Nova. Plus deployers would need to setup Searchlight to listen to
> notifications emitted from Nova so the indexes are updated in ES. It is,
> however, arguably a better tool for the job than Nova trying to deal
> with filtering and sorting with python. There is general agreement
> within the core team that this is the path forward, but it's going to
> require investigation and testing before we get a better idea of how
> feasible this is.
> Related to paging, we also have an existing problem with the marker that
> will need to be sorted out before we can support multiple cells with v2.
> Flavors are now in the API DB, and we return a marker for paging, but it
> doesn't have the cell context, so we have to work that in. The good news
> is we control the marker and we never documented anywhere that it's a
> specific resource uuid (although for instances it is the last instance
> uuid processed). So this is fixable, but is a known issue right now.
> Jay Pipes has an idea about a generation ID for quotas, but it wasn't
> fleshed out in the session, so TBD.
I had put this on the etherpad before we reached a consensus on moving
allocations/inventories/etc... to the api database. I previously had a
concern on how something global like quotas would work if it was reliant
on generation id that was actually multiple generation ids, one for each
cell. Now that we've decided to pull the table with the generation id
out of the cells this is a moot discussion point.
> Upgrade process
> We didn't get into this too much. People are generally in agreement on
> what's already planned for the upgrade process from a non-cells v1
> deployment to cells v2. Andrew covers some of the proposed commands for
> upgrades in his presentation . We hope to build into oslo.messaging
> the ability to construct a transport_url from config options so that the
> transport_url doesn't have to be provided to the migration commands, we
> can just figure it out automatically.
> There is a rough plan for upgrading from cells v1 to v2 in a docs patch
> from Andrew .
> REST API for managing cells resources
> This came up at the very end of the session, but operators need a way to
> manage cells resources like they can for hosts. This would ideally be a
> REST API but could start as a nova-manage command for the initial
>  https://etherpad.openstack.org/p/newton-nova-cells
>  https://www.openstack.org/videos/video/nova-cells-v2-whats-going-on
>  https://review.openstack.org/#/c/298551/
>  https://review.openstack.org/#/c/267153/
> Matt Riedemann
> OpenStack Development Mailing List (not for usage questions)
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev