[openstack-dev] [Nova] Cells conversation starter

Andrew Laski andrew.laski at rackspace.com
Wed Oct 22 19:01:03 UTC 2014


On 10/22/2014 03:42 AM, Vineet Menon wrote:
>
> On 22 October 2014 06:24, Tom Fifield <tom at openstack.org 
> <mailto:tom at openstack.org>> wrote:
>
>     On 22/10/14 03:07, Andrew Laski wrote:
>     >
>     > On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
>     >> On 10/20/2014 08:00 PM, Andrew Laski wrote:
>     >>> One of the big goals for the Kilo cycle by users and
>     developers of the
>     >>> cells functionality within Nova is to get it to a point where
>     it can be
>     >>> considered a first class citizen of Nova.  Ultimately I think
>     this comes
>     >>> down to getting it tested by default in Nova jobs, and making
>     it easy
>     >>> for developers to work with.  But there's a lot of work to get
>     there.
>     >>> In order to raise awareness of this effort, and get the
>     conversation
>     >>> started on a few things, I've summarized a little bit about
>     cells and
>     >>> this effort below.
>     >>>
>     >>>
>     >>> Goals:
>     >>>
>     >>> Testing of a single cell setup in the gate.
>     >>> Feature parity.
>     >>> Make cells the default implementation. Developers write code
>     once and
>     >>> it works for  cells.
>     >>>
>     >>> Ultimately the goal is to improve maintainability of a large
>     feature
>     >>> within the Nova code base.
>     >>>
>     >> Thanks for the write-up Andrew! Some thoughts/questions below.
>     Looking
>     >> forward to the discussion on some of these topics, and would be
>     happy to
>     >> review the code once we get to that point.
>     >>
>     >>> Feature gaps:
>     >>>
>     >>> Host aggregates
>     >>> Security groups
>     >>> Server groups
>     >>>
>     >>>
>     >>> Shortcomings:
>     >>>
>     >>> Flavor syncing
>     >>>      This needs to be addressed now.
>     >>>
>     >>> Cells scheduling/rescheduling
>     >>> Instances can not currently move between cells
>     >>>      These two won't affect the default one cell setup so they
>     will be
>     >>> addressed later.
>     >>>
>     >>>
>     >>> What does cells do:
>     >>>
>     >>> Schedule an instance to a cell based on flavor slots available.
>     >>> Proxy API requests to the proper cell.
>     >>> Keep a copy of instance data at the global level for quick
>     retrieval.
>     >>> Sync data up from a child cell to keep the global level up to
>     date.
>     >>>
>     >>>
>     >>> Simplifying assumptions:
>     >>>
>     >>> Cells will be treated as a two level tree structure.
>     >>>
>     >> Are we thinking of making this official by removing code that
>     actually
>     >> allows cells to be an actual tree of depth N? I am not sure if
>     doing so
>     >> would be a win, although it does complicate the
>     RPC/Messaging/State code
>     >> a bit, but if it's not being used, even though a nice
>     generalization,
>     >> why keep it around?
>     >
>     > My preference would be to remove that code since I don't
>     envision anyone
>     > writing tests to ensure that functionality works and/or doesn't
>     > regress.  But there's the challenge of not knowing if anyone is
>     actually
>     > relying on that behavior.  So initially I'm not creating a
>     specific work
>     > item to remove it.  But I think it needs to be made clear that
>     it's not
>     > officially supported and may get removed unless a case is made for
>     > keeping it and work is put into testing it.
>
>     While I agree that N is a bit interesting, I have seen N=3 in
>     production
>
>     [central API]-->[state/region1]-->[state/region DC1]
>                                    \->[state/region DC2]
>                   -->[state/region2 DC]
>                   -->[state/region3 DC]
>                   -->[state/region4 DC]
>
> I'm curious.
> What are the use cases for this deployment? Agreeably, root node runs 
> n-api along with horizon, key management etc. What components  are 
> deployed in tier 2 and tier 3?
> And AFAIK, currently, openstack cell deployment isn't even a tree but 
> DAG since, one cell can have multiple parents. Has anyone come up any 
> such requirement?
>
>

While there's nothing to prevent a cell from having multiple parents I 
would be curious to know if this would actually work in practice, since 
I can imagine a number of cases that might cause problems. And is there 
a practical use for this?

Maybe we should start logging a warning when this is setup stating that 
this is an unsupported(i.e. untested) configuration to start to codify 
the design as that of a tree.  At least for the initial scope of work I 
think this makes sense, and if a case is made for a DAG setup that can 
be done independently.

>
>     >>
>     >>> Plan:
>     >>>
>     >>> Fix flavor breakage in child cell which causes boot tests to fail.
>     >>> Currently the libvirt driver needs flavor.extra_specs which is not
>     >>> synced to the child cell.  Some options are to sync flavor and
>     extra
>     >>> specs to child cell db, or pass full data with the request.
>     >>> https://review.openstack.org/#/c/126620/1 offers a means of
>     passing full
>     >>> data with the request.
>     >>>
>     >>> Determine proper switches to turn off Tempest tests for
>     features that
>     >>> don't work with the goal of getting a voting job.  Once this
>     is in place
>     >>> we can move towards feature parity and work on internal
>     refactorings.
>     >>>
>     >>> Work towards adding parity for host aggregates, security
>     groups, and
>     >>> server groups.  They should be made to work in a single cell
>     setup, but
>     >>> the solution should not preclude them from being used in multiple
>     >>> cells.  There needs to be some discussion as to whether a host
>     aggregate
>     >>> or server group is a global concept or per cell concept.
>     >>>
>     >> Have there been any previous discussions on this topic? If so
>     I'd really
>     >> like to read up on those to make sure I understand the pros and
>     cons
>     >> before the summit session.
>     >
>     > The only discussion I'm aware of is some comments on
>     > https://review.openstack.org/#/c/59101/ , though they mention a
>     > discussion at the Utah mid-cycle.
>     >
>     > The main con I'm aware of for defining these as global concepts
>     is that
>     > there is no rescheduling capability in the cells scheduler.  So if a
>     > build is sent to a cell with a host aggregate that can't fit that
>     > instance the build will fail even though there may be space in
>     that host
>     > aggregate from a global perspective.  That should be somewhat
>     > straightforward to address though.
>     >
>     > I think it makes sense to define these as global concepts.  But
>     these
>     > are features that aren't used with cells yet so I haven't put a
>     lot of
>     > thought into potential arguments or cases for doing this one way or
>     > another.
>     >
>
> Keeping aggregates local also poses problem in case when cells are 
> temporarily dead (out of system). Since top level doesn't have any 
> idea about local features including who all to contact for deletion of 
> a particular aggregate.
>
>     >
>     >>> Work towards merging compute/api.py and compute/cells_api.py
>     so that
>     >>> developers only need to make changes/additions in once place. 
>     The goal
>     >>> is for as much as possible to be hidden by the RPC layer,
>     which will
>     >>> determine whether a call goes to a compute/conductor/cell.
>     >>>
>     >>> For syncing data between cells, look at using objects to
>     handle the
>     >>> logic of writing data to the cell/parent and then syncing the
>     data to
>     >>> the other.
>     >>>
>     >> Some of that work has been done already, although in a somewhat
>     ad-hoc
>     >> fashion, were you thinking of extending objects to support this
>     natively
>     >> (whatever that means), or do we continue to inline the code in the
>     >> existing object methods.
>     >
>     > I would prefer to have some native support for this.  In general
>     data is
>     > considered authoritative at the global level or the cell level.  For
>     > example, instance data is synced down from the global level to a
>     > cell(except for a few fields which are synced up) but a
>     migration would
>     > be synced up.  I could imagine decorators that would specify how
>     data
>     > should be synced and handle that as transparently as possible.
>     >
>     >>
>     >>> A potential migration scenario is to consider a non cells
>     setup to be a
>     >>> child cell and converting to cells will mean setting up a
>     parent cell
>     >>> and linking them.  There are periodic tasks in place to sync
>     data up
>     >>> from a child already, but a manual kick off mechanism will
>     need to be
>     >>> added.
>     >>>
>     >>>
>     >>> Future plans:
>     >>>
>     >>> Something that has been considered, but is out of scope for
>     now, is that
>     >>> the parent/api cell doesn't need the same data model as the
>     child cell.
>     >>> Since the majority of what it does is act as a cache for API
>     requests,
>     >>> it does not need all the data that a cell needs and what data
>     it does
>     >>> need could be stored in a form that's optimized for reads.
>     >>>
>     >>>
>     >>> Thoughts?
>     >>>
>     >>> _______________________________________________
>     >>> OpenStack-dev mailing list
>     >>> OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >>
>     >> _______________________________________________
>     >> OpenStack-dev mailing list
>     >> OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
>     >
>     > _______________________________________________
>     > OpenStack-dev mailing list
>     > OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141022/2ff2c422/attachment.html>


More information about the OpenStack-dev mailing list