[openstack-dev] [Nova] Cells conversation starter
Andrew Laski
andrew.laski at rackspace.com
Wed Oct 22 19:01:03 UTC 2014
On 10/22/2014 03:42 AM, Vineet Menon wrote:
>
> On 22 October 2014 06:24, Tom Fifield <tom at openstack.org
> <mailto:tom at openstack.org>> wrote:
>
> On 22/10/14 03:07, Andrew Laski wrote:
> >
> > On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
> >> On 10/20/2014 08:00 PM, Andrew Laski wrote:
> >>> One of the big goals for the Kilo cycle by users and
> developers of the
> >>> cells functionality within Nova is to get it to a point where
> it can be
> >>> considered a first class citizen of Nova. Ultimately I think
> this comes
> >>> down to getting it tested by default in Nova jobs, and making
> it easy
> >>> for developers to work with. But there's a lot of work to get
> there.
> >>> In order to raise awareness of this effort, and get the
> conversation
> >>> started on a few things, I've summarized a little bit about
> cells and
> >>> this effort below.
> >>>
> >>>
> >>> Goals:
> >>>
> >>> Testing of a single cell setup in the gate.
> >>> Feature parity.
> >>> Make cells the default implementation. Developers write code
> once and
> >>> it works for cells.
> >>>
> >>> Ultimately the goal is to improve maintainability of a large
> feature
> >>> within the Nova code base.
> >>>
> >> Thanks for the write-up Andrew! Some thoughts/questions below.
> Looking
> >> forward to the discussion on some of these topics, and would be
> happy to
> >> review the code once we get to that point.
> >>
> >>> Feature gaps:
> >>>
> >>> Host aggregates
> >>> Security groups
> >>> Server groups
> >>>
> >>>
> >>> Shortcomings:
> >>>
> >>> Flavor syncing
> >>> This needs to be addressed now.
> >>>
> >>> Cells scheduling/rescheduling
> >>> Instances can not currently move between cells
> >>> These two won't affect the default one cell setup so they
> will be
> >>> addressed later.
> >>>
> >>>
> >>> What does cells do:
> >>>
> >>> Schedule an instance to a cell based on flavor slots available.
> >>> Proxy API requests to the proper cell.
> >>> Keep a copy of instance data at the global level for quick
> retrieval.
> >>> Sync data up from a child cell to keep the global level up to
> date.
> >>>
> >>>
> >>> Simplifying assumptions:
> >>>
> >>> Cells will be treated as a two level tree structure.
> >>>
> >> Are we thinking of making this official by removing code that
> actually
> >> allows cells to be an actual tree of depth N? I am not sure if
> doing so
> >> would be a win, although it does complicate the
> RPC/Messaging/State code
> >> a bit, but if it's not being used, even though a nice
> generalization,
> >> why keep it around?
> >
> > My preference would be to remove that code since I don't
> envision anyone
> > writing tests to ensure that functionality works and/or doesn't
> > regress. But there's the challenge of not knowing if anyone is
> actually
> > relying on that behavior. So initially I'm not creating a
> specific work
> > item to remove it. But I think it needs to be made clear that
> it's not
> > officially supported and may get removed unless a case is made for
> > keeping it and work is put into testing it.
>
> While I agree that N is a bit interesting, I have seen N=3 in
> production
>
> [central API]-->[state/region1]-->[state/region DC1]
> \->[state/region DC2]
> -->[state/region2 DC]
> -->[state/region3 DC]
> -->[state/region4 DC]
>
> I'm curious.
> What are the use cases for this deployment? Agreeably, root node runs
> n-api along with horizon, key management etc. What components are
> deployed in tier 2 and tier 3?
> And AFAIK, currently, openstack cell deployment isn't even a tree but
> DAG since, one cell can have multiple parents. Has anyone come up any
> such requirement?
>
>
While there's nothing to prevent a cell from having multiple parents I
would be curious to know if this would actually work in practice, since
I can imagine a number of cases that might cause problems. And is there
a practical use for this?
Maybe we should start logging a warning when this is setup stating that
this is an unsupported(i.e. untested) configuration to start to codify
the design as that of a tree. At least for the initial scope of work I
think this makes sense, and if a case is made for a DAG setup that can
be done independently.
>
> >>
> >>> Plan:
> >>>
> >>> Fix flavor breakage in child cell which causes boot tests to fail.
> >>> Currently the libvirt driver needs flavor.extra_specs which is not
> >>> synced to the child cell. Some options are to sync flavor and
> extra
> >>> specs to child cell db, or pass full data with the request.
> >>> https://review.openstack.org/#/c/126620/1 offers a means of
> passing full
> >>> data with the request.
> >>>
> >>> Determine proper switches to turn off Tempest tests for
> features that
> >>> don't work with the goal of getting a voting job. Once this
> is in place
> >>> we can move towards feature parity and work on internal
> refactorings.
> >>>
> >>> Work towards adding parity for host aggregates, security
> groups, and
> >>> server groups. They should be made to work in a single cell
> setup, but
> >>> the solution should not preclude them from being used in multiple
> >>> cells. There needs to be some discussion as to whether a host
> aggregate
> >>> or server group is a global concept or per cell concept.
> >>>
> >> Have there been any previous discussions on this topic? If so
> I'd really
> >> like to read up on those to make sure I understand the pros and
> cons
> >> before the summit session.
> >
> > The only discussion I'm aware of is some comments on
> > https://review.openstack.org/#/c/59101/ , though they mention a
> > discussion at the Utah mid-cycle.
> >
> > The main con I'm aware of for defining these as global concepts
> is that
> > there is no rescheduling capability in the cells scheduler. So if a
> > build is sent to a cell with a host aggregate that can't fit that
> > instance the build will fail even though there may be space in
> that host
> > aggregate from a global perspective. That should be somewhat
> > straightforward to address though.
> >
> > I think it makes sense to define these as global concepts. But
> these
> > are features that aren't used with cells yet so I haven't put a
> lot of
> > thought into potential arguments or cases for doing this one way or
> > another.
> >
>
> Keeping aggregates local also poses problem in case when cells are
> temporarily dead (out of system). Since top level doesn't have any
> idea about local features including who all to contact for deletion of
> a particular aggregate.
>
> >
> >>> Work towards merging compute/api.py and compute/cells_api.py
> so that
> >>> developers only need to make changes/additions in once place.
> The goal
> >>> is for as much as possible to be hidden by the RPC layer,
> which will
> >>> determine whether a call goes to a compute/conductor/cell.
> >>>
> >>> For syncing data between cells, look at using objects to
> handle the
> >>> logic of writing data to the cell/parent and then syncing the
> data to
> >>> the other.
> >>>
> >> Some of that work has been done already, although in a somewhat
> ad-hoc
> >> fashion, were you thinking of extending objects to support this
> natively
> >> (whatever that means), or do we continue to inline the code in the
> >> existing object methods.
> >
> > I would prefer to have some native support for this. In general
> data is
> > considered authoritative at the global level or the cell level. For
> > example, instance data is synced down from the global level to a
> > cell(except for a few fields which are synced up) but a
> migration would
> > be synced up. I could imagine decorators that would specify how
> data
> > should be synced and handle that as transparently as possible.
> >
> >>
> >>> A potential migration scenario is to consider a non cells
> setup to be a
> >>> child cell and converting to cells will mean setting up a
> parent cell
> >>> and linking them. There are periodic tasks in place to sync
> data up
> >>> from a child already, but a manual kick off mechanism will
> need to be
> >>> added.
> >>>
> >>>
> >>> Future plans:
> >>>
> >>> Something that has been considered, but is out of scope for
> now, is that
> >>> the parent/api cell doesn't need the same data model as the
> child cell.
> >>> Since the majority of what it does is act as a cache for API
> requests,
> >>> it does not need all the data that a cell needs and what data
> it does
> >>> need could be stored in a form that's optimized for reads.
> >>>
> >>>
> >>> Thoughts?
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141022/2ff2c422/attachment.html>
More information about the OpenStack-dev
mailing list