[openstack-dev] [Nova] Cells conversation starter
Tom Fifield
tom at openstack.org
Wed Oct 22 04:24:02 UTC 2014
On 22/10/14 03:07, Andrew Laski wrote:
>
> On 10/21/2014 04:31 AM, Nikola Đipanov wrote:
>> On 10/20/2014 08:00 PM, Andrew Laski wrote:
>>> One of the big goals for the Kilo cycle by users and developers of the
>>> cells functionality within Nova is to get it to a point where it can be
>>> considered a first class citizen of Nova. Ultimately I think this comes
>>> down to getting it tested by default in Nova jobs, and making it easy
>>> for developers to work with. But there's a lot of work to get there.
>>> In order to raise awareness of this effort, and get the conversation
>>> started on a few things, I've summarized a little bit about cells and
>>> this effort below.
>>>
>>>
>>> Goals:
>>>
>>> Testing of a single cell setup in the gate.
>>> Feature parity.
>>> Make cells the default implementation. Developers write code once and
>>> it works for cells.
>>>
>>> Ultimately the goal is to improve maintainability of a large feature
>>> within the Nova code base.
>>>
>> Thanks for the write-up Andrew! Some thoughts/questions below. Looking
>> forward to the discussion on some of these topics, and would be happy to
>> review the code once we get to that point.
>>
>>> Feature gaps:
>>>
>>> Host aggregates
>>> Security groups
>>> Server groups
>>>
>>>
>>> Shortcomings:
>>>
>>> Flavor syncing
>>> This needs to be addressed now.
>>>
>>> Cells scheduling/rescheduling
>>> Instances can not currently move between cells
>>> These two won't affect the default one cell setup so they will be
>>> addressed later.
>>>
>>>
>>> What does cells do:
>>>
>>> Schedule an instance to a cell based on flavor slots available.
>>> Proxy API requests to the proper cell.
>>> Keep a copy of instance data at the global level for quick retrieval.
>>> Sync data up from a child cell to keep the global level up to date.
>>>
>>>
>>> Simplifying assumptions:
>>>
>>> Cells will be treated as a two level tree structure.
>>>
>> Are we thinking of making this official by removing code that actually
>> allows cells to be an actual tree of depth N? I am not sure if doing so
>> would be a win, although it does complicate the RPC/Messaging/State code
>> a bit, but if it's not being used, even though a nice generalization,
>> why keep it around?
>
> My preference would be to remove that code since I don't envision anyone
> writing tests to ensure that functionality works and/or doesn't
> regress. But there's the challenge of not knowing if anyone is actually
> relying on that behavior. So initially I'm not creating a specific work
> item to remove it. But I think it needs to be made clear that it's not
> officially supported and may get removed unless a case is made for
> keeping it and work is put into testing it.
While I agree that N is a bit interesting, I have seen N=3 in production
[central API]-->[state/region1]-->[state/region DC1]
\->[state/region DC2]
-->[state/region2 DC]
-->[state/region3 DC]
-->[state/region4 DC]
>>
>>> Plan:
>>>
>>> Fix flavor breakage in child cell which causes boot tests to fail.
>>> Currently the libvirt driver needs flavor.extra_specs which is not
>>> synced to the child cell. Some options are to sync flavor and extra
>>> specs to child cell db, or pass full data with the request.
>>> https://review.openstack.org/#/c/126620/1 offers a means of passing full
>>> data with the request.
>>>
>>> Determine proper switches to turn off Tempest tests for features that
>>> don't work with the goal of getting a voting job. Once this is in place
>>> we can move towards feature parity and work on internal refactorings.
>>>
>>> Work towards adding parity for host aggregates, security groups, and
>>> server groups. They should be made to work in a single cell setup, but
>>> the solution should not preclude them from being used in multiple
>>> cells. There needs to be some discussion as to whether a host aggregate
>>> or server group is a global concept or per cell concept.
>>>
>> Have there been any previous discussions on this topic? If so I'd really
>> like to read up on those to make sure I understand the pros and cons
>> before the summit session.
>
> The only discussion I'm aware of is some comments on
> https://review.openstack.org/#/c/59101/ , though they mention a
> discussion at the Utah mid-cycle.
>
> The main con I'm aware of for defining these as global concepts is that
> there is no rescheduling capability in the cells scheduler. So if a
> build is sent to a cell with a host aggregate that can't fit that
> instance the build will fail even though there may be space in that host
> aggregate from a global perspective. That should be somewhat
> straightforward to address though.
>
> I think it makes sense to define these as global concepts. But these
> are features that aren't used with cells yet so I haven't put a lot of
> thought into potential arguments or cases for doing this one way or
> another.
>
>
>>> Work towards merging compute/api.py and compute/cells_api.py so that
>>> developers only need to make changes/additions in once place. The goal
>>> is for as much as possible to be hidden by the RPC layer, which will
>>> determine whether a call goes to a compute/conductor/cell.
>>>
>>> For syncing data between cells, look at using objects to handle the
>>> logic of writing data to the cell/parent and then syncing the data to
>>> the other.
>>>
>> Some of that work has been done already, although in a somewhat ad-hoc
>> fashion, were you thinking of extending objects to support this natively
>> (whatever that means), or do we continue to inline the code in the
>> existing object methods.
>
> I would prefer to have some native support for this. In general data is
> considered authoritative at the global level or the cell level. For
> example, instance data is synced down from the global level to a
> cell(except for a few fields which are synced up) but a migration would
> be synced up. I could imagine decorators that would specify how data
> should be synced and handle that as transparently as possible.
>
>>
>>> A potential migration scenario is to consider a non cells setup to be a
>>> child cell and converting to cells will mean setting up a parent cell
>>> and linking them. There are periodic tasks in place to sync data up
>>> from a child already, but a manual kick off mechanism will need to be
>>> added.
>>>
>>>
>>> Future plans:
>>>
>>> Something that has been considered, but is out of scope for now, is that
>>> the parent/api cell doesn't need the same data model as the child cell.
>>> Since the majority of what it does is act as a cache for API requests,
>>> it does not need all the data that a cell needs and what data it does
>>> need could be stored in a form that's optimized for reads.
>>>
>>>
>>> Thoughts?
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list