[nova][kolla] questions on cells

Bogdan Dobrelya bdobreli at redhat.com
Thu Oct 3 07:35:16 UTC 2019

On 01.10.2019 12:00, Mark Goddard wrote:
> Thanks all for your responses. Replies to Dan inline.
> On Mon, 30 Sep 2019 at 18:27, Dan Smith <dms at danplanet.com> wrote:
>>> 1. Is there any benefit to not having a superconductor? Presumably
>>> it's a little more efficient in the single cell case? Also IIUC it
>>> only requires a single message queue so is a little simpler?
>> In a multi-cell case you need it, but you're asking about the case where
>> there's only one (real) cell yeah?
>> If the deployment is really small, then the overhead of having one is
>> probably measurable and undesirable. I dunno what to tell you about
>> where that cut-off is, unfortunately. However, once you're over a
>> certain number of nodes, that probably shakes out a bit. The
>> superconductor does things that the cell-specific ones won't have to do,
>> so there's about the same amount of total load, just a potentially
>> larger memory footprint for running extra services, which would be
>> measurable at small scales. For a tiny deployment there's also overhead
>> just in the complexity, but one of the goals of v2 has always been to
>> get everyone on the same architecture, so having a "small mode" and a
>> "large mode" brings with it its own complexity.
> Thanks for the explanation. We've built in a switch for single or
> super mode, and single mode keeps us compatible with existing
> deployments, so I guess we'll keep the switch.
>>> 2. Do console proxies need to live in the cells? This is what devstack
>>> does in superconductor mode. I did some digging through nova code, and
>>> it looks that way. Testing with novncproxy agrees. This suggests we
>>> need to expose a unique proxy endpoint for each cell, and configure
>>> all computes to use the right one via e.g. novncproxy_base_url,
>>> correct?
>> I'll punt this to Melanie, as she's the console expert at this point,
>> but I imagine you're right.
>>> 3. Should I upgrade the superconductor or conductor service first?
>> Superconductor first, although they all kinda have to go around the same
>> time. Superconductor, like the regular conductors, needs to look at the
>> cell database directly, so if you were to upgrade superconductor before
>> the cell database you'd likely have issues. I think probably the ideal
>> would be to upgrade the db schema everywhere (which you can do without
>> rolling code), then upgrade the top-level services (conductor,
>> scheduler, api) and then you could probably get away with doing
>> conductor in the cell along with computes, or whatever. If possible
>> rolling the cell conductors with the top-level services would be ideal.
> I should have included my strawman deploy and upgrade flow for
> context, but I'm still honing it. All DB schema changes will be done
> up front in both cases.
> In terms of ordering, the API-level services (superconductor, API
> scheduler) are grouped together and will be rolled first - agreeing
> with what you've said. I think between Ansible's tags and limiting
> actions to specific hosts, the code can be written to support
> upgrading all cell conductors together, or at the same time as (well,
> immediately before) the cell's computes.
> The thinking behind upgrading one cell at a time is to limit the blast
> radius if something goes wrong. You suggest it would be better to roll
> all cell conductors at the same time though - do you think it's safer
> to run with the version disparity between conductor and computes
> rather than super- and cell- conductors?

I'd say upgrading one cell at a time may be in important consideration 
for EDGE (DCN) multi-cells deployments, where it may be technically 
impossible to roll it over all of the remote sites due to reasons.

>>> 4. Does the cell conductor need access to the API DB?
>> Technically it should not be allowed to talk to the API DB for
>> "separation of concerns" reasons. However, there are a couple of
>> features that still rely on the cell conductor being able to upcall to
>> the API database, such as the late affinity check. If you can only
>> choose one, then I'd say configure the cell conductors to talk to the
>> API DB, but if there's a knob for "isolate them" it'd be better.
> Knobs are easy to make, and difficult to keep working in all positions
> :) It seems worthwhile in this case.
>>> 5. What DB configuration should be used in nova.conf when running
>>> online data migrations? I can see some migrations that seem to need
>>> the API DB, and others that need a cell DB. If I just give it the API
>>> DB, will it use the cell mappings to get to each cell DB, or do I need
>>> to run it once for each cell?
>> The API DB has its own set of migrations, so you obviously need API DB
>> connection info to make that happen. There is no fanout to all the rest
>> of the cells (currently), so you need to run it with a conf file
>> pointing to the cell, for each cell you have. The latest attempt
>> at making this fan out was abanoned in July with no explanation, so it
>> dropped off my radar at least.
> That makes sense. The rolling upgrade docs could be a little clearer
> for multi-cell deployments here.
>>> 6. After an upgrade, when can we restart services to unpin the compute
>>> RPC version? Looking at the compute RPC API, it looks like the super
>>> conductor will remain pinned until all computes have been upgraded.
>>> For a cell conductor, it looks like I could restart it to unpin after
>>> upgrading all computes in that cell, correct?
>> Yeah.
>>> 7. Which services require policy.{yml,json}? I can see policy
>>> referenced in API, conductor and compute.
>> That's a good question. I would have thought it was just API, so maybe
>> someone else can chime in here, although it's not specific to cells.
> Yeah, unrelated to cells, just something I wondered while digging
> through our nova Ansible role.
> Here is the line that made me think policies are required in
> conductors: https://opendev.org/openstack/nova/src/commit/6d5fdb4ef4dc3e5f40298e751d966ca54b2ae902/nova/compute/api.py#L666.
> I guess this is only required for cell conductors though?
>> --Dan

Best regards,
Bogdan Dobrelya,
Irc #bogdando

More information about the openstack-discuss mailing list