[openstack-dev] [ironic] The scenary to rolling upgrade Ironic
jim at jimrollenhagen.com
Wed Oct 14 22:57:57 UTC 2015
On Wed, Oct 14, 2015 at 08:44:08AM +0000, Tan, Lin wrote:
> Hi guys,
> I am looking at https://bugs.launchpad.net/ironic/+bug/1502903 which
> is related to rolling upgrade and here is Jim's patch
> https://review.openstack.org/#/c/234450 I really have a concern or
> question about how to do Ironic doing rolling upgrades. It might be my
> mistake, but I would like to discuss here and get some feedback.
> I manually did a rolling upgrade for a private OpenStack Cloud before.
> There are three main tasks for upgrade: 1. upgrade the code of
> service. 2. change configuration. 3. the upgrade of DB Schema in DB,
> which is the most difficult and time-consuming part.
> The current rolling upgrade solution or live upgrade are highly
> depends on upgrade different services in place one-by-one while make
> new service A can still communicate with old service B. The ideal
> case is after we upgrade one of the services, others can still work
> without break. This is can be done by using versionedobject and RPC
> version. For example, new Nova-API and new Nova-conductor can talk to
> old Nova-compute. In the case of Nova services, it was suggests to
> follow below steps:
> 1. expand DB schema
> 2. pin RPC versions and object version at current
> 3. upgrade all nova-conductor servers because it will talk with DB
> 4. upgrade all nova services on controller nodes like nova-api
> 5. upgrade all nova-compute nodes
> 6. unpin RPC versions
> 7. shrink DB schema.
> This is perfect for Nova. Because it has many
> nova-compute nodes, and few nova-conductor nodes and nova-api nodes.
> It's not necessary to upgrade nova-compute services at one time, which
> is time consuming.
> For Ironic, we only have ir-conductor and ir-api. So the question is
> should we upgrade ir-conductor first or ir-api? In my opinion, the
> ideal case is that we can have old ir-conductor and new ir-conductors
> coexist, which means we should upgrade ir-api to latest at first. But
> it's impossible at the moment, because ir-conductor will talk to DB
> directly and we only have one DB schema. That's a large difference
> between Ironic and Nova. We are missing a layer like nova-conductor.
> The second case is upgrade ir-conductors first. That means if we
> upgrade the DB Schema, we have to upgrade all ir-conductors at once.
> During the upgrade, we could not provide Ironic service at all.
> So I would suggest to stop all Ironic service, and upgrade ir-api
> first, and then upgrade ir-conductor one by one. Only enable the
> ir-conductor which has done the upgrade. Or upgrade ir-api and
> ir-conductors at once, although it sounds stupid a little bit.
Hey Tan, thanks for bringing this up.
I've been thinking about this stuff a lot lately, and I'd like us to get
it working during the Mitaka cycle, so deployers can do a rolling
upgrade from Liberty to Mitaka.
Conductors will always need to talk to the database. APIs may not need
to talk to the database. I think we can just roll conductor
upgrades through, and then update ironic-api after that. This should
just work, as long as we're very careful about schema changes (this is
where the expand/contract thing comes into play). Different versions of
conductors are only a problem if the database schema is not compatible
with one of the versions.
We also need to remote the objects layer to the conductor from the api
service, so that the API service is no longer talking to the DB. And
allow RPC version pinning.
Beyond that, I think the Nova model should work fine for us. There's
some work to do in our objects layer, and then lots of documentation for
developers, reviewers, and deployers. I think it's totally reasonable to
complete this during Mitaka, though.
I opened this blueprint yesterday to track this work. I'd like to get
the developer/reviewer docs done first, so we don't accidentally land
any changes that break assumptions here (for example, the bug you linked
before). Is this something you're willing to take the lead on?
More information about the OpenStack-dev