[openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

Michael Still mikal at stillhq.com
Tue Aug 26 20:20:11 UTC 2014


On Tue, Aug 26, 2014 at 7:59 PM, Tim Bell <Tim.Bell at cern.ch> wrote:
>
>
> > From: Michael Still [mailto:mikal at stillhq.com]
> > Sent: 25 August 2014 23:38
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds
>
> ...
>
> > Mark McClain and I discussed a possible plan for nova-network to neutron upgrades at the Ops Meetup today, and it seemed generally acceptable. It defines a "cold migration" as
> > freezing the ability to create or destroy instances during the upgrade, and then requiring a short network outage for each instance in the cell.
> > This is why I'm trying to understand the "no downtime" use case better. Is it literally no downtime, ever? Or is it a more simple "no simultaneous downtime for instances"?
> > Michael
>
> The simultaneous downtime across the cloud is the one we really need to avoid. Short network outages (depending on how you define short) can be handled along with blocking API operations for short periods.
>
> The other item was how to stage the upgrade.. with a cloud of a significant size and some concerns about scalability, we would like to be able to do the migration as a set of steps rather than a big bang. During the gap between the steps, we'd like to open the APIs for usage, such as new VMs get created on Neutron hypervisors. Would that be a possibility ?

Mark and I finally got a chance to sit down and write out a basic
proposal. It looks like this:

== neutron step 0 ==
configure neutron to reverse proxy calls to Nova (part to be written)

== nova-compute restart one ==
Freeze nova's network state (probably by stopping nova-api, but we
could be smarter than that if required)
Update all nova-compute nodes to point Neutron and remove nova-net
agent for Neutron Nova aware L2 agent
Enable Neutron Layer 2 agent on each node, this might have the side
effect of causing the network configuration to be rebuilt for some
instances
API can be unfrozen at this time until ready for step 2

== neutron restart two ==
Freeze nova's network state (probably by stopping nova-api, but we
could be smarter than that if required)
Dump/translate/restore date from Nova-Net to Neutron
Configure Neutron to point to its own database
Unfreeze Nova API

*** Stopping point for linuxbridge to linuxbridge translation, or
continue for rollout of new tech

== nova-compute restart two ==
Configure OVS or new technology, ensure that proper ML2 driver is installed
Restart Layer2 agent on each hypervisor where next gen networking
should be enabled


So, I want to stop using the word "cold" to describe this. Its more of
a rolling upgrade than a cold migration. So... Would two shorter nova
API outages be acceptable?

Michael

-- 
Rackspace Australia



More information about the OpenStack-dev mailing list