[openstack-dev] [nova][neutron] Migration from nova-network to Neutron for large production clouds

Joe Harrison joehazzers at gmail.com
Fri Aug 29 14:12:42 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 27/08/14 12:59, Tim Bell wrote:
>> -----Original Message----- From: Michael Still
>> [mailto:mikal at stillhq.com] Sent: 26 August 2014 22:20 To:
>> OpenStack Development Mailing List (not for usage questions) 
>> Subject: Re: [openstack-dev] [nova][neutron] Migration from
>> nova-network to Neutron for large production clouds
> ...
>> 
>> Mark and I finally got a chance to sit down and write out a basic
>> proposal. It looks like this:
>> 
> 
> Thanks... I've put a few questions inline and I'll ask the experts
> to review the steps when they're back from holidays
> 
>> == neutron step 0 == configure neutron to reverse proxy calls to
>> Nova (part to be written)
>> 
>> == nova-compute restart one == Freeze nova's network state
>> (probably by stopping nova-api, but we could be smarter than that
>> if required) Update all nova-compute nodes to point Neutron and
>> remove nova-net agent for Neutron Nova aware L2 agent Enable
>> Neutron Layer 2 agent on each node, this might have the side
>> effect of causing the network configuration to be rebuilt for
>> some instances API can be unfrozen at this time until ready for
>> step 2
>> 
> 
> - Would it be possible to only update some of the compute nodes ?
> We'd like to stage the upgrade if we can in view of scaling risks.
> Worst case, we'd look to do it cell by cell but those are quite
> large already (200+ hypervisors)

I have a few what-ifs when comes to this:-

- - What if the migration fails halfway through? How do we administrate
nova in this situation?

Unfortunately Tim, last time I checked Neutron has no awareness of
Nova's cells (and only "recently" became aware of nova regions) so I
don't see how this would be taken into account for a migration.

> 
>> == neutron restart two == Freeze nova's network state (probably
>> by stopping nova-api, but we could be smarter than that if
>> required) Dump/translate/restore date from Nova-Net to Neutron
>> Configure Neutron to point to its own database Unfreeze Nova API
>> 

I think it's a good idea to be smarter.

> 
> - Linked with the point above, we'd like to do the nova-net to
> neutron in stages if we can

Again, this sounds like a nightmare if it fails. This sounds like it's
meant to be one big transaction, but it is anything but.

For this to be done safely in a production cloud (which is one of the
few reasons to actually do a replacement instead of just swapping out
the component), we need to be able to run Neutron and Nova-net at the
same time or it *does* have to become a transactional migration.

If the migration fails at some stage, you're left in limbo. Does Nova
work? Does Neutron work?

There needs to be some sort of fault tolerance or rollback feature if
you're going down the "all or nothing" approach to stop a cloud being
left in an inconsistent (and impossible to administrate or operate via
APIs) state.

If the two of them (Nova-network and Neutron) could both exist and
operate at the same time in a cloud, it wouldn't have to be a one-shot
migration. If some nodes fail, that's fine as you could just let them
fall back to Nova-net and fix them whilst your cloud still works and
more importantly nova-api is up and running.

> 
>> *** Stopping point for linuxbridge to linuxbridge translation, or
>> continue for rollout of new tech
>> 
>> == nova-compute restart two == Configure OVS or new technology,
>> ensure that proper ML2 driver is installed Restart Layer2 agent
>> on each hypervisor where next gen networking should be enabled
>> 
>> 
>> So, I want to stop using the word "cold" to describe this. Its
>> more of a rolling upgrade than a cold migration. So... Would two
>> shorter nova API outages be acceptable?
>> 
> 
> Two Nova API outages would be OK for us.

I think the Nova API outages are the least concern in comparison to
being left in a "halfway" state in a production environment. Hopefully
these concerns can be addresses.

> 
>> Michael
>> 
>> -- Rackspace Australia

Whilst I wholeheartedly agree that this migration plan seems like a
good idea (and reminds me of an Raiders of the Lost Ark-esque scene),
I'm afraid of what would happen if something went wrong in the middle
of this swap.

It wouldn't be a good idea to stop nova-api to fix this, as users and
services would be able to use it again.

Perhaps we should change the policy on nova-api during this migration
to only allow access to a special "migration" role or the like? This
would disable services or users from accessing Nova's api when a
special policy is applied for the migration, but allow administrators
to continue monitoring via the API and fix any problems. This seems
like a currently absent must-have.

I like the idea of the migration, but I hope that any and all "what
if?" questions have been addressed and the problems are mitigated.

I wish you and Mark lots of luck with this migration, but please make
sure it's not fragile and ensure it's fault tolerant!

Cheers,
Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUAIpaAAoJEHYEICnOV08jDrMQALq9oqx1Qj9j5AKNEPdofA+M
jIKW5i0tPFRe+eCfhM3yozGroKDldUcvKGUaZ8B5FkkHT5n959NGhOjIxcxaCkOb
rDU5+LaIQG9QBKK4mAvzfE1D8KxhbfM/xmiBBDhWjl96+HxGUusHwtxlmPgHdK44
mRxcl63HxvAX2IC2XL8ZJ9Qew/LsT+rf8xSfD2MA6xgas2e6rBSxEOTLb6GUxvBQ
DMWC8KlZthkLjLec+cBTwoQDB8nR2q1YW+qW3mCj8tp0HPYQhqagDwh6p329PvWq
2u3mwjsyzYrLi7FBw//VT188WKFwMC1opkXfk01mMZDt7FMVxzAM+oqGxdoBytfu
PxcesrQlVjbxhXroEZmArXQVuDOwPrsKq6yykkeFjsq1ybjBNZvA123BjHSMstAH
kqKBlbSgrBb0BaKhDZ5AKYldeYoIjkclMfL/2lafsm6ciwh+5B6JtImiuOVTg2+R
FDUJ1m3//7+fqOf4Qb33srCJsZhn8/3vZhmfdC1X5dVAma4mZXllsa9sk5dAHEXV
v50UdLKfjKHkDRmLsLWiodoC1KlL1EB93bo5zs0WjUkjzp1Mvc3uPa92PjcHnTzX
kNNWC9cMd7vcdsdqoqw3fM8vdsREWAbdN5XpLV2m2U1f87TbHwDqwhi2uApbNvDm
Gu1xzz62ohOQaYx8zRAC
=M7s1
-----END PGP SIGNATURE-----



More information about the OpenStack-dev mailing list