[openstack-dev] [nova][neutron] massive overhead processing "network-changed" events during live migration
Chris Friesen
chris.friesen at windriver.com
Fri May 19 18:40:25 UTC 2017
Recently we noticed failures in Newton when we attempted to live-migrate an
instance with 16 vifs. We tracked it down to an RPC timeout in nova which timed
out waiting for the 'refresh_cache-%s' lock in get_instance_nw_info(). This led
to a few other discoveries.
First, we have no fair locking in OpenStack. The live migration code path was
waiting for the lock, but the code processing the incoming "network-changed"
events kept getting the lock instead even though they arrived while the live
migration code was already blocked waiting for the lock.
Second, it turns out the cost of processing the "network-changed" events is
astronomical.
1) In Newton nova commit 5de902a was merged to fix evacuate bugs, but it meant
both source and dest compute nodes got the "network-changed" events. This
doubled the number of neutron API calls during a live migration.
2) A "network-changed" event is sent from neutron each time something changes.
There are multiple of these events for each vif during a live-migration. In the
current upstream code the only information passed with the event is the instance
id, so nova will loop over all the ports in the instance and build up all the
information about subnets/floatingIP/fixedIP/etc. for that instance. This
results in O(N^2) neutron API calls where N is the number of vifs in the instance.
3) mriedem has proposed a patch series (https://review.openstack.org/#/c/465783
and https://review.openstack.org/#/c/465787) that would change neutron to
include the port ID, and allow nova to update just that port. This reduces the
cost to O(N), but it's still significant.
In a hardware lab with 4 compute nodes I created 4 boot-from-volume instances,
each with 16 vifs. I then live-migrated them all in parallel. (The one on
compute-0 was migrated to compute-1, the one on compute-1 was migrated to
compute-2, etc.) The aggregate CPU usage for a few critical components on the
controller node is shown below. Note in particular the CPU usage for
neutron--it's using most of 10 CPUs for ~10 seconds, spiking to 13 CPUs. This
seems like an absurd amount of work to do just to update the cache in nova.
Labels:
L0: neutron-server
L1: nova-conductor
L2: beam.smp
L3: postgres
- - - L0 L1 L2 L3
date time dt occ occ occ occ
yyyy/mm/dd hh:mm:ss.dec (s) (%) (%) (%) (%)
2017-05-19 17:51:38.710 2.173 19.75 1.28 2.85 1.96
2017-05-19 17:51:40.012 1.302 1.02 1.75 3.80 5.07
2017-05-19 17:51:41.334 1.322 2.34 2.66 5.25 1.76
2017-05-19 17:51:42.681 1.347 91.79 3.31 5.27 5.64
2017-05-19 17:51:44.035 1.354 40.78 7.27 3.48 7.34
2017-05-19 17:51:45.406 1.371 7.12 21.35 8.66 19.58
2017-05-19 17:51:46.784 1.378 16.71 196.29 6.87 15.93
2017-05-19 17:51:48.133 1.349 18.51 362.46 8.57 25.70
2017-05-19 17:51:49.508 1.375 284.16 199.30 4.58 18.49
2017-05-19 17:51:50.919 1.411 512.88 17.61 7.47 42.88
2017-05-19 17:51:52.322 1.403 412.34 8.90 9.15 19.24
2017-05-19 17:51:53.734 1.411 320.24 5.20 10.59 9.08
2017-05-19 17:51:55.129 1.396 304.92 2.27 10.65 10.29
2017-05-19 17:51:56.551 1.422 556.09 14.56 10.74 18.85
2017-05-19 17:51:57.977 1.426 979.63 43.41 14.17 21.32
2017-05-19 17:51:59.382 1.405 902.56 48.31 13.69 18.59
2017-05-19 17:52:00.808 1.425 1140.99 74.28 15.12 17.18
2017-05-19 17:52:02.238 1.430 1013.91 69.77 16.46 21.19
2017-05-19 17:52:03.647 1.409 964.94 175.09 15.81 27.23
2017-05-19 17:52:05.077 1.430 838.15 109.13 15.70 34.12
2017-05-19 17:52:06.502 1.425 525.88 79.09 14.42 11.09
2017-05-19 17:52:07.954 1.452 614.58 38.38 12.20 17.89
2017-05-19 17:52:09.380 1.426 763.25 68.40 12.36 16.08
2017-05-19 17:52:10.825 1.445 901.57 73.59 15.90 41.12
2017-05-19 17:52:12.252 1.427 966.15 42.97 16.76 23.07
2017-05-19 17:52:13.702 1.450 902.40 70.98 19.66 17.50
2017-05-19 17:52:15.173 1.471 1023.33 59.71 19.78 18.91
2017-05-19 17:52:16.605 1.432 1127.04 64.19 16.41 26.80
2017-05-19 17:52:18.046 1.442 1300.56 68.22 16.29 24.39
2017-05-19 17:52:19.517 1.471 1055.60 71.74 14.39 17.09
2017-05-19 17:52:20.983 1.465 845.30 61.48 15.24 22.86
2017-05-19 17:52:22.447 1.464 1027.33 65.53 15.94 26.85
2017-05-19 17:52:23.919 1.472 1003.08 56.97 14.39 28.93
2017-05-19 17:52:25.367 1.448 702.50 45.42 11.78 20.53
2017-05-19 17:52:26.814 1.448 558.63 66.48 13.22 29.64
2017-05-19 17:52:28.276 1.462 620.34 206.63 14.58 17.17
2017-05-19 17:52:29.749 1.473 555.62 110.37 10.95 13.27
2017-05-19 17:52:31.228 1.479 436.66 33.65 9.00 21.55
2017-05-19 17:52:32.685 1.456 417.12 87.44 13.44 12.27
2017-05-19 17:52:34.128 1.443 368.31 87.08 11.95 14.70
2017-05-19 17:52:35.558 1.430 171.66 11.67 9.28 13.36
2017-05-19 17:52:36.976 1.417 231.82 10.57 7.03 14.08
2017-05-19 17:52:38.413 1.438 241.14 77.78 6.86 15.34
2017-05-19 17:52:39.827 1.413 85.01 63.72 5.85 14.01
2017-05-19 17:52:41.200 1.373 3.31 3.43 7.18 1.78
2017-05-19 17:52:42.556 1.357 60.68 2.94 6.51 6.16
2017-05-19 17:52:44.019 1.463 24.23 5.94 3.45 3.15
2017-05-19 17:52:45.376 1.356 0.93 3.91 5.13 0.83
2017-05-19 17:52:46.699 1.323 7.68 4.12 5.43 0.45
2017-05-19 17:52:48.033 1.334 5.85 1.70 6.00 1.91
2017-05-19 17:52:49.373 1.341 66.28 2.37 4.49 16.40
2017-05-19 17:52:50.715 1.342 31.67 3.03 3.66 6.91
2017-05-19 17:52:52.023 1.308 2.80 2.35 3.30 10.76
2017-05-19 17:52:53.330 1.307 6.94 5.78 3.25 2.30
2017-05-19 17:52:54.699 1.368 3.11 2.67 8.34 1.01
2017-05-19 17:52:56.049 1.351 23.14 2.28 2.83 5.30
2017-05-19 17:52:57.434 1.384 46.86 5.02 6.27 11.93
2017-05-19 17:52:58.803 1.370 3.78 10.26 3.08 2.08
2017-05-19 17:53:00.206 1.403 66.09 8.20 4.07 1.27
2017-05-19 17:53:01.542 1.336 63.71 9.70 3.17 4.89
2017-05-19 17:53:02.855 1.312 21.53 3.99 4.33 5.03
It seems like it should be possible to reduce the amount of work involved here.
One possibility would be to partially revert nova commit 5de902a for the
live-migration case, since it seemed to work fine in Mitaka and earlier.
Another possibility would be to include additional information about what
changed in the "network-changed" event, which would reduce the amount of queries
that nova would need to do.
In a larger cloud this is going to cause major issues for NFV-type workloads,
where instances having many vifs is going to be relatively common.
Chris
More information about the OpenStack-dev
mailing list