[Openstack] kvm suspend vs. the kernel

Andrew Bogott abogott at wikimedia.org
Fri Jul 31 18:25:56 UTC 2015


Most of my virt nodes run the standard Trusty kernel, 3.13.0-52-generic 
or similar.  Recently I had cause to shut down one of them, so I started 
by running a scripted 'nova suspend' of all instances.  A couple of 
instances into the script, the kernel locked up and the whole system 
died.  Further investigation on a test node confirmed:  Anytime I 
suspended more than a couple of instances, the entire system was a goner 
and required a reboot.

So... today I've been investigating alternative kernels.  My first 
attempt was 3.19.  With 3.19 I can suspend and resume instances as much 
as I want, and the server stays up.  But, once an instance is resumed 
its clock is garbled.  A simple 'sleep 1' on a resumed instance causes 
it to hang forever.

The sweet spot is in the middle.  3.16 doesn't hang with suspend/resume, 
and the instances actually work once resumed.  So, I have my solution.

What gives?  Is suspend/resume just generally considered harmful? Am I 
encountering a nasty hardware interaction such that these kernels work 
for others?

Issues like this make me think that I'm the only person in the world who 
is actually using this stuff :(

-A





More information about the Openstack mailing list