[Openstack] kvm suspend vs. the kernel
Andrew Bogott
abogott at wikimedia.org
Fri Jul 31 18:25:56 UTC 2015
Most of my virt nodes run the standard Trusty kernel, 3.13.0-52-generic
or similar. Recently I had cause to shut down one of them, so I started
by running a scripted 'nova suspend' of all instances. A couple of
instances into the script, the kernel locked up and the whole system
died. Further investigation on a test node confirmed: Anytime I
suspended more than a couple of instances, the entire system was a goner
and required a reboot.
So... today I've been investigating alternative kernels. My first
attempt was 3.19. With 3.19 I can suspend and resume instances as much
as I want, and the server stays up. But, once an instance is resumed
its clock is garbled. A simple 'sleep 1' on a resumed instance causes
it to hang forever.
The sweet spot is in the middle. 3.16 doesn't hang with suspend/resume,
and the instances actually work once resumed. So, I have my solution.
What gives? Is suspend/resume just generally considered harmful? Am I
encountering a nasty hardware interaction such that these kernels work
for others?
Issues like this make me think that I'm the only person in the world who
is actually using this stuff :(
-A
More information about the Openstack
mailing list