On Thu, Mar 19, 2020, at 1:25 AM, Arnaud Morin wrote:
Hey Melanie, all,
About OVH case (company I work for). We are digging into the issue.
First thing, we do not limit anymore the IOPS. I dont remember when we removed this limit, but this is not new.
However, the hypervisor are quite old now, and our policy on this old servers was to use some swap. And we think that the host may slow down when overcommitting on RAM (swapping on disk).
Anyway, we also know that we can have better latency when upgrading QEMU. We are currently in the middle of testing a new QEMU version, I will push to upgrade your hypervisors first, so we will see if the latency on QEMU side can help the gate.
Finally, we plan to change the hardware and stop doing overcommit on RAM (and swapping on disk). However, I have no ETA about that, but for sure, this will improve the IOPS.
You all likely know far more about this than I do, but our use case is likely ideal for kernel same page merging because we boot a relatively small number of identical images that rotate relatively slowly (24 hours). Turning that on, if not already, could potentially reduce memory pressure.
I'll keep you in touch.
Cheers,
-- Arnaud Morin