Hi Sean, Thanks, as always, for your reply!
hi up until recently the advice from our virt team was that iotread where not really needed for openstack howver in the last 6 weeks they have actully asked us to consider enabling them
I don't have the data to know whether iothread improves performance or not. Rather, I made the assumption that a dedicated core for I/O would likely perform much better than without. If someone has any data on this, it would be extremely useful. The issue we are trying to resolve is related to high-speed local storage performance that is literally 10x, and sometimes 15x, slower in a VM than the host. The local storage can reach upwards of 8GiB/sec and 1 million IOPS. It's not necessarily throughput we're after, though - it is latency, and the high latency in QEMU/KVM is simply too high to get adequate storage performance inside a VM. If iothread(s) do not help, then the point of implementing the parameter in Nova is probably moot.
so work will be happening in qemu/libvirt to always create at least one iothread going forward and affinies it to the same set of cores as the emulator threads by default.
That sounds like a good idea, although I did read somewhere in the QEMU docs that not all drivers support iothreads, and trying to use them with unsupported drivers will likely crash QEMU - but I don't know how old those docs were. It seems reasonable since the "big QEMU lock" is not being used for the io thread(s).
we dont have a downstream rfe currently filed for ioithread specifically but we do virtio scsi multi queue support https://bugzilla.redhat.com/show_bug.cgi?id=1880273
I found this old blueprint and implementation (that apparently was never accepted due to tests failing in various environments): https://blueprints.launchpad.net/nova/+spec/libvirt-iothreads https://review.opendev.org/c/openstack/nova/+/384871/
to do together. my understanding is that without iothread multi queue virtio scsi does not provide as much of a perfromace boost as with io threads.
I can imagine that being the case - since a spinning loop has super-low latency compared to an interrupt.
if you our other have capasity to work on this i would be happy to work on a spec with ye to enable it.
I wish I had the bandwidth to learn how, but since I'm not a Python developer, nor have a development environment ready to go (I'm mostly performing cloud operator and business support functions), I probably couldn't help much other than provide feedback.
effectivly what i was plannign to propose if we got around to it is adding a new config option cpu_iothread_set which would default to the same value as cpu_share_set. this effectivly will ensure that witout any config updates all existing deployment will start benifiting form iothreads and allow you to still dedicate a set of cores to running the iothread seperate form the cpu_share_set if you wasnt this to also benifit floating vms not just pinned vms.
I would first suggest asking the QEMU folks whether there are incompatibilities with iothreads with storage drivers that could cause issues by enabling iothreads by default. I suggest a more cautionary approach and leave the default as-is and allow a user to enable iothreads themselves. The default could always be changed later if there isn't any negative feedback from those who tried using iothreads.
in addtion to that a new flavor extra spec/image property woudl be added similar to cpu_emultor_threads.
im not quite sure how that extra spec should work but either hw:cpu_iotread_policy woudl either support the same vales as hw:cpu_emulator_threads where hw:cpu_iotread_policy=shared woudl allocate an iotread that floats over the cpu_iothread_set (which is the same as cpu_shared_set by default) and hw:cpu_iotread_policy=isolate would allocate an addtional iothread form the cpu_dedicated_set. hw:cpu_iotread_policy=share woudl be the default behavior if cpu_shared_set or cpu_iothread_set was defined in the config and not flavor extra spec or image property was defiend. basically all vms woudl have at least 1 iothread that floated over teh shared pool if a share pool was configured on the host.
I will have to review this more carefully, when I have a bit more time.
that is option a option b woudl be to allso support
hw:cpu_iotread_count so you could ask for n iothread eitehr form the shared/iothread set or dedicated set depending on the value of hw:cpu_iotread_policy
im not really sure if there is a need for more the 1 io thread. my understanding is that once you have at least 1 there is demising retruns. it will improve your perfoamce if you have more propvided you have multiple disks/volumes attached but not as much as having the initall iotread.
I would guess that multiple io threads would benefit multiple VMs, where each VM would use its own I/O thread/dedicated core. So, I think providing the possibility for multiple iothreads should be considered, with assignment of these threads to individual VMs. However, this brings up a significantly more complex resource allocation requirement, much less resource allocation during live migration.
is this something you wold be willing to work on and implement? i woudl be happy to review any spec in this areay and i can bring it up downstream again but i cant commit to working on this in the z release. this would require some minor rpc chagnes to ensure live migrate work properly as the iothread set or cpu share set could be different on different hosts. but beyond that the feature is actully pretty simple to enable.
I think we need to do some testing to prove the performance benefits first - before spending the time to implement.
no there is no way to enable them out of band of nova today. you technially could wrap the qemu binary wiht a script that inject parmaters but that obviously would not be supported upstream. but that would be a workaround if you really needed it
https://review.opendev.org/c/openstack/devstack/+/817075 is an exmaple of such a script that break apparmor and selinx but you could proably make it work with enough effort. although i woudl sugess just implemeting the feature upstream and downing a downstream backport instead.
Interesting - maybe I can hack this for testing and proof-of-concept purposes. Thanks for the suggestion! I'll see if we can figure out how to test iothreads in our environment where the high-speed local storage exists. Eric