[nova] providing a local disk to a Nova instance
Hi Nova team! tl;dr: we would like to contribute giving instances access to physical block devices directly on the compute hosts. Would this be accepted? Longer version: About 3 or 4 years ago, someone wrote a spec, so we'd be able to provide a local disk of a compute, directly to a VM to use. This was then rejected, because at the time, Cinder had the blockdevice drive, which was more or less achieving the same thing. Unfortunately, because nobody was maintaining the blockdevice driver in Cinder, and because there was no CI that could test it, the driver got removed. We've investigated how we could otherwise implement it, and one solution would be to use Cinder, but then we'd be going through an iSCSI export, which would drastically reduce performances. Another solution would be to manage KVM instances by hand, not touching anything to libvirt and/or OpenVSwitch, but then we would loose the ease of using the Nova API, so we would prefer to avoid this direction. So we (ie: employees in my company) need to ask the Nova team: would you consider a spec to do what was rejected before, since there's now no other good enough alternative? Our current goal is to be able to provide a disk directly to a VM, so that we could build Ceph clusters with an hyper-converged model (ie: storage hosted on the compute nodes). In this model, we wouldn't need live-migration of a VM with an attached physical block device (though the feature could be added on a later stage). Before we start investigating how this can be done, I need to know if this has at least some chances to be accepted or not. If there is, then we'll probably start an experimental patch locally, then write a spec to properly start this project. So please let us know. Cheers, Thomas Goirand (zigo)
IIUC one of our (Red Hat's) customer-facing folks brought us a similar question recently. In their case they wanted to use PCI passthrough to pass an NVMe disk to an instance. This is technically possible, but there would be major privacy concerns in a multi-tenant cloud, as Nova currently has no way of cleaning up a disk after a VM has left it, so either the guest OS would have to do it itself, or any subsequent VM using that disk would have access to all of the previous VM's data (this could be mitigated by full-disk encryption, though). Cleaning up disks after VM's would probably fall more within Cybor's scope... There's also the question of instance move operations like live and cold migrations - what happens to the passed-through disk in those cases? Does Nova have to copy it to the destination? I think those would be fairly easily addressable though (there are no major technical or political challenges, it's just a matter of someone writing the code and reviewing). The disk cleanup thing is going to be harder, I suspect - more politically than technically. It's a bit of a chicken and egg problem with Nova and Cyborg, at the moment. Nova can refuse features as being out of scope and punt them to Cyborg, but I'm not sure how production-ready Cyborg is... On Tue, Sep 1, 2020 at 10:41 AM Thomas Goirand <zigo@debian.org> wrote:
Hi Nova team!
tl;dr: we would like to contribute giving instances access to physical block devices directly on the compute hosts. Would this be accepted?
Longer version:
About 3 or 4 years ago, someone wrote a spec, so we'd be able to provide a local disk of a compute, directly to a VM to use. This was then rejected, because at the time, Cinder had the blockdevice drive, which was more or less achieving the same thing. Unfortunately, because nobody was maintaining the blockdevice driver in Cinder, and because there was no CI that could test it, the driver got removed.
We've investigated how we could otherwise implement it, and one solution would be to use Cinder, but then we'd be going through an iSCSI export, which would drastically reduce performances.
Another solution would be to manage KVM instances by hand, not touching anything to libvirt and/or OpenVSwitch, but then we would loose the ease of using the Nova API, so we would prefer to avoid this direction.
So we (ie: employees in my company) need to ask the Nova team: would you consider a spec to do what was rejected before, since there's now no other good enough alternative?
Our current goal is to be able to provide a disk directly to a VM, so that we could build Ceph clusters with an hyper-converged model (ie: storage hosted on the compute nodes). In this model, we wouldn't need live-migration of a VM with an attached physical block device (though the feature could be added on a later stage).
Before we start investigating how this can be done, I need to know if this has at least some chances to be accepted or not. If there is, then we'll probably start an experimental patch locally, then write a spec to properly start this project. So please let us know.
Cheers,
Thomas Goirand (zigo)
On 1 Sep 2020, at 17:15, Artom Lifshitz <alifshit@redhat.com> wrote:
IIUC one of our (Red Hat's) customer-facing folks brought us a similar question recently. In their case they wanted to use PCI passthrough to pass an NVMe disk to an instance. This is technically possible, but there would be major privacy concerns in a multi-tenant cloud, as Nova currently has no way of cleaning up a disk after a VM has left it, so either the guest OS would have to do it itself, or any subsequent VM using that disk would have access to all of the previous VM's data (this could be mitigated by full-disk encryption, though). Cleaning up disks after VM's would probably fall more within Cybor's scope...
There's also the question of instance move operations like live and cold migrations - what happens to the passed-through disk in those cases? Does Nova have to copy it to the destination? I think those would be fairly easily addressable though (there are no major technical or political challenges, it's just a matter of someone writing the code and reviewing).
The disk cleanup thing is going to be harder, I suspect - more politically than technically. It's a bit of a chicken and egg problem with Nova and Cyborg, at the moment. Nova can refuse features as being out of scope and punt them to Cyborg, but I'm not sure how production-ready Cyborg is...
Does the LVM pass through option help for direct attach of a local disk ? https://cloudnull.io/2017/12/nova-lvm-an-iop-love-story/ <https://cloudnull.io/2017/12/nova-lvm-an-iop-love-story/> Tim
On Tue, Sep 1, 2020 at 10:41 AM Thomas Goirand <zigo@debian.org> wrote:
Hi Nova team!
tl;dr: we would like to contribute giving instances access to physical block devices directly on the compute hosts. Would this be accepted?
Longer version:
About 3 or 4 years ago, someone wrote a spec, so we'd be able to provide a local disk of a compute, directly to a VM to use. This was then rejected, because at the time, Cinder had the blockdevice drive, which was more or less achieving the same thing. Unfortunately, because nobody was maintaining the blockdevice driver in Cinder, and because there was no CI that could test it, the driver got removed.
We've investigated how we could otherwise implement it, and one solution would be to use Cinder, but then we'd be going through an iSCSI export, which would drastically reduce performances.
Another solution would be to manage KVM instances by hand, not touching anything to libvirt and/or OpenVSwitch, but then we would loose the ease of using the Nova API, so we would prefer to avoid this direction.
So we (ie: employees in my company) need to ask the Nova team: would you consider a spec to do what was rejected before, since there's now no other good enough alternative?
Our current goal is to be able to provide a disk directly to a VM, so that we could build Ceph clusters with an hyper-converged model (ie: storage hosted on the compute nodes). In this model, we wouldn't need live-migration of a VM with an attached physical block device (though the feature could be added on a later stage).
Before we start investigating how this can be done, I need to know if this has at least some chances to be accepted or not. If there is, then we'll probably start an experimental patch locally, then write a spec to properly start this project. So please let us know.
Cheers,
Thomas Goirand (zigo)
On Tue, 2020-09-01 at 19:47 +0200, Tim Bell wrote:
On 1 Sep 2020, at 17:15, Artom Lifshitz <alifshit@redhat.com> wrote:
IIUC one of our (Red Hat's) customer-facing folks brought us a similar question recently. In their case they wanted to use PCI passthrough to pass an NVMe disk to an instance. This is technically possible, but there would be major privacy concerns in a multi-tenant cloud, as Nova currently has no way of cleaning up a disk after a VM has left it, so either the guest OS would have to do it itself, or any subsequent VM using that disk would have access to all of the previous VM's data (this could be mitigated by full-disk encryption, though). Cleaning up disks after VM's would probably fall more within Cybor's scope...
There's also the question of instance move operations like live and cold migrations - what happens to the passed-through disk in those cases? Does Nova have to copy it to the destination? I think those would be fairly easily addressable though (there are no major technical or political challenges, it's just a matter of someone writing the code and reviewing).
The disk cleanup thing is going to be harder, I suspect - more politically than technically. It's a bit of a chicken and egg problem with Nova and Cyborg, at the moment. Nova can refuse features as being out of scope and punt them to Cyborg, but I'm not sure how production-ready Cyborg is... on the cyborg front i suggested adding a lvm driver to cyborg a few years ago for this usecase. i wanted to use it for testing in the gate since it would require no hardware and also support "programming" by copying a glance image to the volmen and cleaning by erasing the vloumne when the vm is deleted. it would solve the local attach disk usecase too in a way. 1 by providing lvm volumes as disks to the guest and two by extending nova to support host block devices form cybrog enabling other driver that for example just alloacted an entire blockdevci instead of a volume to be written without needing nova changes.
but the production readyness was the catch 22 there. because it was not considered production ready i caned the idea of an lvm driver because there was no path to deliverin gthis to customer downstream so i did not spend time working on the idea. i still think its the right way to go but currently that is not aligned with the feature im working on.
Does the LVM pass through option help for direct attach of a local disk ?
https://cloudnull.io/2017/12/nova-lvm-an-iop-love-story/ <https://cloudnull.io/2017/12/nova-lvm-an-iop-love-story/>
not really since the lvm image backend for nova just use lvm volumns instead of the root/ephmeral disks in the flavor its not actully providing a way to attach a local disk or partion so is not really the same usecase. at least form a down stream peresective its also not supported in the redhat product. from an upstream perfective it is not tested in the gate as far as i am aware so the maintance of that backend is questionably although if we get bug reports wew will fix them but novas lvm backend is not really a solution here i suspect. that said it did in the past out perform the default flat/qcow backedn for write intensive workloads. not sure if that has changed over time but there were pros and cons to it.
Tim
On Tue, Sep 1, 2020 at 10:41 AM Thomas Goirand <zigo@debian.org> wrote:
Hi Nova team!
tl;dr: we would like to contribute giving instances access to physical block devices directly on the compute hosts. Would this be accepted?
Longer version:
About 3 or 4 years ago, someone wrote a spec, so we'd be able to provide a local disk of a compute, directly to a VM to use. This was then rejected, because at the time, Cinder had the blockdevice drive, which was more or less achieving the same thing. Unfortunately, because nobody was maintaining the blockdevice driver in Cinder, and because there was no CI that could test it, the driver got removed.
We've investigated how we could otherwise implement it, and one solution would be to use Cinder, but then we'd be going through an iSCSI export, which would drastically reduce performances.
Another solution would be to manage KVM instances by hand, not touching anything to libvirt and/or OpenVSwitch, but then we would loose the ease of using the Nova API, so we would prefer to avoid this direction.
So we (ie: employees in my company) need to ask the Nova team: would you consider a spec to do what was rejected before, since there's now no other good enough alternative?
Our current goal is to be able to provide a disk directly to a VM, so that we could build Ceph clusters with an hyper-converged model (ie: storage hosted on the compute nodes). In this model, we wouldn't need live-migration of a VM with an attached physical block device (though the feature could be added on a later stage).
Before we start investigating how this can be done, I need to know if this has at least some chances to be accepted or not. If there is, then we'll probably start an experimental patch locally, then write a spec to properly start this project. So please let us know.
Cheers,
Thomas Goirand (zigo)
participants (4)
- 
                
                Artom Lifshitz
- 
                
                Sean Mooney
- 
                
                Thomas Goirand
- 
                
                Tim Bell