Customization of scheduler manager
Hi all, as part of my thesis, I modified the Openstack version Train with the intention of sharing resources, which are not in use with other users to ensure maximum utilization. So far everything is working fine except for the last step. Let me first explain my work. The system works according to the following rules: 1. Users who own compute hosts within the private cloud have the highest priority on their hosts. 2. Users who do not own hosts within the private cloud are low priority users who can instantiate their virtual machine on unused resources (on the hosts that have an owner). 3. If the owner wants to use his resources that are currently occupied, the foreign VM must be suspended to free up resources for the owners VM. 4. An owner is a low priority user on foreign hosts. Everything works automatically and generically, but in step 3 I do not suspend those VMs, I delete them. I want the VMs to be suspended to be able to restart them with the intention of being able to continue processes that are paused, and I know there is maybe a REST API functions that provides this functionality. A user should be able to continue his work after resources become free again. It would be annoying if long-running processes were killed. My question is this: Is the suspend function the right choice? Are the resources released when I use the suspend function? Thank you & Best regards, Levon Melikbekjan
On Tue, 2022-07-26 at 15:59 +0000, Levon Melikbekjan wrote:
Hi all,
as part of my thesis, I modified the Openstack version Train with the intention of sharing resources, which are not in use with other users to ensure maximum utilization. So far everything is working fine except for the last step. Let me first explain my work.
The system works according to the following rules:
1. Users who own compute hosts within the private cloud have the highest priority on their hosts. 2. Users who do not own hosts within the private cloud are low priority users who can instantiate their virtual machine on unused resources (on the hosts that have an owner). 3. If the owner wants to use his resources that are currently occupied, the foreign VM must be suspended to free up resources for the owners VM. 4. An owner is a low priority user on foreign hosts.
Everything works automatically and generically, but in step 3 I do not suspend those VMs, I delete them. I want the VMs to be suspended to be able to restart them with the intention of being able to continue processes that are paused, and I know there is maybe a REST API functions that provides this functionality. A user should be able to continue his work after resources become free again. It would be annoying if long-running processes were killed.
My question is this: Is the suspend function the right choice? Are the resources released when I use the suspend function? no the resouces are not releassed when you suspend.
if i was to do this i woudl shelve the instance so that the user can unshelve it to a differnt host if needed. what you are discirbing is somthing we have previously considerd call premetible instances or spot instnaces to use aws terminology. shelve will preserve the vms ports, volumes and root disk creating a snapthot storign it to glance. when the user wants to resume there low priority instance they can unshleve it and it will go to a differnt host. note that due to how nova and placment works you cant share resouce in nova the way you are trying to do becasue placment will still prevent the oversubsctiion and in traint placment is not optional. so you will never exceed the overallocation ratio unless you have altered that by say setting it very high or not creating allcoations for the low priority instances.
Thank you & Best regards,
Levon Melikbekjan
Amazing! Thank you for the hint. The shelve function is exactly what I was looking for. I have already created a workflow architecture that describes the new functionality of your select_destination python function that is located in the manager.py. The manager.py can be found in the path “/usr/lib/python2.7/site-packages/nova/scheduler”. My intention is to extend the manager.py script with a priority queue. The extension will automatically look for (the best match -> hosts with most unused resources) unused resources to reallocate shelved VMs from the priority queue. A user is the only instance who can delete his VMs completely. For me it is important not to lose the calculations performed on these VMs by processes, when the VMs are automatically shelved by my automated extension. The automated process knows if a user is an owner and which hosts he owns, because the host aggregate id is always selected from the description attribute of the user object. If this field is empty, then the user is not an owner. This is the way how my process determines the priority status of a user. Von: Sean Mooney <smooney@redhat.com> Datum: Dienstag, 26. Juli 2022 um 18:43 An: Levon Melikbekjan <levonmelikbekjan@yahoo.de>, openstack@lists.openstack.org <openstack@lists.openstack.org> Betreff: Re: Customization of scheduler manager On Tue, 2022-07-26 at 15:59 +0000, Levon Melikbekjan wrote:
Hi all,
as part of my thesis, I modified the Openstack version Train with the intention of sharing resources, which are not in use with other users to ensure maximum utilization. So far everything is working fine except for the last step. Let me first explain my work.
The system works according to the following rules:
1. Users who own compute hosts within the private cloud have the highest priority on their hosts. 2. Users who do not own hosts within the private cloud are low priority users who can instantiate their virtual machine on unused resources (on the hosts that have an owner). 3. If the owner wants to use his resources that are currently occupied, the foreign VM must be suspended to free up resources for the owners VM. 4. An owner is a low priority user on foreign hosts.
Everything works automatically and generically, but in step 3 I do not suspend those VMs, I delete them. I want the VMs to be suspended to be able to restart them with the intention of being able to continue processes that are paused, and I know there is maybe a REST API functions that provides this functionality. A user should be able to continue his work after resources become free again. It would be annoying if long-running processes were killed.
My question is this: Is the suspend function the right choice? Are the resources released when I use the suspend function? no the resouces are not releassed when you suspend.
if i was to do this i woudl shelve the instance so that the user can unshelve it to a differnt host if needed. what you are discirbing is somthing we have previously considerd call premetible instances or spot instnaces to use aws terminology. shelve will preserve the vms ports, volumes and root disk creating a snapthot storign it to glance. when the user wants to resume there low priority instance they can unshleve it and it will go to a differnt host. note that due to how nova and placment works you cant share resouce in nova the way you are trying to do becasue placment will still prevent the oversubsctiion and in traint placment is not optional. so you will never exceed the overallocation ratio unless you have altered that by say setting it very high or not creating allcoations for the low priority instances.
Thank you & Best regards,
Levon Melikbekjan
participants (2)
-
Levon Melikbekjan
-
Sean Mooney