Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hello Franck, it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea. We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster. The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes? Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster? Regards, Eugen Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hello and thanks for your help. Its very interesting to see your response. first, excuse my english… google translate help me a lot… I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students. I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use. On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes. In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas… Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
On 12.03.24 07:30, Franck VEDEL wrote:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
Which image formats are you using? I dimly recall issues around using qcow (esp. v1) as they need to be converted to raw first which can take some time
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
As a general comment for a small cluster I'd recommend a hyperconverged setup, it's typically more resource-efficient. Plug: have you looked at https://microstack.run/ -- it's a all-in-one OpenStack package which aims for low resource requirements
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hi Franck, sorry for my late response, the last days have been quite busy. Indeed, several minutes to spawn a new VM is long, as others already suggested it makes sense to verify where that time is spent. I don't have too many insights in budget and hardware planning so I can't really help with that. But if HA is not an issue you could go for a hyperconverged setup and colocate all the services. That would require some powerful servers, not sure if that fits into your budget though. Some of our customers have their hardware vendors which have a catalog for different use-cases (storage, hypervisors, etc) to choose from, do you have such an option as well? Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hello Frank, a possible solution might come from virtualization, to mitigate the hyper-convergence issues (eg install an hypervisor on two/three physical servers and use it to host multiple VMs for identity/messaging/networking etc, leaving only compute and CEPH on bare metal) by separating the components in separate VMs. Also I was studying CEPH requirements for my workplace, and for a production deployment the requirements are intensive: * at least three nodes working as Monitor (ceph-mon) and Manager (ceph-mgr). Must be in odd number for quorum, so next is five nodes. * At least two MetaData Sever (ceph-mds) nodes, one active and the others in standby, ready to take over * At least three Object Storage Daemon (ceph-osd) nodes, a OSD daemon for each physical device (be it HDD, SSD, RAID...), so OSD daemon count >= OSD nodes count HW requirements: * Monitor/Manager - at least 64GB RAM, better >= 128 GB * Metadata Server - CPU-intensive, single threaded, at least 1GB RAM * Object Storage Daemon - at least 4GB RAM for running daemon (so for each physical device) Probably the OSD are the only one that must run on bare metal, but the other one may be virtualized. Take all the info with a grain of salt, as we're still in the design phase References * https://docs.ceph.com/en/reef/start/intro/ * https://docs.ceph.com/en/reef/start/hardware-recommendations/ * https://www.ibm.com/docs/en/storage-ceph/7?topic=considerations-colocating-c... * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce.... * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-si... * https://docs.mirantis.com/mcp/q4-18/mcp-ref-arch/openstack-environment-plan/.... * https://tracker.ceph.com/projects/ceph/wiki/How_Many_OSDs_Can_I_Run_per_Host * https://docs.ceph.com/en/reef/start/os-recommendations/ * https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html-si... * https://docs.ceph.com/en/latest/rbd/rbd-openstack/ Best regards Francesco Di Nucci On 19/03/24 08:05, Eugen Block wrote:
Hi Franck, sorry for my late response, the last days have been quite busy. Indeed, several minutes to spawn a new VM is long, as others already suggested it makes sense to verify where that time is spent. I don't have too many insights in budget and hardware planning so I can't really help with that. But if HA is not an issue you could go for a hyperconverged setup and colocate all the services. That would require some powerful servers, not sure if that fits into your budget though. Some of our customers have their hardware vendors which have a catalog for different use-cases (storage, hypervisors, etc) to choose from, do you have such an option as well?
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hello Francesco. Thanks a lot for all this help. A lot of documents to read… A lot of informations to understand. I will test… Franck
Le 19 mars 2024 à 08:39, Francesco Di Nucci <francesco.dinucci@na.infn.it> a écrit :
Hello Frank,
a possible solution might come from virtualization, to mitigate the hyper-convergence issues (eg install an hypervisor on two/three physical servers and use it to host multiple VMs for identity/messaging/networking etc, leaving only compute and CEPH on bare metal) by separating the components in separate VMs.
Also I was studying CEPH requirements for my workplace, and for a production deployment the requirements are intensive:
at least three nodes working as Monitor (ceph-mon) and Manager (ceph-mgr). Must be in odd number for quorum, so next is five nodes. At least two MetaData Sever (ceph-mds) nodes, one active and the others in standby, ready to take over At least three Object Storage Daemon (ceph-osd) nodes, a OSD daemon for each physical device (be it HDD, SSD, RAID...), so OSD daemon count >= OSD nodes count HW requirements:
Monitor/Manager - at least 64GB RAM, better >= 128 GB Metadata Server - CPU-intensive, single threaded, at least 1GB RAM Object Storage Daemon - at least 4GB RAM for running daemon (so for each physical device) Probably the OSD are the only one that must run on bare metal, but the other one may be virtualized. Take all the info with a grain of salt, as we're still in the design phase
References
https://docs.ceph.com/en/reef/start/intro/ https://docs.ceph.com/en/reef/start/hardware-recommendations/ https://www.ibm.com/docs/en/storage-ceph/7?topic=considerations-colocating-c... https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/ce.... https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-si... https://docs.mirantis.com/mcp/q4-18/mcp-ref-arch/openstack-environment-plan/.... https://tracker.ceph.com/projects/ceph/wiki/How_Many_OSDs_Can_I_Run_per_Host https://docs.ceph.com/en/reef/start/os-recommendations/ https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html-si... https://docs.ceph.com/en/latest/rbd/rbd-openstack/ Best regards
Francesco Di Nucci
On 19/03/24 08:05, Eugen Block wrote:
Hi Franck, sorry for my late response, the last days have been quite busy. Indeed, several minutes to spawn a new VM is long, as others already suggested it makes sense to verify where that time is spent. I don't have too many insights in budget and hardware planning so I can't really help with that. But if HA is not an issue you could go for a hyperconverged setup and colocate all the services. That would require some powerful servers, not sure if that fits into your budget though. Some of our customers have their hardware vendors which have a catalog for different use-cases (storage, hypervisors, etc) to choose from, do you have such an option as well?
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr> <mailto:franck.vedel@univ-grenoble-alpes.fr>:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> <mailto:eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr> <mailto:franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hi Eugen, thanks for your help. over the last 2 weeks, I have set up a small cluster with a few servers (T350 DELL, so really small servers), completely changing the operation of storage, image management, using ephemerals because for most projects It's enough. And there, in a few seconds, I deploy 30 Windows 10 or 2019 server instances. I now have to properly distribute the resources of the future cluster, that is to say the control, network, compute, storage roles on baremetal servers or Vms....but I am happy.. I am moving forward. thanks again Franck VEDEL
Le 19 mars 2024 à 08:05, Eugen Block <eblock@nde.ag> a écrit :
Hi Franck, sorry for my late response, the last days have been quite busy. Indeed, several minutes to spawn a new VM is long, as others already suggested it makes sense to verify where that time is spent. I don't have too many insights in budget and hardware planning so I can't really help with that. But if HA is not an issue you could go for a hyperconverged setup and colocate all the services. That would require some powerful servers, not sure if that fits into your budget though. Some of our customers have their hardware vendors which have a catalog for different use-cases (storage, hypervisors, etc) to choose from, do you have such an option as well?
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
That's good news! Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Hi Eugen, thanks for your help. over the last 2 weeks, I have set up a small cluster with a few servers (T350 DELL, so really small servers), completely changing the operation of storage, image management, using ephemerals because for most projects It's enough. And there, in a few seconds, I deploy 30 Windows 10 or 2019 server instances. I now have to properly distribute the resources of the future cluster, that is to say the control, network, compute, storage roles on baremetal servers or Vms....but I am happy.. I am moving forward. thanks again
Franck VEDEL
Le 19 mars 2024 à 08:05, Eugen Block <eblock@nde.ag> a écrit :
Hi Franck, sorry for my late response, the last days have been quite busy. Indeed, several minutes to spawn a new VM is long, as others already suggested it makes sense to verify where that time is spent. I don't have too many insights in budget and hardware planning so I can't really help with that. But if HA is not an issue you could go for a hyperconverged setup and colocate all the services. That would require some powerful servers, not sure if that fits into your budget though. Some of our customers have their hardware vendors which have a catalog for different use-cases (storage, hypervisors, etc) to choose from, do you have such an option as well?
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Hello and thanks for your help. Its very interesting to see your response.
first, excuse my english… google translate help me a lot…
I understand the difficulty of the question (build a ne cluster, it's difficult to understand the different solutions for configuring a cluster, there are so many parameters. What is certain is that I don't need HA. Actually, with my 3 nodes cluster, 2 nodes are controllers and network, 3 nodes are compute, one is storage (with an iscsi bay). All vcpus are used so I need to delete some projects before starting a new lab with students.
I just tried to build a test open stack cluster with 6 nodes, with a Ceph cluster (so openstack on the same servers). Ceph is used with cinder. Instance creation is slow. For example, if I create a 20G Windows instance (with volume on a Ceph cluster), it takes 6 minutes (so if I put 30 students in parallel doing this operation, it is very long, too long). If I don't use a volume, same thing because the Ceph cluster has a "vms" pool in use.
On my current production cluster, the same instance creation operation without volume (ephemeral disk) is fast, but I do not have enough disks (800G) on each server. And no possibilities to add disks. What I need is a solution that allows me to quickly create instances (including Windows) from 20 to 40G, in ephemeral, but that I can use for certain projects to create images from snapshots so I also need a solution with volumes.
In short... it's still complicated because I do all this in addition to my work, and I don't have all the time I would like for that. Let's imagine a budget of 100,000 euros. How would you build a cluster for 250 students who would do labs to build networks configuration, creation and connection of instances, so nothing complicated, wanting instance creations to be fast. How many nodes, and what distribution of roles? Just to get some ideas…
Franck VEDEL
Le 11 mars 2024 à 14:04, Eugen Block <eblock@nde.ag> a écrit :
Hello Franck,
it's not an easy question to answer, I'll just write up a few of my thoughts. In general, ceph is a good idea for openstack, yes. But you have to keep in mind that when a server fails in a 3 node cluster it becomes degraded as there's no target for recovery left until the server comes back online. So my recommendation would be at least 4 nodes for a "real" production ceph cluster, but 3 would work as long as your infrastructure is stable enough (no regular power outages or anything). Colocating ceph and openstack on the same hardware can work (I read about it once in a while), but that means more services become unavailable (or are degraded) in case of maintenance or failure. And I'm not sure how deployment tools like kolla-ansible deal with it, I've never installed such a mixed infrastructure. If you colocated the compute service with the ceph servers you would have to migrate VMs every time a ceph server needs maintenance, or they become unavailable if a server fails unexpectedly (and you'd have to tweak the database to migrate them to a different compute node). So from a maintenance/failure point of view colocation is not the best idea.
We've had a single control node openstack running for years without any incident, but updating was disruptive, of course, at least for self-service networks, provider networks are directly available on the compute nodes, so most of the infrastructure was not impacted. We then added a second control node with a galera tie-breaker to have a HA cluster.
The question is what your requirements actually are wrt (high) availability. How is your current setup with 3 nodes? Are all 3 nodes both control and compute nodes? What is the current storage backend, local filesystem of the compute nodes?
Is it an option to buy more (smaller) nodes so that you could have a dedicated ceph cluster?
Regards, Eugen
Zitat von Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr>:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Hi Franck, Check out this spreadsheet in LibreOffice; it might give you some idea of the sizing for your infrastructure. https://github.com/noslzzp/cloud-resource-calculator/tree/master I'm not sure if there are similar tools not in spreadsheets form to help with the sizing, would be nice to know in case. Best, M.
hi… interesting…. thanks a lot !! Franck
Le 20 mars 2024 à 10:47, Matteo Piccinini <matteo.piccinini@gmail.com> a écrit :
Hi Franck,
Check out this spreadsheet in LibreOffice; it might give you some idea of the sizing for your infrastructure.
https://github.com/noslzzp/cloud-resource-calculator/tree/master
I'm not sure if there are similar tools not in spreadsheets form to help with the sizing, would be nice to know in case.
Best, M.
Bonjour Franck! It would be interesting to narrow down where you are reaching the limits of Openstack : - Are you running out of vCPU/RAM due to too many VMs? - Are you seeing CPU steal time due to oversubscribing the hosts? - Are the VMs oversized? - Are you running out of capacity on the network side? - Are you running out of capacity on the storage side? - Is the Openstack control plane reaching its limits? - That would be surprising considering the amount of computes you have. At such a "small" scale, some of the Openstack overhead might be tricky to deal with. - Running Controllers + Compute + Storage on the same server typically means a "hyperconverged" setup. - I am not sure if there are community run deployments that will support it out of the box. It's typically complex to deploy/support/maintain and part of a commercial support package. Have you looked at other virtualization platforms like Proxmox? On Thu, Mar 7, 2024 at 9:36 AM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr> wrote:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL *Dép. Réseaux Informatiques & Télécoms* *IUT1 - Univ GRENOBLE Alpes* *0476824462* Stages, Alternance, Emploi.
In terms of Hyperconverged environments. OpenstackHelm is a good candidate if you want to do it. --Karl. From: Laurent Dumont <laurentfdumont@gmail.com> Date: Tuesday, 12 March 2024 at 2:19 pm To: Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: Questi Bonjour Franck! It would be interesting to narrow down where you are reaching the limits of Openstack : * Are you running out of vCPU/RAM due to too many VMs? * Are you seeing CPU steal time due to oversubscribing the hosts? * Are the VMs oversized? * Are you running out of capacity on the network side? * Are you running out of capacity on the storage side? * Is the Openstack control plane reaching its limits? * That would be surprising considering the amount of computes you have. At such a "small" scale, some of the Openstack overhead might be tricky to deal with. * Running Controllers + Compute + Storage on the same server typically means a "hyperconverged" setup. * I am not sure if there are community run deployments that will support it out of the box. It's typically complex to deploy/support/maintain and part of a commercial support package. Have you looked at other virtualization platforms like Proxmox? On Thu, Mar 7, 2024 at 9:36 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote: Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Bonjour Laurent, and thanks a lot for your help.
Are you running out of vCPU/RAM due to too many VMs?
Yes… it’s one of my problems.
Have you looked at other virtualization platforms like Proxmox?
I know Proxmox well, I also use it for other labs. But here, it's about creating networks, routers, connecting them, taking control remotely, playing with security groups, using Designate. I'm already going to stop using volumes so much, which are not useful to me most of the time. What I'm looking for, in addition to the ability to manage instances for my 200 students, is for it to be fast, for example I'm looking to optimize the time it takes to create Windows instances. When there are around thirty of them at the same time, that poses problems for me. I'm still studying all of this. Thank you anyway. Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Le 12 mars 2024 à 04:08, Laurent Dumont <laurentfdumont@gmail.com> a écrit :
Bonjour Franck!
It would be interesting to narrow down where you are reaching the limits of Openstack : Are you running out of vCPU/RAM due to too many VMs? Are you seeing CPU steal time due to oversubscribing the hosts? Are the VMs oversized? Are you running out of capacity on the network side? Are you running out of capacity on the storage side? Is the Openstack control plane reaching its limits? That would be surprising considering the amount of computes you have. At such a "small" scale, some of the Openstack overhead might be tricky to deal with. Running Controllers + Compute + Storage on the same server typically means a "hyperconverged" setup. I am not sure if there are community run deployments that will support it out of the box. It's typically complex to deploy/support/maintain and part of a commercial support package. Have you looked at other virtualization platforms like Proxmox?
On Thu, Mar 7, 2024 at 9:36 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr <mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Bonjour Franck, Good to know! If the end goal is a lab for folks to get familiar with Openstack, it makes sense to keep the platform. From an image creation perspective, you might be overloading the IO on the SAN. Like Peter mentioned, .raw and .qcow2 images behave differently when it comes to creation/clone. - Are you creating an instance from a snapshot? - Are you creating an instance from an image? - What is the format used From a speed perspective : - If storage reliability/redundancy is not a concern. - If losing data is not the end of the world. - If you have local storage on the compute and it's based on SSD/NVME You could try ephemeral storage? - It's basically placing the qemu instance disk directly on the compute. - No need for ceph/iscsi or more servers, just add a couple of drives to the compute itself. - There is some level of instance caching --> https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_ca... Laurent On Tue, Mar 12, 2024 at 4:13 PM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr> wrote:
Bonjour Laurent, and thanks a lot for your help.
- Are you running out of vCPU/RAM due to too many VMs?
Yes… it’s one of my problems.
Have you looked at other virtualization platforms like Proxmox?
I know Proxmox well, I also use it for other labs. But here, it's about creating networks, routers, connecting them, taking control remotely, playing with security groups, using Designate.
I'm already going to stop using volumes so much, which are not useful to me most of the time. What I'm looking for, in addition to the ability to manage instances for my 200 students, is for it to be fast, for example I'm looking to optimize the time it takes to create Windows instances. When there are around thirty of them at the same time, that poses problems for me.
I'm still studying all of this. Thank you anyway.
Franck VEDEL *Dép. Réseaux Informatiques & Télécoms* *IUT1 - Univ GRENOBLE Alpes* *0476824462* Stages, Alternance, Emploi.
Le 12 mars 2024 à 04:08, Laurent Dumont <laurentfdumont@gmail.com> a écrit :
Bonjour Franck!
It would be interesting to narrow down where you are reaching the limits of Openstack :
- Are you running out of vCPU/RAM due to too many VMs? - Are you seeing CPU steal time due to oversubscribing the hosts? - Are the VMs oversized? - Are you running out of capacity on the network side? - Are you running out of capacity on the storage side? - Is the Openstack control plane reaching its limits? - That would be surprising considering the amount of computes you have.
At such a "small" scale, some of the Openstack overhead might be tricky to deal with.
- Running Controllers + Compute + Storage on the same server typically means a "hyperconverged" setup. - I am not sure if there are community run deployments that will support it out of the box. It's typically complex to deploy/support/maintain and part of a commercial support package.
Have you looked at other virtualization platforms like Proxmox?
On Thu, Mar 7, 2024 at 9:36 AM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr> wrote:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL *Dép. Réseaux Informatiques & Télécoms* *IUT1 - Univ GRENOBLE Alpes* *0476824462* Stages, Alternance, Emploi.
Laurent, thank you very much for all this help. My current Openstack platform is built for labs, so not saving data is not a problem. We show a lot of interesting things on this cluster, the students really appreciate it. They also appreciate labs (virtual or on physical servers) where we build a small openstack (first all-in-one, with keystone and AD link, with tls, then multinode, designate, etc.). But making the right hardware choices for a new cluster... it's not the same thing at all. It's really difficult. If we don't make the right choices, it's for a long time. So explained in your message, I will turn to ephemeral storage on SSD disks on the cluster servers. I know the principles of raw and qcow2 images, I also have a cache set up for using these images. But I'm using it wrong, I know. This will be corrected on the next cluster. Franck
Le 13 mars 2024 à 00:25, Laurent Dumont <laurentfdumont@gmail.com> a écrit :
Bonjour Franck,
Good to know! If the end goal is a lab for folks to get familiar with Openstack, it makes sense to keep the platform.
From an image creation perspective, you might be overloading the IO on the SAN. Like Peter mentioned, .raw and .qcow2 images behave differently when it comes to creation/clone. Are you creating an instance from a snapshot? Are you creating an instance from an image? What is the format used From a speed perspective : If storage reliability/redundancy is not a concern. If losing data is not the end of the world. If you have local storage on the compute and it's based on SSD/NVME You could try ephemeral storage? It's basically placing the qemu instance disk directly on the compute. No need for ceph/iscsi or more servers, just add a couple of drives to the compute itself. There is some level of instance caching --> https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_ca... Laurent
On Tue, Mar 12, 2024 at 4:13 PM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr <mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote:
Bonjour Laurent, and thanks a lot for your help.
Are you running out of vCPU/RAM due to too many VMs?
Yes… it’s one of my problems.
Have you looked at other virtualization platforms like Proxmox?
I know Proxmox well, I also use it for other labs. But here, it's about creating networks, routers, connecting them, taking control remotely, playing with security groups, using Designate.
I'm already going to stop using volumes so much, which are not useful to me most of the time. What I'm looking for, in addition to the ability to manage instances for my 200 students, is for it to be fast, for example I'm looking to optimize the time it takes to create Windows instances. When there are around thirty of them at the same time, that poses problems for me.
I'm still studying all of this. Thank you anyway.
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
Le 12 mars 2024 à 04:08, Laurent Dumont <laurentfdumont@gmail.com <mailto:laurentfdumont@gmail.com>> a écrit :
Bonjour Franck!
It would be interesting to narrow down where you are reaching the limits of Openstack : Are you running out of vCPU/RAM due to too many VMs? Are you seeing CPU steal time due to oversubscribing the hosts? Are the VMs oversized? Are you running out of capacity on the network side? Are you running out of capacity on the storage side? Is the Openstack control plane reaching its limits? That would be surprising considering the amount of computes you have. At such a "small" scale, some of the Openstack overhead might be tricky to deal with. Running Controllers + Compute + Storage on the same server typically means a "hyperconverged" setup. I am not sure if there are community run deployments that will support it out of the box. It's typically complex to deploy/support/maintain and part of a commercial support package. Have you looked at other virtualization platforms like Proxmox?
On Thu, Mar 7, 2024 at 9:36 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr <mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote:
Good morning, I currently have an Openstack cluster made up of 3 nodes, an iscsi bay (10T), 576 G of Ram, 10T, 288vcpus. This cluster is used by around 150 students, but is reaching its limits. Having obtained a budget to set up a larger cluster, I am wondering about the choice of the number of nodes, their role (how many controllers, network, compute, etc.) and above all what solution for storage. Let's imagine a budget to buy 6 servers with good capacities, is the right choice Ceph storage (with cinder and rdb?) on the Openstack cluster nodes? Do we need 3 servers for a Ceph cluster and 3 for the Openstack part (in this case I lose capacity for the "compute" part)... I don't know what the right choices are and above all, I have a little afraid of going in the wrong directions. Could any of you guide me, or give me links to sites that could help me (and that I haven't seen). Thanks in advance
Franck VEDEL Dép. Réseaux Informatiques & Télécoms IUT1 - Univ GRENOBLE Alpes 0476824462 Stages, Alternance, Emploi.
participants (7)
-
Eugen Block
-
Francesco Di Nucci
-
Franck VEDEL
-
Karl Kloppenborg
-
Laurent Dumont
-
Matteo Piccinini
-
Peter Sabaini