[ops][largescale-sig] How many compute nodes in a single cluster ?
Hi everyone, As part of the Large Scale SIG[1] activities, I'd like to quickly poll our community on the following question: How many compute nodes do you feel comfortable fitting in a single-cluster deployment of OpenStack, before you need to scale it out to multiple regions/cells/.. ? Obviously this depends on a lot of deployment-dependent factors (type of activity, choice of networking...) so don't overthink it: a rough number is fine :) [1] https://wiki.openstack.org/wiki/Large_Scale_SIG Thanks in advance, -- Thierry Carrez (ttx)
Hey all, I will start the answers :) At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now. Cheers, -- Arnaud Morin On 28.01.21 - 14:24, Thierry Carrez wrote:
Hi everyone,
As part of the Large Scale SIG[1] activities, I'd like to quickly poll our community on the following question:
How many compute nodes do you feel comfortable fitting in a single-cluster deployment of OpenStack, before you need to scale it out to multiple regions/cells/.. ?
Obviously this depends on a lot of deployment-dependent factors (type of activity, choice of networking...) so don't overthink it: a rough number is fine :)
[1] https://wiki.openstack.org/wiki/Large_Scale_SIG
Thanks in advance,
-- Thierry Carrez (ttx)
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells. cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades. seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone. glad to hear you can manage 1500 compute nodes by the way. the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ________________________________ From: Sean Mooney <smooney@redhat.com> Sent: Tuesday, February 2, 2021 9:50 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [ops][largescale-sig] How many compute nodes in a single cluster ? On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells. cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades. seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone. glad to hear you can manage 1500 compute nodes by the way. the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
I am not sure simply going off the number of compute nodes is a good representation of scaling issues. I think it has a lot more to do with density/networks/ports and the rate of churn in the environment, but I could be wrong. For example, I only have 80 high density computes (64 or 128 CPU's with ~400 instances per compute) and I run into the same scaling issues that are described in the Large Scale Sig and have to do a lot of tuning to keep the environment stable. My environment is also kinda unique in the way mine gets used as I have 2k to 4k instances torn down and rebuilt within an hour or two quite often so my API's are constantly bombarded. On Tue, Feb 2, 2021 at 3:15 PM Erik Olof Gunnar Andersson < eandersson@blizzard.com> wrote:
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Tuesday, February 2, 2021 9:50 AM *To:* openstack-discuss@lists.openstack.org < openstack-discuss@lists.openstack.org> *Subject:* Re: [ops][largescale-sig] How many compute nodes in a single cluster ?
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells.
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades.
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone.
glad to hear you can manage 1500 compute nodes by the way.
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
Yes, totally agree with that, on our side we are used to monitor the number of neutron ports (and espacially the number of ports in BUILD state). As usually an instance is having one port in our cloud, number of instances is closed to number of ports. About the cellsv2, we are mostly struggling on neutron side, so cells are not helping us. -- Arnaud Morin On 03.02.21 - 09:05, David Ivey wrote:
I am not sure simply going off the number of compute nodes is a good representation of scaling issues. I think it has a lot more to do with density/networks/ports and the rate of churn in the environment, but I could be wrong. For example, I only have 80 high density computes (64 or 128 CPU's with ~400 instances per compute) and I run into the same scaling issues that are described in the Large Scale Sig and have to do a lot of tuning to keep the environment stable. My environment is also kinda unique in the way mine gets used as I have 2k to 4k instances torn down and rebuilt within an hour or two quite often so my API's are constantly bombarded.
On Tue, Feb 2, 2021 at 3:15 PM Erik Olof Gunnar Andersson < eandersson@blizzard.com> wrote:
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Tuesday, February 2, 2021 9:50 AM *To:* openstack-discuss@lists.openstack.org < openstack-discuss@lists.openstack.org> *Subject:* Re: [ops][largescale-sig] How many compute nodes in a single cluster ?
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells.
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades.
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone.
glad to hear you can manage 1500 compute nodes by the way.
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
On Wed, 2021-02-03 at 14:24 +0000, Arnaud Morin wrote:
Yes, totally agree with that, on our side we are used to monitor the number of neutron ports (and espacially the number of ports in BUILD state).
As usually an instance is having one port in our cloud, number of instances is closed to number of ports.
About the cellsv2, we are mostly struggling on neutron side, so cells are not helping us.
ack, that makes sense. there are some things you can do to help scale neutron. one semi simple step is if you are usign ml2/ovs, ml2/linux-bridge or ml2/sriov-nic-agent is to move neutron to its own rabbitmq instance. neutron using the default ml2 drivers tends to be quite chatty so placing those on there own rabbit instance can help. while its in conflict with ha requirements ensuring that clustering is not used and instead loadblanicn with something like pace maker to a signel rabbitmq server can also help. rabbmqs clustering ablity while improving Ha by removing a singel point of failure decreease the performance of rabbit so if you have good monitoring and simpley restat or redeploy rabbit quickly using k8s or something else like an active backup deplopment mediataed by pacemeaker can work much better then actully clutering. if you use ml2/ovn that allows you to remove the need for the dhcp agent and l3 agent as well as the l2 agent per compute host. that signifcaltly reducece neutron rpc impact however ovn does have some partiy gaps and scaling issues of its own. if it works for you and you can use as a new enough version that allows the ovn southd process on the compute nodes to subscibe to a subset of noth/southdb update relevent to just that node i can help with scaling neutorn. im not sure about usage fo feature like dvr or routed provider networks impact this as i mostly work on nova now but at least form a data plane point of view it can reduce contention on the networing nodes(where l3 agents ran) to do routing and nat on behalf of all compute nodes. at some point it might make sense for neutorn to take a similar cells approch to its own architrue but given the ablity of it to delegate some or all of the networkign to extrenal network contoler like ovn/odl its never been clear that an in tree sharding mechium like cells was actully required. one thing that i hope some one will have time to investate at some point is can we replace rabbitmq in general with nats. this general topic comes up with different technolgies form time to time. nats however look like it would actuly be a good match in terms of feature and intended use while being much lighter weight then rabbitmq and actully improving in performance the more nats server instance you cluster since that was a design constraint form the start. i dont actully think neutorn acritrues or nova for that matter is inherintly flawed but a more moderne messagaing buts might help all distibuted services scale with fewer issues then they have today.
Thanks for your reply, a lot of useful info! We already identified that using separated rabbit cluster for neutron could improve the scalability. About the usage of NATS, I never tried this piece of software but definitely sounds a good fit for large cloud. On rabbitmq side they worked on a new kind of queue called "quorum" that are HA by design. The documentation is recommending to use quorum now instead of classic queues with HA. Does anyone know if there is a chance that oslo_messaging will manage such kind of queues? Beside the rabbit, we also monitor our database cluster (we are using mariadb with galera) very carefully. About it, we also think that splitting the cluster in multiple deployment could help improving, but while it's easy to say, it's time consuming to move an already running cloud to a new architecture :) Regards, -- Arnaud Morin On 03.02.21 - 14:55, Sean Mooney wrote:
On Wed, 2021-02-03 at 14:24 +0000, Arnaud Morin wrote:
Yes, totally agree with that, on our side we are used to monitor the number of neutron ports (and espacially the number of ports in BUILD state).
As usually an instance is having one port in our cloud, number of instances is closed to number of ports.
About the cellsv2, we are mostly struggling on neutron side, so cells are not helping us.
ack, that makes sense. there are some things you can do to help scale neutron. one semi simple step is if you are usign ml2/ovs, ml2/linux-bridge or ml2/sriov-nic-agent is to move neutron to its own rabbitmq instance. neutron using the default ml2 drivers tends to be quite chatty so placing those on there own rabbit instance can help. while its in conflict with ha requirements ensuring that clustering is not used and instead loadblanicn with something like pace maker to a signel rabbitmq server can also help. rabbmqs clustering ablity while improving Ha by removing a singel point of failure decreease the performance of rabbit so if you have good monitoring and simpley restat or redeploy rabbit quickly using k8s or something else like an active backup deplopment mediataed by pacemeaker can work much better then actully clutering.
if you use ml2/ovn that allows you to remove the need for the dhcp agent and l3 agent as well as the l2 agent per compute host. that signifcaltly reducece neutron rpc impact however ovn does have some partiy gaps and scaling issues of its own. if it works for you and you can use as a new enough version that allows the ovn southd process on the compute nodes to subscibe to a subset of noth/southdb update relevent to just that node i can help with scaling neutorn.
im not sure about usage fo feature like dvr or routed provider networks impact this as i mostly work on nova now but at least form a data plane point of view it can reduce contention on the networing nodes(where l3 agents ran) to do routing and nat on behalf of all compute nodes.
at some point it might make sense for neutorn to take a similar cells approch to its own architrue but given the ablity of it to delegate some or all of the networkign to extrenal network contoler like ovn/odl its never been clear that an in tree sharding mechium like cells was actully required.
one thing that i hope some one will have time to investate at some point is can we replace rabbitmq in general with nats. this general topic comes up with different technolgies form time to time. nats however look like it would actuly be a good match in terms of feature and intended use while being much lighter weight then rabbitmq and actully improving in performance the more nats server instance you cluster since that was a design constraint form the start.
i dont actully think neutorn acritrues or nova for that matter is inherintly flawed but a more moderne messagaing buts might help all distibuted services scale with fewer issues then they have today.
I am not sure simply going off the number of compute nodes is a good representation of scaling issues. I think it has a lot more to do with density/networks/ports and the rate of churn in the environment, but I could be wrong. For example, I only have 80 high density computes (64 or 128 CPU's with ~400 instances per compute) and I run into the same scaling issues that are described in the Large Scale Sig and have to do a lot of tuning to keep the environment stable. My environment is also kinda unique in the way mine gets used as I have 2k to 4k instances torn down and rebuilt within an hour or two quite often so my API's are constantly bombarded. actully your envionment sound like a pretty typical CI cloud where you often have short lifetimes for instance, oftten have high density and large turnover. but you are correct compute node scalse alone is not a good indictor.
On Wed, 2021-02-03 at 09:05 -0500, David Ivey wrote: port,volume,instance count are deffinetly factors as is the workload profile im just assuming your cloud is a ci cloud but interms of generic workload profiles that would seam to be the closes aproximation im aware off to that type of creation and deleteion in a n hour period. 400 instance per comput ewhile a lot is really not that unreasonable assuming your typical host have 1+TB of ram and you have typically less than 4-8 cores per guests with only 128 CPUs going much above that would be over subscitbing the cpus quite hevially we generally dont recommend exceeding more then about 4x oversubsiption for cpus even though the default is 16 based on legacy reason that assume effectvly website hosts type workloads where the botelneck is not on cpu but disk and network io. with 400 instance per host that also equatest to at least 400 neutrop ports if you are using ipatable thats actully at least 1200 ports on the host which definetly has scalining issues on agent restart or host reboot. usign the python binding for ovs can help a lot as well as changing to the ovs firewall driver as that removes the linux bridge and veth pair created for each nueton port when doing hybrid plug.
On Tue, Feb 2, 2021 at 3:15 PM Erik Olof Gunnar Andersson < eandersson@blizzard.com> wrote:
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Tuesday, February 2, 2021 9:50 AM *To:* openstack-discuss@lists.openstack.org < openstack-discuss@lists.openstack.org> *Subject:* Re: [ops][largescale-sig] How many compute nodes in a single cluster ?
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells.
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades.
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone.
glad to hear you can manage 1500 compute nodes by the way.
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
I never thought about it being like a CI cloud, but it would be very similar in usage. I should clarify that it is actually physical cores (AMD Epics) so it's 128 and 256 threads and yes at least 1TB ram per with ceph shared storage. That 400 is actually capped out at about 415 instances per compute (same cap for 64 and 128 cpu's) where I run into kernel/libvirt issues and nfconntrack hits limits and crashes. I don't have specifics to give at the moment regarding that issue, I will have to try and recreate/reproduce that when I get my other environment freed up to allow me to test that again. I was in a hurry last time that happened to me and did not get a chance to gather all the information for a bug. Switching to python binding with ovs and some tuning of mariadb, rabbit, haproxy and memcached is how I got to be able to accommodate that rate of turnover. On Wed, Feb 3, 2021 at 9:40 AM Sean Mooney <smooney@redhat.com> wrote:
I am not sure simply going off the number of compute nodes is a good representation of scaling issues. I think it has a lot more to do with density/networks/ports and the rate of churn in the environment, but I could be wrong. For example, I only have 80 high density computes (64 or 128 CPU's with ~400 instances per compute) and I run into the same scaling issues that are described in the Large Scale Sig and have to do a lot of tuning to keep the environment stable. My environment is also kinda unique in the way mine gets used as I have 2k to 4k instances torn down and rebuilt within an hour or two quite often so my API's are constantly bombarded. actully your envionment sound like a pretty typical CI cloud where you often have short lifetimes for instance, oftten have high density and large turnover. but you are correct compute node scalse alone is not a good indictor.
On Wed, 2021-02-03 at 09:05 -0500, David Ivey wrote: port,volume,instance count are deffinetly factors as is the workload profile
im just assuming your cloud is a ci cloud but interms of generic workload profiles that would seam to be the closes aproximation im aware off to that type of creation and deleteion in a n hour period.
400 instance per comput ewhile a lot is really not that unreasonable assuming your typical host have 1+TB of ram and you have typically less than 4-8 cores per guests with only 128 CPUs going much above that would be over subscitbing the cpus quite hevially we generally dont recommend exceeding more then about 4x oversubsiption for cpus even though the default is 16 based on legacy reason that assume effectvly website hosts type workloads where the botelneck is not on cpu but disk and network io.
with 400 instance per host that also equatest to at least 400 neutrop ports if you are using ipatable thats actully at least 1200 ports on the host which definetly has scalining issues on agent restart or host reboot.
usign the python binding for ovs can help a lot as well as changing to the ovs firewall driver as that removes the linux bridge and veth pair created for each nueton port when doing hybrid plug.
On Tue, Feb 2, 2021 at 3:15 PM Erik Olof Gunnar Andersson < eandersson@blizzard.com> wrote:
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be
hitting any
limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Tuesday, February 2, 2021 9:50 AM *To:* openstack-discuss@lists.openstack.org < openstack-discuss@lists.openstack.org> *Subject:* Re: [ops][largescale-sig] How many compute nodes in a single cluster ?
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells.
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades.
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone.
glad to hear you can manage 1500 compute nodes by the way.
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
I am not sure simply going off the number of compute nodes is a good representation of scaling issues. I think it has a lot more to do with density/networks/ports and the rate of churn in the environment, but I could be wrong. For example, I only have 80 high density computes (64 or 128 CPU's with ~400 instances per compute) and I run into the same scaling issues that are described in the Large Scale Sig and have to do a lot of tuning to keep the environment stable. My environment is also kinda unique in the way mine gets used as I have 2k to 4k instances torn down and rebuilt within an hour or two quite often so my API's are constantly bombarded. actully your envionment sound like a pretty typical CI cloud where you often have short lifetimes for instance, oftten have high density and large turnover. but you are correct compute node scalse alone is not a good indictor.
We also have three different types of Clouds deployments. 1. Large deployment with 12,000+ nodes. 2. Smaller deployment with a lot higher density VMs (over-provisioned). 3. Small deployment with high number of security groups. All three have very different issues. In deployment one the major issue is just the sheer number of updates from the nova-compute and neutron agents. In deployment two we suffer more from just the sheer number of changes to things like neutron ports. The third deployment struggles with the scalability of security groups. Also worth mentioning that things like Kubernetes (and high parallel Terraform deployments to some degree) deployments posed new issues for our deployments as either one of those can trigger millions of API calls per day, especially in cases where Kubernetes has gone “rogue” trying to recover from an unexpected state (e.g. bad volume, bad load balancer). Best Regards, Erik Olof Gunnar Andersson Technical Lead, Senior Cloud Engineer From: David Ivey <david.j.ivey@gmail.com> Sent: Wednesday, February 3, 2021 7:03 AM To: Sean Mooney <smooney@redhat.com> Cc: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; openstack-discuss@lists.openstack.org Subject: Re: [ops][largescale-sig] How many compute nodes in a single cluster ? I never thought about it being like a CI cloud, but it would be very similar in usage. I should clarify that it is actually physical cores (AMD Epics) so it's 128 and 256 threads and yes at least 1TB ram per with ceph shared storage. That 400 is actually capped out at about 415 instances per compute (same cap for 64 and 128 cpu's) where I run into kernel/libvirt issues and nfconntrack hits limits and crashes. I don't have specifics to give at the moment regarding that issue, I will have to try and recreate/reproduce that when I get my other environment freed up to allow me to test that again. I was in a hurry last time that happened to me and did not get a chance to gather all the information for a bug. Switching to python binding with ovs and some tuning of mariadb, rabbit, haproxy and memcached is how I got to be able to accommodate that rate of turnover. On Wed, Feb 3, 2021 at 9:40 AM Sean Mooney <smooney@redhat.com<mailto:smooney@redhat.com>> wrote: On Wed, 2021-02-03 at 09:05 -0500, David Ivey wrote: port,volume,instance count are deffinetly factors as is the workload profile im just assuming your cloud is a ci cloud but interms of generic workload profiles that would seam to be the closes aproximation im aware off to that type of creation and deleteion in a n hour period. 400 instance per comput ewhile a lot is really not that unreasonable assuming your typical host have 1+TB of ram and you have typically less than 4-8 cores per guests with only 128 CPUs going much above that would be over subscitbing the cpus quite hevially we generally dont recommend exceeding more then about 4x oversubsiption for cpus even though the default is 16 based on legacy reason that assume effectvly website hosts type workloads where the botelneck is not on cpu but disk and network io. with 400 instance per host that also equatest to at least 400 neutrop ports if you are using ipatable thats actully at least 1200 ports on the host which definetly has scalining issues on agent restart or host reboot. usign the python binding for ovs can help a lot as well as changing to the ovs firewall driver as that removes the linux bridge and veth pair created for each nueton port when doing hybrid plug.
On Tue, Feb 2, 2021 at 3:15 PM Erik Olof Gunnar Andersson < eandersson@blizzard.com<mailto:eandersson@blizzard.com>> wrote:
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
We manage our scale with regions as well. With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. Our issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning. ------------------------------ *From:* Sean Mooney <smooney@redhat.com<mailto:smooney@redhat.com>> *Sent:* Tuesday, February 2, 2021 9:50 AM *To:* openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> < openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> *Subject:* Re: [ops][largescale-sig] How many compute nodes in a single cluster ?
On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
Hey all,
I will start the answers :)
At OVH, our hard limit is around 1500 hypervisors on a region. It also depends a lot on number of instances (and neutron ports). The effects if we try to go above this number: - load on control plane (db/rabbit) is increasing a lot - "burst" load is hard to manage (e.g. restart of all neutron agent or nova computes is putting a high pressure on control plane) - and of course, failure domain is bigger
Note that we dont use cells. We are deploying multiple regions, but this is painful to manage / understand for our clients. We are looking for a solution to unify the regions, but we did not find anything which could fit our needs for now.
i assume you do not see cells v2 as a replacment for multipel regions because they do not provide indepente falut domains and also because they are only a nova feature so it does not solve sclaing issue in other service like neutorn which are streached acrooss all cells.
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not help with that infact by adding more contoler it hinders upgrades.
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs between regjions and use fedeeration to avoid sharing keystone.
glad to hear you can manage 1500 compute nodes by the way.
the old value of 500 nodes max has not been true for a very long time rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes outside of the operational overhead.
Cheers,
Hi, at CERN we have 3 regions regions with a total of 75 cells (>8000 compute nodes). In the past we had a cell with almost 2000 compute nodes. Now, we try to not have more than 200 compute nodes per cell. We prefer to manage more but smaller cells. Belmiro CERN On Thu, Jan 28, 2021 at 2:29 PM Thierry Carrez <thierry@openstack.org> wrote:
Hi everyone,
As part of the Large Scale SIG[1] activities, I'd like to quickly poll our community on the following question:
How many compute nodes do you feel comfortable fitting in a single-cluster deployment of OpenStack, before you need to scale it out to multiple regions/cells/.. ?
Obviously this depends on a lot of deployment-dependent factors (type of activity, choice of networking...) so don't overthink it: a rough number is fine :)
[1] https://wiki.openstack.org/wiki/Large_Scale_SIG
Thanks in advance,
-- Thierry Carrez (ttx)
Hi, in our case limit is 280 compute nodes, maybe bit less would be more comfortable but that depends on usage profile Br, - Eki -
-----Original Message----- From: Thierry Carrez <thierry@openstack.org> Sent: Thursday, January 28, 2021 3:24 PM To: openstack-discuss@lists.openstack.org Subject: [ops][largescale-sig] How many compute nodes in a single cluster ?
Hi everyone,
As part of the Large Scale SIG[1] activities, I'd like to quickly poll our community on the following question:
How many compute nodes do you feel comfortable fitting in a single-cluster deployment of OpenStack, before you need to scale it out to multiple regions/cells/.. ?
Obviously this depends on a lot of deployment-dependent factors (type of activity, choice of networking...) so don't overthink it: a rough number is fine :)
[1] https://wiki.openstack.org/wiki/Large_Scale_SIG
Thanks in advance,
-- Thierry Carrez (ttx)
participants (7)
-
Arnaud Morin
-
Belmiro Moreira
-
David Ivey
-
Erik Olof Gunnar Andersson
-
Peura, Erkki (Nokia - FI/Espoo)
-
Sean Mooney
-
Thierry Carrez