properly sizing openstack controlplane infrastructure
Hi everyone, A colleague and I have more or less recently been tasked with redesigning my employers openstack-infrastructure and it turned out that starting over will be easier than fixing the existing stack as we've managed to lock ourselves in a corner quite good. The requirements we've got are basically "here's 50 compute-nodes, make sure whatever you're building scales upwards from there". We have the existing pike-stack as a reference but we don't really know how the different services scale up with more compute-nodes to handle. The pike-stack has three servers as control-plane, each of them with 96G of RAM and they don't seem to have too much room left when coordinating 14 compute-nodes. We're thinking about splitting the control-nodes into infrastructure (db/rabbit/memcache) and API. What would I want to look for when sizing those control-nodes? I've not been able to find any references for this at all, just rather nebulous '8G RAM should do' which is around what our rabbit currently inhales. Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. Does that properly scale up and above 50+ nodes or should we look into offloading the networking onto something else? I'm aware that the questions may sound a bit funny, but we have to spec out the controlplane-hardware before we can start testing and we'd prefer not to retrofit the servers because we goofed up. Thanks for your time, -- Cheers, Hardy
I've run that same network config at about 70 nodes with no problems. I've run the same without dvr at 150 nodes. Your memory usage seems very high. I ran 150 nodes with a small 16g server ages ago. Might double check that. Thanks, Kevin ________________________________________ From: Hartwig Hauschild [openstack@hauschild.it] Sent: Tuesday, April 30, 2019 8:30 AM To: openstack-discuss@lists.openstack.org Subject: properly sizing openstack controlplane infrastructure Hi everyone, A colleague and I have more or less recently been tasked with redesigning my employers openstack-infrastructure and it turned out that starting over will be easier than fixing the existing stack as we've managed to lock ourselves in a corner quite good. The requirements we've got are basically "here's 50 compute-nodes, make sure whatever you're building scales upwards from there". We have the existing pike-stack as a reference but we don't really know how the different services scale up with more compute-nodes to handle. The pike-stack has three servers as control-plane, each of them with 96G of RAM and they don't seem to have too much room left when coordinating 14 compute-nodes. We're thinking about splitting the control-nodes into infrastructure (db/rabbit/memcache) and API. What would I want to look for when sizing those control-nodes? I've not been able to find any references for this at all, just rather nebulous '8G RAM should do' which is around what our rabbit currently inhales. Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. Does that properly scale up and above 50+ nodes or should we look into offloading the networking onto something else? I'm aware that the questions may sound a bit funny, but we have to spec out the controlplane-hardware before we can start testing and we'd prefer not to retrofit the servers because we goofed up. Thanks for your time, -- Cheers, Hardy
Am 30.04.2019 schrieb Fox, Kevin M:
I've run that same network config at about 70 nodes with no problems. I've run the same without dvr at 150 nodes.
Your memory usage seems very high. I ran 150 nodes with a small 16g server ages ago. Might double check that.
That's what I was thinking as well, but it did not match up with what we currently have at all. I'll need to figure out what went wrong here. -- Cheers, Hardy
----- Original Message -----
From: "Hartwig Hauschild" <openstack@hauschild.it> To: openstack-discuss@lists.openstack.org Sent: Tuesday, April 30, 2019 9:30:22 AM Subject: properly sizing openstack controlplane infrastructure
The requirements we've got are basically "here's 50 compute-nodes, make sure whatever you're building scales upwards from there".
It depends what's your end goal. 100? 500? >1000 nodes? At some point things like Nova Cells will help (or become necessity).
The pike-stack has three servers as control-plane, each of them with 96G of RAM and they don't seem to have too much room left when coordinating 14 compute-nodes.
96 GB of RAM per controller is much more than enough for 14 compute nodes. There's room for improvement in configuration.
We're thinking about splitting the control-nodes into infrastructure (db/rabbit/memcache) and API.
What would I want to look for when sizing those control-nodes? I've not been able to find any references for this at all, just rather nebulous '8G RAM should do' which is around what our rabbit currently inhales.
You might want to check out Performance Docs: https://docs.openstack.org/developer/performance-docs/ For configuration tips, I'd suggest looking at what openstack-ansible or similar projects provide as "battle-tested" configuration. It's a good baseline reference before you tune yourself. -Daniel
Am 30.04.2019 schrieb Daniel Speichert:
----- Original Message -----
From: "Hartwig Hauschild" <openstack@hauschild.it> To: openstack-discuss@lists.openstack.org Sent: Tuesday, April 30, 2019 9:30:22 AM Subject: properly sizing openstack controlplane infrastructure
The requirements we've got are basically "here's 50 compute-nodes, make sure whatever you're building scales upwards from there".
It depends what's your end goal. 100? 500? >1000 nodes? At some point things like Nova Cells will help (or become necessity).
I really hope not that high, but splitting into cells or AZs / Regions is definitely planned if it goes up.
The pike-stack has three servers as control-plane, each of them with 96G of RAM and they don't seem to have too much room left when coordinating 14 compute-nodes.
96 GB of RAM per controller is much more than enough for 14 compute nodes. There's room for improvement in configuration.
We're thinking about splitting the control-nodes into infrastructure (db/rabbit/memcache) and API.
What would I want to look for when sizing those control-nodes? I've not been able to find any references for this at all, just rather nebulous '8G RAM should do' which is around what our rabbit currently inhales.
You might want to check out Performance Docs: https://docs.openstack.org/developer/performance-docs/
For configuration tips, I'd suggest looking at what openstack-ansible or similar projects provide as "battle-tested" configuration. It's a good baseline reference before you tune yourself.
Problem is: For all I know this is a non-tuned openstack-ansible-setup. I guess I'll have to figure out why it's using way more memory than it should (and run out every now and then). Thanks, -- cheers, Hardy
On 4/30/19 5:30 PM, Hartwig Hauschild wrote:
Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. Does that properly scale up and above 50+ nodes
It does, that's not the bottleneck.
From my experience, 3 heavy control nodes are really enough to handle 200+ compute nodes. Though what you're suggesting (separating db & rabbitmq-server in separate nodes) is a very good idea.
Cheers, Thomas Goirand (zigo)
Am 01.05.2019 schrieb Thomas Goirand:
On 4/30/19 5:30 PM, Hartwig Hauschild wrote:
Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. Does that properly scale up and above 50+ nodes
It does, that's not the bottleneck.
Oh, Ok. I've read that OVS-DVR-VXLAN will produce a lot of load on the messaging-system, at least if you enable l2-pop and don't run broadcast.
From my experience, 3 heavy control nodes are really enough to handle 200+ compute nodes. Though what you're suggesting (separating db & rabbitmq-server in separate nodes) is a very good idea.
Ah, cool. Then I'll head that way and see how that works out (and how many add-on-services it can take) -- cheers, Hardy
On 5/2/19 4:21 PM, Hartwig Hauschild wrote:
Am 01.05.2019 schrieb Thomas Goirand:
On 4/30/19 5:30 PM, Hartwig Hauschild wrote:
Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. Does that properly scale up and above 50+ nodes
It does, that's not the bottleneck.
Oh, Ok. I've read that OVS-DVR-VXLAN will produce a lot of load on the messaging-system, at least if you enable l2-pop and don't run broadcast.
Yes, but that's really not a big problem for a 200+ nodes setup, especially if you dedicate 3 nodes for messaging.
From my experience, 3 heavy control nodes are really enough to handle 200+ compute nodes. Though what you're suggesting (separating db & rabbitmq-server in separate nodes) is a very good idea.
Ah, cool. Then I'll head that way and see how that works out (and how many add-on-services it can take)
participants (4)
-
Daniel Speichert
-
Fox, Kevin M
-
Hartwig Hauschild
-
Thomas Goirand