[largescale-sig]scaling story
Hi everyone, we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks, Alex Song
Hi Alex, Thanks for the great write up! I would love to see more of this. How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ? Thanks Mohammed From: Alex Song (宋文平) <songwenping@inspur.com> Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: [largescale-sig]scaling story Hi everyone, we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks, Alex Song
That sounds a great value for large scale! Thank you for this story. Would you mind sharing more details / config params / change you did on code? I am surprised about the max_connections = 100000 for example. We identified on our side that having too much connection to the DB resulted in memory/fd exhaustion. Also, one question about the placement 1000 host limit, is it because you request to spawn 3k instances in one request? Cheers, Arnaud On 13.06.24 - 11:47, Mohammed Naser wrote:
Hi Alex,
Thanks for the great write up! I would love to see more of this.
How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?
Thanks Mohammed
From: Alex Song (宋文平) <songwenping@inspur.com> Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: [largescale-sig]scaling story Hi everyone,
we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks,
Alex Song
Hi Arnaud, For the first question, in the operating system, maintaining a link typically takes up 10-100KB memory space, 100000 connections may use 1G-10G memory space, the memory will not be exhausted. During our 3000 nodes large scale test, we calculated that the maximum number of database connections reached around 1.5W, and in the stable state, the number of connections remained around 4500. thus the connection links comsume less than 1GB memory space, includes the size of query cache, temporary tables, or buffer pools, we calculated that the maximum memory usage of the database service process is around 25GB, which is less than 10% of the total memory usage of the control node. Therefore, it will not cause memory depletion. For the second question, yeah, it's because we spwan to create 3k instance in one request, the placement default return 1000 allocate candidates, so we need to increase the limit to 3000. For the deployment optimization, we modified the ansible module on the basis of the openstack helm to support user-defined configuration, making it more convenient to modify openstack configuration parameters. Additonally, we optimize the load balance problem of Kubeapi in large-scale scenarios, adjust the long connection strategy of Kubelet client to make it randomly reconnect and ensure the overall load balance of all management nodes. The etherpad https://etherpad.opendev.org/p/large-scale-inspur shows the more details / config params/ change on code.
Hello Alex, thanks for the writeup! A few comments and questions from me: *Database settings* It seems dangerous to me to define the number of connections 'max_connections' so high (max 64K if you only connect to one ip). Every database connection requires resources on the operating system side at the very least. As far as I know, MariaDB allocates additional memory per active thread (e.g. sort_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buffer_size/>,join_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/join_buffer_size/>). In extreme situations, this can cause MariaDB to either run out of memory (10k connections this is ~21GB only for threads when you use the mariadb defaults) due to CGroup resource limits or cause the memory requirement to grow so large that the OOM killer may even be activated on the node. Not a good thing for a database, especially when OOMKILL terminates the database using a SIGKILL. Furthermore, it could be that other limitations in the setup (IO hardware limits, ulimits or other configuration parameters) cause many thousands of connections to be active in parallel, but these are slowed down or even blocked/starved as a result. Ultimately, this will at least have a negative impact on response times, but it can also cause more serious problems that could, for example, cause the server to block its operation. These horror scenarios only occur in extreme situations, but it is precisely in these situations that these settings are particularly dangerous in my opinion. *RabbitMQ * I have also wondered what the limit for RabbitMQ is and whether there are potential difficulties here. As far as I know, the maximum number is automatically set, for example, by the ULimit or Erlang port limit applicable to the process (see also https://www.rabbitmq.com/docs/networking#tuning-for-large-number-of-connecti...). What was your initial limit? In my setup, there are already a lot: $ docker exec -ti rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H “Max open files” /proc/PID/limits' /proc/22/limits:Max open files 1048576 1048576 files $ docker exec -ti rabbitmq rabbitmqctl eval 'erlang:system_info(port_limit).' 65536 Almost everything about the maximum possible connections in TCP can be used here for internal file system access. What I would like to know: How many connections can it handle in times of very high load? Do you have monitoring data from the situation you resolved? The RabbitMQ documentation mentioned above describes some interesting approaches - it probably makes sense to discuss this in more detail in a dedicated mail thread. Regards Marc
<o:p> </o:p></span></p><p class=3DMsoNormal><span lang=3DEN-US>Hi = Marc,<o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US><o:p> </o:p></span></p><p><b><span = lang=3DEN-US>Database settings</span></b><span = lang=3DEN-US><o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US>10k DB connections consume up to 21G of memory, which only = accounts for 10% of the server's memory in our env and will not cause = OOM risk.<o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US><o:p> </o:p></span></p><p class=3DMsoNormal><b><span = lang=3DEN-US>RabbitMQ<o:p></o:p></span></b></p><p = class=3DMsoNormal><span lang=3DEN-US><o:p> </o:p></span></p><p = class=3DMsoNormal><span lang=3DEN-US>The maximum number of RabbitMQ = connections is 20000, which is obtained by test in the 3000 node = environment.<o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US =
<o:p> </o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US>Thanks<o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US>Alex<o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US =
<o:p> </o:p></span></p><div><div =
0 *H÷ 010 +0 *H÷ $Content-Type: multipart/alternative; boundary="----=_NextPart_000_0285_01DB23C2.5F292080" This is a multipart message in MIME format. ------=_NextPart_000_0285_01DB23C2.5F292080 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit Hi Marc, Database settings 10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk. RabbitMQ The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment. Thanks Alex ·¢ŒþÈË: Marc Schoechlin [mailto:ms@256bit.org] ·¢ËÍʱŒä: 2024Äê10ÔÂ17ÈÕ 20:02 ÊÕŒþÈË: openstack-discuss@lists.openstack.org Ö÷Ìâ: Re: [largescale-sig]scaling story Hello Alex, thanks for the writeup! A few comments and questions from me: Database settings It seems dangerous to me to define the number of connections 'max_connections' so high (max 64K if you only connect to one ip). Every database connection requires resources on the operating system side at the very least. As far as I know, MariaDB allocates additional memory per active thread (e.g. sort_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buffer_size/> , join_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/join_buffer_size/> ). In extreme situations, this can cause MariaDB to either run out of memory (10k connections this is ~21GB only for threads when you use the mariadb defaults) due to CGroup resource limits or cause the memory requirement to grow so large that the OOM killer may even be activated on the node. Not a good thing for a database, especially when OOMKILL terminates the database using a SIGKILL. Furthermore, it could be that other limitations in the setup (IO hardware limits, ulimits or other configuration parameters) cause many thousands of connections to be active in parallel, but these are slowed down or even blocked/starved as a result. Ultimately, this will at least have a negative impact on response times, but it can also cause more serious problems that could, for example, cause the server to block its operation. These horror scenarios only occur in extreme situations, but it is precisely in these situations that these settings are particularly dangerous in my opinion. RabbitMQ I have also wondered what the limit for RabbitMQ is and whether there are potential difficulties here. As far as I know, the maximum number is automatically set, for example, by the ULimit or Erlang port limit applicable to the process (see also https://www.rabbitmq.com/docs/networking#tuning-for-large-number-of-connecti ons). What was your initial limit? In my setup, there are already a lot: $ docker exec -ti rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H ¡°Max open files¡± /proc/PID/limits' /proc/22/ <limits:Max> limits:Max open files 1048576 1048576 files $ docker exec -ti rabbitmq rabbitmqctl eval ' <erlang:system_info(port_limit)> erlang:system_info(port_limit).' 65536 Almost everything about the maximum possible connections in TCP can be used here for internal file system access. What I would like to know: How many connections can it handle in times of very high load? Do you have monitoring data from the situation you resolved? The RabbitMQ documentation mentioned above describes some interesting approaches - it probably makes sense to discuss this in more detail in a dedicated mail thread. Regards Marc ------=_NextPart_000_0285_01DB23C2.5F292080 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; charset=3Dgb2312"><meta = name=3DGenerator content=3D"Microsoft Word 15 (filtered = medium)"><style><!-- /* Font Definitions */ @font-face {font-family:=CB=CE=CC=E5; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:"\@=CB=CE=CC=E5"; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:=CE=A2=C8=ED=D1=C5=BA=DA; panose-1:2 11 5 3 2 2 4 2 2 4;} @font-face {font-family:"\@=CE=A2=C8=ED=D1=C5=BA=DA"; panose-1:2 11 5 3 2 2 4 2 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:12.0pt; font-family:=CB=CE=CC=E5;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p {mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; font-size:12.0pt; font-family:=CB=CE=CC=E5;} span.--l {mso-style-name:--l;} span.--r {mso-style-name:--r;} span.EmailStyle20 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--></head><body lang=3DZH-CN link=3Dblue = vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span = lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm = 0cm 0cm'><p class=3DMsoNormal><b><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=B7=A2=BC=FE=C8=CB<span lang=3DEN-US>:</span></span></b><span = lang=3DEN-US = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'> Marc Schoechlin [mailto:ms@256bit.org] <br></span><b><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=B7=A2=CB=CD=CA=B1=BC=E4<span lang=3DEN-US>:</span></span></b><span = lang=3DEN-US = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'> 2024</span><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=C4=EA<span lang=3DEN-US>10</span>=D4=C2<span = lang=3DEN-US>17</span>=C8=D5<span lang=3DEN-US> = 20:02<br></span><b>=CA=D5=BC=FE=C8=CB<span = lang=3DEN-US>:</span></b><span lang=3DEN-US> = openstack-discuss@lists.openstack.org<br></span><b>=D6=F7=CC=E2<span = lang=3DEN-US>:</span></b><span lang=3DEN-US> Re: [largescale-sig]scaling = story<o:p></o:p></span></span></p></div></div><p class=3DMsoNormal><span = lang=3DEN-US><o:p> </o:p></span></p><p><span lang=3DEN-US>Hello = Alex,<o:p></o:p></span></p><p><span lang=3DEN-US>thanks for the = writeup!<o:p></o:p></span></p><p><span lang=3DEN-US>A few comments and = questions from me:<o:p></o:p></span></p><p><b><span = lang=3DEN-US>Database settings</span></b><span = lang=3DEN-US><o:p></o:p></span></p><p><span lang=3DEN-US>It seems = dangerous to me to define the number of connections 'max_connections' so = high (max 64K if you only connect to one ip).<br><br>Every database = connection requires resources on the operating system side at the very = least. As far as I know, MariaDB allocates additional memory per active = thread (e.g. <a = href=3D"https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buf= fer_size/">sort_buffer_size</a>,<a = href=3D"https://mariadb.com/docs/server/ref/mdb/system-variables/join_buf= fer_size/"> join_buffer_size</a>). In extreme situations, this can cause = MariaDB to either run out of memory (10k connections this is ~21GB only = for threads when you use the mariadb defaults) due to CGroup resource = limits or cause the memory requirement to grow so large that the OOM = killer may even be activated on the node. Not a good thing for a = database, especially when OOMKILL terminates the database using a = SIGKILL.<br><br>Furthermore, it could be that other limitations in the = setup (IO hardware limits, ulimits or other configuration parameters) = cause many thousands of connections to be active in parallel, but these = are slowed down or even blocked/starved as a result. Ultimately, this = will at least have a negative impact on response times, but it can also = cause more serious problems that could, for example, cause the server to = block its operation. These horror scenarios only occur in extreme = situations, but it is precisely in these situations that these settings = are particularly dangerous in my = opinion.<br><br><b>RabbitMQ</b><o:p></o:p></span></p><p><span = class=3D--l><span lang=3DEN-US>I have also wondered what the limit for = RabbitMQ is and whether there are potential difficulties here. = </span></span><span lang=3DEN-US><o:p></o:p></span></p><p><span = class=3D--l><span lang=3DEN-US>As far as I know, the maximum number is = automatically set, for example, by the ULimit or Erlang port limit = applicable to the process (see also </span></span><span lang=3DEN-US><a = href=3D"https://www.rabbitmq.com/docs/networking#tuning-for-large-number-= of-connections">https://www.rabbitmq.com/docs/networking#tuning-for-large= -number-of-connections</a><span = class=3D--l>).</span><o:p></o:p></span></p><p><span class=3D--l><span = lang=3DEN-US>What was your initial limit?</span></span><span = lang=3DEN-US><br><br><o:p></o:p></span></p><p><span class=3D--l><span = lang=3DEN-US>In my setup, there are already a lot:</span></span><span = lang=3DEN-US><o:p></o:p></span></p><p><span class=3D--l><span = lang=3DEN-US style=3D'font-family:"Courier New"'>$ docker exec -ti = rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H =A1=B0Max = open files=A1=B1 /proc/PID/limits'</span></span><span lang=3DEN-US = style=3D'font-family:"Courier New"'><br><span = class=3D--l>/proc/22/</span></span><span lang=3DEN-US><a = href=3D"limits:Max"><span style=3D'font-family:"Courier = New"'>limits:Max</span></a></span><span class=3D--l><span lang=3DEN-US = style=3D'font-family:"Courier New"'> open files 1048576 1048576 = files</span></span><span lang=3DEN-US><o:p></o:p></span></p><p><span = class=3D--l><span lang=3DEN-US style=3D'font-family:"Courier New"'>$ = docker exec -ti rabbitmq rabbitmqctl eval '</span></span><span = lang=3DEN-US><a href=3D"erlang:system_info(port_limit)"><span = style=3D'font-family:"Courier = New"'>erlang:system_info(port_limit)</span></a></span><span = class=3D--l><span lang=3DEN-US style=3D'font-family:"Courier = New"'>.'</span></span><span lang=3DEN-US style=3D'font-family:"Courier = New"'><br><span class=3D--l>65536</span></span><span = lang=3DEN-US><o:p></o:p></span></p><p><span class=3D--l><span = lang=3DEN-US>Almost everything about the maximum possible connections in = TCP can be used here for internal file system access.</span></span><span = lang=3DEN-US><br><br><span class=3D--l>What I would like to know: How = many connections can it handle in times of very high load? Do you have = monitoring data from the situation you resolved?</span><br><span = class=3D--l>The RabbitMQ documentation mentioned above describes some = interesting approaches - it probably makes sense to discuss this in more = detail in a dedicated mail thread.</span><o:p></o:p></span></p><p><span = lang=3DEN-US>Regards<br>Marc<o:p></o:p></span></p></div></body></html> ------=_NextPart_000_0285_01DB23C2.5F292080-- 0É0± xðáwIÊèë`{0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 170109092830Z 340511122004Z0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0"0 *H÷ 0 «ä5ïc$Œ©æ'µ¯Þ6>úUKÛdÔ²Áe9Î~{BîÒLgD÷*wvVÊŠ/DýUj_xá\m/ óž=kзéGÙœQ€ýx~Wùgk ÛÜøãÔ7É6NçÏ*?n°Ê²mhùè{ïôÌÆ 7üF-Î<@ÃÓͬWçÅyåLZrF 6~føÈ×T~$0d¡ýL|zšøW=ötÚ%ýq,¥Ã~Ÿ" ÀýŸÑö2T,QÕÔ,dºÂÅ^§ÈôïåJ)ëVvp Ó£00 +7CA0U0Uÿ0ÿ0U^YŠŽLX`Nöµ¥9Š2Á5j0 +70# +7&a°$öz(¶o§K0 *H÷ daòYÙ~×ì×NÑ3ŽËlP±i!¿ýÅòsÝLºP;I4hžØþ¡L¶äŠJêà·Õ:1àEÕs®9UÛ:8Mh{IJ캌£·ÚŒó×÷C_ŸhoÏúÆËYM&Uس?9êèr·ìgP_÷mÇ#YÖpîw0º¹«0 $p³jºnº¥¯/Ôä¬÷Ú ŠÖ5uþQݶh(y¶ènnã%E,øú&Zâ"dKسãi5VúP9±œ/ñ õkØŸsá/Xðﻯ€µ³O÷Í-ó¿Mè³3Óž{G(ªŸßŒG0³0 ~d¢LÅ££j¿d¢0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 241015081309Z 291014081309Z0¯10 &ò,dcom10 &ò,dlangchao10 &ò,dhome1-0+U$æµå浪朮æ°æ®ææ¯æéå ¬åž10U å®æå¹³1%0# *H÷ songwenping@inspur.com0"0 *H÷ 0 ÓÀ3²\ž¹JÇUê² x| ËÃê«T\]µmK¿§'ËXÙ*çö Z;9ìÒhmçIæF©[.« õ9ò¯¶:üœ:Õ0Ëð7w-bŽô¿ýËÙßéèã&Æ®µ³«Îá1q×\Ï ÜÀosô5vÓI[£Ìª:Pè¥~%Ú³%¡øþzæ]go+or{Ø Üðë(o»#»ÓÌÞ2£ÿìöŒw? úE¡Àþí±à ±Œë±÷0cÃ1,«oã3?FpBšn'QÃ]÷y{èÖ#dŒ±£00= +700.&+7ò©×z©=÷Ø\Jý&§Mda0)U%"0 ++ +7 0U 05 +7 (0&0 +0 +0 +7 0D *H÷ 7050*H÷ 0*H÷ 0+0 *H÷ 0IUB0@ & +7 songwenping@inspur.comsongwenping@inspur.com0Uݵ>}6ÞúÏÀoÄ̹P0U#0^YŠŽLX`Nöµ¥9Š2Á5j0U00ÿ ü ùºldap:///CN=INSPUR-CA,CN=JTCA2012,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?certificateRevocationList?base?objectClass=cRLDistributionPoint:http://JTCA2012.home.langchao.com/CertEnroll/INSPUR-CA.crl0,+00±+0€ldap:///CN=INSPUR-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?cACertificate?base?objectClass=certificationAuthority0d+0Xhttp://JTCA2012.home.langchao.com/CertEnroll/JTCA2012.home.langchao.com_INSPUR-CA(1).crt0S +7F0D B +7 42S-1-5-21-1606980848-706699826-1801674531-2274525320 *H÷ sMIú;E}auÅÚ 'õÛ<ÿ!b27óu"ãìïwªBÉî9¶zÎ |Ý»H+4šÏçÉxè §!Ó·-ÏâoU Žl ÍààV«"Ý5¢lzÝdcŽäøw^¯uâ+ßkäÃZì§q×rK éOæþä},öõ}:«óàuSºËuϱ³ûËõ._À0é[¢'m°@eV>Ì-Ú÷8Š$a>á^iiªßçüÏø(Ç;b̺kMÝé ÌrðÂm1MvhL£ÈC|?100p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 + ø0 *H÷ 1 *H÷ 0 *H÷ 1 241021060606Z0# *H÷ 1a î7,\nöæÿ+ÈÌ^_<)(0 +71r0p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0*H÷ 1r p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 *H÷ 1 00 `He*0 `He0 *H÷ 0 `He0*H÷ 0 *H÷ @0+0 `He0 `He0 `He0 *H÷ ŸöŽCºSf3×H¥5ÌÂó«VBçÇtEÅ>:Ä¢RA,þÓÈŽÍÛêIià ✬<Á/-MäB7(þA¢p^÷%B 3â©Íàȵ²ðT!Ðå9ŒkѧÚSŠÅœôzN*|J µÊ³ùËjt¡²[õ°ÇSADÊÇ÷ â#£¹F^®SµêÉ5»-ÖHxĹH\(á<PëÌlôéGÿÜN·: »A1bÀÄëRØÝ`c'P2È É~þpEHüê >DÛmÎ¥Ž|ÝuR°ÃªÄ\A;Þ×òZmuH
Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (宋文平):
*Database settings*
10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk.
I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself.
*RabbitMQ*
The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment.
That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits) Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/) A example: URL="amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL" Respectful regards Marc
I think if the connections increase, it makes sense to run a separate server dedicated to database itself within the same network and disk writes performance does matter On Mon, 21 Oct 2024, 23:59 Marc Schoechlin, <ms@256bit.org> wrote:
Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (宋文平):
*Database settings*
10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk.
I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself.
*RabbitMQ*
The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment.
That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits)
Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/)
A example:
URL="amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL"
Respectful regards Marc
<o:p> </o:p></span></p><p class=3DMsoNormal><span = class=3D--l><span lang=3DEN-US>Hello = Marc,<o:p></o:p></span></span></p><p class=3DMsoNormal><span = class=3D--l><span lang=3DEN-US> Sorry for late = response.<o:p></o:p></span></span></p><p class=3DMsoNormal =
</span><b><span = lang=3DEN-US>RabbitMQ: </span></b><span class=3D--l><span = lang=3DEN-US>yeah, the number of available file descriptors is = significantly higher for the rabbitmq process, but in the test of = concurrent create 3000 virtual machines case, we find = nova/cinder/neutron components connections to rabbitmq is up to 10K on =
<o:p> </o:p></span></p><p class=3DMsoNormal><span lang=3DEN-US =
<o:p> </o:p></span></p><div><div =
<span class=3D--l>)<br><br>A example:<br><br></span></span><span = class=3D--l><span lang=3DEN-US style=3D'font-family:"Courier = New"'>URL=3D</span></span><span lang=3DEN-US><a =
0 *H÷ 010 +0 *H÷ $ åContent-Type: multipart/alternative; boundary="----=_NextPart_000_01ED_01DB2710.4F8B4990" This is a multipart message in MIME format. ------=_NextPart_000_01ED_01DB2710.4F8B4990 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit Hello Marc, Sorry for late response. Database: One DB connection consumes about 100KB memory, in actual testing, the max connections is up to 15K, and including the buffer pool, the database consumes up to 20G of memory. RabbitMQ: yeah, the number of available file descriptors is significantly higher for the rabbitmq process, but in the test of concurrent create 3000 virtual machines case, we find nova/cinder/neutron components connections to rabbitmq is up to 10K on the management webpage, so we set the max rabbitmq connection to 20K. ·¢ŒþÈË: Marc Schoechlin [mailto:ms@256bit.org] ·¢ËÍʱŒä: 2024Äê10ÔÂ22ÈÕ 2:29 ÊÕŒþÈË: Alex Song (ËÎÎÄÆœ) <songwenping@inspur.com>; openstack-discuss@lists.openstack.org Ö÷Ìâ: Re: ŽðžŽ: [largescale-sig]scaling story Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (ËÎÎÄÆœ): Database settings 10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk. I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself. RabbitMQ The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment. That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits) Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/) A example: URL= <amqp://openstack:mypassword@10.10.21.12:5672> "amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/ <perf-test:latest> perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL" Respectful regards Marc ------=_NextPart_000_01ED_01DB2710.4F8B4990 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; charset=3Dgb2312"><meta = name=3DGenerator content=3D"Microsoft Word 15 (filtered = medium)"><style><!-- /* Font Definitions */ @font-face {font-family:=CB=CE=CC=E5; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:"\@=CB=CE=CC=E5"; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:=CE=A2=C8=ED=D1=C5=BA=DA; panose-1:2 11 5 3 2 2 4 2 2 4;} @font-face {font-family:"\@=CE=A2=C8=ED=D1=C5=BA=DA"; panose-1:2 11 5 3 2 2 4 2 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:12.0pt; font-family:=CB=CE=CC=E5;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p {mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; font-size:12.0pt; font-family:=CB=CE=CC=E5;} span.--l {mso-style-name:--l;} span.--r {mso-style-name:--r;} span.EmailStyle20 {mso-style-type:personal; font-family:"Calibri",sans-serif; color:#1F497D;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--></head><body lang=3DZH-CN link=3Dblue = vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span = lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'text-indent:21.0pt'><b><span lang=3DEN-US>Database: = </span></b><span class=3D--l><span lang=3DEN-US>One DB connection = consumes about 100KB memory, in actual testing, the max connections is = up to 15K, and including the buffer pool, the database consumes up to = 20G of memory. </span></span><span lang=3DEN-US><o:p></o:p></span></p><p = class=3DMsoNormal><span lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= the management webpage, so we set the max rabbitmq connection to = 20K.</span></span><span lang=3DEN-US><o:p></o:p></span></p><p = class=3DMsoNormal><span lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm = 0cm 0cm'><p class=3DMsoNormal><b><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=B7=A2=BC=FE=C8=CB<span lang=3DEN-US>:</span></span></b><span = lang=3DEN-US = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'> Marc Schoechlin [mailto:ms@256bit.org] <br></span><b><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=B7=A2=CB=CD=CA=B1=BC=E4<span lang=3DEN-US>:</span></span></b><span = lang=3DEN-US = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'> 2024</span><span = style=3D'font-size:11.0pt;font-family:"=CE=A2=C8=ED=D1=C5=BA=DA",sans-ser= if'>=C4=EA<span lang=3DEN-US>10</span>=D4=C2<span = lang=3DEN-US>22</span>=C8=D5<span lang=3DEN-US> = 2:29<br></span><b>=CA=D5=BC=FE=C8=CB<span lang=3DEN-US>:</span></b><span = lang=3DEN-US> Alex Song (</span>=CB=CE=CE=C4=C6=BD<span lang=3DEN-US>) = <songwenping@inspur.com>; = openstack-discuss@lists.openstack.org<br></span><b>=D6=F7=CC=E2<span = lang=3DEN-US>:</span></b><span lang=3DEN-US> Re: = </span>=B4=F0=B8=B4<span lang=3DEN-US>: [largescale-sig]scaling = story<o:p></o:p></span></span></p></div></div><p class=3DMsoNormal><span = lang=3DEN-US><o:p> </o:p></span></p><p><span lang=3DEN-US>Hello = Alex,<o:p></o:p></span></p><div><p class=3DMsoNormal><span = lang=3DEN-US>Am 21.10.24 um 08:06 schrieb Alex Song = (</span>=CB=CE=CE=C4=C6=BD<span = lang=3DEN-US>):<o:p></o:p></span></p></div><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><p><b><span = lang=3DEN-US>Database settings</span></b><span = lang=3DEN-US><o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US>10k DB connections consume up to 21G of memory, which only = accounts for 10% of the server's memory in our env and will not cause = OOM risk.<o:p></o:p></span></p></blockquote><p class=3DMsoNormal><span = lang=3DEN-US>I think that's a lot and I would size it differently if it = were my setup.<br>The calculation only refers to the working memory that = the Connections roughly use. Not included in this calculation is, for = example, the buffer pool of the database itself.<br>In most cases, there = are also processes that could use a lot of memory outside the database = itself.<br><br><o:p></o:p></span></p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><p = class=3DMsoNormal><span lang=3DEN-US> <o:p></o:p></span></p><p = class=3DMsoNormal><b><span lang=3DEN-US>RabbitMQ</span></b><span = lang=3DEN-US><o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US> <o:p></o:p></span></p><p class=3DMsoNormal><span = lang=3DEN-US>The maximum number of RabbitMQ connections is 20000, which = is obtained by test in the 3000 node = environment.<o:p></o:p></span></p></blockquote><p = class=3DMsoNormal><span class=3D--l><span lang=3DEN-US>That is = interesting. What was the limiting factor? Usually the number of = available file descriptors is significantly higher for the RabbitMQ = process.<br>(see /proc/<pid>/limits)<br><br>Incidentally, I have = had good experiences with RabbitMQ Perftest during performance tests = these days.<br>(</span></span><span lang=3DEN-US><a = href=3D"https://perftest.rabbitmq.com/">https://perftest.rabbitmq.com/</a= href=3D"amqp://openstack:mypassword@10.10.21.12:5672"><span = style=3D'font-family:"Courier = New"'>"amqp://openstack:mypassword@10.10.21.12:5672"</span></a>= </span><span class=3D--l><span lang=3DEN-US = style=3D'font-family:"Courier New"'><br>docker run -it --net host --rm = pivotalrabbitmq/</span></span><span lang=3DEN-US><a = href=3D"perf-test:latest"><span style=3D'font-family:"Courier = New"'>perf-test:latest</span></a></span><span class=3D--l><span = lang=3DEN-US style=3D'font-family:"Courier New"'> = \<br> = --queue-pattern-f|rom 1 --queue-pattern-to 500 = \<br> --producers = 500 --consumers 15 = \<br> = --variable-size 1000:30 = \<br> = --variable-size 10000:20 = \<br> = --variable-size 5000:45 = \<br> = --quorum-queue --queue perftest = \<br> --uri = "$URL"</span><span lang=3DEN-US><br><br></span></span><span = lang=3DEN-US><o:p></o:p></span></p><p><span class=3D--l><span = lang=3DEN-US>Respectful regards <br>Marc<br><br></span></span><span = lang=3DEN-US><o:p></o:p></span></p></div></body></html> ------=_NextPart_000_01ED_01DB2710.4F8B4990-- 0É0± xðáwIÊèë`{0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 170109092830Z 340511122004Z0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0"0 *H÷ 0 «ä5ïc$Œ©æ'µ¯Þ6>úUKÛdÔ²Áe9Î~{BîÒLgD÷*wvVÊŠ/DýUj_xá\m/ óž=kзéGÙœQ€ýx~Wùgk ÛÜøãÔ7É6NçÏ*?n°Ê²mhùè{ïôÌÆ 7üF-Î<@ÃÓͬWçÅyåLZrF 6~føÈ×T~$0d¡ýL|zšøW=ötÚ%ýq,¥Ã~Ÿ" ÀýŸÑö2T,QÕÔ,dºÂÅ^§ÈôïåJ)ëVvp Ó£00 +7CA0U0Uÿ0ÿ0U^YŠŽLX`Nöµ¥9Š2Á5j0 +70# +7&a°$öz(¶o§K0 *H÷ daòYÙ~×ì×NÑ3ŽËlP±i!¿ýÅòsÝLºP;I4hžØþ¡L¶äŠJêà·Õ:1àEÕs®9UÛ:8Mh{IJ캌£·ÚŒó×÷C_ŸhoÏúÆËYM&Uس?9êèr·ìgP_÷mÇ#YÖpîw0º¹«0 $p³jºnº¥¯/Ôä¬÷Ú ŠÖ5uþQݶh(y¶ènnã%E,øú&Zâ"dKسãi5VúP9±œ/ñ õkØŸsá/Xðﻯ€µ³O÷Í-ó¿Mè³3Óž{G(ªŸßŒG0³0 ~d¢LÅ££j¿d¢0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 241015081309Z 291014081309Z0¯10 &ò,dcom10 &ò,dlangchao10 &ò,dhome1-0+U$æµå浪朮æ°æ®ææ¯æéå ¬åž10U å®æå¹³1%0# *H÷ songwenping@inspur.com0"0 *H÷ 0 ÓÀ3²\ž¹JÇUê² x| ËÃê«T\]µmK¿§'ËXÙ*çö Z;9ìÒhmçIæF©[.« õ9ò¯¶:üœ:Õ0Ëð7w-bŽô¿ýËÙßéèã&Æ®µ³«Îá1q×\Ï ÜÀosô5vÓI[£Ìª:Pè¥~%Ú³%¡øþzæ]go+or{Ø Üðë(o»#»ÓÌÞ2£ÿìöŒw? úE¡Àþí±à ±Œë±÷0cÃ1,«oã3?FpBšn'QÃ]÷y{èÖ#dŒ±£00= +700.&+7ò©×z©=÷Ø\Jý&§Mda0)U%"0 ++ +7 0U 05 +7 (0&0 +0 +0 +7 0D *H÷ 7050*H÷ 0*H÷ 0+0 *H÷ 0IUB0@ & +7 songwenping@inspur.comsongwenping@inspur.com0Uݵ>}6ÞúÏÀoÄ̹P0U#0^YŠŽLX`Nöµ¥9Š2Á5j0U00ÿ ü ùºldap:///CN=INSPUR-CA,CN=JTCA2012,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?certificateRevocationList?base?objectClass=cRLDistributionPoint:http://JTCA2012.home.langchao.com/CertEnroll/INSPUR-CA.crl0,+00±+0€ldap:///CN=INSPUR-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?cACertificate?base?objectClass=certificationAuthority0d+0Xhttp://JTCA2012.home.langchao.com/CertEnroll/JTCA2012.home.langchao.com_INSPUR-CA(1).crt0S +7F0D B +7 42S-1-5-21-1606980848-706699826-1801674531-2274525320 *H÷ sMIú;E}auÅÚ 'õÛ<ÿ!b27óu"ãìïwªBÉî9¶zÎ |Ý»H+4šÏçÉxè §!Ó·-ÏâoU Žl ÍààV«"Ý5¢lzÝdcŽäøw^¯uâ+ßkäÃZì§q×rK éOæþä},öõ}:«óàuSºËuϱ³ûËõ._À0é[¢'m°@eV>Ì-Ú÷8Š$a>á^iiªßçüÏø(Ç;b̺kMÝé ÌrðÂm1MvhL£ÈC|?100p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 + ø0 *H÷ 1 *H÷ 0 *H÷ 1 241025110134Z0# *H÷ 1ôÓâÙ"2Ë}SçÅø :ò0 +71r0p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0*H÷ 1r p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 *H÷ 1 00 `He*0 `He0 *H÷ 0 `He0*H÷ 0 *H÷ @0+0 `He0 `He0 `He0 *H÷ /GIÓE¯bxb¥£U5:bMØéÂÌQ$]Ãx«áXîLOË4 œaPÑíYìCUøUOõž^±nV«_]9m1€iñÂ}p;Ô¥¶¥æÔŸi¶Æ ·á§,ÿÖ6DaöÓ6;:jçŸÍ{ã¥Æ+tö*(gÀ1f°Ê²:ÇHä©xqŸC°@ètÍáTãurÐæ6*ÙÓ1Pr ÓGÐGÞ¥µ¶«Ø~e¿4 }j/ç-öIJÔår@î±ÙßÂ{0I.g,ÀÞÔ²2sM§
participants (6)
-
Alex Song (宋文平)
-
Arnaud Morin
-
engineer2024
-
Marc Schoechlin
-
Mohammed Naser
-
songwenping@inspur.com