[largescale-sig]scaling story

newer
The CFP for OpenInfra Days North...

older
[i18n] i18n SIG PTG Schedule & Key...

Alex Song (宋文平)

12 Jun 2024 12 Jun '24

3:39 a.m.

Hi everyone, we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks, Alex Song

Attachments:

attachment.html (text/html — 2.6 KB)
smime.p7s (application/pkcs7-signature — 3.7 KB)

Show replies by date

Hi Alex, Thanks for the great write up! I would love to see more of this. How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ? Thanks Mohammed From: Alex Song (宋文平) <songwenping@inspur.com> Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: [largescale-sig]scaling story Hi everyone, we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks, Alex Song

Arnaud Morin

14 Jun 14 Jun

6:17 a.m.

That sounds a great value for large scale! Thank you for this story. Would you mind sharing more details / config params / change you did on code? I am surprised about the max_connections = 100000 for example. We identified on our side that having too much connection to the DB resulted in memory/fd exhaustion. Also, one question about the placement 1000 host limit, is it because you request to spawn 3k instances in one request? Cheers, Arnaud On 13.06.24 - 11:47, Mohammed Naser wrote:

...

Hi Alex,

Thanks for the great write up! I would love to see more of this.

How did you adjust the max number of conns for RabbitMQ and for the relay I assume you used https://docs.ovn.org/en/latest/tutorials/ovn-ovsdb-relay.html ?

Thanks Mohammed

From: Alex Song (宋文平) <songwenping@inspur.com> Date: Wednesday, June 12, 2024 at 4:43 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: [largescale-sig]scaling story Hi everyone,

we want to share our scaling story. Please refer the link: https://etherpad.opendev.org/p/large-scale-inspur Thanks,

Alex Song

songwenping＠inspur.com

26 Jun 26 Jun

5:36 a.m.

Hi Arnaud， For the first question, in the operating system, maintaining a link typically takes up 10-100KB memory space, 100000 connections may use 1G-10G memory space, the memory will not be exhausted. During our 3000 nodes large scale test, we calculated that the maximum number of database connections reached around 1.5W, and in the stable state, the number of connections remained around 4500. thus the connection links comsume less than 1GB memory space, includes the size of query cache, temporary tables, or buffer pools, we calculated that the maximum memory usage of the database service process is around 25GB, which is less than 10% of the total memory usage of the control node. Therefore, it will not cause memory depletion. For the second question, yeah, it's because we spwan to create 3k instance in one request, the placement default return 1000 allocate candidates, so we need to increase the limit to 3000. For the deployment optimization, we modified the ansible module on the basis of the openstack helm to support user-defined configuration, making it more convenient to modify openstack configuration parameters. Additonally, we optimize the load balance problem of Kubeapi in large-scale scenarios, adjust the long connection strategy of Kubelet client to make it randomly reconnect and ensure the overall load balance of all management nodes. The etherpad https://etherpad.opendev.org/p/large-scale-inspur shows the more details / config params/ change on code.

Marc Schoechlin

17 Oct 17 Oct

7:02 a.m.

Hello Alex, thanks for the writeup! A few comments and questions from me: *Database settings* It seems dangerous to me to define the number of connections 'max_connections' so high (max 64K if you only connect to one ip). Every database connection requires resources on the operating system side at the very least. As far as I know, MariaDB allocates additional memory per active thread (e.g. sort_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buffer_size/>,join_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/join_buffer_size/>). In extreme situations, this can cause MariaDB to either run out of memory (10k connections this is ~21GB only for threads when you use the mariadb defaults) due to CGroup resource limits or cause the memory requirement to grow so large that the OOM killer may even be activated on the node. Not a good thing for a database, especially when OOMKILL terminates the database using a SIGKILL. Furthermore, it could be that other limitations in the setup (IO hardware limits, ulimits or other configuration parameters) cause many thousands of connections to be active in parallel, but these are slowed down or even blocked/starved as a result. Ultimately, this will at least have a negative impact on response times, but it can also cause more serious problems that could, for example, cause the server to block its operation. These horror scenarios only occur in extreme situations, but it is precisely in these situations that these settings are particularly dangerous in my opinion. *RabbitMQ * I have also wondered what the limit for RabbitMQ is and whether there are potential difficulties here. As far as I know, the maximum number is automatically set, for example, by the ULimit or Erlang port limit applicable to the process (see also https://www.rabbitmq.com/docs/networking#tuning-for-large-number-of-connecti...). What was your initial limit? In my setup, there are already a lot: $ docker exec -ti rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H “Max open files” /proc/PID/limits' /proc/22/limits:Max open files 1048576 1048576 files $ docker exec -ti rabbitmq rabbitmqctl eval 'erlang:system_info(port_limit).' 65536 Almost everything about the maximum possible connections in TCP can be used here for internal file system access. What I would like to know: How many connections can it handle in times of very high load? Do you have monitoring data from the situation you resolved? The RabbitMQ documentation mentioned above describes some interesting approaches - it probably makes sense to discuss this in more detail in a dedicated mail thread. Regards Marc

Alex Song (宋文平)

21 Oct 21 Oct

1:06 a.m.

New subject: 答复: [largescale-sig]scaling story

...

<o:p> </o:p>Hi = Marc,<o:p></o:p><o:p> </o:p>Database settings<o:p></o:p>10k DB connections consume up to 21G of memory, which only = accounts for 10% of the server's memory in our env and will not cause = OOM risk.<o:p></o:p><o:p> </o:p>RabbitMQ<o:p></o:p><o:p> </o:p>The maximum number of RabbitMQ = connections is 20000, which is obtained by test in the 3000 node = environment.<o:p></o:p><span = lang=3DEN-US =

...

<o:p> </o:p>Thanks<o:p></o:p>Alex<o:p></o:p><span = lang=3DEN-US =

...

<o:p> </o:p><div><div =

0 *H÷ 010 +0 *H÷ $Content-Type: multipart/alternative; boundary="----=_NextPart_000_0285_01DB23C2.5F292080" This is a multipart message in MIME format. ------=_NextPart_000_0285_01DB23C2.5F292080 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit Hi Marc, Database settings 10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk. RabbitMQ The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment. Thanks Alex ·¢ŒþÈË: Marc Schoechlin [mailto:ms@256bit.org] ·¢ËÍÊ±Œä: 2024Äê10ÔÂ17ÈÕ 20:02 ÊÕŒþÈË: openstack-discuss@lists.openstack.org Ö÷Ìâ: Re: [largescale-sig]scaling story Hello Alex, thanks for the writeup! A few comments and questions from me: Database settings It seems dangerous to me to define the number of connections 'max_connections' so high (max 64K if you only connect to one ip). Every database connection requires resources on the operating system side at the very least. As far as I know, MariaDB allocates additional memory per active thread (e.g. sort_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buffer_size/> , join_buffer_size <https://mariadb.com/docs/server/ref/mdb/system-variables/join_buffer_size/> ). In extreme situations, this can cause MariaDB to either run out of memory (10k connections this is ~21GB only for threads when you use the mariadb defaults) due to CGroup resource limits or cause the memory requirement to grow so large that the OOM killer may even be activated on the node. Not a good thing for a database, especially when OOMKILL terminates the database using a SIGKILL. Furthermore, it could be that other limitations in the setup (IO hardware limits, ulimits or other configuration parameters) cause many thousands of connections to be active in parallel, but these are slowed down or even blocked/starved as a result. Ultimately, this will at least have a negative impact on response times, but it can also cause more serious problems that could, for example, cause the server to block its operation. These horror scenarios only occur in extreme situations, but it is precisely in these situations that these settings are particularly dangerous in my opinion. RabbitMQ I have also wondered what the limit for RabbitMQ is and whether there are potential difficulties here. As far as I know, the maximum number is automatically set, for example, by the ULimit or Erlang port limit applicable to the process (see also https://www.rabbitmq.com/docs/networking#tuning-for-large-number-of-connecti ons). What was your initial limit? In my setup, there are already a lot: $ docker exec -ti rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H ¡°Max open files¡± /proc/PID/limits' /proc/22/ <limits:Max> limits:Max open files 1048576 1048576 files $ docker exec -ti rabbitmq rabbitmqctl eval ' <erlang:system_info(port_limit)> erlang:system_info(port_limit).' 65536 Almost everything about the maximum possible connections in TCP can be used here for internal file system access. What I would like to know: How many connections can it handle in times of very high load? Do you have monitoring data from the situation you resolved? The RabbitMQ documentation mentioned above describes some interesting approaches - it probably makes sense to discuss this in more detail in a dedicated mail thread. Regards Marc ------=_NextPart_000_0285_01DB23C2.5F292080 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; charset=3Dgb2312"><meta = name=3DGenerator content=3D"Microsoft Word 15 (filtered = medium)"><style></style></head><body lang=3DZH-CN link=3Dblue = vlink=3Dpurple><div class=3DWordSection1><span = lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm = 0cm 0cm'>=B7=A2=BC=FE=C8=CB: Marc Schoechlin [mailto:ms@256bit.org] =B7=A2=CB=CD=CA=B1=BC=E4: 2024=C4=EA10=D4=C217=C8=D5 = 20:02 =CA=D5=BC=FE=C8=CB: = openstack-discuss@lists.openstack.org =D6=F7=CC=E2: Re: [largescale-sig]scaling = story<o:p></o:p></div></div><o:p> </o:p>Hello = Alex,<o:p></o:p>thanks for the = writeup!<o:p></o:p>A few comments and = questions from me:<o:p></o:p>Database settings<o:p></o:p>It seems = dangerous to me to define the number of connections 'max_connections' so = high (max 64K if you only connect to one ip). Every database = connection requires resources on the operating system side at the very = least. As far as I know, MariaDB allocates additional memory per active = thread (e.g. <a = href=3D"https://mariadb.com/docs/server/ref/mdb/system-variables/sort_buf= fer_size/">sort_buffer_size</a>,<a = href=3D"https://mariadb.com/docs/server/ref/mdb/system-variables/join_buf= fer_size/"> join_buffer_size</a>). In extreme situations, this can cause = MariaDB to either run out of memory (10k connections this is ~21GB only = for threads when you use the mariadb defaults) due to CGroup resource = limits or cause the memory requirement to grow so large that the OOM = killer may even be activated on the node. Not a good thing for a = database, especially when OOMKILL terminates the database using a = SIGKILL. Furthermore, it could be that other limitations in the = setup (IO hardware limits, ulimits or other configuration parameters) = cause many thousands of connections to be active in parallel, but these = are slowed down or even blocked/starved as a result. Ultimately, this = will at least have a negative impact on response times, but it can also = cause more serious problems that could, for example, cause the server to = block its operation. These horror scenarios only occur in extreme = situations, but it is precisely in these situations that these settings = are particularly dangerous in my = opinion. RabbitMQ<o:p></o:p>I have also wondered what the limit for = RabbitMQ is and whether there are potential difficulties here. = <o:p></o:p>As far as I know, the maximum number is = automatically set, for example, by the ULimit or Erlang port limit = applicable to the process (see also <a = href=3D"https://www.rabbitmq.com/docs/networking#tuning-for-large-number-= of-connections">https://www.rabbitmq.com/docs/networking#tuning-for-large= -number-of-connections</a>).<o:p></o:p>What was your initial limit? <o:p></o:p>In my setup, there are already a lot:<o:p></o:p>$ docker exec -ti = rabbitmq /bin/bash -c 'pgrep beam.smp|xargs -I PID grep -H =A1=B0Max = open files=A1=B1 /proc/PID/limits' /proc/22/<a = href=3D"limits:Max">limits:Max</a> open files 1048576 1048576 = files<o:p></o:p>$ = docker exec -ti rabbitmq rabbitmqctl eval '<a href=3D"erlang:system_info(port_limit)">erlang:system_info(port_limit)</a>.' 65536<o:p></o:p>Almost everything about the maximum possible connections in = TCP can be used here for internal file system access. What I would like to know: How = many connections can it handle in times of very high load? Do you have = monitoring data from the situation you resolved? The RabbitMQ documentation mentioned above describes some = interesting approaches - it probably makes sense to discuss this in more = detail in a dedicated mail thread.<o:p></o:p>Regards Marc<o:p></o:p></div></body></html> ------=_NextPart_000_0285_01DB23C2.5F292080-- 0É0± xðáwIÊèë`{0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 170109092830Z 340511122004Z0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0"0 *H÷ 0 «ä5ïc$Œ©æ'µ¯Þ6>úUKÛdÔ²Áe9Î~{BîÒLgD÷*wvVÊŠ/DýUj_xá\m/ óž=kÐ·éGÙœQ€ýx~Wùgk ÛÜøãÔ7É6NçÏ*?n°Ê²mhùè{Ã¯ôÌÆ 7üF-Î<@ÃÓÍ¬WçÅyåLZrF 6~føÈ×T~$0d¡ýL|zšøW=ötÚ%ýq,¥Ã~Ÿ"ÀýŸÑö2T,QÕÔ,dºÂÅ^§ÈôïåJ)ëVvp Ó£00 +7CA0U0Uÿ0ÿ0U^YŠŽLX`Nöµ¥9Š2Á5j0 +70# +7&a°$öz(¶o§K0 *H÷ daòYÙ~×ì×NÑ3ŽËlP±i!¿ýÅòsÝLºP;I4hžØþ¡L¶äŠJêà·Õ:1àEÕs®9UÛ:8Mh{Ä²ìºŒ£·ÚŒó×÷C_ŸhoÏúÆËYM&UØ³?9êèr·ìgP_÷mÇ#YÖpîw0º¹«0$p³jºnº¥¯/Ôä¬÷Ú ŠÖ5uþQÝ¶h(y¶ènnã%E,øú&Zâ"dKØ³ãi5VúP9±œ/ñ õkØŸsá/Xðï»¯€µ³O÷Í-ó¿Mè³3Óž{G(ªŸßŒG0³0 ~d¢LÅ££j¿d¢0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 241015081309Z 291014081309Z0¯10 &ò,dcom10 &ò,dlangchao10 &ò,dhome1-0+U$æµåæµªæœ®æ°æ®ææ¯æéå¬åž10U å®æå¹³1%0# *H÷ songwenping@inspur.com0"0 *H÷ 0 ÓÀ3²\ž¹JÇUê² x| ËÃê«T\]µmK¿§'ËXÙ*çö Z;9ìÒhmçIæF©[.« õ9ò¯¶:üœ:Õ0ËÃ°7w-bŽô¿ýËÙßéèã&Æ®µ³«Îá1q×\Ï ÜÀosô5vÓI[£Ìª:Pè¥~%Ú³%¡øþzæ]go+or{Ø Üðë(o»#»ÓÌÞ2£ÿìöŒw? úE¡Àþí±à ±Œë±÷0cÃ1,«oã3?FpBšn'QÃ]÷y{èÖ#dŒ±£00= +700.&+7ò©×z©=÷Ø\Jý&§Mda0)U%"0 ++ +7 0U 05 +7 (0&0 +0 +0 +7 0D *H÷ 7050*H÷ 0*H÷ 0+0 *H÷ 0IUB0@ & +7 songwenping@inspur.comsongwenping@inspur.com0UÝµ>}6ÞúÏÀoÄÌ¹P0U#0^YŠŽLX`Nöµ¥9Š2Á5j0U00ÿ ü ùºldap:///CN=INSPUR-CA,CN=JTCA2012,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?certificateRevocationList?base?objectClass=cRLDistributionPoint:http://JTCA2012.home.langchao.com/CertEnroll/INSPUR-CA.crl0,+00±+0€ldap:///CN=INSPUR-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?cACertificate?base?objectClass=certificationAuthority0d+0Xhttp://JTCA2012.home.langchao.com/CertEnroll/JTCA2012.home.langchao.com_INSPUR-CA(1).crt0S +7F0D B +7 42S-1-5-21-1606980848-706699826-1801674531-2274525320 *H÷ sMIú;E}auÅÚ'õÛ<ÿ!b27óu"ãìïwªBÉî9¶zÎ |Ý»H+4šÏçÉxè §!Ó·-ÏâoU ŽlÍààV«"Ý5¢lzÝdcŽäøw^¯uâ+ßkäÃZì§q×rKéOæþä},öõ}:«óàuSºËuÏ±³ûËõ._À0é[¢'m°@eV>Ì-Ú÷8Š$a>á^iiªßçüÏø(Ç;bÌºkMÝé ÌrðÂm1MvhL£ÈC|?100p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 + ø0 *H÷ 1 *H÷ 0 *H÷ 1 241021060606Z0# *H÷ 1a î7,\nöæÿ+ÈÌ^_<)(0 +71r0p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0*H÷ 1r p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 *H÷ 100 `He*0 `He0 *H÷ 0 `He0*H÷ 0 *H÷ @0+0 `He0 `He0 `He0 *H÷ ŸöŽCºSf3×H¥5ÌÂó«VBçÇtEÅ>:Ä¢RA,þÓÈŽÍÛêIià âœ¬<Á/-MäB7(þA¢p^÷%B 3â©ÍàÈµ²ðT!Ðå9ŒkÑ§ÚSŠÅœôzN*|J µÊ³ùËjt¡²[õ°ÇSADÊÇ÷ â#£¹F^®SµêÉ5»-ÖHxÄ¹H\(á<PëÌlôéGÿÜN·: »A1bÀÄëRØÝ`c'P2ÈÉ~þpEHüê >DÛmÎ¥Ž|ÝuR°ÃªÄ\A;Þ×òZmuH

Marc Schoechlin

1:28 p.m.

New subject: 答复: [largescale-sig]scaling story

Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (宋文平):

...

*Database settings*

10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk.

I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself.

...

*RabbitMQ*

The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment.

That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits) Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/) A example: URL="amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL" Respectful regards Marc

engineer2024

1:48 p.m.

New subject: 答复: [largescale-sig]scaling story

I think if the connections increase, it makes sense to run a separate server dedicated to database itself within the same network and disk writes performance does matter On Mon, 21 Oct 2024, 23:59 Marc Schoechlin, <ms@256bit.org> wrote:

...

Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (宋文平):

*Database settings*

10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk.

I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself.

*RabbitMQ*

The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment.

That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits)

Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/)

A example:

URL="amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL"

Respectful regards Marc

Alex Song (宋文平)

25 Oct 25 Oct

6:01 a.m.

New subject: 答复: 答复: [largescale-sig]scaling story

...

<o:p> </o:p>Hello = Marc,<o:p></o:p> Sorry for late = response.<o:p></o:p><p class=3DMsoNormal =

...

RabbitMQ: yeah, the number of available file descriptors is = significantly higher for the rabbitmq process, but in the test of = concurrent create 3000 virtual machines case, we find = nova/cinder/neutron components connections to rabbitmq is up to 10K on =

...

<o:p> </o:p><span lang=3DEN-US =

...

<o:p> </o:p><div><div =

...

) A example: URL=3D<a =

0 *H÷ 010 +0 *H÷ $ åContent-Type: multipart/alternative; boundary="----=_NextPart_000_01ED_01DB2710.4F8B4990" This is a multipart message in MIME format. ------=_NextPart_000_01ED_01DB2710.4F8B4990 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit Hello Marc, Sorry for late response. Database: One DB connection consumes about 100KB memory, in actual testing, the max connections is up to 15K, and including the buffer pool, the database consumes up to 20G of memory. RabbitMQ: yeah, the number of available file descriptors is significantly higher for the rabbitmq process, but in the test of concurrent create 3000 virtual machines case, we find nova/cinder/neutron components connections to rabbitmq is up to 10K on the management webpage, so we set the max rabbitmq connection to 20K. ·¢ŒþÈË: Marc Schoechlin [mailto:ms@256bit.org] ·¢ËÍÊ±Œä: 2024Äê10ÔÂ22ÈÕ 2:29 ÊÕŒþÈË: Alex Song (ËÎÎÄÆœ) <songwenping@inspur.com>; openstack-discuss@lists.openstack.org Ö÷Ìâ: Re: ŽðžŽ: [largescale-sig]scaling story Hello Alex, Am 21.10.24 um 08:06 schrieb Alex Song (ËÎÎÄÆœ): Database settings 10k DB connections consume up to 21G of memory, which only accounts for 10% of the server's memory in our env and will not cause OOM risk. I think that's a lot and I would size it differently if it were my setup. The calculation only refers to the working memory that the Connections roughly use. Not included in this calculation is, for example, the buffer pool of the database itself. In most cases, there are also processes that could use a lot of memory outside the database itself. RabbitMQ The maximum number of RabbitMQ connections is 20000, which is obtained by test in the 3000 node environment. That is interesting. What was the limiting factor? Usually the number of available file descriptors is significantly higher for the RabbitMQ process. (see /proc/<pid>/limits) Incidentally, I have had good experiences with RabbitMQ Perftest during performance tests these days. (https://perftest.rabbitmq.com/) A example: URL= <amqp://openstack:mypassword@10.10.21.12:5672> "amqp://openstack:mypassword@10.10.21.12:5672" docker run -it --net host --rm pivotalrabbitmq/ <perf-test:latest> perf-test:latest \ --queue-pattern-from 1 --queue-pattern-to 500 \ --producers 500 --consumers 15 \ --variable-size 1000:30 \ --variable-size 10000:20 \ --variable-size 5000:45 \ --quorum-queue --queue perftest \ --uri "$URL" Respectful regards Marc ------=_NextPart_000_01ED_01DB2710.4F8B4990 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; charset=3Dgb2312"><meta = name=3DGenerator content=3D"Microsoft Word 15 (filtered = medium)"><style></style></head><body lang=3DZH-CN link=3Dblue = vlink=3Dpurple><div class=3DWordSection1>Database: = One DB connection = consumes about 100KB memory, in actual testing, the max connections is = up to 15K, and including the buffer pool, the database consumes up to = 20G of memory. <o:p></o:p><o:p></o:p><span lang=3DEN-US = style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'font-size:10.5pt;font-family:"Calibri",sans-serif;color:#1F497D'= style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm = 0cm 0cm'>=B7=A2=BC=FE=C8=CB: Marc Schoechlin [mailto:ms@256bit.org] =B7=A2=CB=CD=CA=B1=BC=E4: 2024=C4=EA10=D4=C222=C8=D5 = 2:29 =CA=D5=BC=FE=C8=CB: Alex Song (=CB=CE=CE=C4=C6=BD) = <songwenping@inspur.com>; = openstack-discuss@lists.openstack.org =D6=F7=CC=E2: Re: = =B4=F0=B8=B4: [largescale-sig]scaling = story<o:p></o:p></div></div><o:p> </o:p>Hello = Alex,<o:p></o:p><div>Am 21.10.24 um 08:06 schrieb Alex Song = (=CB=CE=CE=C4=C6=BD):<o:p></o:p></div><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'>Database settings<o:p></o:p>10k DB connections consume up to 21G of memory, which only = accounts for 10% of the server's memory in our env and will not cause = OOM risk.<o:p></o:p></blockquote>I think that's a lot and I would size it differently if it = were my setup. The calculation only refers to the working memory that = the Connections roughly use. Not included in this calculation is, for = example, the buffer pool of the database itself. In most cases, there = are also processes that could use a lot of memory outside the database = itself. <o:p></o:p><blockquote = style=3D'margin-top:5.0pt;margin-bottom:5.0pt'> <o:p></o:p>RabbitMQ<o:p></o:p> <o:p></o:p>The maximum number of RabbitMQ connections is 20000, which = is obtained by test in the 3000 node = environment.<o:p></o:p></blockquote>That is = interesting. What was the limiting factor? Usually the number of = available file descriptors is significantly higher for the RabbitMQ = process. (see /proc/<pid>/limits) Incidentally, I have = had good experiences with RabbitMQ Perftest during performance tests = these days. (<a = href=3D"https://perftest.rabbitmq.com/">https://perftest.rabbitmq.com/</a= href=3D"amqp://openstack:mypassword@10.10.21.12:5672">"amqp://openstack:mypassword@10.10.21.12:5672"</a>= docker run -it --net host --rm = pivotalrabbitmq/<a = href=3D"perf-test:latest">perf-test:latest</a> = \ = --queue-pattern-f|rom 1 --queue-pattern-to 500 = \ --producers = 500 --consumers 15 = \ = --variable-size 1000:30 = \ = --variable-size 10000:20 = \ = --variable-size 5000:45 = \ = --quorum-queue --queue perftest = \ --uri = "$URL" <o:p></o:p>Respectful regards Marc <o:p></o:p></div></body></html> ------=_NextPart_000_01ED_01DB2710.4F8B4990-- 0É0± xðáwIÊèë`{0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 170109092830Z 340511122004Z0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0"0 *H÷ 0 «ä5ïc$Œ©æ'µ¯Þ6>úUKÛdÔ²Áe9Î~{BîÒLgD÷*wvVÊŠ/DýUj_xá\m/ óž=kÐ·éGÙœQ€ýx~Wùgk ÛÜøãÔ7É6NçÏ*?n°Ê²mhùè{Ã¯ôÌÆ 7üF-Î<@ÃÓÍ¬WçÅyåLZrF 6~føÈ×T~$0d¡ýL|zšøW=ötÚ%ýq,¥Ã~Ÿ"ÀýŸÑö2T,QÕÔ,dºÂÅ^§ÈôïåJ)ëVvp Ó£00 +7CA0U0Uÿ0ÿ0U^YŠŽLX`Nöµ¥9Š2Á5j0 +70# +7&a°$öz(¶o§K0 *H÷ daòYÙ~×ì×NÑ3ŽËlP±i!¿ýÅòsÝLºP;I4hžØþ¡L¶äŠJêà·Õ:1àEÕs®9UÛ:8Mh{Ä²ìºŒ£·ÚŒó×÷C_ŸhoÏúÆËYM&UØ³?9êèr·ìgP_÷mÇ#YÖpîw0º¹«0$p³jºnº¥¯/Ôä¬÷Ú ŠÖ5uþQÝ¶h(y¶ènnã%E,øú&Zâ"dKØ³ãi5VúP9±œ/ñ õkØŸsá/Xðï»¯€µ³O÷Í-ó¿Mè³3Óž{G(ªŸßŒG0³0 ~d¢LÅ££j¿d¢0 *H÷ 0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA0 241015081309Z 291014081309Z0¯10 &ò,dcom10 &ò,dlangchao10 &ò,dhome1-0+U$æµåæµªæœ®æ°æ®ææ¯æéå¬åž10U å®æå¹³1%0# *H÷ songwenping@inspur.com0"0 *H÷ 0 ÓÀ3²\ž¹JÇUê² x| ËÃê«T\]µmK¿§'ËXÙ*çö Z;9ìÒhmçIæF©[.« õ9ò¯¶:üœ:Õ0ËÃ°7w-bŽô¿ýËÙßéèã&Æ®µ³«Îá1q×\Ï ÜÀosô5vÓI[£Ìª:Pè¥~%Ú³%¡øþzæ]go+or{Ø Üðë(o»#»ÓÌÞ2£ÿìöŒw? úE¡Àþí±à ±Œë±÷0cÃ1,«oã3?FpBšn'QÃ]÷y{èÖ#dŒ±£00= +700.&+7ò©×z©=÷Ø\Jý&§Mda0)U%"0 ++ +7 0U 05 +7 (0&0 +0 +0 +7 0D *H÷ 7050*H÷ 0*H÷ 0+0 *H÷ 0IUB0@ & +7 songwenping@inspur.comsongwenping@inspur.com0UÝµ>}6ÞúÏÀoÄÌ¹P0U#0^YŠŽLX`Nöµ¥9Š2Á5j0U00ÿ ü ùºldap:///CN=INSPUR-CA,CN=JTCA2012,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?certificateRevocationList?base?objectClass=cRLDistributionPoint:http://JTCA2012.home.langchao.com/CertEnroll/INSPUR-CA.crl0,+00±+0€ldap:///CN=INSPUR-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=home,DC=langchao,DC=com?cACertificate?base?objectClass=certificationAuthority0d+0Xhttp://JTCA2012.home.langchao.com/CertEnroll/JTCA2012.home.langchao.com_INSPUR-CA(1).crt0S +7F0D B +7 42S-1-5-21-1606980848-706699826-1801674531-2274525320 *H÷ sMIú;E}auÅÚ'õÛ<ÿ!b27óu"ãìïwªBÉî9¶zÎ |Ý»H+4šÏçÉxè §!Ó·-ÏâoU ŽlÍààV«"Ý5¢lzÝdcŽäøw^¯uâ+ßkäÃZì§q×rKéOæþä},öõ}:«óàuSºËuÏ±³ûËõ._À0é[¢'m°@eV>Ì-Ú÷8Š$a>á^iiªßçüÏø(Ç;bÌºkMÝé ÌrðÂm1MvhL£ÈC|?100p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 + ø0 *H÷ 1 *H÷ 0 *H÷ 1 241025110134Z0# *H÷ 1ôÓâÙ"2Ë}SçÅø :ò0 +71r0p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0*H÷ 1r p0Y10 &ò,dcom10 &ò,dlangchao10 &ò,dhome10U INSPUR-CA~d¢LÅ££j¿d¢0 *H÷ 100 `He*0 `He0 *H÷ 0 `He0*H÷ 0 *H÷ @0+0 `He0 `He0 `He0 *H÷ /GIÓE¯bxb¥£U5:bMØéÂÌQ$]Ãx«áXîLOË4 œaPÑíYìCUøUOõž^±nV«_]9m1€iñÂ}p;Ô¥¶¥æÔŸi¶Æ·á§,ÿÖ6DaöÓ6;:jçŸÍ{ã¥Æ+tö*(gÀ1f°Ê²:ÇHä©xqŸC°@ètÍáTãurÐæ6*ÙÓ1Pr ÓGÐGÞ¥µ¶«Ø~e¿4 }j/ç-öIJÔår@î±ÙßÂ{0I.g,ÀÞÔ²2sM§

284

Age (days ago)

419

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Alex Song (宋文平)
Arnaud Morin
engineer2024
Marc Schoechlin
Mohammed Naser
songwenping＠inspur.com

[largescale-sig]scaling story

tags

participants (6)