To fix an RBD (Rados Block Device) IOPS bottleneck on the client side in OpenStack, you can try the following:
- Monitor the CPU and memory usage on the client machine to ensure that it has sufficient resources. You can use tools like top or htop to view real-time resource usage.
- Check the network bandwidth between the client and the storage system to ensure that it is not a bottleneck. You can use tools like iperf or tcpdump to measure network performance.
- Review the configuration of the storage system to ensure that it is optimized for the workload. This may include adjusting the number and type of disks used, as well as the RAID level and chunk size.
- Consider using a storage system with a higher IOPS rating to see if it can improve performance. This may involve upgrading to faster disks or using a storage solution with more disks or SSDs.
- Try using a different client machine with more resources (e.g., a machine with a faster CPU and more memory) to see if it can issue more I/O requests.
- Consider using a different network connection between the client and the storage system, such as a faster network card or a direct connection rather than a network switch.
- If you are using Ceph as the underlying storage system, you can try adjusting the Ceph configuration to improve performance. This may include adjusting the number of placement groups, the object size, or the number of OSDs (object storage devices).
It's also worth noting that an IOPS bottleneck can also occur on the server side (i.e., within the storage system itself). In this case, you may need to adjust the configuration of the storage system or add more resources (e.g., disks or SSDs) to improve performance.
BR,
Kerem Çeliker
keremceliker.medium.com
IBM | Red Hat Champion
Sent from my iPhone
On 28 Dec 2022, at 08:13, openstack-discuss-request@lists.openstack.org wrote:
Send openstack-discuss mailing list submissions to openstack-discuss@lists.openstack.orgTo subscribe or unsubscribe via the World Wide Web, visit https://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discussor, via email, send a message with subject or body 'help' to openstack-discuss-request@lists.openstack.orgYou can reach the person managing the list at openstack-discuss-owner@lists.openstack.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of openstack-discuss digest..."Today's Topics: 1. [ops][nova] RBD IOPS bottleneck on client-side (Can ?zyurt) 2. [keystone][Meeting] Reminder Keystone meeting is cancelled today (Dave Wilde) 3. Nova libvirt/kvm sound device (Zakhar Kirpichenko) 4. Re: [Tacker][SRBAC] Update regarding implementation of project personas in Tacker (Yasufumi Ogawa) 5. Re: [Tacker][SRBAC] Update regarding implementation of project personas in Tacker (manpreet kaur)----------------------------------------------------------------------Message: 1Date: Tue, 27 Dec 2022 15:33:56 +0300From: Can ?zyurt <acozyurt@gmail.com>To: OpenStack Discuss <openstack-discuss@lists.openstack.org>Subject: [ops][nova] RBD IOPS bottleneck on client-sideMessage-ID: <CAMf4N71awOfNyBk4FfpM6UCjH4AZWz+QuJwUnOv+itvqvPbTjw@mail.gmail.com>Content-Type: text/plain; charset="UTF-8"Hi everyone,I hope you are all doing well. We are trying to pinpoint an IOPSproblem with RBD and decided to ask you for your take on it.1 control plane1 compute node5 storage with 8 SSD disks eachOpenstack Stein/Ceph Mimic deployed with kolla-ansible on ubuntu-1804(kernel 5.4)isolcpus 4-127 on computevcpu_pin_set 4-127 in nova.confimage_metadatas: hw_scsi_model: virtio-scsi hw_disk_bus: scsiflavor_metadatas: hw:cpu_policy: dedicatedWhat we have tested:fio --directory=. --ioengine=libaio --direct=1--name=benchmark_random_read_write --filename=test_rand --bs=4k--iodepth=32 --size=1G --readwrite=randrw --rwmixread=50 --time_based--runtime=300s --numjobs=161. First we run the fio test above on a guest VM, we see average 5K/5Kread/write IOPS consistently. What we realize is that during the test,one single core on compute host is used at max, which is the first ofthe pinned cpus of the guest. 'top -Hp $qemupid' shows that somethreads (notably tp_librbd) share the very same core throughout thetest. (also emulatorpin set = vcpupin set as expected)2. We remove isolcpus and every other configuration stays the same.Now fio tests now show 11K/11K read/write IOPS. No bottlenecked singlecpu on the host, observed threads seem to visit all emulatorpins.3. We bring isolcpus back and redeploy the cluster with Train/Nautiluson ubuntu-1804. Observations are identical to #1.4. We tried replacing vcpu_pin_set to cpu_shared_set andcpu_dedicated_set to be able to pin emulator cpuset to 0-4 to noavail. Multiple guests on a host can easily deplete resources and IOPSdrops.5. Isolcpus are still in place and we deploy Ussuri with kolla-ansibleand Train (to limit the moving parts) with ceph-ansible both onubuntu-1804. Now we see 7K/7K read/write IOPS.6. We destroy only the compute node and boot it with ubuntu-2004 withisolcpus set. Add it back to the existing cluster and fio showsslightly above 10K/10K read/write IOPS.What we think happens:1. Since isolcpus disables scheduling between given cpus, qemu processand its threads are stuck at the same cpu which created thebottleneck. They should be runnable on any given emulatorpin cpus.2. Ussuri is more performant despite isolcpus, with the improvementsmade over time.3. Ubuntu-2004 is more performant despite isolcpus, with theimprovements made over time in the kernel.Now the questions are:1. What else are we missing here?2. Are any of those assumptions false?3. If all true, what can we do to solve this issue given that wecannot upgrade openstack nor ceph on production overnight?4. Has anyone dealt with this issue before?We welcome any opinion and suggestions at this point as we need tomake sure that we are on the right path regarding the problem andupgrade is not the only solution. Thanks in advance.------------------------------Message: 2Date: Tue, 27 Dec 2022 07:00:30 -0800From: Dave Wilde <dwilde@redhat.com>To: openstack-discuss@lists.openstack.orgSubject: [keystone][Meeting] Reminder Keystone meeting is cancelled todayMessage-ID: <CAJEkJ+qYCHZayR-t9w4e=ZB846CZSrggcqdYKE85fWkj2XAo5w@mail.gmail.com>Content-Type: text/plain; charset="utf-8"Just a quick reminder that there won?t be the keystone weekly meetingtoday. We?ll resume our regulararly scheduled programming on 03-Jan-2023.Please update the agenda if you have anything you?d like to discuess. Thereviewathon is also cancelled this week, to be resumed on 06-Jan-2023./Dave-------------- next part --------------An HTML attachment was scrubbed...URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221227/6554e318/attachment-0001.htm>------------------------------Message: 3Date: Tue, 27 Dec 2022 17:40:47 +0200From: Zakhar Kirpichenko <zakhar@gmail.com>To: openstack-discuss@lists.openstack.orgSubject: Nova libvirt/kvm sound deviceMessage-ID: <CAEw-OTX-3DR6YymKrp1B4-SRSi2DPSD=kr-OwHVtMB031C_FzA@mail.gmail.com>Content-Type: text/plain; charset="utf-8"Hi!I'd like to have the following configuration added to every guest on aspecific host managed by Nova and libvirt/kvm: <sound model='bla'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1b'function='0x0'/> </sound>When I add the device manually to instance xml, it works as intended butthe instance configuration gets overwritten on instance stop/start or hardreboot via Nova.What is the currently supported / proper way to add a virtual sound devicewithout having to modify libvirt or Nova code? I would appreciate anyadvice.Best regards,Zakhar-------------- next part --------------An HTML attachment was scrubbed...URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221227/90256d0f/attachment-0001.htm>------------------------------Message: 4Date: Wed, 28 Dec 2022 04:42:16 +0900From: Yasufumi Ogawa <yasufum.o@gmail.com>To: manpreet kaur <kaurmanpreet2620@gmail.com>Cc: openstack-discuss <openstack-discuss@lists.openstack.org>Subject: Re: [Tacker][SRBAC] Update regarding implementation of project personas in TackerMessage-ID: <5e4a7010-ffc7-83d9-e74b-ed6aae76c15c@gmail.com>Content-Type: text/plain; charset=UTF-8; format=flowedHi Manpreet-san,Thanks for your notice. I've started to review and understood this change is considered backward compatibility from suggestions on the etherpad. Although it's LGTM, I'd like to ask if Yuta has any comment for the proposal because he has also propose a policy management feature in this release.For the deadline, let us discuss again to reschedule it if we cannot merge the deadline.Thanks,YasufumiOn 2022/12/26 14:07, manpreet kaur wrote:Hi Ogawa san and Tacker team,
This mailer is regarding the SRBAC implementation happening in Tacker.
In the Tacker release 2023.1 virtual PTG [1], it was decided by the
Tacker community to partially implement the project personas
(project-reader role) in the current release. And in upcoming releases,
we will implement the remaining project-member role.
To address the above requirement, I have prepared a specification [2]
and pushed the same in Gerrit for community review.
Ghanshyam san reviewed the specification and shared TC's opinion and
suggestion to implement both the project-reader and project-member roles.
The complete persona implementation will depreciate the 'owner' rule,
and?help in?restricting any other role to accessing project-based resources.
Additionally, intact legacy admin (current admin), works?in?the same way
so that we do not break things and introduce the project personas which
should be additional things to be available for operators to adopt.
Current Status: Incorporated the new?requirement and uploaded a new
patch set to address the review comment.
Note: The Tacker spec freeze date is 28th Dec 2022, there might be some
delay in merging the specification in shared timelines.
[1] https://etherpad.opendev.org/p/tacker-antelope-ptg#L186
<https://etherpad.opendev.org/p/tacker-antelope-ptg#L186>
[2] https://review.opendev.org/c/openstack/tacker-specs/+/866956
<https://review.opendev.org/c/openstack/tacker-specs/+/866956>
Thanks & Regards,
Manpreet Kaur
------------------------------Message: 5Date: Wed, 28 Dec 2022 10:35:32 +0530From: manpreet kaur <kaurmanpreet2620@gmail.com>To: Yasufumi Ogawa <yasufum.o@gmail.com>Cc: openstack-discuss <openstack-discuss@lists.openstack.org>Subject: Re: [Tacker][SRBAC] Update regarding implementation of project personas in TackerMessage-ID: <CAFQfZj9uwyZWGfST14S8hiKH_pNjn_B1GvwaXEmfa3LRzXhJbA@mail.gmail.com>Content-Type: text/plain; charset="utf-8"Hi Ogawa san,Thanks for accepting the new RBAC proposal, please find the latestpatch-set 7 [1] as the final version.Would try to merge the specification within the proposed timelines.@Ghanshyam san,Thanks for adding clarity to the proposed changes and for a quick review.[1] https://review.opendev.org/c/openstack/tacker-specs/+/866956Best Regards,Manpreet KaurOn Wed, Dec 28, 2022 at 1:12 AM Yasufumi Ogawa <yasufum.o@gmail.com> wrote:Hi Manpreet-san,
Thanks for your notice. I've started to review and understood this
change is considered backward compatibility from suggestions on the
etherpad. Although it's LGTM, I'd like to ask if Yuta has any comment
for the proposal because he has also propose a policy management feature
in this release.
For the deadline, let us discuss again to reschedule it if we cannot
merge the deadline.
Thanks,
Yasufumi
On 2022/12/26 14:07, manpreet kaur wrote:
Hi Ogawa san and Tacker team,
This mailer is regarding the SRBAC implementation happening in Tacker.
In the Tacker release 2023.1 virtual PTG [1], it was decided by the
Tacker community to partially implement the project personas
(project-reader role) in the current release. And in upcoming releases,
we will implement the remaining project-member role.
To address the above requirement, I have prepared a specification [2]
and pushed the same in Gerrit for community review.
Ghanshyam san reviewed the specification and shared TC's opinion and
suggestion to implement both the project-reader and project-member roles.
The complete persona implementation will depreciate the 'owner' rule,
and help in restricting any other role to accessing project-based
resources.
Additionally, intact legacy admin (current admin), works in the same way
so that we do not break things and introduce the project personas which
should be additional things to be available for operators to adopt.
Current Status: Incorporated the new requirement and uploaded a new
patch set to address the review comment.
Note: The Tacker spec freeze date is 28th Dec 2022, there might be some
delay in merging the specification in shared timelines.
[1] https://etherpad.opendev.org/p/tacker-antelope-ptg#L186
<https://etherpad.opendev.org/p/tacker-antelope-ptg#L186>
[2] https://review.opendev.org/c/openstack/tacker-specs/+/866956
<https://review.opendev.org/c/openstack/tacker-specs/+/866956>
Thanks & Regards,
Manpreet Kaur
-------------- next part --------------An HTML attachment was scrubbed...URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221228/cd9d61c5/attachment.htm>------------------------------Subject: Digest Footer_______________________________________________openstack-discuss mailing listopenstack-discuss@lists.openstack.org------------------------------End of openstack-discuss Digest, Vol 50, Issue 63*************************************************