I haven't used iscsi as a backend yet, but for HDDs the speed looks
relatable, on a system with HDD ceph backend the volume creation of a
volume (image is 2 GB) takes about 40 seconds, as you yee the download
is quite slow, the conversion is a little faster:
Image download 541.00 MB at 28.14 MB/s
Converted 2252.00 MB image at 172.08 MB/s
With a factor of 10 (20 GB) I would probably end up with similar
creation times. Just for comparison, this is almost the same image
(also 2 GB) in a different ceph cluster where I mounted the cinder
conversion path from cephfs, SSD pool:
Image download 555.12 MB at 41.34 MB/s
Converted 2252.00 MB image at 769.17 MB/s
This volume was created within 20 seconds. You might also want to
tweak these options:
block_device_allocate_retries = 300
block_device_allocate_retries_interval = 10
These are the defaults:
block_device_allocate_retries = 60
block_device_allocate_retries_interval = 3
This would fit your error message:
> Volume be0f28eb-1045-4687-8bdb-5a6d385be6fa did not finish being
> created even after we waited 187 seconds or 61 attempts. And its
> status is downloading.
It tried 60 times with a 3 second interval, apparently that's not
enough. Can you see any bottlenecks in the network or disk utilization
which would slow down the download?
Zitat von Franck VEDEL <franck.vedel(a)univ-grenoble-alpes.fr>:
> Hi Eugen,
> thanks for your help
> We have 3 servers (s1, s2 , s3) and an iscsi bay attached on s3.
> Multinode:
> [control]
> s1
> s2
>
> [compute]
> s1
> s2
> s3
>
> [storage]
> s3
>
> on s1: more /etc/kolla/globals.yml
> ...
> enable_cinder: "yes"
> enable_cinder_backend_iscsi: "yes"
> enable_cinder_backend_lvm: « yes"
> enable_iscsid: « yes"
> cinder_volume_group: "cinder-volumes »
> ...
> enable_glance_image_cache: "yes"
> glance_cache_max_size: "21474836480"
> glance_file_datadir_volume: « /images/«
> ...
>
> on s3: /images is on the iscsi bay
> mount |grep images
> /dev/mapper/VG--IMAGES-LV--IMAGES on /images type xfs
> (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=1024,noquota)
>
> lsblk
> sdf
> 8:80 0 500G 0 disk
> └─mpathc
> 253:3 0 500G 0 mpath
> └─mpathc1
> 253:4 0 500G 0 part
> └─VG--IMAGES-LV--IMAGES
> 253:5 0 500G 0 lvm /images
>
>
> ls -l /images:
> drwxr-x---. 5 42415 42415 4096 6 févr. 18:40 image-cache
> drwxr-x---. 2 42415 42415 4096 4 févr. 15:16 images
> drwxr-x---. 2 42415 42415 6 22 nov. 12:03 staging
> drwxr-x---. 2 42415 42415 6 22 nov. 12:03 tasks_work_dir
>
> ls -l /images/image-cache
> total 71646760
> -rw-r-----. 1 42415 42415 360841216 2 déc. 11:52
> 3e3aada8-7610-4c55-b116-a12db68f8ea4
> -rw-r-----. 1 42415 42415 237436928 28 nov. 16:56
> 6419642b-fcbd-4e5d-9c77-46a48d2af93f
> -rw-r-----. 1 42415 42415 10975379456 26 nov. 14:59
> 7490e914-8001-4d56-baea-fabf80f425e1
> -rw-r-----. 1 42415 42415 21474836480 22 nov. 16:46
> 7fc7f9a6-ab0e-45cf-9c29-7e59f6aa68a5
> -rw-r-----. 1 42415 42415 2694512640 15 déc. 18:07
> 890fd2e8-2fac-42c6-956b-6b10f2253a56
> -rw-r-----. 1 42415 42415 12048400384 1 déc. 17:04
> 9a235763-ff0c-40fd-9a8d-7cdca3d3e9ce
> -rw-r-----. 1 42415 42415 5949227008 15 déc. 20:41
> 9cbba37b-1de1-482a-87f2-631d2143cd46
> -rw-r-----. 1 42415 42415 566994944 6 déc. 12:32
> b6e29dd9-a66d-4569-a222-6fc0bd9b1b11
> -rw-r-----. 1 42415 42415 578748416 2 déc. 11:24
> c40953ee-4b39-43a5-8f6c-b48a046c38e9
> -rw-r-----. 1 42415 42415 16300544 27 janv. 12:19
> c88630c7-a7c6-44ff-bfa0-e5af4b1720e3
> -rw-r-----. 1 42415 42415 12288 6 févr. 18:40 cache.db
> -rw-r-----. 1 42415 42415 12324503552 1 déc. 07:50
> e0d4fddd-5aa7-4177-a1d6-e6b4c56f12e8
> -rw-r-----. 1 42415 42415 6139084800 22 nov. 15:05
> eda93204-9846-4216-a6e8-c29977fdcf2f
> -rw-r-----. 1 42415 42415 0 22 nov. 12:03 image_cache_db_init
> drwxr-x---. 2 42415 42415 6 27 janv. 12:19 incomplete
> drwxr-x---. 2 42415 42415 6 22 nov. 12:03 invalid
> drwxr-x---. 2 42415 42415 6 22 nov. 12:03 queue
>
> on s1
> openstack image list
> +--------------------------------------+-----------------------------+--------+
> | ID | Name
> | Status |
> +--------------------------------------+-----------------------------+--------+
> …..
> | 7fc7f9a6-ab0e-45cf-9c29-7e59f6aa68a5 | rocky8.4
> | active |
> ….
> | 7490e914-8001-4d56-baea-fabf80f425e1 | win10_2104
> | active |
> ….
> +———————————————————+-----------------------------+————+
>
>
> openstack image show 7fc7f9a6-ab0e-45cf-9c29-7e59f6aa68a5
> disk_format | raw
>
> when I try to add an instance from this image (2G RAM, 40G HDD):
> [Error : Build of instance baa06bef-9628-407f-8bae-500ef7bce065
> aborted: Volume be0f28eb-1045-4687-8bdb-5a6d385be6fa did not finish
> being created even after we waited 187 seconds or 61 attempts. And
> its status is downloading.
>
> it’s impossible. I need to add the volume from image first, and
> after add instance from volume.
>
> Is it normal ?
>
>
> Franck
>
>> Le 7 févr. 2022 à 10:55, Eugen Block <eblock(a)nde.ag> a écrit :
>>
>> Hi Franck,
>>
>> although it's a different topic than your original question I
>> wanted to comment on the volume creation time (maybe a new thread
>> would make sense). What is your storage back end? If it is ceph,
>> are your images in raw format? Otherwise cinder has to download the
>> image from glance (to /var/lib/cinder/conversion) and convert it,
>> then upload it back to ceph. It's similar with nova, nova stores
>> base images in /var/lib/nova/instances/_base to prevent the compute
>> nodes from downloading it every time. This may save some time for
>> the download, but the upload has to happen anyway. And if you don't
>> use shared storage for nova (e.g. for live-migration) you may
>> encounter that some compute nodes are quicker creating an instance
>> because they only have to upload, others will first have to
>> download, convert and then upload it.
>>
>> You would see the conversion in the logs of cinder:
>>
>> INFO cinder.image.image_utils
>> [req-f2062570-4006-464b-a1f5-d0d5ac34670d
>> d71f59600f1c40c394022738d4864915 31b9b4900a4d4bdaabaf263d0b4021be -
>> - -] Converted 2252.00 MB image at 757.52 MB/s
>>
>> Hope this helps.
>>
>> Eugen
>>
>>
>> Zitat von Franck VEDEL <franck.vedel(a)univ-grenoble-alpes.fr>:
>>
>>> Sunday morning: my openstack works…. OUF.
>>> the "kolla-ansible -i multimode mariadb_recovery" command (which
>>> is magic anyway) fixed the problem and then the mariadb and nova
>>> containers started.
>>> Once solved the problems between my serv3 and the iscsi bay,
>>> restart the container glance, everything seems to work.
>>>
>>>> 4 minutes to create a 20GB empty volume seems too long to me. For
>>>> an actual 20GB image, it's going to depend on the speed of the
>>>> backing storage tech.
>>> 4 minutes is for a volume from an image. I will see this problem
>>> next summer , I will retry to change the MTU value.
>>>
>>> Thanks a lot, really
>>>
>>>
>>> Franck
>>>
>>>> Le 5 févr. 2022 à 17:08, Laurent Dumont
>>>> <laurentfdumont(a)gmail.com> a écrit :
>>>>
>>>> Any chance to revert back switches + server? That would indicate
>>>> that MTU was the issue.
>>>> Dont ping the iscsi bay, ping between the controllers in
>>>> Openstack, they are the ones running mariadb/galera.
>>>> Since the icmp packets are small, it might not trigger the MTU
>>>> issues. Can you try something like "ping -s 8972 -M do -c 4
>>>> $mariadb_host_2" from $ mariadb_host_1?
>>>> What is your network setup on the servers? Two ports in a bond?
>>>> Did you change both physical interface MTU + bond interface itself?
>>>>
>>>> 4 minutes to create a 20GB empty volume seems too long to me. For
>>>> an actual 20GB image, it's going to depend on the speed of the
>>>> backing storage tech.
>>>>
>>>> On Sat, Feb 5, 2022 at 1:51 AM Franck VEDEL
>>>> <franck.vedel(a)univ-grenoble-alpes.fr
>>>> <mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote:
>>>> Thanks for your help.
>>>>
>>>>
>>>>> What was the starting value for MTU?
>>>> 1500
>>>>> What was the starting value changed to for MTU?
>>>> 9000
>>>>> Can ping between all your controllers?
>>>> yes, all container starts except nova-conductor, nova-scheduler, maraidb
>>>>
>>>>
>>>>> Do you just have two controllers running mariadb?
>>>> yes
>>>>> How did you change MTU?
>>>>
>>>> On the 3 servers:
>>>> nmcli connection modify team0-port1 802-3-ethernet.mtu 9000
>>>> nmcli connection modify team1-port2 802-3-ethernet.mtu 9000
>>>> nmcli connection modify type team0 team.runner lack ethernet.mtu 9000
>>>> nmcli con down team0
>>>> nmcli con down team1
>>>>
>>>>
>>>>> Was the change reverted at the network level as well (switches
>>>>> need to be configured higher or at the same MTU value then the
>>>>> servers)
>>>> I didn’t change Mtu on network (switches) , but ping -s
>>>> 10.0.5.117 (iscsi bay) was working from serv3.
>>>>
>>>> I changed the value of the mtu because the creation of the
>>>> volumes takes a lot of time I find (4 minutes for 20G, which is
>>>> too long for what I want to do, the patience of the students
>>>> decreases with the years)
>>>>
>>>> Franck
>>>>
>>>>> Le 4 févr. 2022 à 23:12, Laurent Dumont
>>>>> <laurentfdumont(a)gmail.com <mailto:laurentfdumont@gmail.com>> a
>>>>> écrit :
>>>>>
>>>>> What was the starting value for MTU?
>>>>> What was the starting value changed to for MTU?
>>>>> Can ping between all your controllers?
>>>>> Do you just have two controllers running mariadb?
>>>>> How did you change MTU?
>>>>> Was the change reverted at the network level as well (switches
>>>>> need to be configured higher or at the same MTU value then the
>>>>> servers)
>>>>> 4567 seems to be the port for galera (clustering for mariadb) <>
>>>>> On Fri, Feb 4, 2022 at 11:52 AM Franck VEDEL
>>>>> <franck.vedel(a)univ-grenoble-alpes.fr
>>>>> <mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote:
>>>>> Hello,
>>>>> I am in an emergency situation, quite catastrophic situation
>>>>> because I do not know what to do.
>>>>>
>>>>> I have an Openstack cluster with 3 servers (serv1, serv2,
>>>>> serv3). He was doing so well…
>>>>>
>>>>>
>>>>> A network admin came to me and told me to change an MTU on the
>>>>> cards. I knew it shouldn't be done...I shouldn't have done it.
>>>>> I did it.
>>>>> Of course, it didn't work as expected. I went back to my
>>>>> starting configuration and there I have a big problem with
>>>>> mariadb which is set up on serv1 and serv2.
>>>>>
>>>>> Here are my errors:
>>>>>
>>>>>
>>>>> 2022-02-04 17:40:36 0 [ERROR] WSREP: failed to open gcomm
>>>>> backend connection: 110: failed to reach primary view: 110
>>>>> (Connection timed out)
>>>>> at gcomm/src/pc.cpp:connect():160
>>>>> 2022-02-04 17:40:36 0 [ERROR] WSREP:
>>>>> gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend
>>>>> connection: -110 (Connection timed out)
>>>>> 2022-02-04 17:40:36 0 [ERROR] WSREP:
>>>>> gcs/src/gcs.cpp:gcs_open():1475: Failed to open channel
>>>>> 'openstack' at 'gcomm://10.0.5.109:4567,10.0.5.110:4567': <>
>>>>> -110 (Connection timed out)
>>>>> 2022-02-04 17:40:36 0 [ERROR] WSREP: gcs connect failed:
>>>>> Connection timed out
>>>>> 2022-02-04 17:40:36 0 [ERROR] WSREP:
>>>>> wsrep::connect(gcomm://10.0.5.109:4567,10.0.5.110:4567 <>)
>>>>> failed: 7
>>>>> 2022-02-04 17:40:36 0 [ERROR] Aborting
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I do not know what to do. My installation is done with
>>>>> kolla-ansible, mariadb docker restarts every 30 seconds.
>>>>>
>>>>> Can the "kolla-ansible reconfigure mariadb" command be a solution?
>>>>> Could the command "kolla-ansible mariadb recovery" be a solution?
>>>>>
>>>>> Thanks in advance if you can help me.
>>>>>
>>>>>
>>>>>
>>>>> Franck
>>>>>
>>>>>
>>>>
>>
>>
>>
>>