[kolla][cinder][glance][ceph] Corrupt image while downloading from Ceph
Hi community, need your help. *>>> Background <<<* I'm using kolla-ansible 8.0.0 to deploy a 1+3 "Stein" cluster. Ceph is used as backend. The configuration is a bit peculiar. The control runs on a VM hosted in a separate network compared to the one where the baremetal servers hosting the OS Compute services are. On the Compute Hosts, we have the following services: glance_api neutron_metadata_agent neutron_l3_agent neutron_dhcp_agent neutron_openvswitch_agent openvswitch_vswitchd openvswitch_db nova_compute nova_libvirt nova_ssh cinder_backup cinder_volume chrony cron kolla_toolbox fluentd Services APIs and Authentication run on the controller. In a standard "lab configuration" everything works fine. *>>> Fault Scenario <<<* We are trying to verify possible issues (and the way to work around them) in case latency between Controller and Compuite increases. And we have found one quite fast. Basically, if you try to create a volume from a RAW image (stored in Ceph) it will fail.
From glance-api.log on the controller:
From the cinder-volume.log on the computes: : 019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server None, None) 2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/cinder/image/image_utils.py",
2019-12-03 16:00:11.840 27 INFO eventlet.wsgi.server [req-225aae45-ad93-40f5-835d-027f93e3307d 615252134b844dbeb7acc34219e431e6 0049baebd0f742de915b11ec18509803 - default default] Traceback (most recent call last): File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/wsgi.py", line 572, in handle_one_response write(b''.join(towrite)) File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/wsgi.py", line 518, in write wfile.writelines(towrite) File "/usr/lib64/python2.7/socket.py", line 334, in writelines self.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py", line 401, in sendall tail = self.send(data, flags) File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py", line 395, in send return self._send_loop(self.fd.send, data, flags) File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py", line 382, in _send_loop return send_method(data, *args) error: [Errno 104] Connection reset by peer line 410, in fetch 2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server reason=reason) 2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server ImageDownloadFailed: Failed to download image 6e7bb902-917e-4c9e-ba9f-3ee811a2502a, reason: IOError: 32 Corrupt image download. Hash was 88b062103e34c9824d7172afaa9a80befd00e1bef86d16a362572f01bd887a0551c188e98526eecdeedca262d3364175d384352c10d203bdb6a5b87b0593f231 expected adc29d5ce6129337e1e9bf00cc3f0798682c021c6f1a0aab46213438a6de8c6b027180389aa21196e7f708214815221a9a0c6029a96badafefca624bf58e4bff *>>> Troubleshooting <<<* At a first glance it seems a problem related to the size of the image. We have tried with: Cirros Raw (39MB) => It works Ubuntu18 QCOW2 (328MB) => It works Ubuntu18 Raw (2.2GB) => IT FAILS !!!! Any suggestion about where to address our effort? Many thanks in advance BR /Giuseppe
participants (1)
-
Giuseppe Sannino