[kolla][cinder][glance][ceph] Corrupt image while downloading from Ceph

Giuseppe Sannino km.giuseppesannino at gmail.com
Tue Dec 3 16:24:10 UTC 2019


Hi community,
need your help.

*>>> Background <<<*
I'm using kolla-ansible 8.0.0 to deploy a 1+3 "Stein" cluster.
Ceph is used as backend.

The configuration is a bit peculiar. The control runs on a VM hosted in a
separate network compared to the one where the baremetal servers hosting
the OS Compute services are.

On the Compute Hosts, we have the following services:
glance_api
neutron_metadata_agent
neutron_l3_agent
neutron_dhcp_agent
neutron_openvswitch_agent
openvswitch_vswitchd
openvswitch_db
nova_compute
nova_libvirt
nova_ssh
cinder_backup
cinder_volume
chrony
cron
kolla_toolbox
fluentd


Services APIs and Authentication run on the controller.

In a standard "lab configuration" everything works fine.

*>>> Fault Scenario <<<*
We are trying to verify possible issues (and the way to work around them)
in case latency between Controller and Compuite increases.
And we have found one quite fast.

Basically, if you try to create a volume from a RAW image (stored in Ceph)
it will fail.

>From glance-api.log on the controller:

2019-12-03 16:00:11.840 27 INFO eventlet.wsgi.server
[req-225aae45-ad93-40f5-835d-027f93e3307d 615252134b844dbeb7acc34219e431e6
0049baebd0f742de915b11ec18509803 - default default] Traceback (most recent
call last):
  File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/wsgi.py",
line 572, in handle_one_response
    write(b''.join(towrite))
  File "/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/wsgi.py",
line 518, in write
    wfile.writelines(towrite)
  File "/usr/lib64/python2.7/socket.py", line 334, in writelines
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py",
line 401, in sendall
    tail = self.send(data, flags)
  File
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py",
line 395, in send
    return self._send_loop(self.fd.send, data, flags)
  File
"/var/lib/kolla/venv/lib/python2.7/site-packages/eventlet/greenio/base.py",
line 382, in _send_loop
    return send_method(data, *args)
error: [Errno 104] Connection reset by peer



>From the cinder-volume.log on the computes:
:
019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server     None, None)
2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server   File
"/var/lib/kolla/venv/lib/python2.7/site-packages/cinder/image/image_utils.py",
line 410, in fetch
2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server
reason=reason)
2019-12-03 16:00:15.932 34 ERROR oslo_messaging.rpc.server
ImageDownloadFailed: Failed to download image
6e7bb902-917e-4c9e-ba9f-3ee811a2502a, reason: IOError: 32 Corrupt image
download. Hash was
88b062103e34c9824d7172afaa9a80befd00e1bef86d16a362572f01bd887a0551c188e98526eecdeedca262d3364175d384352c10d203bdb6a5b87b0593f231
expected
adc29d5ce6129337e1e9bf00cc3f0798682c021c6f1a0aab46213438a6de8c6b027180389aa21196e7f708214815221a9a0c6029a96badafefca624bf58e4bff


*>>> Troubleshooting <<<*
At a first glance it seems a problem related to the size of the image.
We have tried with:
Cirros Raw (39MB) => It works
Ubuntu18 QCOW2 (328MB) => It works
Ubuntu18 Raw (2.2GB) => IT FAILS !!!!



Any suggestion about where to address our effort?


Many thanks in advance

BR
/Giuseppe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191203/b0d5d812/attachment.html>


More information about the openstack-discuss mailing list