[Swift][Ussuri] Erasure Coding Quarantines
Hello, I'm not sure if it is related, but I am testing reimaging a node from Ubuntu 18.04 to 20.04 (Cloud Archive Ussuri 2.25.1 to Focal Repo Ussuri 2.25.2) in a 3-node cluster. I have a lot of servers to upgrade, and the environments will be mixed for a while, so I have left this mixed for a week. Now the 18.04 servers are starting to accumulate quarantined EC objects. The 20.04 server has log messages indicating that it is unable to get enough responses from the reconstructor. The 18.04 servers have a couple EC messages. "...liberasurecode[30807]: Invalid fragment header information!" followed by "...object-server: Quarantined object /srv/node/d9/objects-4/10321/b2e/50a3dce515aafc6f222824ec5c7dfb2e/1664002866.33100#8#d.data: Invalid EC metadata at offset 0x0 Is there any thing you can think of that would cause this issue? My steps to reproduce: 1. Create a container in an EC policy 2. Upload object to EC container 3. Wait a bit of time (It looks like the object-server is quarantining but not sure on the trigger) 4. Try to download the file and it will give a 504 Looking at python3-swift dependencies and the package versions I thought may affect this: 20.04 libjerasure2 = 2.0.0+2017.04.10.git.de1739cc84-1 python3-pyeclib = 1.6.0-6build1 python3-xattr = 0.9.6-1.1 18.04 libjerasure2 = 2.0.0+2017.04.10.git.de1739cc84-1 python3-pyeclib = 1.3.1-1ubuntu3 python3-xattr = 0.9.2-0ubuntu1 Thanks! Reid Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal@kaseya.com, and then delete it. To opt-out of receiving emails Please click here<https://info.kaseya.com/email-subscription-center.html>. The term 'this e-mail' includes any and all attachments.
On Tue, 27 Sep 2022 13:35:47 +0000 Reid Guyett <rguyett@datto.com> wrote:
The 20.04 server has log messages indicating that it is unable to get enough responses from the reconstructor.
The 18.04 servers have a couple EC messages. "...liberasurecode[30807]: Invalid fragment header information!" followed by "...object-server: Quarantined object /srv/node/d9/objects-4/10321/b2e/50a3dce515aafc6f222824ec5c7dfb2e/1664002866.33100#8#d.data: Invalid EC metadata at offset 0x0
Ouch, tbat is unfortunate. Unfortunately, I'm not familiar with the exact details of this. There was a window where depending on how linker worked, our code could get linked with an incorrect zlib crc routine randomly. I'll try to raise someone on IRC about this. -- Pete
On Fri, Sep 30, 2022 at 4:56 PM Pete Zaitcev <zaitcev@redhat.com> wrote:
Unfortunately, I'm not familiar with the exact details of this. There was a window where depending on how linker worked, our code could get linked with an incorrect zlib crc routine randomly.
# When upgrading from liberasurecode<=1.5.0, you may want to continue writing # legacy CRCs until all nodes are upgraded and capabale of reading fragments # with zlib CRCs. liberasurecode>=1.6.2 checks for the environment variable # LIBERASURECODE_WRITE_LEGACY_CRC; if set (value doesn't matter), it will use # its legacy CRC. Set this option to true or false to ensure the environment # variable is or is not set. Leave the option blank or absent to not touch # the environment (default). For more information, see # https://bugs.launchpad.net/liberasurecode/+bug/1886088 # write_legacy_ec_crc = https://github.com/NVIDIA/swift/blob/master/etc/proxy-server.conf-sample#L32... set it in your object-server [DEFAULT] confs too -- Clay Gerrard
Hi, Thanks for the follow-up. I was able to find this cause in the IRC channel. I ultimately upgraded the other nodes to 20.04 in our test clusters and moved the quarantined objects back to where they belonged. From there the files were downloadable again. We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. It is hard to find the information about the bug pre-upgrade. I didn't see it in the release notes for 2.25.2 (well they don't exist) and I don't see anything about it in the main Ubuntu Release notes. This is why we have testing environments. Reid ________________________________ From: Clay Gerrard <clay.gerrard@gmail.com> Sent: Sunday, October 2, 2022 17:41 To: Pete Zaitcev <zaitcev@redhat.com> Cc: Reid Guyett <rguyett@datto.com>; openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>; Matthew Grinnell <mgrinnell@datto.com> Subject: Re: [Swift][Ussuri] Erasure Coding Quarantines KASEYA Warning: Sender @clay.gerrard@gmail.com is not yet trusted by your organization. Please be careful before replying or clicking on the URLs. Report Phishing<https://cloud.graph.us/feedback?msgId=PENBK19KS3pyYjdEeUVweTE3c01UcUxqbVhMckJoSjRhV0NuUVk1TERzVzRYV0V6WFdiUUBtYWlsLmdtYWlsLmNvbT4%3D&orgDomain=a2FzZXlhLmNvbQ%3D%3D&opt=unsafe> Mark as Safe<https://cloud.graph.us/feedback?msgId=PENBK19KS3pyYjdEeUVweTE3c01UcUxqbVhMckJoSjRhV0NuUVk1TERzVzRYV0V6WFdiUUBtYWlsLmdtYWlsLmNvbT4%3D&orgDomain=a2FzZXlhLmNvbQ%3D%3D&opt=safe> powered by Graphus® [EXTERNAL] On Fri, Sep 30, 2022 at 4:56 PM Pete Zaitcev < zaitcev@redhat.com<mailto:zaitcev@redhat.com>> wrote: Unfortunately, I'm not familiar with the exact details of this. There was a window where depending on how linker worked, our code could get linked with an incorrect zlib crc routine randomly. # When upgrading from liberasurecode<=1.5.0, you may want to continue writing # legacy CRCs until all nodes are upgraded and capabale of reading fragments # with zlib CRCs. liberasurecode>=1.6.2 checks for the environment variable # LIBERASURECODE_WRITE_LEGACY_CRC; if set (value doesn't matter), it will use # its legacy CRC. Set this option to true or false to ensure the environment # variable is or is not set. Leave the option blank or absent to not touch # the environment (default). For more information, see # https://bugs.launchpad.net/liberasurecode/+bug/1886088 # write_legacy_ec_crc = https://github.com/NVIDIA/swift/blob/master/etc/proxy-server.conf-sample#L32... set it in your object-server [DEFAULT] confs too -- Clay Gerrard Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal@kaseya.com, and then delete it. To opt-out of receiving emails Please click here<https://info.kaseya.com/email-subscription-center.html>. The term 'this e-mail' includes any and all attachments.
On Mon, Oct 3, 2022 at 3:37 PM Reid Guyett <rguyett@datto.com> wrote:
Thanks for the follow-up. [...] From there the files were downloadable again.
Nice work!
We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded.
I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade.
This is why we have testing environments.
This is why *competent* deployers and operators have testing environments - and it's the only thing that makes the terrible terrible reality of building and releasing software actually a net good. Couldn't do it without you; go FOSS! -- Clay Gerrard
I'm not sure how to create a double quote in Outlook web app... We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade. In the IRC response, the env var only works in 1.6.2 but 20.04 ships with 1.6.1. The application setting you mentioned is in in Swift 2.27 and we are still in Ussuri (2.25.2) but still requires the compatible liberasurecode1 package. I'm not sure how to go about requesting this version to be available in the Focal repos. It seems like it should belong there since upgrading from 18.04 to 20.04 is a contributor to this problem. ________________________________ From: Clay Gerrard <clay.gerrard@gmail.com> Sent: Tuesday, October 4, 2022 09:28 To: Reid Guyett <rguyett@datto.com> Cc: Pete Zaitcev <zaitcev@redhat.com>; openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>; Matthew Grinnell <mgrinnell@datto.com> Subject: Re: [Swift][Ussuri] Erasure Coding Quarantines KASEYA Warning: Sender @clay.gerrard@gmail.com is not yet trusted by your organization. Please be careful before replying. Report Phishing<https://cloud.graph.us/feedback?msgId=PENBK19KS3pyV290Z1N1a3AyZ2E2MmFRd21EPUVXU0ZFSHpMVz1yN251LXh5QUJ1NzdNZ0BtYWlsLmdtYWlsLmNvbT4%3D&orgDomain=a2FzZXlhLmNvbQ%3D%3D&opt=unsafe> Mark as Safe<https://cloud.graph.us/feedback?msgId=PENBK19KS3pyV290Z1N1a3AyZ2E2MmFRd21EPUVXU0ZFSHpMVz1yN251LXh5QUJ1NzdNZ0BtYWlsLmdtYWlsLmNvbT4%3D&orgDomain=a2FzZXlhLmNvbQ%3D%3D&opt=safe> powered by Graphus® [EXTERNAL] On Mon, Oct 3, 2022 at 3:37 PM Reid Guyett < rguyett@datto.com<mailto:rguyett@datto.com>> wrote: Thanks for the follow-up. [...] From there the files were downloadable again. Nice work! We are going to try to create a new liberasurecode package 1.6.2 for 20.04 so we can set the environment variable to write legacy CRC headers until all the nodes in the cluster can be upgraded. I'm not sure if you need a new package, I think you have to set the env at runtime - but there's also a swift config option that will force the env to get set that you can turn off after full upgrade. This is why we have testing environments. This is why *competent* deployers and operators have testing environments - and it's the only thing that makes the terrible terrible reality of building and releasing software actually a net good. Couldn't do it without you; go FOSS! -- Clay Gerrard Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal@kaseya.com, and then delete it. To opt-out of receiving emails Please click here<https://info.kaseya.com/email-subscription-center.html>. The term 'this e-mail' includes any and all attachments.
On Wed, Oct 5, 2022 at 7:58 AM Reid Guyett <rguyett@datto.com> wrote:
the env var only works in 1.6.2 but 20.04 ships with 1.6.1.
Oh shoot, yeah I have no idea what version is packaged downstream. Maybe we can get Thomas to backport the jammy package https://packages.ubuntu.com/jammy/liberasurecode1 to focal https://packages.ubuntu.com/focal/liberasurecode1 -- Clay Gerrard
participants (3)
-
Clay Gerrard
-
Pete Zaitcev
-
Reid Guyett