[swift] EC data left in old location after rebalance
From one of the nodes that does not belong to the list above. This
Hello, We started using EC policies in a new cluster a few months ago and added more capacity. During the rebalance (started June 30), it seems that all the data was copied to the new locations but it didn't clean up the old locations. This was identified through our handoff monitoring. OS: Ubuntu 18.04 Swift: 2.17.1 Example: List of devices for partition ~$ swift-get-nodes /etc/swift/object-4.ring.gz -p 14242 ... removed ... Server:Port Device x.x.x.31:6031 d31 Server:Port Device x.x.x.66:6030 d30 Server:Port Device x.x.x.25:6029 d29 Server:Port Device x.x.x.33:6027 d27 Server:Port Device x.x.x.36:6020 d20 Server:Port Device x.x.x.29:6018 d18 Server:Port Device x.x.x.21:6033 d33 Server:Port Device x.x.x.27:6025 d25 Server:Port Device x.x.x.35:6022 d22 Server:Port Device x.x.x.39:6031 d31 Server:Port Device x.x.x.28:6032 d32 Server:Port Device x.x.x.23:6021 d21 Server:Port Device x.x.x.26:6022 d22 Server:Port Device x.x.x.34:6023 d23 Server:Port Device x.x.x.37:6019 d19 Server:Port Device x.x.x.30:6017 d17 Server:Port Device x.x.x.22:6027 d27 Server:Port Device x.x.x.24:6031 d31 Server:Port Device x.x.x.32:6032 d32 Partitions look to have the correct data on them: ~$ ssh root@x.x.x.31 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.66 "ls -lah ${DEVICE:-/srv/node*}/d30/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.25 "ls -lah ${DEVICE:-/srv/node*}/d29/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.33 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.36 "ls -lah ${DEVICE:-/srv/node*}/d20/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.29 "ls -lah ${DEVICE:-/srv/node*}/d18/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.21 "ls -lah ${DEVICE:-/srv/node*}/d33/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.27 "ls -lah ${DEVICE:-/srv/node*}/d25/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.35 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.39 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.28 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.23 "ls -lah ${DEVICE:-/srv/node*}/d21/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.26 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.34 "ls -lah ${DEVICE:-/srv/node*}/d23/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.37 "ls -lah ${DEVICE:-/srv/node*}/d19/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.30 "ls -lah ${DEVICE:-/srv/node*}/d17/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.22 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.24 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.32 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 partition should not exist on this node after the rebalance. x.x.x.20:~# ls /srv/node/d28/objects-4/14242 | wc -l 627 The reconstructor is throwing a lot of these unexpected response errors in the logs. Manually running it from the node that should not have the partition, I can reproduce the error. x.x.y.0/24 is the replication network. x.x.x.20:~# swift-object-reconstructor /etc/swift/object-server.conf -d d28 -p 14242 -o -v object-reconstructor: x.x.y.42:6200/d30/14242 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 36 failures to 0 successes'" There looked to have been some partition locations cleaned up around July 11. Our expectation is that the old partition locations should be cleaned gradually since June 30 but we're not seeing that. I Was hoping for some ideas on what may be the problem (if any) and how we can make sure that the old partitions are cleaned up. Thanks, Reid
You try enabling handoffs_first for a reconstruction cycle to 2, as this will prioritise those on handoffs. But make sure you turn it off as it'll stop normal reconstruction from happening. I am working on some code to build in better old primary handoff usage in the reconstructor but that code hasn't landed yet, and not sure when it will. Regards, Matt On Tue, Jul 20, 2021 at 11:54 PM Reid Guyett <rguyett@datto.com> wrote:
Hello,
We started using EC policies in a new cluster a few months ago and added more capacity. During the rebalance (started June 30), it seems that all the data was copied to the new locations but it didn't clean up the old locations. This was identified through our handoff monitoring.
OS: Ubuntu 18.04 Swift: 2.17.1
Example: List of devices for partition
~$ swift-get-nodes /etc/swift/object-4.ring.gz -p 14242 ... removed ... Server:Port Device x.x.x.31:6031 d31 Server:Port Device x.x.x.66:6030 d30 Server:Port Device x.x.x.25:6029 d29 Server:Port Device x.x.x.33:6027 d27 Server:Port Device x.x.x.36:6020 d20 Server:Port Device x.x.x.29:6018 d18 Server:Port Device x.x.x.21:6033 d33 Server:Port Device x.x.x.27:6025 d25 Server:Port Device x.x.x.35:6022 d22 Server:Port Device x.x.x.39:6031 d31 Server:Port Device x.x.x.28:6032 d32 Server:Port Device x.x.x.23:6021 d21 Server:Port Device x.x.x.26:6022 d22 Server:Port Device x.x.x.34:6023 d23 Server:Port Device x.x.x.37:6019 d19 Server:Port Device x.x.x.30:6017 d17 Server:Port Device x.x.x.22:6027 d27 Server:Port Device x.x.x.24:6031 d31 Server:Port Device x.x.x.32:6032 d32
Partitions look to have the correct data on them:
~$ ssh root@x.x.x.31 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.66 "ls -lah ${DEVICE:-/srv/node*}/d30/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.25 "ls -lah ${DEVICE:-/srv/node*}/d29/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.33 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.36 "ls -lah ${DEVICE:-/srv/node*}/d20/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.29 "ls -lah ${DEVICE:-/srv/node*}/d18/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.21 "ls -lah ${DEVICE:-/srv/node*}/d33/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.27 "ls -lah ${DEVICE:-/srv/node*}/d25/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.35 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.39 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.28 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.23 "ls -lah ${DEVICE:-/srv/node*}/d21/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.26 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.34 "ls -lah ${DEVICE:-/srv/node*}/d23/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.37 "ls -lah ${DEVICE:-/srv/node*}/d19/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.30 "ls -lah ${DEVICE:-/srv/node*}/d17/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.22 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.24 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.32 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664
From one of the nodes that does not belong to the list above. This partition should not exist on this node after the rebalance.
x.x.x.20:~# ls /srv/node/d28/objects-4/14242 | wc -l 627
The reconstructor is throwing a lot of these unexpected response errors in the logs. Manually running it from the node that should not have the partition, I can reproduce the error. x.x.y.0/24 is the replication network.
x.x.x.20:~# swift-object-reconstructor /etc/swift/object-server.conf -d d28 -p 14242 -o -v object-reconstructor: x.x.y.42:6200/d30/14242 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 36 failures to 0 successes'"
There looked to have been some partition locations cleaned up around July 11. Our expectation is that the old partition locations should be cleaned gradually since June 30 but we're not seeing that. I Was hoping for some ideas on what may be the problem (if any) and how we can make sure that the old partitions are cleaned up.
Thanks, Reid
handoffs_first for a reconstruction cycle to 2 Does this mean to set handoffs_first/handoffs_only = true and run the reconstructor twice in with `-o`?
Thanks! Reid On Sun, Jul 25, 2021 at 9:02 PM Matthew Oliver <matt@oliver.net.au> wrote:
You try enabling handoffs_first for a reconstruction cycle to 2, as this will prioritise those on handoffs. But make sure you turn it off as it'll stop normal reconstruction from happening.
I am working on some code to build in better old primary handoff usage in the reconstructor but that code hasn't landed yet, and not sure when it will.
Regards, Matt
On Tue, Jul 20, 2021 at 11:54 PM Reid Guyett <rguyett@datto.com> wrote:
Hello,
We started using EC policies in a new cluster a few months ago and added more capacity. During the rebalance (started June 30), it seems that all the data was copied to the new locations but it didn't clean up the old locations. This was identified through our handoff monitoring.
OS: Ubuntu 18.04 Swift: 2.17.1
Example: List of devices for partition
~$ swift-get-nodes /etc/swift/object-4.ring.gz -p 14242 ... removed ... Server:Port Device x.x.x.31:6031 d31 Server:Port Device x.x.x.66:6030 d30 Server:Port Device x.x.x.25:6029 d29 Server:Port Device x.x.x.33:6027 d27 Server:Port Device x.x.x.36:6020 d20 Server:Port Device x.x.x.29:6018 d18 Server:Port Device x.x.x.21:6033 d33 Server:Port Device x.x.x.27:6025 d25 Server:Port Device x.x.x.35:6022 d22 Server:Port Device x.x.x.39:6031 d31 Server:Port Device x.x.x.28:6032 d32 Server:Port Device x.x.x.23:6021 d21 Server:Port Device x.x.x.26:6022 d22 Server:Port Device x.x.x.34:6023 d23 Server:Port Device x.x.x.37:6019 d19 Server:Port Device x.x.x.30:6017 d17 Server:Port Device x.x.x.22:6027 d27 Server:Port Device x.x.x.24:6031 d31 Server:Port Device x.x.x.32:6032 d32
Partitions look to have the correct data on them:
~$ ssh root@x.x.x.31 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.66 "ls -lah ${DEVICE:-/srv/node*}/d30/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.25 "ls -lah ${DEVICE:-/srv/node*}/d29/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.33 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.36 "ls -lah ${DEVICE:-/srv/node*}/d20/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.29 "ls -lah ${DEVICE:-/srv/node*}/d18/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.21 "ls -lah ${DEVICE:-/srv/node*}/d33/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.27 "ls -lah ${DEVICE:-/srv/node*}/d25/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.35 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.39 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.28 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.23 "ls -lah ${DEVICE:-/srv/node*}/d21/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.26 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.34 "ls -lah ${DEVICE:-/srv/node*}/d23/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.37 "ls -lah ${DEVICE:-/srv/node*}/d19/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.30 "ls -lah ${DEVICE:-/srv/node*}/d17/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.22 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.24 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.32 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664
From one of the nodes that does not belong to the list above. This partition should not exist on this node after the rebalance.
x.x.x.20:~# ls /srv/node/d28/objects-4/14242 | wc -l 627
The reconstructor is throwing a lot of these unexpected response errors in the logs. Manually running it from the node that should not have the partition, I can reproduce the error. x.x.y.0/24 is the replication network.
x.x.x.20:~# swift-object-reconstructor /etc/swift/object-server.conf -d d28 -p 14242 -o -v object-reconstructor: x.x.y.42:6200/d30/14242 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 36 failures to 0 successes'"
There looked to have been some partition locations cleaned up around July 11. Our expectation is that the old partition locations should be cleaned gradually since June 30 but we're not seeing that. I Was hoping for some ideas on what may be the problem (if any) and how we can make sure that the old partitions are cleaned up.
Thanks, Reid
Yeah running it "once" would be a single cycle. Or you can watch logs, there should be something like: "Object reconstruction complete. (x minutes)" Usually you can also watch swift-recon and wait until the oldest completion timestamp was after the date you restarted the reconstructors, But it seems the reconstructor still isn't plumbed through reconstruction. I do have an old patch for that ( https://review.opendev.org/c/openstack/swift/+/541141 opps 3 years old), I'll just dust it off and shepherd it through so it'll work upstream. Matt On Tue, Jul 27, 2021 at 6:29 AM Reid Guyett <rguyett@datto.com> wrote:
handoffs_first for a reconstruction cycle to 2 Does this mean to set handoffs_first/handoffs_only = true and run the reconstructor twice in with `-o`?
Thanks!
Reid
On Sun, Jul 25, 2021 at 9:02 PM Matthew Oliver <matt@oliver.net.au> wrote:
You try enabling handoffs_first for a reconstruction cycle to 2, as this
will prioritise those on handoffs. But make sure you turn it off as it'll stop normal reconstruction from happening.
I am working on some code to build in better old primary handoff usage
in the reconstructor but that code hasn't landed yet, and not sure when it will.
Regards, Matt
On Tue, Jul 20, 2021 at 11:54 PM Reid Guyett <rguyett@datto.com> wrote:
Hello,
We started using EC policies in a new cluster a few months ago and added more capacity. During the rebalance (started June 30), it seems that all the data was copied to the new locations but it didn't clean up the old locations. This was identified through our handoff monitoring.
OS: Ubuntu 18.04 Swift: 2.17.1
Example: List of devices for partition
~$ swift-get-nodes /etc/swift/object-4.ring.gz -p 14242 ... removed ... Server:Port Device x.x.x.31:6031 d31 Server:Port Device x.x.x.66:6030 d30 Server:Port Device x.x.x.25:6029 d29 Server:Port Device x.x.x.33:6027 d27 Server:Port Device x.x.x.36:6020 d20 Server:Port Device x.x.x.29:6018 d18 Server:Port Device x.x.x.21:6033 d33 Server:Port Device x.x.x.27:6025 d25 Server:Port Device x.x.x.35:6022 d22 Server:Port Device x.x.x.39:6031 d31 Server:Port Device x.x.x.28:6032 d32 Server:Port Device x.x.x.23:6021 d21 Server:Port Device x.x.x.26:6022 d22 Server:Port Device x.x.x.34:6023 d23 Server:Port Device x.x.x.37:6019 d19 Server:Port Device x.x.x.30:6017 d17 Server:Port Device x.x.x.22:6027 d27 Server:Port Device x.x.x.24:6031 d31 Server:Port Device x.x.x.32:6032 d32
Partitions look to have the correct data on them:
~$ ssh root@x.x.x.31 "ls -lah
${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l"
664 ~$ ssh root@x.x.x.66 "ls -lah ${DEVICE:-/srv/node*}/d30/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.25 "ls -lah ${DEVICE:-/srv/node*}/d29/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.33 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.36 "ls -lah ${DEVICE:-/srv/node*}/d20/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.29 "ls -lah ${DEVICE:-/srv/node*}/d18/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.21 "ls -lah ${DEVICE:-/srv/node*}/d33/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.27 "ls -lah ${DEVICE:-/srv/node*}/d25/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.35 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.39 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.28 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.23 "ls -lah ${DEVICE:-/srv/node*}/d21/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.26 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.34 "ls -lah ${DEVICE:-/srv/node*}/d23/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.37 "ls -lah ${DEVICE:-/srv/node*}/d19/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.30 "ls -lah ${DEVICE:-/srv/node*}/d17/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.22 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.24 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.32 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664
From one of the nodes that does not belong to the list above. This partition should not exist on this node after the rebalance.
x.x.x.20:~# ls /srv/node/d28/objects-4/14242 | wc -l 627
The reconstructor is throwing a lot of these unexpected response errors in the logs. Manually running it from the node that should not have the partition, I can reproduce the error. x.x.y.0/24 is the replication network.
x.x.x.20:~# swift-object-reconstructor /etc/swift/object-server.conf -d d28 -p 14242 -o -v object-reconstructor: x.x.y.42:6200/d30/14242 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 36 failures to 0 successes'"
There looked to have been some partition locations cleaned up around July 11. Our expectation is that the old partition locations should be cleaned gradually since June 30 but we're not seeing that. I Was hoping for some ideas on what may be the problem (if any) and how we can make sure that the old partitions are cleaned up.
Thanks, Reid
Hi, We have tried running the object-reconstructor on a couple servers with a new configuration file where handoffs_only = True a few times. We only ran on a couple servers to make sure it was working as desired and no handoffs would be left. Each time it looks like it shows that there are still handoffs remaining but running more times doesn't progress: ... object-reconstructor: x.x.y42:6200/d12/22520 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 1 failures to 0 successes'" object-reconstructor: x.x.y.46:6200/d14/15843 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 31 failures to 0 successes'" ... object-reconstructor: 153/1861 (8.22%) partitions reconstructed in 62.53s (2.45/sec, 11m remaining) object-reconstructor: Handoffs only mode still has handoffs remaining. Next pass will continue to revert handoffs. object-reconstructor: Object reconstruction complete (once). (1.04 minutes) We increased debugging logging for a little while and another message we are seeing a lot on the newer servers is: "local2.warning hostname object-server: ssync subrequest failed with 400: POST /device/partition/account/container/object" (Message contains actual data). Searching all devices in the cluster for the above object+fragment returns 1 fragment on 2 devices (same fragment in both). First copy is in the location where it should have been before rebalance and second copy is in the location it should be after rebalance. It does look like we have dark data as we have many fragments where they are the only part available like above. Would this affect the above? Also how did we get into a situation where we have only 1 fragment? None of the servers have been down for more than the time it takes for a reboot. We haven't reintroduced a bad disk with data. More background on our environment: The software team is expiring objects but we are not running the object-expirer. They are also deleting some objects but it seems that some of the fragments don't get deleted. We doubled the size of the cluster and the rebalance that brought these objects/fragments into view said we were moving 100% partitions. Reid On Mon, Jul 26, 2021 at 6:54 PM Matthew Oliver <matt@oliver.net.au> wrote:
Yeah running it "once" would be a single cycle. Or you can watch logs, there should be something like: "Object reconstruction complete. (x minutes)"
Usually you can also watch swift-recon and wait until the oldest completion timestamp was after the date you restarted the reconstructors, But it seems the reconstructor still isn't plumbed through reconstruction. I do have an old patch for that (https://review.opendev.org/c/openstack/swift/+/541141 opps 3 years old), I'll just dust it off and shepherd it through so it'll work upstream.
Matt
On Tue, Jul 27, 2021 at 6:29 AM Reid Guyett <rguyett@datto.com> wrote:
handoffs_first for a reconstruction cycle to 2 Does this mean to set handoffs_first/handoffs_only = true and run the reconstructor twice in with `-o`?
Thanks!
Reid
On Sun, Jul 25, 2021 at 9:02 PM Matthew Oliver <matt@oliver.net.au> wrote:
You try enabling handoffs_first for a reconstruction cycle to 2, as this will prioritise those on handoffs. But make sure you turn it off as it'll stop normal reconstruction from happening.
I am working on some code to build in better old primary handoff usage in the reconstructor but that code hasn't landed yet, and not sure when it will.
Regards, Matt
On Tue, Jul 20, 2021 at 11:54 PM Reid Guyett <rguyett@datto.com> wrote:
Hello,
We started using EC policies in a new cluster a few months ago and added more capacity. During the rebalance (started June 30), it seems that all the data was copied to the new locations but it didn't clean up the old locations. This was identified through our handoff monitoring.
OS: Ubuntu 18.04 Swift: 2.17.1
Example: List of devices for partition
~$ swift-get-nodes /etc/swift/object-4.ring.gz -p 14242 ... removed ... Server:Port Device x.x.x.31:6031 d31 Server:Port Device x.x.x.66:6030 d30 Server:Port Device x.x.x.25:6029 d29 Server:Port Device x.x.x.33:6027 d27 Server:Port Device x.x.x.36:6020 d20 Server:Port Device x.x.x.29:6018 d18 Server:Port Device x.x.x.21:6033 d33 Server:Port Device x.x.x.27:6025 d25 Server:Port Device x.x.x.35:6022 d22 Server:Port Device x.x.x.39:6031 d31 Server:Port Device x.x.x.28:6032 d32 Server:Port Device x.x.x.23:6021 d21 Server:Port Device x.x.x.26:6022 d22 Server:Port Device x.x.x.34:6023 d23 Server:Port Device x.x.x.37:6019 d19 Server:Port Device x.x.x.30:6017 d17 Server:Port Device x.x.x.22:6027 d27 Server:Port Device x.x.x.24:6031 d31 Server:Port Device x.x.x.32:6032 d32
Partitions look to have the correct data on them:
~$ ssh root@x.x.x.31 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.66 "ls -lah ${DEVICE:-/srv/node*}/d30/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.25 "ls -lah ${DEVICE:-/srv/node*}/d29/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.33 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.36 "ls -lah ${DEVICE:-/srv/node*}/d20/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.29 "ls -lah ${DEVICE:-/srv/node*}/d18/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.21 "ls -lah ${DEVICE:-/srv/node*}/d33/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.27 "ls -lah ${DEVICE:-/srv/node*}/d25/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.35 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.39 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.28 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.23 "ls -lah ${DEVICE:-/srv/node*}/d21/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.26 "ls -lah ${DEVICE:-/srv/node*}/d22/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.34 "ls -lah ${DEVICE:-/srv/node*}/d23/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.37 "ls -lah ${DEVICE:-/srv/node*}/d19/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.30 "ls -lah ${DEVICE:-/srv/node*}/d17/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.22 "ls -lah ${DEVICE:-/srv/node*}/d27/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.24 "ls -lah ${DEVICE:-/srv/node*}/d31/objects-4/14242 | wc -l" 664 ~$ ssh root@x.x.x.32 "ls -lah ${DEVICE:-/srv/node*}/d32/objects-4/14242 | wc -l" 664
From one of the nodes that does not belong to the list above. This partition should not exist on this node after the rebalance.
x.x.x.20:~# ls /srv/node/d28/objects-4/14242 | wc -l 627
The reconstructor is throwing a lot of these unexpected response errors in the logs. Manually running it from the node that should not have the partition, I can reproduce the error. x.x.y.0/24 is the replication network.
x.x.x.20:~# swift-object-reconstructor /etc/swift/object-server.conf -d d28 -p 14242 -o -v object-reconstructor: x.x.y.42:6200/d30/14242 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 36 failures to 0 successes'"
There looked to have been some partition locations cleaned up around July 11. Our expectation is that the old partition locations should be cleaned gradually since June 30 but we're not seeing that. I Was hoping for some ideas on what may be the problem (if any) and how we can make sure that the old partitions are cleaned up.
Thanks, Reid
participants (2)
-
Matthew Oliver
-
Reid Guyett