<div dir="ltr">Hi folks,<div><br></div><div>This is Emre from <a href="http://GROU.PS">GROU.PS</a> -- we operate an OpenStack Swift cluster since 2011, it's been great.</div><div><br></div><div>We have a standard installation with a single proxy server (proxy1) and three storage servers (storage1, storage2, storage3) each with 5x1TB disks.</div>

<div><br></div><div>Following a chain of mistakes initiated by our hosting provider, which changed the wrong disk on one of our OpenStack Swift storage servers, we ended up with the following situation:</div><div>


<p class="">root@proxy1:/etc/swift# swift-ring-builder container.builder <br>container.builder, build version 41<br>1048576 partitions, 3 replicas, 3 zones, 14 devices, 60.00 balance<br>The minimum number of hours before a partition can be reassigned is 1<br>

Devices:    id  zone      ip address  port      name weight partitions balance meta<br>             0     1     192.168.1.3  6001    c0d1p1  80.00     262144   37.50 <br>             1     1     192.168.1.3  6001    c0d2p1  80.00     262144   37.50 <br>

             2     1     192.168.1.3  6001    c0d3p1  80.00     262144   37.50 <br>             3     2     192.168.1.4  6001    c0d1p1 100.00     238312   -0.00 <br>             4     2     192.168.1.4  6001    c0d2p1 100.00     238312   -0.00 <br>

             5     2     192.168.1.4  6001    c0d3p1 100.00     238312   -0.00 <br>             6     3     192.168.1.5  6001    c0d1p1 100.00     209715  -12.00 <br>             7     3     192.168.1.5  6001    c0d2p1 100.00     209715  -12.00 <br>

             8     3     192.168.1.5  6001    c0d3p1 100.00     209715  -12.00 <br>            10     2     192.168.1.4  6001    c0d5p1 100.00     238312   -0.00 <br>            11     3     192.168.1.5  6001    c0d5p1 100.00     209716  -12.00 <br>

            14     3     192.168.1.5  6001    c0d6p1 100.00     209715  -12.00 <br>            15     1     192.168.1.3  6001    c0d5p1  80.00     262144   37.50 <br>            16     2     192.168.1.4  6001    c0d6p1 100.00      95328  -60.00</p>

<p class=""><br></p><p class="">root@proxy1:/etc/swift# ssh storage1 df -h<br>Filesystem            Size  Used Avail Use% Mounted on<br>/dev/cciss/c0d0p5     1.8T   38G  1.7T   3% /<br>none                  3.9G  220K  3.9G   1% /dev<br>

none                  4.0G     0  4.0G   0% /dev/shm<br>none                  4.0G   60K  4.0G   1% /var/run<br>none                  4.0G     0  4.0G   0% /var/lock<br>none                  4.0G     0  4.0G   0% /lib/init/rw<br>

/dev/cciss/c0d1p1     1.9T  1.9T  239M 100% /srv/node/c0d1p1<br>/dev/cciss/c0d2p1     1.9T  1.9T  210M 100% /srv/node/c0d2p1<br>/dev/cciss/c0d3p1     1.9T  1.9T  104K 100% /srv/node/c0d3p1<br>/dev/cciss/c0d5p1     1.9T  1.2T  643G  66% /srv/node/c0d5p1<br>

/dev/cciss/c0d0p2      92M   51M   37M  59% /boot<br>/dev/cciss/c0d0p3     1.9G   35M  1.8G   2% /tmp</p><p class=""><br></p><p class="">root@proxy1:/etc/swift# ssh storage2 df -h<br>Filesystem            Size  Used Avail Use% Mounted on<br>

/dev/cciss/c0d0p5     1.8T   33G  1.7T   2% /<br>none                  3.9G  220K  3.9G   1% /dev<br>none                  4.0G     0  4.0G   0% /dev/shm<br>none                  4.0G  108K  4.0G   1% /var/run<br>none                  4.0G     0  4.0G   0% /var/lock<br>

none                  4.0G     0  4.0G   0% /lib/init/rw<br>/dev/cciss/c0d0p3     1.9G   35M  1.8G   2% /tmp<br>/dev/cciss/c0d0p2      92M   51M   37M  59% /boot<br>/dev/cciss/c0d1p1     1.9T  1.5T  375G  80% /srv/node/c0d1p1<br>

/dev/cciss/c0d2p1     1.9T  1.5T  385G  80% /srv/node/c0d2p1<br>/dev/cciss/c0d3p1     1.9T  1.5T  382G  80% /srv/node/c0d3p1<br>/dev/cciss/c0d4p1     1.9T  1.5T  377G  80% /srv/node/c0d5p1<br>/dev/cciss/c0d5p1     1.9T  519G  1.4T  28% /srv/node/c0d6p1</p>

<p class=""><br></p><p class="">root@proxy1:/etc/swift# ssh storage3 df -h<br>Filesystem            Size  Used Avail Use% Mounted on<br>/dev/cciss/c0d0p5     1.8T   90G  1.7T   6% /<br>none                  3.9G  224K  3.9G   1% /dev<br>

none                  4.0G     0  4.0G   0% /dev/shm<br>none                  4.0G  112K  4.0G   1% /var/run<br>none                  4.0G     0  4.0G   0% /var/lock<br>none                  4.0G     0  4.0G   0% /lib/init/rw<br>

/dev/cciss/c0d1p1     1.9T  1.1T  741G  61% /srv/node/c0d1p1<br>/dev/cciss/c0d2p1     1.9T  1.1T  741G  61% /srv/node/c0d2p1<br>/dev/cciss/c0d3p1     1.9T  1.1T  758G  60% /srv/node/c0d3p1<br>/dev/cciss/c0d5p1     1.9T  1.1T  765G  59% /srv/node/c0d5p1<br>

/dev/cciss/c0d6p1     1.9T  1.1T  772G  59% /srv/node/c0d6p1<br>/dev/cciss/c0d0p2      92M   51M   37M  59% /boot<br>/dev/cciss/c0d0p3     1.9G   35M  1.8G   2% /tmp</p><div><br></div><p class="">As you can see:</p><p class="">

* Balances are messed up and they don't get to a normal state no matter how long we wait. Although the behavior for the end-user is still stable.</p><p class="">* We tried erasing the contents a disk on storage1 (/dev/cciss/c0d5p1) that was 100% full before all others (others were still 95%) and this disk filled up pretty quickly, while others quickly catching up to 100%. We were expecting each to balance to the same level because storage2 and storage3 (with 5 disks each) are set to 100 in weight, whereas storage1 (with 4 disks only) is set to 80 in weight.</p>

<p class="">* There was a failing disk with storage2, so we replaced that (/dev/cciss/c0d5p1) but it is not filling up as quickly. Storage2 is almost 80%</p><p class="">* Storage3 is healthy.</p><p class="">* Storage1 is currently taken offline. Because it's been failing constantly and its disk space doesn't balance.</p>

<p class=""><br></p><p class="">What is the best course of action to take in this scenario. I believe, we can either:</p><p class="">1) Completely dump storage1. Remove zone-1 from the proxy. Get a new server with similar setup and add it as a new zone on proxy accordingly. </p>

<p class="">2) Stop storage1. Erase the contents of full disks on storage1. Switch to proxy, remove the full disks from the cluster, then add them as new devices. (again, a delay may </p><p class="">3) Or something completely different?</p>

<p class=""><br></p><p class="">My fear is, with both the first and second alternatives, if there's a delay between removing the zones or disks, and adding new ones, the other zones/disks would fill up. Therefore I would need to choose the alternative where there would be the minimal amount of delay.</p>

<p class="">Last but not least, please note that this is swift installation is outdated, never been updated since installation. (I am to blame!)</p><p class="">Thanks for your suggestions, directions in advance.</p><p class="">

Cheers,</p></div><div><br clear="all"><div><br></div>-- <br><div dir="ltr"><span style="border-collapse:collapse;color:rgb(136,136,136)"><font face="courier new, monospace"><font color="#000000">Emre</font></font></span></div>


<font face="yw-aa4fb83139ac691cb05d5cfd95da4950bbf70a7d-b525b4f57053e9faff67a45fe23fc468--ol" style></font></div></div>