[Openstack] Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)

Samuel Merritt sam at swiftstack.com
Mon Oct 22 19:03:46 UTC 2012


On 10/22/12 9:38 AM, Emre Sokullu wrote:
> Hi folks,
>
> At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
> now. Currently, we hold about 18TB of data on 3 storage nodes. Since
> we hit 84% in utilization, we have recently decided to expand the
> storage with more disks.
>
> In order to do that, after creating a new c0d4p1 partition in each of
> the storage nodes, we ran the following commands on our proxy server:
>
> swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
> swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
> swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100
>
 > [snip]
 >
 > So right now, the problem is;  the disk growth in each of the storage
 > nodes seems to have stalled,

So you've added 3 new devices to each ring and assigned a weight of  100 
to each one. What are the weights of the other devices in the ring? If 
they're much larger than 100, then that will cause the new devices to 
end up with a small fraction of the data you want on them.

Running "swift-ring-builder <thing>.builder" will show you information, 
including weights, of all the devices in the ring.


> * Bonus question: why do we copy ring.gz files to storage nodes and
> how critical they are. To me it's not clear how Swift can afford to
> wait (even though it's just a few seconds ) for .ring.gz files to be
> in storage nodes after rebalancing- if those files are so critical.

The ring.gz files contain the mapping from Swift partitions to disks. As 
you know, the proxy server uses it to determine which backends have the 
data for a given request. The replicators also use the ring to determine 
where data belongs so that they can ensure the right number of replicas, 
etc.

When two storage nodes have different versions of a ring.gz file, you 
can get replicator fights. They look like this:

- node1's (old) ring says that the partition for a replica of 
/cof/fee/cup belongs on node2's /dev/sdf.
- node2's (new) ring says that the same partition belongs on node1's 
/dev/sdd.

When the replicator on node1 runs, it will see that it has the partition 
for /cof/fee/cup on its disk. It will then consult the ring, push that 
partition's contents to node2, and then delete its local copy (since 
node1's ring says that this data does not belong on node1).

When the replicator on node2 runs, it will do the converse: push to 
node1, then delete its local copy.

If you leave the rings out of sync for a long time, then you'll end up 
consuming disk and network IO ping-ponging a set of data around. If 
they're out of sync for a few seconds, then it's not a big deal.




More information about the Openstack mailing list