[Openstack] Create initial rings swift

John Dickinson me at not.mn
Thu Dec 11 18:51:10 UTC 2014


Great questions, but there isn't a short answer. Let me do my best to give you a concise answer.


> On Dec 10, 2014, at 11:00 PM, dhanesh1212121212 <dhanesh1212 at gmail.com> wrote:
> 
> Hi All,
> 
> Below details is Mentioned in openstack document to configure object storage.
> 
> For simplicity, this guide uses one region and
> zone with 2^10 (1024) maximum partitions, 3 replicas of each object, and 1 hour minimum
> time between moving a partition more than once. For Object Storage, a partition indicates
> a directory on a storage device rather than a conventional partition table.

The partition isn't really a directory. Well, it happens to have an on-disk layout where partitions are represented as directories, but conceptually that doesn't matter. A partition is one part of the overall keyspace of the hash function used by the ring. Specifically, a partition is a certain number of prefix bits of the output of md5. The number of prefix bits used is called the part_power--in your example, it's 10.

When a request comes to Swift, the URI is hashed, resulting in a long number. The prefix of that number is the partition. So, for example, if you hashed a URI and it started with the bits 011011000101010010111011... and you have a part power of 4, then the partition is 0110, or 6 in decimal. Swift keeps a lookup table, called "the ring", that maps partitions to storage volumes (read: drives). So in this case, Swift will find right drives for partition 6 in the lookup table and then send the data to the right servers that have those drives.

Keeping that lookup table right (so that replicas are distributed evenly and across failure domains) is what happens when you rebalance a ring.


(This is a lot of detail, but it's important to answer the questions below.)


> 
> 
> Please Clarify my doubts.
> 
> 1. Does it mean we can create only 1024 folders (directory) if we select 2^10. is it possible to expand the partition later?

The part_power can not be changed after the ring is created! [1] Reason being, if the part power changes, the all the data currently in cluster would have to be rehashed and moved. At best, that would cause a massive internal network load. At worst, rehashing all the objects would simply take an enormous amount of time and you'd essentially be left with a large period of downtime.

[1] There have been a few conversations over the years about how to allow changing the part power, but it's not been solved yet.

> 
> 
> 
> 2. started swift configuration with three replicas. is it possible add more replicas later?

Yes, absolutely! With the `swift-ring-builder` command, you can change the number of replicas. Below is the help/usage message for that command:

swift-ring-builder <builder_file> set_replicas <replicas>
    Changes the replica count to the given <replicas>. <replicas> may
    be a floating-point value, in which case some partitions will have
    floor(<replicas>) replicas and some will have ceiling(<replicas>)
    in the correct proportions.

    A rebalance is needed to make the change take effect.



> 
> 
> 3. what is the meaning of "moving a partition more than once in one hour".

Swift is optimized for very high availability. If you were to add a lot of capacity to your cluster all at once, you may end up with a lot of data needing to move around. However, because Swift is a distributed system, you might have different versions of the ring deployed in the cluster at the same time. Also, if all replicas were assigned to a new location but the data hadn't been moved yet, any requests for that data would fail.

To solve this, Swift will only move one replica for a partition in a given rebalance operation. This means that your existing data is still available (even if that third replica's data hasn't moved yet). The amount of time you must wait between rebalances is called the min_part_hours. It should be set to something bigger than how long it takes for your cluster to run a complete replication pass (ie replication completed on all nodes).

You can adjust the min_part_hours at any time. It's a safety valve for you and your end-users, so use it instead of circumventing it.




I hope the answers help. There's a ton of detail about how the rings work in Swift. Here's some links for further reading/watching:

http://docs.openstack.org/developer/swift/overview_ring.html
https://swiftstack.com/blog/2012/11/21/how-the-ring-works-in-openstack-swift/
https://swiftstack.com/blog/2012/09/13/how-openstack-swift-handles-hardware-failures/
http://mirror.int.linux.conf.au/linux.conf.au/2013/mp4/Playing_with_OpenStack_Swift.mp4



--John






> 
> 
> 
> Regards,
> Dhanesh M.
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20141211/d0d7a81f/attachment.sig>


More information about the Openstack mailing list