[Openstack] Failure and Membership detection in Swift

John Dickinson me at not.mn
Fri Jan 30 21:41:59 UTC 2015


To give you some more specific answers to your questions...


Nodes are "discovered" by being added to the ring. Simply put, this means an operator or external (to swift) management system is responsible for adding the device into. It is not automatically done inside of Swift (note that it could be done by some provisioning system)

The ring has two parts: the ring file and the builder file. The ring file is what's distributed to nodes in the cluster, and it is created from the builder file. Rings are updated out-of-band to the Swift processes. That is, there isn't an API command to modify rings. Rings are updated and deployed to the nodes.

Some of the failure detection is done by timeouts. There are timeouts on connecting to another node and timeouts on data transmitted between nodes. There are other failure detection scenarios like looking at system messages (swift-drive-audit), detecting full or failed drives with mount checks and fallocate calls, and checksum calculations for detecting bit rot.


--John






> On Jan 30, 2015, at 1:21 PM, John Dickinson <me at not.mn> wrote:
> 
> There's a ton of info available at https://swiftstack.com/openstack-swift/. Specifically, take a look at https://swiftstack.com/openstack-swift/architecture/ for how Swift solves some of these issues.
> 
> You might also find the info at http://swift.openstack.org useful, especially http://docs.openstack.org/developer/swift/admin_guide.html.
> 
> For more info on failure handling, see https://www.youtube.com/watch?v=mKZ7kDDPSIU where I went through several failure conditions and show how Swift recovers from them. There's also a much older blog post at https://swiftstack.com/blog/2012/09/13/how-openstack-swift-handles-hardware-failures/.
> 
> While it might not be the academic paper you're looking for, this should give a good groundwork of info. If you have other questions, feel free to ask.
> 
> --John
> 
> 
>> On Jan 30, 2015, at 12:56 PM, Behrooz Shafiee <shafiee01 at gmail.com> wrote:
>> 
>> Hello All,
>> 
>> Unfortunately, I could not find any academic paper which describes Swift in academic terms, most of documents are user manual and general overview. I need to know some details about Swift which I could not find in the public documents.
>> 1) How does membership detection works in Swift (discovering newly added nodes)?
>> Digging this link, I realized unlike many other systems (Dynamo, Cassandra,...) Swift does not use a gossip based protocol. And it seems that it is being handled by the concept of Rings;however, what is a Ringbuilder? is it a central service or a distributed protocol? what if it fails? How rings info and its changes are shared among different nodes?
>> 
>> 2)How failure detection works in Swift?
>> Going through OpenStack Swift book by Joe Arnold, I realized a timeout-based mechanism by health-monitor is in charge of finding failures. Is that correct?
>> 
>> 
>> Thanks in advance,
>> --
>> Behrooz
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150130/22f22755/attachment.sig>


More information about the Openstack mailing list