[Openstack] [Swift] : Internal Working of PUT Command

John Dickinson me at not.mn
Thu Jan 5 17:55:13 UTC 2017

On 4 Jan 2017, at 20:44, Sameer Kulkarni wrote:

> Hi All,
> I was eager to know the data-flow of an object on Openstack Swift PUT
> command. I know a highlevel overview from various posts, which I am not
> sure of.
>    - Client Initiates PUT command specifying the object path on local
>    storage to that on swift cloud.
>    - The Object is transferred to proxy-server over HTTP request.
>    - If (REPLICAS = 3), then three primary nodes are found out using Ring
>    Algorithm.
>    - Then the object is transferred to these three primary nodes parallely
>    from proxy server.
>    - Then after majority successful ACK (two here), the client is sent back
>    ACK.
> I will be happy if someone can confirm the above sequence of steps is
> correct.

Yep, that's a pretty good summary.

> My follow questions are
>    - What happens when there is NO ACK from the 3rd node?

The client will still get a success (assuming the other two completed just fine).

As you said, the data is sent to all replicas at the same time. After an object server has fsync()'d the data to disk, it can return success to the proxy. The proxy collects these responses and determines what the proper response to the client is. Let's suppose that a three-replica system has two normal storage nodes and one that is very slow. The proxy sends chunks of the object to each server concurrently (default of 64k bytes) as the data is read from the client. Each of these has a timeout, so a failing object server may error out. In this case, if there are still enough active nodes to give a quorum, then they will continue to be sent data and the failing node will no longer get the data. This handles the case of "what happens when hardware fails during a write request". However, what if the third is just slow, but never quite times out? In that case, once the proxy gets a successful response from a quorum of replicas, but the third one hasn't responded yet, we know that the client will get a successful response, no matter what--we already have a quorum of success. The proxy starts a new (much shorter) timer to give the third replica a chance to finish, and when the timer fires or the third replica responds (whichever is first), the client gets the response.

>    - How is rsync used, when there is node failure?

And the above gets us to the "so what about that third node?

The object replication process is running in the background. When it detects that a replica is missing from where it should be, it triggers rsync to copy the data over. (note, this explanation is greatly simplified)

So, in this way, when a third node fails during a write, the replication process will ensure that the correct data is in the correct place, but this is not done as part of the client-facing data path.

> Thank you
> Sameer Kulkarni

> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170105/c1ab539f/attachment.sig>

More information about the Openstack mailing list