[Openstack] {Swift] On demand replication

John Dickinson me at not.mn
Mon Dec 5 21:46:28 UTC 2016



On 5 Dec 2016, at 12:56, John Dickinson wrote:

> On 5 Dec 2016, at 12:39, Mark Kirkwood wrote:
>
>> Thanks John - increasing the partition coverage is a great idea (I hadn't considered doing that).
>>
>>
>> Now with respect to the lack of durability implication - I should have said we are using a 2 region topology with (region) affinity set, so altho the initial object placement will be generally durable, in the case where someone was (perhaps overly) concerned about it living in both regions we were looking at how to 'encourage' that to happen (ahead of the steady process that the replicator is making).
>>
>
> In a multi-region setup, I really like read-affinity, and I'm not a huge fan of write-affinity. Read-affinity gives you a lot of what you're looking for: you get responses from the "closest" servers that have the data, lowering time-to-first-byte on reads. Write-affinity intentionally forces Swift to place data, durably, in handoff locations, thus creating replication work that happens later. Obviously, this means that if your inbound traffic to one region is higher than the cross-region background replication, your cluster will never "catch up" on that deferred work. In other words, you're intentionally expanding and exposing an eventual consistency window to clients that they are able to detect on reads. Some of this can be hidden by also having read affinity that matches the write affinity setting, but it's not possible to hide all of the behavior changes that the end-user will see.
>
> Write-affinity can be great if you've got a bursty workload (i.e. periods of lower activity when the cluster catches up) or if the use case doesn't have much in the way of reading the data immediately after writing it. So I'm not totally against it. I just don't think in most cases the benefits are worth the cost, and often times it can lead to confusion.
>
> However, all that being said, you're hinting at something that would be a pretty cool feature: policy-specific affinity. That's a great idea! Now it's just a matter of prioritization...


Actually, it has been written. Just needs to be reviewed/landed.

https://review.openstack.org/#/c/382766/


>
>
> --John
>
>
>
>
>
>>
>> regards
>>
>>
>> Mark
>>
>>
>> On 06/12/16 05:41, John Dickinson wrote:
>>> I'd suggest monitoring overall replications status with a combination of log monitoring and swift-dispersion-report. If you find something that is under-replicated, you can run the replicator process and give it a list of partitions to prioritize. http://docs.openstack.org/developer/swift/admin_guide.html#dispersion-report I'd recommend running dispersion report with 100% coverage.
>>>
>>> However, your question implies that when an object is created it is not durable in the system. That's not the case. Swift will not return with a success unless the data has been durably persisted in the cluster (a quorum of writes). Quorum is determined per storage policy; in a replicated policy, quorum is half + 1 for odd numbers of replicas and half for even numbers of replicas. This means that when a client gets a 201 response to an object PUT request, that object has been stored at least 2 times in a 3-replica storage policy (and it's extraordinarily likely it was stored a full 3 times).
>>>
>>> Let me know if you want more details about this.
>>>
>>> --John
>>>
>>>
>>>
>>>
>>> On 4 Dec 2016, at 14:56, Mark Kirkwood wrote:
>>>
>>>> Suppose you have a newly created object, and want to ensure it has replicated (i.e this object is 'special' in some way). If the replication process is taking a while, is it sensible to 'encourage' Swift replication via direct HTTP calls using the 'replicate' method (or is this dangerous)?
>>>>
>>>>
>>>> E.g:
>>>>
>>>> markir at obj4:/srv/node/vdb/objects/5$ ls 55e
>>>> ls: cannot access '55e': No such file or directory
>>>>
>>>> markir at obj1:~$ curl -v -X REPLICATE "http://obj4:6000/vdb/5/55e"
>>>>
>>>> markir at obj4:/srv/node/vdb/objects/5$ sleep 30;ls 55e
>>>> 5c38bfdd63f01a8e56260105fc68555e
>>>>
>>>>
>>>> regards
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> Post to     : openstack at lists.openstack.org
>>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20161205/ddb885b7/attachment.sig>


More information about the Openstack mailing list