Open Stack

Tue Apr 26 10:57:50 UTC 2016

Hi Mark,

Having pondered of this a bit, it made me realize that capacity 
management in swift being deployed on enterprise level ( lets say 25+ 
storage nodes) might be a bit of a challenge for a sysadmin.

Would it be an idea to consider the following addition to the 
ringbuilder tools for future releases.

Implement a mechanism to automatically  ( optional via command line 
argument(s) ) distribute the newly created swift ring build structures 
to all the participating storage nodes.

Options of specifying the storage nodes to copy to could be;

1) Via command line arg

2) From the Ip part of the disks belonging to the ring build structure

3) A merge of the above 2 ( in case storage servers share attached 
storage from an external subsystem ( using shared filesystem) and the IP 
part contains localhost)

This way the inconsistency between hash tables on different swift 
storage nodes can be easily controlled and time of differences can be 
kept to a minimum.

A next step in this function could be a pause in accepting client 
traffic till all hash tables have been copied.

Peter

On 20/04/2016 05:41, Mark Kirkwood wrote:
> The proxies and the storage nodes all have a copy of the ring 
> structure(s): e.g:
>
> $ ls -l /etc/swift/*.ring.gz
> -rw-r--r-- 1 root  nagios 1316 Apr 20 00:31 account.ring.gz
> -rw-r--r-- 1 root  nagios 1299 Apr 20 00:31 container.ring.gz
> -rw-r--r-- 1 root  nagios 1287 Apr 20 00:31 object.ring.gz
>
> but yeah,  suppose you make changes to the ring on (say) one of the 
> proxies, got make a coffee, then distribute the new rings to the 
> various machines. There is a period of time when the rings are 
> different on some machines from others.
>
> So it is possible that a request for an object initiated by the proxy 
> where you modified the rings may look for an object on the newly added 
> device (which does not have anything yet) - it will be served from a 
> handoff or replica instead (you might get a not found if you have num 
> replicas = 1...haven't tried that out tho).
>
> I *think* if you modify the ring on a proxy then it won't be able to 
> force the storage nodes to move an object somewhere where it can't be 
> found (they will look at their own ring version). However, subsequent 
> replication runs (where storage servers chatter to their next and prev 
> neighbours) once you have all the new rings distributed will 
> reorganise anything that did get moved incorectly.
>
> John can hopefully give you fuller details (I haven't read up on or 
> tried out all the various scenarios you clearly dream up). However I 
> did do some pretty horrific things (on purpose):
>
> - changing the number of partitions and installing this everywhere 
> (ahem - do not do this in a cluster you care about)
> - checking that it utterly breaks everything :-(
> - copying back the old rings (do back these up)!
> - checking that the cluster is working again :-)
>
> So in general, seems pretty robust!
>
> Also our friend swift-recon can alert you about any problems with non 
> matching rings:
>
> markir at proxy1:~$ swift-recon --md5
> =============================================================================== 
>
> --> Starting reconnaissance on 4 hosts
> =============================================================================== 
>
> [2016-04-20 16:35:10] Checking ring md5sums
> 4/4 hosts matched, 0 error[s] while checking hosts.
> =============================================================================== 
>
> [2016-04-20 16:35:10] Checking swift.conf md5sum
> 4/4 hosts matched, 0 error[s] while checking hosts.
> =============================================================================== 
>
>
> Cheers
>
> Mark
>
>
>
> On 20/04/16 02:37, Peter Brouwer wrote:
>>
>> Hello All
>>
>> Followup question.
>> Assume a swift cluster with a number of swift proxy nodes, each node
>> needs to hold a copy of the ring structure, right?
>> What happens when a disk is added to the ring. After the change is made
>> on the first proxy node, the ring config files need to be copied to the
>> other proxy nodes, right?
>> Is there a risk during the period that the new ring builder files are
>> copied a file can be stored using the new structure on one proxy node
>> and retrieved via an other node that still holds the old structure and
>> not returning object not found. Or the odd change an object is moved
>> already by the re-balance process while being access by a proxy that
>> still has the old ring structure.
>>
>

-- 
Regards,

Peter Brouwer, Principal Software Engineer,
Oracle Application Integration Engineering.
Phone:  +44 1506 672767, Mobile +44 7720 598 226
E-Mail: Peter.Brouwer at Oracle.com

Open Stack

[Openstack] Ring rebuild, multiple copies of ringbuilder file, wasRe: swift ringbuilder and disk size/capacity relationship

OpenStack

Community

Documentation

Branding & Legal