[Openstack] Ring rebuild, multiple copies of ringbuilder file, wasRe: swift ringbuilder and disk size/capacity relationship
Peter Brouwer
peter.brouwer at oracle.com
Tue Apr 19 14:37:11 UTC 2016
Hello All
Followup question.
Assume a swift cluster with a number of swift proxy nodes, each node
needs to hold a copy of the ring structure, right?
What happens when a disk is added to the ring. After the change is made
on the first proxy node, the ring config files need to be copied to the
other proxy nodes, right?
Is there a risk during the period that the new ring builder files are
copied a file can be stored using the new structure on one proxy node
and retrieved via an other node that still holds the old structure and
not returning object not found. Or the odd change an object is moved
already by the re-balance process while being access by a proxy that
still has the old ring structure.
Regards
Peter
On 16/03/2016 00:23, Mark Kirkwood wrote:
> On 16/03/16 00:51, Peter Brouwer wrote:
>
>> Ah, good info. Followup question, assume worse case ( just to emphasis
>> the situation) , one copy ( replication = 1 ) , disk approaching its max
>> capacity.
>> How can you monitor this situation, i.e. to avoid the disk full scenario
>> and
>> if the disk is full, what type of error is returned?
>>
>
> Let's do an example: 4 storage nodes (obj1...obj4) each with 1 disk
> (vdb) added to ring. Replication set to 1.
>
> Firstly write a 1G object (to see where it is gonna go)...host obj1,
> disk vdb, partition 1003):
>
> obj1 $ ls -l
> /srv/node/vdb/objects/1003/d31/fae796287c852f0833316a3dadfb3d31/
> total 1048580
> -rw------- 1 swift swift 1073741824 Mar 16 10:15 1458079557.01198.data
>
>
> Then remove it
>
> obj1 $ ls -l
> /srv/node/vdb/objects/1003/d31/fae796287c852f0833316a3dadfb3d31/
> total 4
> -rw------- 1 swift swift 0 Mar 16 10:47 1458078463.80396.ts
>
>
> ...and use up space on obj1/vdb (dd a 29G file into /srv/node/vdb
> somewhere)
>
> obj1 $ df -m|grep vdb
> /dev/vdb 30705 29729 977 97% /srv/node/vdb
>
>
> Add object again (ends up on obj4 instead...handoff node)
>
> obj4 $ ls -l
> /srv/node/vdb/objects/1003/d31/fae796287c852f0833316a3dadfb3d31/
> total 1048580
> -rw------- 1 swift swift 1073741824 Mar 16 11:06 1458079557.01198.data
>
>
> So swift is coping with the obj1/vdb disk being too full. Remove again
> and exhaust space on all disks (dd again):
>
> @obj[1-4] $ df -h|grep vdb
> /dev/vdb 30G 30G 977M 97% /srv/node/vdb
>
>
> Now attempt to write 1G object again
>
> swiftclient.exceptions.ClientException:
> Object PUT failed:
> http://192.168.122.61:8080/v1/AUTH_9a428d5a6f134f829b2a5e4420f512e7/con0/obj0
> 503 Service Unavailable
>
>
> So we get an http 503 to show that the put has failed.
>
>
> Now re monitoring. Out of the box swift-recon cover this:
>
> proxy1 $ swift-recon -dv
> ===============================================================================
>
> --> Starting reconnaissance on 4 hosts
> ===============================================================================
>
> [2016-03-16 13:16:54] Checking disk usage now
> -> http://192.168.122.63:6000/recon/diskusage: [{u'device': u'vdc',
> u'avail': 32162807808, u'mounted': True, u'used': 33718272, u'size':
> 32196526080}, {u'device': u'vdb', u'avail': 1024225280, u'mounted':
> True, u'used': 31172300800, u'size': 32196526080}]
> -> http://192.168.122.64:6000/recon/diskusage: [{u'device': u'vdc',
> u'avail': 32162807808, u'mounted': True, u'used': 33718272, u'size':
> 32196526080}, {u'device': u'vdb', u'avail': 1024274432, u'mounted':
> True, u'used': 31172251648, u'size': 32196526080}]
> -> http://192.168.122.62:6000/recon/diskusage: [{u'device': u'vdc',
> u'avail': 32162807808, u'mounted': True, u'used': 33718272, u'size':
> 32196526080}, {u'device': u'vdb', u'avail': 1024237568, u'mounted':
> True, u'used': 31172288512, u'size': 32196526080}]
> -> http://192.168.122.65:6000/recon/diskusage: [{u'device': u'vdc',
> u'avail': 32162807808, u'mounted': True, u'used': 33718272, u'size':
> 32196526080}, {u'device': u'vdb', u'avail': 1024221184, u'mounted':
> True, u'used': 31172304896, u'size': 32196526080}]
> Distribution Graph:
> 0% 4
> *********************************************************************
> 96% 4
> *********************************************************************
> Disk usage: space used: 124824018944 of 257572208640
> Disk usage: space free: 132748189696 of 257572208640
> Disk usage: lowest: 0.1%, highest: 96.82%, avg: 48.4617574245%
> ===============================================================================
>
>
>
> So integrating swift-recon into regular monitoring/alerting
> (collectd/nagios or whatever) is one approach (mind you most folk
> already monitor disk usage data... and there is nothing overly special
> about ensuring you don't run of space)!
>
>
>> BTW, thanks for the patience for sticking with me in this.
>
> No worries - a good question (once I finally understood it).
>
> regards
>
> Mark
--
Regards,
Peter Brouwer, Principal Software Engineer,
Oracle Application Integration Engineering.
Phone: +44 1506 672767, Mobile +44 7720 598 226
E-Mail: Peter.Brouwer at Oracle.com
More information about the Openstack
mailing list