[Openstack] [OpenStack][Swift][Replicator] Does object replicator push "exist" object to handoff node while a node/disk/network fails ?

Kuo Hugo tonytkdk at gmail.com
Thu Sep 6 16:37:53 UTC 2012


Thanks for your quick reply John ~

Seems that I lost something with umount disk test before .

I'll try it later
appreciate ~

2012/9/7 John Dickinson <me at not.mn>

> you can force a replicator to push to a handoff node by unmounting the
> drive one of the primary replicas is on.
>
> --John
>
>
> On Sep 6, 2012, at 9:00 AM, Kuo Hugo <tonytkdk at gmail.com> wrote:
>
> > Hi folks , John and Chmouel ,
> >
> > I did post a question about this long time ago. And my test result is
> match to Chmouel's answer.
> >
> > https://answers.launchpad.net/swift/+question/191924
> > "The object replicator will push an object to a handoff node if another
> primary node returns that the drive the object is supposed to go on is bad.
> We don't push to handoff nodes on general errors, otherwise things like
> network partitions or rebooting machines would cause storms of unneeded
> handoff traffic."
> >
> > But I read something different from John (or just my misunderstanding)
>  , so want to clarify it.
> >
> > Assumption :
> > Storage Nodes :  5 (each for one zone)
> > Zones :   5
> > Replica :  3
> > Disks :   2*5   ( 1 disk/per node )
> >
> > Account           AUTH_test
> > Container        Con_1
> > Object              Obj1
> >
> >
> > Partition       3430
> > Hash            6b342ac122448ef16bf1655d652bfe1e
> >
> > Server:Port Device      192.168.1.101:36000 DISK1
> > Server:Port Device      192.168.1.102:36000 DISK1
> > Server:Port Device      192.168.1.103:36000 DISK1
> > Server:Port Device      192.168.1.104:36000 DISK1        [Handoff]
> > Server:Port Device      192.168.1.105:36000 DISK1        [Handoff]
> >
> >
> > curl -I -XHEAD "
> http://192.168.1.101:36000/DISK1/3430/AUTH_test/Con_1/Obj1"
> > curl -I -XHEAD "
> http://192.168.1.102:36000/DISK1/3430/AUTH_test/Con_1/Obj1"
> > curl -I -XHEAD "
> http://192.168.1.103:36000/DISK1/3430/AUTH_test/Con_1/Obj1"
> > curl -I -XHEAD "
> http://192.168.1.104:36000/DISK1/3430/AUTH_test/Con_1/Obj1" # [Handoff]
> > curl -I -XHEAD "
> http://192.168.1.105:36000/DISK1/3430/AUTH_test/Con_1/Obj1" # [Handoff]
> >
> >
> > ssh 192.168.1.101 "ls -lah
> /srv/node/DISK1/objects/3430/e1e/6b342ac122448ef16bf1655d652bfe1e/"
> > ssh 192.168.1.102 "ls -lah
> /srv/node/DISK1/objects/3430/e1e/6b342ac122448ef16bf1655d652bfe1e/"
> > ssh 192.168.1.103 "ls -lah
> /srv/node/DISK1/objects/3430/e1e/6b342ac122448ef16bf1655d652bfe1e/"
> > ssh 192.168.1.104 "ls -lah
> /srv/node/DISK1/objects/3430/e1e/6b342ac122448ef16bf1655d652bfe1e/" #
> [Handoff]
> > ssh 192.168.1.105 "ls -lah
> /srv/node/DISK1/objects/3430/e1e/6b342ac122448ef16bf1655d652bfe1e/" #
> [Handoff]
> >
> > Case :
> > Obj1 is already been uploaded to 3 primary devices properly. What kind
> of fails on "192.168.1.101:3600 DISK1" will trigger replicator push a
> copy to "192.168.1.104:36000 DISK1 [handoff] " device ?
> >
> > In my past test , the replicator does not push a copy to handoff node
> for an "existing" object. Whatever network fail / reboot machine / umount
> disk , I think these are general errors from Chmouel mentioned before. But
> I'm not that sure about the meaning of "replicator will push an object to a
> handoff node if another primary node returns that the drive the object is
> supposed to go on is bad" . How object-replicator to know that the drive
> the object is supposed to go on is bad (I think replicator will never know
> it. Should it work with object-auditor ?)
> >
> > How to produce a fail to trigger replicator push object to handoff node ?
> >
> > In my consideration , for replicator pushes an object to handoff node
> there's a condition is that primary device does not have the object , also
> can not push into the device(192.168.1.101:36000 DISK1). It might be
> moved to quarantine due to the object-auditor found the object is broken.
> >
> > So that even the disk(192.168.1.101:3600 DISK1) is still mounted and
> the target partition 3430 does not have Obj1 . Another node's
> object-replicator try to push it's Obj1 to "192.168.1.101:36000 DISK1" ,
> but unluckily , the "192.168.1.101:36000 DISK1" is bad. So the
> object-replicator will push object to "192.168.1.104:36000 DISK1
> [handoff] " now .
> >
> > That's my inference , please feel free to correct it . I'm really
> confusing about to produce the kind of fails for replicator to push object
> to handoff node .
> > Any idea would be great .
> >
> >
> > Cheers
> > --
> > +Hugo Kuo+
> > tonytkdk at gmail.com
> > +886 935004793
> >
>
>


-- 
+Hugo Kuo+
tonytkdk at gmail.com
+ <tonytkdk at gmail.com>886 935004793
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120907/10bb22f6/attachment.html>


More information about the Openstack mailing list