[Openstack] Swift Consistency Guarantees?

Nikolaus Rath Nikolaus at rath.org
Fri Jan 20 20:18:39 UTC 2012


Hmm, but if there are e.g. 4 replicas, two of which are up-to-date but
offline, and two available but online, swift would serve the old version?

-Niko


On 01/20/2012 03:06 PM, Chmouel Boudjnah wrote:
> As Stephen mentionned if there is only one replica left Swift would not
> serve it.
> 
> Chmouel.
> 
> On Fri, Jan 20, 2012 at 1:58 PM, Nikolaus Rath <Nikolaus at rath.org
> <mailto:Nikolaus at rath.org>> wrote:
> 
>     Hi,
> 
>     Sorry for being so persistent, but I'm still not sure what happens if
>     the 2 servers that carry the new replica are down, but the 1 server that
>     has the old replica is up. Will GET fail or return the old replica?
> 
>     Best,
>     Niko
> 
>     On 01/20/2012 02:52 PM, Stephen Broeker wrote:
>     > By default there are 3 replicas.
>     > A PUT Object will return after 2 replicas are done.
>     > So if all nodes are up then there are at least 2 replicas.
>     > If all replica nodes are down, then the GET Object will fail.
>     >
>     > On Fri, Jan 20, 2012 at 11:21 AM, Nikolaus Rath <Nikolaus at rath.org
>     <mailto:Nikolaus at rath.org>
>     > <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>> wrote:
>     >
>     >     Hi,
>     >
>     >     So if an object update has not yet been replicated on all
>     nodes, and all
>     >     nodes that have been updated are offline, what will happen?
>     Will swift
>     >     recognize this and give me an error, or will it silently
>     return the
>     >     older version?
>     >
>     >     Thanks,
>     >     Nikolaus
>     >
>     >
>     >     On 01/20/2012 02:14 PM, Stephen Broeker wrote:
>     >     > If a node is down, then it is ignored.
>     >     > That is the whole point about 3 replicas.
>     >     >
>     >     > On Fri, Jan 20, 2012 at 10:43 AM, Nikolaus Rath
>     <Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     >     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>
>     >     > <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>>> wrote:
>     >     >
>     >     >     Hi,
>     >     >
>     >     >     What happens if one of the nodes is down? Especially if that
>     >     node holds
>     >     >     the newest copy?
>     >     >
>     >     >     Thanks,
>     >     >     Nikolaus
>     >     >
>     >     >     On 01/20/2012 12:33 PM, Stephen Broeker wrote:
>     >     >     > The X-Newest header can be used by a GET Operation to
>     ensure
>     >     that
>     >     >     all of the
>     >     >     > Storage Nodes (3 by default) are queried for the
>     latest copy of
>     >     >     the Object.
>     >     >     > The COPY Object operation already has this functionality.
>     >     >     >
>     >     >     > On Fri, Jan 20, 2012 at 9:12 AM, Nikolaus Rath
>     >     <Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>
>     >     >     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>>
>     >     >     > <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>
>     >     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>>>> wrote:
>     >     >     >
>     >     >     >     Hi,
>     >     >     >
>     >     >     >     No one able to further clarify this?
>     >     >     >
>     >     >     >     Does swift offer there read-after-create
>     consistence like
>     >     >     >     non-us-standard S3? What are the precise syntax and
>     >     semantics of
>     >     >     >     X-Newest header?
>     >     >     >
>     >     >     >     Best,
>     >     >     >     Nikolaus
>     >     >     >
>     >     >     >
>     >     >     >     On 01/18/2012 10:15 AM, Nikolaus Rath wrote:
>     >     >     >     > Michael Barton <mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>
>     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>>
>     >     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>
>     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>>>
>     >     >     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>
>     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>>
>     >     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>
>     >     <mailto:mike-launchpad at weirdlooking.com
>     <mailto:mike-launchpad at weirdlooking.com>>>>> writes:
>     >     >     >     >> On Tue, Jan 17, 2012 at 4:55 PM, Nikolaus Rath
>     >     >     <Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>
>     >     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>>
>     >     >     >     <mailto:Nikolaus at rath.org
>     <mailto:Nikolaus at rath.org> <mailto:Nikolaus at rath.org
>     <mailto:Nikolaus at rath.org>>
>     >     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>
>     <mailto:Nikolaus at rath.org <mailto:Nikolaus at rath.org>>>>> wrote:
>     >     >     >     >>> Amazon S3 and Google Storage make very
>     explicit (non-)
>     >     >     consistency
>     >     >     >     >>> guarantees for stored objects. I'm looking for
>     a similar
>     >     >     >     documentation
>     >     >     >     >>> about OpenStack's Swift, but haven't had much
>     success.
>     >     >     >     >>
>     >     >     >     >> I don't think there's any documentation on
>     this, but
>     >     it would
>     >     >     >     probably
>     >     >     >     >> be good to write up.  Consistency in Swift is very
>     >     similar
>     >     >     to S3.
>     >     >     >     >> That is, there aren't many non-eventual consistency
>     >     guarantees.
>     >     >     >     >>
>     >     >     >     >> Listing updates can happen asynchronously
>     (especially
>     >     under
>     >     >     >     load), and
>     >     >     >     >> older versions of files can show up in requests
>     (deletes
>     >     >     are just a
>     >     >     >     >> new "deleted" version of the file).
>     >     >     >     >
>     >     >     >     > Ah, ok. Thanks a lot for stating this so explicitly.
>     >     There seems
>     >     >     >     to be a
>     >     >     >     > lot of confusion about this, now I can at least
>     point
>     >     people to
>     >     >     >     > something.
>     >     >     >     >
>     >     >     >     >> Swift can generally be relied on for
>     read-after-write
>     >     >     consistency,
>     >     >     >     >> like S3's regions other than the the US
>     Standard region.
>     >     >      The reason
>     >     >     >     >> S3 in US Standard doesn't have this guarantee
>     is because
>     >     >     it's more
>     >     >     >     >> geographically widespread - something Swift
>     isn't good at
>     >     >     yet.  I can
>     >     >     >     >> imagine we'll have the same limitation when we
>     get there.
>     >     >     >     >
>     >     >     >     > Do you mean read-after-create consistency? Because
>     >     below you
>     >     >     say about
>     >     >     >     > read-after-write:
>     >     >     >     >
>     >     >     >     >>> - If I receive a (non-error) response to a PUT
>     >     request, am I
>     >     >     >     guaranteed
>     >     >     >     >>> that the object will be immediately included
>     in all
>     >     object
>     >     >     >     listings in
>     >     >     >     >>> every possible situation?
>     >     >     >     >>
>     >     >     >     >> Nope.
>     >     >     >     >
>     >     >     >     > ..so is there such a guarantee for PUTs of *new*
>     objects
>     >     >     (like S3 non
>     >     >     >     > us-classic), or does "can generally be relied
>     on" just
>     >     mean
>     >     >     that the
>     >     >     >     > chances for new puts are better?
>     >     >     >     >
>     >     >     >     >> Also like S3, Swift can't make any strong
>     guarantees
>     >     about
>     >     >     >     >> read-after-update or read-after-delete consistency.
>     >      We do
>     >     >     have an
>     >     >     >     >> "X-Newest" header that can be added to GETs and
>     HEADs to
>     >     >     make the
>     >     >     >     >> proxy do a quorum of backend servers and return the
>     >     newest
>     >     >     available
>     >     >     >     >> version, which greatly improves these, at the
>     cost of
>     >     latency.
>     >     >     >     >
>     >     >     >     > That sounds very interesting. Could you give
>     some more
>     >     >     details on what
>     >     >     >     > exactly is guaranteed when using this header?
>     What happens
>     >     >     if the
>     >     >     >     server
>     >     >     >     > having the newest copy is down?
>     >     >     >     >
>     >     >     >     >>> - If the swift server looses an object, will the
>     >     object name
>     >     >     >     still be
>     >     >     >     >>> returned in object listings? Will attempts to
>     >     retrieve it
>     >     >     result
>     >     >     >     in 404
>     >     >     >     >>> errors (as if it never existed) or a different
>     error?
>     >     >     >     >>
>     >     >     >     >> It will show up in listings, but give a 404
>     when you
>     >     attempt to
>     >     >     >     >> retrieve it.  I'm not sure how we can improve that
>     >     with Swift's
>     >     >     >     >> general model, but feel free to make suggestions.
>     >     >     >     >
>     >     >     >     > From an application programmers point of view, it
>     >     would be very
>     >     >     >     helpful
>     >     >     >     > if lost objects could be distinguished from
>     non-existing
>     >     >     object by a
>     >     >     >     > different HTTP error. Trying to access a
>     non-existing
>     >     object may
>     >     >     >     > indicate a bug in the application, so it would
>     be nice to
>     >     >     know when it
>     >     >     >     > happens.
>     >     >     >     >
>     >     >     >     > Also, it would be very helpful if there was a
>     way to list
>     >     >     all lost
>     >     >     >     > objects without having to issue HEAD requests
>     for every
>     >     >     stored object.
>     >     >     >     > Could this information be added to the XML and JSON
>     >     output of
>     >     >     >     container
>     >     >     >     > listings? Then an application would have the
>     chance to
>     >     >     periodically
>     >     >     >     > check for lost data, rather than having to
>     handle all lost
>     >     >     objects at
>     >     >     >     > the instant they're required.
>     >     >     >     >
>     >     >     >     >
>     >     >     >     > I am working on a swift backend for S3QL
>     >     >     >     > (http://code.google.com/p/s3ql/), a program that
>     exposes
>     >     >     online cloud
>     >     >     >     > storage as a local UNIX file system. To prevent data
>     >     >     corruption, there
>     >     >     >     > are two requirements that I'm currently
>     struggling to
>     >     >     provide with the
>     >     >     >     > swift backend:
>     >     >     >     >
>     >     >     >     > - There needs to be a way to reliably check if
>     one object
>     >     >     (holding the
>     >     >     >     >   file system metadata) is the newest version.
>     >     >     >     >
>     >     >     >     >   The S3 backend does this by requiring storage
>     in the non
>     >     >     us-classic
>     >     >     >     >   regions and using list-after-create
>     consistency with a
>     >     >     marker object
>     >     >     >     >   that has has a "generation number" of the metadata
>     >     >     embedded in its
>     >     >     >     >   name.
>     >     >     >     >
>     >     >     >     >   I'm not yet sure if this would work with swift
>     as well
>     >     >     (the google
>     >     >     >     >   storage backend just relies on the strong
>     >     read-after-write
>     >     >     >     >   consistency).
>     >     >     >     >
>     >     >     >     > - The file system checker needs a way to
>     identify lost
>     >     objects.
>     >     >     >     >
>     >     >     >     >   Here the S3 backend just relies on the durability
>     >     >     guarantee that
>     >     >     >     >   effectively no object will ever be lost.
>     >     >     >     >
>     >     >     >     >   Again, I'm not sure how to implement this for
>     swift.
>     >     >     >     >
>     >     >     >     >
>     >     >     >     > Any suggestions?
>     >     >     >     >
>     >     >     >     >
>     >     >     >     >
>     >     >     >     > Best,
>     >     >     >     >
>     >     >     >     >    -Nikolaus
>     >     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >       -Nikolaus
>     >     >     >
>     >     >     >     --
>     >     >     >      »Time flies like an arrow, fruit flies like a
>     Banana.«
>     >     >     >
>     >     >     >      PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF
>     A9AD B7F8
>     >     >     AE4E 425C
>     >     >     >
>     >     >     >     _______________________________________________
>     >     >     >     Mailing list: https://launchpad.net/~openstack
>     >     >     >     Post to     : openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>
>     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>>
>     >     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>
>     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>>>
>     >     >     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>
>     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>>
>     >     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>
>     >     <mailto:openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>>>>
>     >     >     >     Unsubscribe : https://launchpad.net/~openstack
>     >     >     >     More help   : https://help.launchpad.net/ListHelp
>     >     >     >
>     >     >     >
>     >     >
>     >     >
>     >     >       -Nikolaus
>     >     >
>     >     >     --
>     >     >      »Time flies like an arrow, fruit flies like a Banana.«
>     >     >
>     >     >      PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8
>     >     AE4E 425C
>     >     >
>     >     >
>     >
>     >
>     >       -Nikolaus
>     >
>     >     --
>     >      »Time flies like an arrow, fruit flies like a Banana.«
>     >
>     >      PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8
>     AE4E 425C
>     >
>     >
> 
> 
>       -Nikolaus
> 
>     --
>      »Time flies like an arrow, fruit flies like a Banana.«
> 
>      PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C
> 
>     _______________________________________________
>     Mailing list: https://launchpad.net/~openstack
>     Post to     : openstack at lists.launchpad.net
>     <mailto:openstack at lists.launchpad.net>
>     Unsubscribe : https://launchpad.net/~openstack
>     More help   : https://help.launchpad.net/ListHelp
> 
> 


   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C




More information about the Openstack mailing list