[Openstack] shuffle(nodes) in Swift

Samuel Merritt spam at andcheese.org
Thu Jul 5 21:48:20 UTC 2012


On 7/5/12 9:21 AM, Anatoly Legkodymov wrote:
> Good day,
>
> File proxy/server.py contains following construction several times
> (simplifed):
>      nodes = ring.get_nodes()
>      shuffle(nodes)
>      for node in nodes:
>          ...
>
> I found it useful for balancing from the first sight, but deeper
> investigation shows it doesn't happens. Moreover, iteration without
> shuffle (reading always 1st replica first) will improve performance of
> cloud.
>
> Problem can be split in 2 scenarios: multiple clients read same object
> simultaneously, multiple clients read same object from time to time
> occasionally (periodically).
>
> During simultaneous read - object will be read from disk only once,
> later reads will be done from memory cache. In case of shuffle() - 3
> read disk operations should be done. Object will be cached thrice on all
> servers, consuming 3 times more cache memory. Same network bandwidth
> will be used in case of shuffle() and without.

That is not necessarily true. By default, only objects of size <= 5 MiB 
and readable without authentication (i.e. in a public container) are 
allowed to remain in the kernel's buffer-cache. Private files and larger 
files are evicted as they are read. See 
swift.obj.server.DiskFile.__iter__ for the details.

With that in mind, consider a 250 MiB object. Since the object in 
question does not remain in buffer-cache after it is read from disk, the 
object server is limited by available disk IO. Thus, when there are 
multiple simultaneous GET requests for a single object, reading from 3 
disks* will be 3x faster than reading from one disk, so the shuffle() 
increases throughput.

For small (<= 5 MiB), public objects, it is true that 3 copies will live 
in the caches of 3 different machines. However, given the throughput 
increase in the many-simultaneous-readers case, it's a worthwhile tradeoff.

* or whatever the replica count is





More information about the Openstack mailing list