[Openstack] shuffle(nodes) in Swift
Samuel Merritt
spam at andcheese.org
Thu Jul 5 21:48:20 UTC 2012
On 7/5/12 9:21 AM, Anatoly Legkodymov wrote:
> Good day,
>
> File proxy/server.py contains following construction several times
> (simplifed):
> nodes = ring.get_nodes()
> shuffle(nodes)
> for node in nodes:
> ...
>
> I found it useful for balancing from the first sight, but deeper
> investigation shows it doesn't happens. Moreover, iteration without
> shuffle (reading always 1st replica first) will improve performance of
> cloud.
>
> Problem can be split in 2 scenarios: multiple clients read same object
> simultaneously, multiple clients read same object from time to time
> occasionally (periodically).
>
> During simultaneous read - object will be read from disk only once,
> later reads will be done from memory cache. In case of shuffle() - 3
> read disk operations should be done. Object will be cached thrice on all
> servers, consuming 3 times more cache memory. Same network bandwidth
> will be used in case of shuffle() and without.
That is not necessarily true. By default, only objects of size <= 5 MiB
and readable without authentication (i.e. in a public container) are
allowed to remain in the kernel's buffer-cache. Private files and larger
files are evicted as they are read. See
swift.obj.server.DiskFile.__iter__ for the details.
With that in mind, consider a 250 MiB object. Since the object in
question does not remain in buffer-cache after it is read from disk, the
object server is limited by available disk IO. Thus, when there are
multiple simultaneous GET requests for a single object, reading from 3
disks* will be 3x faster than reading from one disk, so the shuffle()
increases throughput.
For small (<= 5 MiB), public objects, it is true that 3 copies will live
in the caches of 3 different machines. However, given the throughput
increase in the many-simultaneous-readers case, it's a worthwhile tradeoff.
* or whatever the replica count is
More information about the Openstack
mailing list