[Openstack] shuffle(nodes) in Swift

Anatoly Legkodymov anatoly at nexenta.com
Thu Jul 5 16:21:00 UTC 2012


Good day,

File proxy/server.py contains following construction several times 
(simplifed):
     nodes = ring.get_nodes()
     shuffle(nodes)
     for node in nodes:
         ...

I found it useful for balancing from the first sight, but deeper 
investigation shows it doesn't happens. Moreover, iteration without 
shuffle (reading always 1st replica first) will improve performance of 
cloud.

Problem can be split in 2 scenarios: multiple clients read same object 
simultaneously, multiple clients read same object from time to time 
occasionally (periodically).

During simultaneous read - object will be read from disk only once, 
later reads will be done from memory cache. In case of shuffle() - 3 
read disk operations should be done. Object will be cached thrice on all 
servers, consuming 3 times more cache memory. Same network bandwidth 
will be used in case of shuffle() and without.

During occasional reads situation is nearly the same. If object is 
cached - read operation will be faster. If object is not cached, 
shuffle() doesn't change read behavior.

When it comes to read operation of huge amount of objects - here we have 
double randomization. First time it happens with consistency hashing 
when partition is chosen. Second time it happens with shuffle(). There 
is no need to do it.

I propose removing shuffle(nodes) from proxy-server will make memory 
caching 3 times more efficient, without loosing in anything else.

Thank you in advance,
Anatoly Legkodymov.




More information about the Openstack mailing list