[openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

Robert Collins robertc at robertcollins.net
Wed Sep 3 20:13:01 UTC 2014


On 3 September 2014 23:50, Nejc Saje <nsaje at redhat.com> wrote:

Forgive my slowness :).

> Disclaimer: in Ironic terms, node = conductor, key = host

Sadly not inside the hash_ring code :/. host == conductor, key == data.

> The test I linked fails with Ironic hash ring code (specifically the part
> that tests consistency). With 1000 keys being mapped to 10 nodes, when you
> add a node:
> - current ceilometer code remaps around 7% of the keys (< 1/#nodes)
> - Ironic code remaps > 90% of the keys

Ok thats pretty definitive and definitely not intended. However
remapping 7% when adding 10% capacity is also undesirable - we'd like
to remap 1/11 -> +-9%.

> The problem lies in the way you build your hash ring[1]. You initialize a
> statically-sized array and divide its fields among nodes. When you do a
> lookup, you check which field in the array the key hashes to and then return
> the node that that field belongs to. This is the wrong approach because when
> you add a new node, you will resize the array and assign the fields
> differently, but the same key will still hash to the same field and will
> therefore hash to a different node.

You're referring to part2host where we round-robin using mod to map a
partition (hash value of key) to a host(conductor). Then when we add a
conductor this entire map scales out - yup I see the issue.

Have you filed a bug for this?

> Nodes must be hashed onto the ring as well, statically chopping up the ring
> and dividing it among nodes isn't enough for consistency.
>
> Cheers,
> Nejc
>
>
>>
>> The Ironic hash_ring implementation uses a hash:
>>      def _get_partition(self, data):
>>          try:
>>              return (struct.unpack_from('>I',
>> hashlib.md5(data).digest())[0]
>>                      >> self.partition_shift)
>>          except TypeError:
>>              raise exception.Invalid(
>>                      _("Invalid data supplied to HashRing.get_hosts."))
>>
>>
>> so I don't see the fixed size thing you're referring to. Could you
>> point a little more specifically? Thanks!
>>
>> -Rob
>>
>> On 1 September 2014 19:48, Nejc Saje <nsaje at redhat.com> wrote:
>>>
>>> Hey guys,
>>>
>>> in Ceilometer we're using consistent hash rings to do workload
>>> partitioning[1]. We've considered generalizing your hash ring
>>> implementation
>>> and moving it up to oslo, but unfortunately your implementation is not
>>> actually consistent, which is our requirement.
>>>
>>> Since you divide your ring into a number of equal sized partitions,
>>> instead
>>> of hashing hosts onto the ring, when you add a new host,
>>> an unbound amount of keys get re-mapped to different hosts (instead of
>>> the
>>> 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the
>>> test in aforementioned patch[2].
>>>
>>> If this is good enough for your use-case, great, otherwise we can get a
>>> generalized hash ring implementation into oslo for use in both projects
>>> or
>>> we can both use an external library[3].
>>>
>>> Cheers,
>>> Nejc
>>>
>>> [1] https://review.openstack.org/#/c/113549/
>>> [2]
>>> https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
>>> [3] https://pypi.python.org/pypi/hash_ring
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list