[openstack-dev] [nova] question about e41fb84 "fix anti-affinity race condition on boot"

John Garbutt john at johngarbutt.com
Mon Mar 17 17:59:02 UTC 2014


On 17 March 2014 17:54, John Garbutt <john at johngarbutt.com> wrote:
> On 15 March 2014 18:39, Chris Friesen <chris.friesen at windriver.com> wrote:
>> Hi,
>>
>> I'm curious why the specified git commit chose to fix the anti-affinity race
>> condition by aborting the boot and triggering a reschedule.
>>
>> It seems to me that it would have been more elegant for the scheduler to do
>> a database transaction that would atomically check that the chosen host was
>> not already part of the group, and then add the instance (with the chosen
>> host) to the group.  If the check fails then the scheduler could update the
>> group_hosts list and reschedule.  This would prevent the race condition in
>> the first place rather than detecting it later and trying to work around it.
>>
>> This would require setting the "host" field in the instance at the time of
>> scheduling rather than the time of instance creation, but that seems like it
>> should work okay.  Maybe I'm missing something though...
>
> We deal with memory races in the same way as this today, when they
> race against the scheduler.
>
> Given the scheduler split, writing that value into the nova db from
> the scheduler would be a step backwards, and it probably breaks lots
> of code that assumes the host is not set until much later.

I forgot to mention, I am starting to be a fan of a two-phase commit
approach, which could deal with these kinds of things in a more
explicit way, before starting the main boot process.

Its not as elegant as a database transaction, but that doesn't seems
possible in the log run, but there could well be something I am
missing here too.

John



More information about the OpenStack-dev mailing list