[openstack-dev] [nova] question about e41fb84 "fix anti-affinity race condition on boot"

Sylvain Bauza sylvain.bauza at gmail.com
Tue Mar 18 10:16:35 UTC 2014


Hi Chris,


2014-03-18 0:36 GMT+01:00 Chris Friesen <chris.friesen at windriver.com>:

> On 03/17/2014 05:01 PM, Sylvain Bauza wrote:
>
>
>> There are 2 distinct cases :
>> 1. there are multiple schedulers involved in the decision
>> 2. there is one single scheduler but there is a race condition on it
>>
>
>
>  About 1., I agree we need to see how the scheduler (and later on Gantt)
>> could address decision-making based on distributed engines. At least, I
>> consider the no-db scheduler blueprint responsible for using memcache
>> instead of a relational DB could help some of these issues, as memcached
>> can be distributed efficiently.
>>
>
> With a central database we could do a single atomic transaction that looks
> something like "select the first host A from list of hosts L that is not in
> the list of hosts used by servers in group G and then set the host field
> for server S to A".  In that context simultaneous updates can't happen
> because they're serialized by the central database.
>
> How would one handle the above for simultaneous scheduling operations
> without a centralized data store?  (I've never played with memcached, so
> I'm not really familiar with what it can do.)
>
>
See the rationale here for memcached-based scheduler :
https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
The idea is to leverage the capabilities of distributed memcached servers
with synchronization so that the decision would be scalable. As said in the
blueprint, another way would be to make use of RPC fanouts, but that's
something Openstack in general tries to avoid.



>
>  About 2., that's a concurrency issue which can be addressed thanks to
>> common practices for synchronizing actions. IMHO, a local lock can be
>> enough for ensuring isolation
>>
>
> It's not that simple though.  Currently the scheduler makes a decision,
> but the results of that decision aren't actually kept in the scheduler or
> written back to the db until much later when the instance is actually
> spawned on the compute node.  So when the next scheduler request comes in
> we violate the scheduling policy.  Local locking wouldn't help this.
>
>
>
Uh, you're right, missed that crucial point. That said, we should consider
that as a classlcal problem of placement with deferral action. One
possibility would be to consider that the host is locked to this group at
the scheduling decision time, even if the first instance hasn't yet booted.
Consider it as a "cache" with TTL if you wish. Thus, that implies the
scheduler would need to have a feedback value from the compute node saying
that the instance really booted. If no ACK comes from the compute node,
once the TTL vanishes, the lock is freed.

-Sylvain



> Chris
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140318/39b069f3/attachment.html>


More information about the OpenStack-dev mailing list