Open Stack

Thu Nov 29 14:37:24 UTC 2018

On 11/29/18 2:00 PM, Jay Pipes wrote:
> On 11/29/2018 04:28 AM, Bogdan Dobrelya wrote:
>> On 11/28/18 8:55 PM, Doug Hellmann wrote:
>>> I thought the preferred solution for more complex settings was config 
>>> maps. Did that approach not work out?
>>>
>>> Regardless, now that the driver work is done if someone wants to take 
>>> another stab at etcd integration it’ll be more straightforward today.
>>>
>>> Doug
>>>
>>
>> While sharing configs is a feasible option to consider for large scale 
>> configuration management, Etcd only provides a strong consistency, 
>> which is also known as "Unavailable" [0]. For edge scenarios, to 
>> configure 40,000 remote computes over WAN connections, we'd rather 
>> want instead weaker consistency models, like "Sticky Available" [0]. 
>> That would allow services to fetch their configuration either from a 
>> central "uplink" or locally as well, when the latter is not accessible 
>> from remote edge sites. Etcd cannot provide 40,000 local endpoints to 
>> fit that case I'm afraid, even if those would be read only replicas. 
>> That is also something I'm highlighting in the paper [1] drafted for 
>> ICFC-2019.
>>
>> But had we such a sticky available key value storage solution, we 
>> would indeed have solved the problem of multiple configuration 
>> management system execution for thousands of nodes as James describes it.
> 
> It's not that etcd is incapable of providing something like this. It's 
> that a *single* etcd KVS used by 40K compute nodes across a 
> disaggregated control plane would not be available to all of those nodes 
> simultaneously.
> 
> But you could certainly use etcd as the data store to build a sticky 
> available configuration data store. If, for example, you had many local 
> [1] etcd KVS that stored local data and synchronized the local data set 
> with other etcd KVS endpoints when a network partition was restored, you 
> could get such a system that was essentially "sticky available" for all 
> intents and purposes.
> 
> Come to think of it, you could do the same with a SQLite DB, ala Swift's 
> replication of SQLite DBs via rsync.
> 
> But, at the risk of sounding like a broken record, at the end of the 
> day, many of OpenStack's core services -- notably Nova -- were not 
> designed for disaggregated control planes. They were designed for the 
> datacenter, with tightly-packed compute resources and low-latency links 
> for the control plane.
> 
> The entire communication bus and state management system would need to 
> be redesigned from the nova-compute to the nova-conductor for (far) edge 
> case clouds to be a true reality.
> 
> Instead of sending all data updates synchronously from each nova-compute 
> to nova-conductor, the communication bus needs to be radically 
> redesigned so that the nova-compute uses a local data store *as its 
> primary data storage* and then asynchronously sends batched updates to 
> known control plane endpoints when those regular network partitions 
> correct themselves.
> 
> The nova-compute manager will need to be substantially hardened to keep 
> itself up and running (and writing to that local state storage) for long 
> periods of time and contain all the logic to resync itself when network 
> uplinks become available again.
> 
> Finally, if those local nova-computes need to actually *do* anything 
> other than keep existing VMs/baremetal machines up and running, then a 
> local Compute API service needs to be made available in the far edge 
> sites themselves -- offering some subset of Compute API functionality to 
> control the VMs in that local site. Otherwise, the whole "multiple 
> department stores running an edge OpenStack site that can tolerate the 
> Mother Ship being down" isn't a thing that will work.
> 
> Like I said, pretty much a complete redesign of the nova control plane...

We derived a little bit off the topic... but that all is valid for the 
post-MVP Edge architecture phases [0] targeted for multiple (aka 
disaggregated/autonomous/local vs central) control planes, indeed.

Although there are more options than that complete redesign. IIUC, does 
the latter assume supporting alternative to SQL/AMQP-ish data/messaging 
backends for Nova and OpenStack in general? That is only an option (see 
such backends examples [1][2]), though I love it the most :)

Other options may be creating client libraries acting on top of APIs or 
existing DB/MQ backends and performing low-level data synchronization, 
or acting as an API re-translators, over multiple control planes. And 
AFAICT that would *not* require complete redesign of supported backends 
nor types of transactions in Nova et al. And for MQ, a brokerless qdr or 
something (there was a nice presentation at the summit)...

But in the end, indeed, it is kinda proved in multiple R&D papers, like 
[3],[4] that only causal sticky consistent synchronization with advanced 
conflicts resolving [5] is the best Edge-y/Fog-y choice for both such 
client libraries and causal consistent DB/KVS/MQ backends. I think that 
is something similar what you (Jay) diescribed for multiple Etcd cluster 
exchanging its data? So for that example, such client libraries should 
be maintaining sticky sessions to groups of those Etcd clusters and 
replicate data around doing it the best of causal consistent ways.

PS. That a nice SQLight & rsync combo would not provide us the best of 
eventual consistency world, no, it would rather be something of a "Total 
Available" [6] thing, the lowest of it, like Read Uncommited or 
Monotonic Writes, and would be a very (very) poor choice IMO.

[0] 
https://wiki.openstack.org/w/index.php?title=OpenStack_Edge_Discussions_Dublin_PTG#Features_2
[1] https://www.ronpub.com/OJDB_2015v2i1n02_Elbushra.pdf
[2] http://rainbowfs.lip6.fr/data/RainbowFS-2016-04-12.pdf
[3] https://www.cs.cmu.edu/~dga/papers/cops-sosp2011.pdf
[4] http://www.cs.cornell.edu/lorenzo/papers/cac-tr.pdf
[5] https://ieeexplore.ieee.org/document/8104644
[6] https://jepsen.io/consistency

> 
> Best,
> -jay
> 
> [1] or local-ish, think POPs or even local to the compute node itself...
> 
>> [0] https://jepsen.io/consistency
>> [1] 
>> https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf 

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Open Stack

[openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

OpenStack

Community

Documentation

Branding & Legal