Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

29 Nov 2018

      On 11/29/18 2:00 PM, Jay Pipes wrote:
...
On 11/29/2018 04:28 AM, Bogdan Dobrelya wrote:
...
On 11/28/18 8:55 PM, Doug Hellmann wrote:
...
I thought the preferred solution for more complex settings was config 
maps. Did that approach not work out?
Regardless, now that the driver work is done if someone wants to take 
another stab at etcd integration it’ll be more straightforward today.
Doug
While sharing configs is a feasible option to consider for large scale 
configuration management, Etcd only provides a strong consistency, 
which is also known as "Unavailable" [0]. For edge scenarios, to 
configure 40,000 remote computes over WAN connections, we'd rather 
want instead weaker consistency models, like "Sticky Available" [0]. 
That would allow services to fetch their configuration either from a 
central "uplink" or locally as well, when the latter is not accessible 
from remote edge sites. Etcd cannot provide 40,000 local endpoints to 
fit that case I'm afraid, even if those would be read only replicas. 
That is also something I'm highlighting in the paper [1] drafted for 
ICFC-2019.
But had we such a sticky available key value storage solution, we 
would indeed have solved the problem of multiple configuration 
management system execution for thousands of nodes as James describes it.
It's not that etcd is incapable of providing something like this. It's 
that a *single* etcd KVS used by 40K compute nodes across a 
disaggregated control plane would not be available to all of those nodes 
simultaneously.
But you could certainly use etcd as the data store to build a sticky 
available configuration data store. If, for example, you had many local 
[1] etcd KVS that stored local data and synchronized the local data set 
with other etcd KVS endpoints when a network partition was restored, you 
could get such a system that was essentially "sticky available" for all 
intents and purposes.
Come to think of it, you could do the same with a SQLite DB, ala Swift's 
replication of SQLite DBs via rsync.
But, at the risk of sounding like a broken record, at the end of the 
day, many of OpenStack's core services -- notably Nova -- were not 
designed for disaggregated control planes. They were designed for the 
datacenter, with tightly-packed compute resources and low-latency links 
for the control plane.
The entire communication bus and state management system would need to 
be redesigned from the nova-compute to the nova-conductor for (far) edge 
case clouds to be a true reality.
Instead of sending all data updates synchronously from each nova-compute 
to nova-conductor, the communication bus needs to be radically 
redesigned so that the nova-compute uses a local data store *as its 
primary data storage* and then asynchronously sends batched updates to 
known control plane endpoints when those regular network partitions 
correct themselves.
The nova-compute manager will need to be substantially hardened to keep 
itself up and running (and writing to that local state storage) for long 
periods of time and contain all the logic to resync itself when network 
uplinks become available again.
Finally, if those local nova-computes need to actually *do* anything 
other than keep existing VMs/baremetal machines up and running, then a 
local Compute API service needs to be made available in the far edge 
sites themselves -- offering some subset of Compute API functionality to 
control the VMs in that local site. Otherwise, the whole "multiple 
department stores running an edge OpenStack site that can tolerate the 
Mother Ship being down" isn't a thing that will work.
Like I said, pretty much a complete redesign of the nova control plane...
We derived a little bit off the topic... but that all is valid for the 
post-MVP Edge architecture phases [0] targeted for multiple (aka 
disaggregated/autonomous/local vs central) control planes, indeed.

Although there are more options than that complete redesign. IIUC, does 
the latter assume supporting alternative to SQL/AMQP-ish data/messaging 
backends for Nova and OpenStack in general? That is only an option (see 
such backends examples [1][2]), though I love it the most :)

Other options may be creating client libraries acting on top of APIs or 
existing DB/MQ backends and performing low-level data synchronization, 
or acting as an API re-translators, over multiple control planes. And 
AFAICT that would *not* require complete redesign of supported backends 
nor types of transactions in Nova et al. And for MQ, a brokerless qdr or 
something (there was a nice presentation at the summit)...

But in the end, indeed, it is kinda proved in multiple R&D papers, like 
[3],[4] that only causal sticky consistent synchronization with advanced 
conflicts resolving [5] is the best Edge-y/Fog-y choice for both such 
client libraries and causal consistent DB/KVS/MQ backends. I think that 
is something similar what you (Jay) diescribed for multiple Etcd cluster 
exchanging its data? So for that example, such client libraries should 
be maintaining sticky sessions to groups of those Etcd clusters and 
replicate data around doing it the best of causal consistent ways.

PS. That a nice SQLight & rsync combo would not provide us the best of 
eventual consistency world, no, it would rather be something of a "Total 
Available" [6] thing, the lowest of it, like Read Uncommited or 
Monotonic Writes, and would be a very (very) poor choice IMO.

[0] 
https://wiki.openstack.org/w/index.php?title=OpenStack_Edge_Discussions_Dubl...
[1] https://www.ronpub.com/OJDB_2015v2i1n02_Elbushra.pdf
[2] http://rainbowfs.lip6.fr/data/RainbowFS-2016-04-12.pdf
[3] https://www.cs.cmu.edu/~dga/papers/cops-sosp2011.pdf
[4] http://www.cs.cornell.edu/lorenzo/papers/cac-tr.pdf
[5] https://ieeexplore.ieee.org/document/8104644
[6] https://jepsen.io/consistency
...
Best,
-jay
[1] or local-ish, think POPs or even local to the compute node itself...
...
[0] https://jepsen.io/consistency
[1] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position...
-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando