[Openstack-operators] New networking solution for Cloud Native apps....
Chris Marino
chris at romana.io
Wed Feb 3 19:07:18 UTC 2016
Hi Tomas, all good questions. Stepping back a bit before I address each
of them individually, the idea behind Romana is to build a simple, high
performance solution that is easy to operate. The primary way we do this is
to use the actual physical network where ever possible. This way operators
can just build whatever kind of physical network they want and get the
visibility and performance of the native network. For OpenStack operators,
this physical network would be what ever kind of underlay they have (or are
building).
What Romana provides on top of the physical network is a layer 3 tenancy
model that avoids all of the virtual networking software that runs the
overlay (thereby reclaiming performance, visibility, etc.). So, the simple
answer to all your questions is going to be to the extent that a user needs
a overlay, they're *not* going to find it on the Romana network.
That said, it has been our experience that the overwhelming use of a tenant
overlay network is to provide *isolation*, not private addressing. The
Romana tenancy model will give tenants the isolation they're looking for,
but not private addressing.
As for the difference between the L3 Romana style isolation and an
overlay's L2 isolation I'd just like to point out that AWS has been quite
successful with their L3 isolation model. In fact, it is the whole new
'Amazon style' application architecture that is the primary focus of the
Romana design. That why we call it a Cloud Native network.
Now, I don't want to dismiss the difference between L2 and L3 networking,
but again I want to point out the the Cloud Native style includes all kinds
of deployment patterns that overcome the network limitations they might
encounter when using an L3 only network (DNS, Service Discovery, replicated
service endpoints, pets not cattle, etc.).
But back to the private addressing question because it is an important
one.......
A consequence of tenants mainly using overlays for isolation is that
they're typically going to NAT to get on and off their tenant networks to
an external network, either to a local network or public IP [incidentally,
this also mean that they've got to manually manage the OpenStack
addresses/CIDRs of each of their tenant networks, which is a major pain].
So, if a tenant has to NAT (from a public IP, etc.), they can do that just
as easily to the Romana IP (something from the Romana 10/8) as they could
to the local (rfc1918) CIDR they set up in OpenStack. So, this is the
direction Romana will be going with regard to access to external networks.
Some kind of NAT'ing service, still to be integrated/implemented.....
The question of where to run the NAT'ing service is also a good one. Maybe
we'll use the Neutron Node. That's possible, or maybe some dedicated
NAT'ing nodes in a DMZ (something like CloudScaling). Whatever we do we'll
need to support a variety of deployment options. We like to think of NAT
as just one of many possible Service Functions that Romana will need to
access, insert, chain, etc.
It is also worth mentioning that whatever we do for NAT (and more generally
for service insertion) is going to require that Romana configures routes to
and from these service nodes. Once we get all this built, it should also
be possible with Romana to bring up an interface with the external IP on
the desired endpoint and configure the routes directly to it, avoiding NAT
entirely.
This way a tenant could get some of their own IPs in to the OpenStack
project, but would also let Romana use its own 10/8 for local traffic.
With that as the broader context, see responses in line below....
On Wed, Feb 3, 2016 at 6:55 AM, Tomas Vondra <vondra at czech-itc.cz> wrote:
>
> >
> Hi Chris!
>
> It's a very interesting project, I like the clean L3 design, without mixing
> OpenVSwitches and namespaces.
>
> But as for myself I'd lean more towards a full L2 implementetion. I for
> example liked the Dragonflow blog posts I have read. I like the way
> Provider
> Networks in Neutron allow me to connect physical boxes to VMs in OpenStack.
>
> My concerns are:
> 1) I'm a service provider. As long as I'm giving out my IP addresses to
> tenants, all is fine. But what if someone wants to bring their own? There
> should be a way out of the fixed addressing scheme in exceptional cases.
>
See above comment for more complete context. We want to be able to support
a tenant bring their own IPs, but only for service endpoints. Fundamental
to Romana's approach is to use IPs that are part of the physical
infrastructure. If you abandon that, you've basically built an overlay.
> 2) What about some legacy distributed apps, which scan the network to find
> all nodes? Think MPI.
>
Not all that familiar with MPI, but my quick scan of this page
<https://en.wikipedia.org/wiki/Message_Passing_Interface> says its a higher
level protocol so it should work on an L3 network. However, there are many
legacy apps that do need L2 adjacency (vMotion, CIFS, discovery, etc.),
which would not work on a Romana network. This is an explicit tradeoff that
comes with a Romana network.
3) What if I already have the 10.0.0.0/8 network used in the datacenter? (I
> think it will be, in most of them.)
>
The Romana network would either need to be its own VLAN and get a full
10/8, or a smaller deployment could live on the existing network. There is
no requirement with Romana that all the Hosts be part of a contiguous CIDR.
*HOWEVER*, each host must have a large contiguous block to use for all the
local endpoints so they can share a single gateway address.
There are all kinds of deployment options, the only real requirement is
that each host have a large CIDR that the Romana IPAM can pull addresses
from for each endpoint.
It has been our experience that OpenStack clusters have their own data
network (i.e. underlay) and ideally, Romana would run on that network. The
default configuration in the demo, the hosts each have a single IP on a
192.168.0/24 Amazon VPC. The host is configured with a gateway to the 10/
network for the tenant VMs (here's the route table after an instal
<http://romana.io/try_romana/openstack/#what-you-can-do>l).
Running a full 10/8 on a separate VLAN brings up the NAT'ing problem, but
see my comments above for how Romana expects that to be deployed.
> 4) What about floating IPs? Or will there be dynamic routing for internet
> access of individual instances?
>
Floating IPs are just one instance of the NATing issue. See comments above.
Routing to some kind of service node that has the Floating IP is where
we'll start, from there all kinds of interesting things to further simplify.
> 5) Do you have a roadmap / feature matrix of what would you like to have
> and
> how it compares to other OpenStack solutions?
>
Nothing very specific right now. Just what you see here
<http://romana.io/roadmap/>. We're still trying to get the basics down.
Near term we're working on how to get Kubernetes to run on OpenStack
without encaps. Kubernetes networking is advancing rapidly to support
multi-tenancy and network policy. As far as OpenStack specific features,
we'd like to be sure that Romana can peacefully co-exist on the underlay
with the other OpenStack network alternatives (its complicated). We don't
want this to be an either/or kind of deployment choice. But rather,
something that's there that lets Cloud Native apps run (specifically
Kubernetes) can run.
> 6) If you ever do another benchmark, focus more on throughput than latency.
> The difference between 1.5 ms and 0.24 ms is not interesting. The one
> between, say, 5 and 10 Gb/s would be. And please document the hardware
> setup
>
Benchmarking is hard. Especially for network performance. Our thinking is
that CPUs today are REALLY, REALLY FAST. There are credible benchmarks that
show hosts running VXLAN VTEPs that easily fill 10GB. More info here
<http://blog.ipspace.net/2015/02/performance-of-hypervisor-based-overlay.html>,
here <http://packetpushers.net/vxlan-udp-ip-ethernet-bandwidth-overheads/> and
here
<http://chinog.org/wp-content/uploads/2015/05/Optimizing-Your-Virtual-Switch-for-VXLAN.pdf>.
So don't really want to claim anything about throughput because it
typically means measuring the CPU load when filling up the pipe. And that's
highly dependent on app, packet size, MTU, etc.
I would like point out that latency is extremely important for chatty apps,
RTT is all based on latency so those apps are really going to benefit from
a direct path from source to dest (avoiding VXLAN routing).
As for my setup, it was pretty simple. I ran OpenStack Compute Nodes on EC2
(large instances) and pinged between tenant VMs on different segments.
First when VMs were on the same host, then when were on different hosts.
Did this for standard OpenStack VXLAN (routing through Neutron Network
node) and Romana.
You can run the demo <http://romana.io/try_romana/openstack/> we've got and
reproduce the results. EC2 networking is notoriously unpredictable, so the
data published is representative of the median results from the runs. They
did vary quite a lot higher (worse), but almost never lower. Which makes
sense since the latency improvements are derived by eliminating the Neutron
Node routing hop <http://romana.io/how/performance/>, not because there is
no encapsulation.
Tomas, hope this help. Please let me know if you have any questions.
Thanks
CM
> ;-).
> Tomas
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160203/0bf2f9ef/attachment.html>
More information about the OpenStack-operators
mailing list