[openstack-dev] [Neutron][Heat] The Neutron API and orchestration

Zane Bitter zbitter at redhat.com
Tue Apr 8 00:28:19 UTC 2014

The Neutron API is a constant cause of pain for us as Heat developers, 
but afaik we've never attempted to bring up the issues we have found in 
a cross-project forum. I've recently been doing some more investigation 
and I want to document the exact ways in which the current Neutron API 
breaks orchestration, both in the hope that a future version of it might 
be better and as a guide for other API authors.

BTW it's my contention that an API that is bad for orchestration is also 
hard to use for the ordinary user as well. When you're trying to figure 
out the order of operations you need to do, there are two times at which 
you could find out you've got it wrong:

1) Before you run the command, when you realise you don't have all of 
the required data yet; or
2) After you run the command, when you get a cryptic error message.

Not only is (1) *mandatory* for a data-driven orchestration system like 
Heat, it offers orders-of-magnitude better user experience for everyone.

I should say at the outset that I know next to nothing about Neutron, 
and one of the goals of this message is to find out which parts I am 
completely wrong about. I did know a little bit about traditional 
networking at one time, and even remember some of it ;)

Neutron has a little documentation on workflow, so let's begin there: 

(1) Create a network
Instinctively, I want a Network to be something like a virtual VRF 
(VVRF?): a separate namespace with it's own route table, within which 
subnet prefixes are not overlapping, but which is completely independent 
of other Networks that may contain overlapping subnets. As far as I can 
tell, this basically seems to be the case. The difference, of course, is 
that instead of having to configure a VRF on every switch/router and 
make sure they're all in sync and connected up in the right ways, I just 
define it in one place globally and Neutron does the rest. I call this 
#winning. Nice work, Neutron.

(2) Associate a subnet with the network
Slightly odd choice of words, because you're actually creating a new 
Subnet (there's no such thing as a Subnet not associated with a 
Network), but this is probably just a minor documentation nit. 
Instinctively, I want a Subnet to be something like a virtual VLAN 
(VVLAN?): at its most basic level, just a group of ports that share a 
broadcast domain, but also having other properties (e.g. if L3 is in 
use, all IP addresses in the subnet should be in the same CIDR). This 
doesn't seem to be the case, though, it's just a CIDR prefix, which 
leaves me wondering how L2 traffic will be treated, as well as how I 
would do things like use both IPv4 and IPv6 on a single port (by 
assigning a port to multiple Subnets?). Looking at the docs, there is a 
much bigger emphasis on DHCP client settings than I expected - surely I 
might want to want to give two sets of ports in the same Subnet 
different DHCP configs? Still, this is not bad - the DHCP configuration 
is done by the time the Subnet is created, so there's no problem in 
connecting stuff to it immediately after.

(3) Boot a VM and attach it to the network
Here's where you completely lost me. I just created a Subnet - maybe a 
bunch of Subnets. I don't want to attach my VM just anywhere in the 
*Network*, I want to attach it to a *particular* Subnet. It's not at all 
obvious where my instance will get attached (at random?), because this 
API just plain takes the Wrong data type. As a user, I'm irritated and 

The situation for orchestration, though, is much, much worse. Because 
the server takes a reference to a network, the dependency graph 
generated from my template will look like this:

    Network <---------- Subnet
           ------------ Server

And yet if the Server is created before the Subnet (as will happen ~50% 
of the time), it will fail. And vice-versa on delete, where the server 
must be removed before the subnet. The dependency graph we needed to 
create was this:

    Network <---------- Subnet <---------- Server

The solution used here was to jury-rig the resource types in Heat with a 
hidden dependency. We can't know which Subnet the server will end up 
attached to, so we create hidden dependencies on all of the ones defined 
in the same template. There's nothing we can do about Subnets defined in 
different templates (Heat allows a tree of templates to be instantiated 
with a single command) - I'm not sure, but it may be possible even now 
to create a tree of stacks that in practice could never be successfully 

The Neutron models in Heat are so riddled with these kinds of invisible 
special-case hacks that all of our public documentation about how Heat 
can be expected to respond to a particular template is rendered 
effectively meaningless with respect to Neutron.

I should add that we can't blame Nova here, because explicitly creating 
a Port doesn't help - it too takes only a network argument, despite 
_requiring_ a Subnet that it will be attached to, presumably at random. 
In fact using a Port makes things even worse, because although there is 
an API for it Nova and Neutron seem to assume that nobody would ever use 
it, and therefore even if you create a port explicitly and pass it to 
Nova to connect a Server, when you disconnect the Server again the Port 
will be deleted at the same time as if you had let Nova create it 
implicitly for you. This issue is currently breaking stack updates 
because we tend to assume that once we've explicitly created something, 
it stays created.

Evidently there is a mechanism for associating a Port with a Subnet, and 
that's by assigning a fixed IP - which is hardly ever what I want. 
There's no middle ground that I can find between specifying the exact, 
fixed IP for a port and just letting it end up somewhere - anywhere - on 
the network, entirely at random.

Let's move on to the L3 extension, starting with Routers. There's kind 
of an inconsistency here, because Routers are virtual devices that I 
need to manage. Hitherto, the point of Neutron was to free me from 
managing individual devices and let me manage the network as a whole. Is 
there a reason I wouldn't want all of the Subnets in the Network to just 
do the Right Thing and make sure everywhere is reachable efficiently 
from everywhere else? If I want something separate, wouldn't I use a 
different Network? (It's not like I have any control over where in a 
Network ports get attached anyway.)

Nonetheless, Routers exist and it appears I have to create one to route 
packets between Subnets. From an orchestration perspective, I'd like 
Router to take a list of Ports to attach to (and of course I'd like each 
Port to be explicitly associated with a Subnet!). I'd be out of luck 
though, because even though the Port list is a property of a Router, you 
can't set it at creation time, only through an update. This is by 
definition possible to do at creation time (if I can do a create call 
immediately followed by an update call then the Neutron API can 
certainly do this internally), so it's very strange to see it 
disallowed. Following this API led us to implement it wrong in Heat as 
well, leading to headaches with floating IPs, about which more later. We 
also mistakenly used a similar design for the Router's external gateway, 
but later corrected it by making it a property of the Router, as it is 
in the API (though we still have to live with a lengthy deprecation 
period). We'll probably end up doing the same with the interfaces.

Of course it goes without saying that the router gateway is just a 
reference to another network and, once again, requires a hidden 
dependency on all of the Subnets in the hopes of picking up the right 
one. BTW I'm just assuming that the definition of the gateway is 
"interface to another Network over which I will do NAT"? I assume that 
because of the generic way in which Floating IPs are handled, with a 
reference to an external network (I guess the operator provides the user 
with the Network UUID for the Internet?) It's not exactly clear why the 
external gateway is special enough that you can have only one interface 
of this type on a Router, but not so special that it would be considered 
a separate thing. There is also a separate Network Gateway, and I have 
no idea what that is...

The big problem with Floating IPs is that you can't create them until 
all the necessary hops in the internetwork have been set up. And, once 
again, there's nothing in the creation parameters that would naturally 
order them - you just pass a reference to the external network. We still 
have a bug open on this, but what we will have to do is create a hidden 
dependency on any RouterInterfaces that connect any Routers whose 
external gateway is the same network where the floating IP is allocated. 
That's about as horrible as it sounds. A Floating IP needs to take as an 
argument a reference to the Router/Gateway which does the NAT:

External       External
Network  <---- Subnet   <---- (gateway)
                             Router <---- Floating IP
Internal                     /               /
Network  <---- Subnet <------<---- Port <----

The bane of my existence during Icehouse development has been the 
ExtraRoutes table. First off, this is broken in a way completely 
unrelated to orchestration: you can't add, remove or change an entry in 
the table without rewriting the whole table, so the whole API is a giant 
race condition waiting to happen. (This can, and IMHO should, be fixed - 
at least for those using the official client - with an ETags header and 
the 409 return code.) Everything about this API, though, is strange. 
It's another one of those only-on-update properties of a Router, though 
in this case that's forced by the fact that you can't attach the Router 
to its Subnets during its creation. An extra route doesn't behave at all 
like a static RIB entry (with a weight and an administrative distance), 
but much like a FIB entry (i.e. it's for routes that have already been 
selected to be active). That part makes sense, but the next hop for a 
FIB entry is a layer 2 address and this takes an IP address. That makes 
no sense to me, since the IP address(es) assigned to the nexthop play no 
part in how packets are forwarded. And, of course, it creates massive 
dependency issues, because we don't know which ports are going to end up 
with the IP addresses required. This API should take a reference to a 
Port as the nexthop. I've been told we can't even simulate this in Heat 
at the moment because a VPN connection doesn't have a port associated 
with it. (If the API accepted _either_ a Port or a VPN connection, that 
would be fine by me though.) So far we've been unable to merge 
ExtraRoutes into Heat, except for a plugin in /contrib, for want of a 
way to make this reliably work in the correct dependency order without 
resorting to progressively worse hacks.

I'm sure fresh horrors await in corners I have not yet dug into. I must 
say that the VPN Service, happily, is one that seems to have done things 
right. Firewall looks pretty good in itself, although the fact that it 
is completely disjoint from any other configuration - i.e. you can't 
even specify which network it applies to, let alone which gateway - is 

Over the past couple of development cycles, we've seen a number of 
proposals to push orchestration-like features into Neutron itself. It is 
now clear to me why: because the Neutron API is illegible to external 
orchestration tools, this leads to people wanting to do an end run 
around it.

I don't expect that the current API can be fixed without breaking 
backwards compatibility, but I hope that folks will take these concepts 
into account the next time the Neutron API gets revised. (I also hope we 
won't see any more proposals to effectively reimplement Heat behind the 
Neutron API ;) Please fell free to include [Heat] in any discussion 
along those lines, we'd be happy to give feedback on any given API 
designs. In exchange, if any Neutron folks are able to explain the exact 
ways in which my ideas about how the current Neutron API does and/or 
should work are wrong and/or crazy, I would be most appreciative :)


More information about the OpenStack-dev mailing list