Hi, Dnia piątek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
Hi all,
(I've tagged the thread with [ovn] because this question was raised in the context of OVN, but it really is about the intent of neutron stateless SG API.)
Neutron API supports 'stateless' field for security groups: https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-g...
The API reference doesn't explain the intent of the API, merely walking through the field mechanics, as in
"The stateful security group extension (stateful-security-group) adds the stateful field to security groups, allowing users to configure stateful or stateless security groups for ports. The existing security groups will all be considered as stateful. Update of the stateful attribute is allowed when there is no port associated with the security group."
The meaning of the API is left for users to deduce. It's customary understood as something like
"allowing to bypass connection tracking in the firewall, potentially providing performance and simplicity benefits" (while imposing additional complexity onto rule definitions - the user now has to explicitly define rules for both directions of a duplex connection.) [This is not an official definition, nor it's quoted from a respected source, please don't criticize it. I don't think this is an important point here.]
Either way, the definition doesn't explain what should happen with basic network services that a user of Neutron SG API is used to rely on. Specifically, what happens for a port related to a stateless SG when it trying to fetch metadata from 169.254.169.254 (or its IPv6 equivalent), or what happens when it attempts to use SLAAC / DHCPv6 procedure to configure its IPv6 stack.
As part of our testing of stateless SG implementation for OVN backend, we've noticed that VMs fail to configure via metadata, or use SLAAC to configure IPv6.
metadata: https://bugs.launchpad.net/neutron/+bug/2009053 slaac: https://bugs.launchpad.net/neutron/+bug/2006949
We've noticed that adding explicit SG rules to allow 'returning' communication for 169.254.169.254:80 and RA / NA fixes the problem.
I figured that these services are "base" / "basic" and should be provided to ports regardless of the stateful-ness of SG. I proposed patches for this here:
metadata series: https://review.opendev.org/q/topic:bug%252F2009053 RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
Discussion in the patch that adjusts the existing stateless SG test scenarios to not create explicit SG rules for metadata and ICMP replies suggests that it's not a given / common understanding that these "base" services should work by default for stateless SGs.
See discussion in comments here: https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
While this discussion is happening in the context of OVN, I think it should be resolved in a broader context. Specifically, a decision should be made about what Neutron API "means" by stateless SGs, and how "base" services are supposed to behave. Then backends can act accordingly.
There's also an open question of how this should be implemented. Whether Neutron would like to create explicit SG rules visible in API that would allow for the returning traffic and that could be deleted as needed, or whether backends should do it implicitly. We already have "default" egress rules, so there's a precedent here. On the other hand, the egress rules are broad (allowing everything) and there's more rationale to delete them and replace them with tighter filters. In my OVN series, I implement ACLs directly in OVN database, without creating SG rules in Neutron API.
So, questions for the community to clarify: - whether Neutron API should define behavior of stateless SGs in general, - if so, whether Neutron API should also define behavior of stateless SGs in terms of "base" services like metadata and DHCP, - if so, whether backends should implement the necessary filters themselves, or Neutron will create default SG rules itself.
I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user. We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it. If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
I hope I laid the problem out clearly, let me know if anything needs clarification or explanation.
Yes :) At least for me.
Yours, Ihar
-- Slawek Kaplonski Principal Software Engineer Red Hat