On Mon, Mar 20, 2023 at 12:03 PM Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
Dnia piątek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
Hi all,
(I've tagged the thread with [ovn] because this question was raised in
the context of OVN, but it really is about the intent of neutron
stateless SG API.)
Neutron API supports 'stateless' field for security groups:
https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-g...
The API reference doesn't explain the intent of the API, merely
walking through the field mechanics, as in
"The stateful security group extension (stateful-security-group) adds
the stateful field to security groups, allowing users to configure
stateful or stateless security groups for ports. The existing security
groups will all be considered as stateful. Update of the stateful
attribute is allowed when there is no port associated with the
security group."
The meaning of the API is left for users to deduce. It's customary
understood as something like
"allowing to bypass connection tracking in the firewall, potentially
providing performance and simplicity benefits" (while imposing
additional complexity onto rule definitions - the user now has to
explicitly define rules for both directions of a duplex connection.)
[This is not an official definition, nor it's quoted from a respected
source, please don't criticize it. I don't think this is an important
point here.]
Either way, the definition doesn't explain what should happen with
basic network services that a user of Neutron SG API is used to rely
on. Specifically, what happens for a port related to a stateless SG
when it trying to fetch metadata from 169.254.169.254 (or its IPv6
equivalent), or what happens when it attempts to use SLAAC / DHCPv6
procedure to configure its IPv6 stack.
As part of our testing of stateless SG implementation for OVN backend,
we've noticed that VMs fail to configure via metadata, or use SLAAC to
configure IPv6.
We've noticed that adding explicit SG rules to allow 'returning'
communication for 169.254.169.254:80 and RA / NA fixes the problem.
I figured that these services are "base" / "basic" and should be
provided to ports regardless of the stateful-ness of SG. I proposed
patches for this here:
metadata series: https://review.opendev.org/q/topic:bug%252F2009053
RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
Discussion in the patch that adjusts the existing stateless SG test
scenarios to not create explicit SG rules for metadata and ICMP
replies suggests that it's not a given / common understanding that
these "base" services should work by default for stateless SGs.
See discussion in comments here:
https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
While this discussion is happening in the context of OVN, I think it
should be resolved in a broader context. Specifically, a decision
should be made about what Neutron API "means" by stateless SGs, and
how "base" services are supposed to behave. Then backends can act
accordingly.
There's also an open question of how this should be implemented.
Whether Neutron would like to create explicit SG rules visible in API
that would allow for the returning traffic and that could be deleted
as needed, or whether backends should do it implicitly. We already
have "default" egress rules, so there's a precedent here. On the other
hand, the egress rules are broad (allowing everything) and there's
more rationale to delete them and replace them with tighter filters.
In my OVN series, I implement ACLs directly in OVN database, without
creating SG rules in Neutron API.
So, questions for the community to clarify:
- whether Neutron API should define behavior of stateless SGs in general,
- if so, whether Neutron API should also define behavior of stateless
SGs in terms of "base" services like metadata and DHCP,
- if so, whether backends should implement the necessary filters
themselves, or Neutron will create default SG rules itself.
I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
Thanks for clarifying the rationale on picking SG rules and not per-backend implementation. What would be your answer to the two other questions in the list above, specifically, "whether Neutron API should define behavior of stateless SGs in general" and "whether Neutron API should define behavior of stateless SGs in relation to metadata / RA / NA". Once we have agreement on these points, we can discuss the exact mechanism - whether to implement in backend or in API. But these two questions are first order in my view. (To give an idea of my thinking, I believe API definition should not only define fields and their mechanics but also semantics, so - yes, api-ref should define the meaning ("behavior") of stateless SG in general, and - yes, api-ref should also define the meaning ("behavior") of stateless SG in relation to "standard" services like ipv6 addressing or metadata. As to the last question - whether it's up to ml2 backend to implement the behavior, or up to the core SG database plugin - I don't have a strong opinion. I lean to "backend" solution just because it allows for more granular definition because SG rules may not express some filter rules, e.g. source port for metadata replies (an unfortunate limitation of SG API that we inherited from AWS?). But perhaps others prefer paying the price for having neutron ml2 plugin enforcing the behavior consistently across all backends.
I hope I laid the problem out clearly, let me know if anything needs
clarification or explanation.
Yes :) At least for me.
Yours,
Ihar
--
Slawek Kaplonski
Principal Software Engineer
Red Hat