Hello:

I agree with having a single API meaning for all backends. We currently support stateless SGs in iptables and ML2/OVN and both backends provide the same behaviour: a rule won't create an opposite direction counterpart by default, the user needs to define it explicitly.

The discussion here could be the default behaviour for standard services:
* DHCP service is currently supported in iptables, native OVS and OVN. This should be supported even without any rule allowed (as is now). Of course, we need to explicitly document that.
* DHCPv6 [1]: unlike Slawek, I'm in favor of allowing this traffic by default, as part of the DHCP protocol traffic allowance.
* Metadata service: this is not a network protocol and we should not consider it. Actually this service is working now (with stateful SGs) because of the default SG egress rules we add. So I'm not in favor of [2]

Regards.

[1]https://review.opendev.org/c/openstack/neutron/+/877049
[2]https://review.opendev.org/c/openstack/neutron/+/876659

On Mon, Mar 20, 2023 at 10:19 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
On Mon, Mar 20, 2023 at 12:03 PM Slawek Kaplonski <skaplons@redhat.com> wrote:
>
> Hi,
>
>
> Dnia piątek, 17 marca 2023 16:07:44 CET Ihar Hrachyshka pisze:
>
> > Hi all,
>
> >
>
> > (I've tagged the thread with [ovn] because this question was raised in
>
> > the context of OVN, but it really is about the intent of neutron
>
> > stateless SG API.)
>
> >
>
> > Neutron API supports 'stateless' field for security groups:
>
> > https://docs.openstack.org/api-ref/network/v2/index.html#stateful-security-groups-extension-stateful-security-group
>
> >
>
> > The API reference doesn't explain the intent of the API, merely
>
> > walking through the field mechanics, as in
>
> >
>
> > "The stateful security group extension (stateful-security-group) adds
>
> > the stateful field to security groups, allowing users to configure
>
> > stateful or stateless security groups for ports. The existing security
>
> > groups will all be considered as stateful. Update of the stateful
>
> > attribute is allowed when there is no port associated with the
>
> > security group."
>
> >
>
> > The meaning of the API is left for users to deduce. It's customary
>
> > understood as something like
>
> >
>
> > "allowing to bypass connection tracking in the firewall, potentially
>
> > providing performance and simplicity benefits" (while imposing
>
> > additional complexity onto rule definitions - the user now has to
>
> > explicitly define rules for both directions of a duplex connection.)
>
> > [This is not an official definition, nor it's quoted from a respected
>
> > source, please don't criticize it. I don't think this is an important
>
> > point here.]
>
> >
>
> > Either way, the definition doesn't explain what should happen with
>
> > basic network services that a user of Neutron SG API is used to rely
>
> > on. Specifically, what happens for a port related to a stateless SG
>
> > when it trying to fetch metadata from 169.254.169.254 (or its IPv6
>
> > equivalent), or what happens when it attempts to use SLAAC / DHCPv6
>
> > procedure to configure its IPv6 stack.
>
> >
>
> > As part of our testing of stateless SG implementation for OVN backend,
>
> > we've noticed that VMs fail to configure via metadata, or use SLAAC to
>
> > configure IPv6.
>
> >
>
> > metadata: https://bugs.launchpad.net/neutron/+bug/2009053
>
> > slaac: https://bugs.launchpad.net/neutron/+bug/2006949
>
> >
>
> > We've noticed that adding explicit SG rules to allow 'returning'
>
> > communication for 169.254.169.254:80 and RA / NA fixes the problem.
>
> >
>
> > I figured that these services are "base" / "basic" and should be
>
> > provided to ports regardless of the stateful-ness of SG. I proposed
>
> > patches for this here:
>
> >
>
> > metadata series: https://review.opendev.org/q/topic:bug%252F2009053
>
> > RA / NA: https://review.opendev.org/c/openstack/neutron/+/877049
>
> >
>
> > Discussion in the patch that adjusts the existing stateless SG test
>
> > scenarios to not create explicit SG rules for metadata and ICMP
>
> > replies suggests that it's not a given / common understanding that
>
> > these "base" services should work by default for stateless SGs.
>
> >
>
> > See discussion in comments here:
>
> > https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/876692
>
> >
>
> > While this discussion is happening in the context of OVN, I think it
>
> > should be resolved in a broader context. Specifically, a decision
>
> > should be made about what Neutron API "means" by stateless SGs, and
>
> > how "base" services are supposed to behave. Then backends can act
>
> > accordingly.
>
> >
>
> > There's also an open question of how this should be implemented.
>
> > Whether Neutron would like to create explicit SG rules visible in API
>
> > that would allow for the returning traffic and that could be deleted
>
> > as needed, or whether backends should do it implicitly. We already
>
> > have "default" egress rules, so there's a precedent here. On the other
>
> > hand, the egress rules are broad (allowing everything) and there's
>
> > more rationale to delete them and replace them with tighter filters.
>
> > In my OVN series, I implement ACLs directly in OVN database, without
>
> > creating SG rules in Neutron API.
>
> >
>
> > So, questions for the community to clarify:
>
> > - whether Neutron API should define behavior of stateless SGs in general,
>
> > - if so, whether Neutron API should also define behavior of stateless
>
> > SGs in terms of "base" services like metadata and DHCP,
>
> > - if so, whether backends should implement the necessary filters
>
> > themselves, or Neutron will create default SG rules itself.
>
>
> I think that we should be transparent and if we need any SG rules like that to allow some traffic, those rules should be be added in visible way for user.
>
> We also have in progress RFE https://bugs.launchpad.net/neutron/+bug/1983053 which may help administrators to define set of default SG rules which will be in each new SG. So if we will now make those additional ACLs to be visible as SG rules in SG it may be later easier to customize it.
>
> If we will hard code ACLs to allow ingress traffic from metadata server or RA/NA packets there will be IMO inconsistency in behaviour between stateful and stateless SGs as for stateful user will be able to disallow traffic between vm and metadata service (probably there's no real use case for that but it's possible) and for stateless it will not be possible as ingress rules will be always there. Also use who knows how stateless SG works may even treat it as bug as from Neutron API PoV this traffic to/from metadata server would work as stateful - there would be rule to allow egress traffic but what actually allows ingress response there?
>

Thanks for clarifying the rationale on picking SG rules and not
per-backend implementation.

What would be your answer to the two other questions in the list
above, specifically, "whether Neutron API should define behavior of
stateless SGs in general" and "whether Neutron API should define
behavior of stateless SGs in relation to metadata / RA / NA". Once we
have agreement on these points, we can discuss the exact mechanism -
whether to implement in backend or in API. But these two questions are
first order in my view.

(To give an idea of my thinking, I believe API definition should not
only define fields and their mechanics but also semantics, so

- yes, api-ref should define the meaning ("behavior") of stateless SG
in general, and
- yes, api-ref should also define the meaning ("behavior") of
stateless SG in relation to "standard" services like ipv6 addressing
or metadata.

As to the last question - whether it's up to ml2 backend to implement
the behavior, or up to the core SG database plugin - I don't have a
strong opinion. I lean to "backend" solution just because it allows
for more granular definition because SG rules may not express some
filter rules, e.g. source port for metadata replies (an unfortunate
limitation of SG API that we inherited from AWS?). But perhaps others
prefer paying the price for having neutron ml2 plugin enforcing the
behavior consistently across all backends.

>
> >
>
> > I hope I laid the problem out clearly, let me know if anything needs
>
> > clarification or explanation.
>
>
> Yes :) At least for me.
>
>
> >
>
> > Yours,
>
> > Ihar
>
> >
>
> >
>
> >
>
>
>
> --
>
> Slawek Kaplonski
>
> Principal Software Engineer
>
> Red Hat