Open Stack

Tue Jul 29 22:03:39 UTC 2014

Hi all,

As I mentioned in a previous IRC, when writing our first few policies I had trouble using the tables we currently use to represent external data sources like Nova/Neutron.  

The main problem is that wide tables (those with many columns) are hard to use.  (a) it is hard to remember what all the columns are, (b) it is easy to mistakenly use the same variable in two different tables in the body of the rule, i.e. to create an accidental join, (c) changes to the datasource drivers can require tedious/error-prone modifications to policy.

I see several options.  Once we choose something, I’ll write up a spec and include the other options as alternatives.

1) Add a preprocessor to the policy engine that makes it easier to deal with large tables via named-argument references.

Instead of writing a rule like

p(port_id, name) :-
    neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)

we would write

p(id, nme) :-
    neutron:ports(port_id=id, name=nme)

The preprocessor would fill in all the missing variables and hand the original rule off to the Datalog engine.

Pros: (i) leveraging vanilla database technology under the hood
      (ii) policy is robust to changes in the fields of the original data b/c the Congress data model is different than the Nova/Neutron data models
Cons: (i) we will need to invert the preprocessor when showing rules/traces/etc. to the user
      (ii) a layer of translation makes debugging difficult

2) Be disciplined about writing narrow tables and write tutorials/recommendations demonstrating how.

Instead of a table like...
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)

we would have many tables...
neutron:ports(port_id)
neutron:ports.addr_pairs(port_id, addr_pairs)
neutron:ports.security_groups(port_id, security_groups)
neutron:ports.extra_dhcp_opts(port_id, extra_dhcp_opts)
neutron:ports.name(port_id, name)
...

People writing policy would write rules such as ...

p(x) :- neutron:ports.name(port, name), ...

[Here, the period e.g. in ports.name is not an operator--just a convenient way to spell the tablename.]

To do this, Congress would need to know which columns in a table are sufficient to uniquely identify a row, which in most cases is just the ID.

Pros: (i) this requires only changes in the datasource drivers; everything else remains the same
      (ii) still leveraging database technology under the hood
      (iii) policy is robust to changes in fields of original data
Cons: (i) datasource driver can force policy writer to use wide tables
      (ii) this data model is much different than the original data models
      (iii) we need primary-key information about tables

3) Enhance the Congress policy language to handle objects natively.

Instead of writing a rule like the following ...

p(port_id, name, group) :-
    neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host),
    neutron:ports.security_groups(security_group, group)

we would write a rule such as
p(port_id, name) :-
    neutron:ports(port),
    port.name(name),
    port.id(port_id),
    port.security_groups(group)

The big difference here is that the period (.) is an operator in the language, just as in C++/Java.

Pros:
(i) The data model we use in Congress is almost exactly the same as the data model we use in Neutron/Nova.

(ii) Policy is robust to changes in the Neutron/Nova data model as long as those changes only ADD fields.

(iii) Programmers may be slightly more comfortable with this language.

Cons:

(i) The obvious implementation (changing the engine to implement the (.) operator directly is quite a change from traditional database technology.  At this point, that seems risky.

(ii) It is unclear how to implement this via a preprocessor (thereby leveraging database technology).  The key problem I see is that we would need to translate port.name(...) into something like option (2) above.  The difficulty is that TABLE could sometimes be a port, sometimes be a network, sometimes be a subnet, etc.

(iii) Requires some extra syntactic restrictions to ensure we don't lose decidability.

(iv) Because the Congress and Nova/Neutron models are the same, changes to the Nova/Neutron model can require rewriting policy.

Thoughts?
Tim

Open Stack

[openstack-dev] [Congress] data-source renovation

OpenStack

Community

Documentation

Branding & Legal