[openstack-dev] [Congress] data-source renovation
Alex Yip
ayip at vmware.com
Mon Aug 4 19:47:03 UTC 2014
Hi all,
I favor the first approach because it solves the usability problem of wide tables without limiting Congress' ability to use wide tables, or adding extra complexity.
There are legitimate uses for wide tables, so Congress should be able to support them. For example, Congress will need to support very large data sources in the future (TB in size). It is best if Congress uses those databases in place, without creating a local copy of the database, so supporting wide tables and making them easy to use in the policy language will be a win for the future.
For Con (i) (we will need to invert the preprocessor when showing rules/traces/etc. to the user), we can keep the translated policies hidden from the user. The user should only see policies that he wrote.
For Con (ii) (a layer of translation makes debugging difficult), the translation layer would be akin to a C preprocessor. It will be possible to match up items on both sides of the translation layer.
- Alex
> Option 2 looks like a better idea keeping in mind the data model
> consistency with Neutron/Nova.
> Could we write something similar to a view which becomes a layer on top if
> this data model?
________________________________________
From: Tim Hinrichs
Sent: Tuesday, July 29, 2014 3:03 PM
To: openstack-dev at lists.openstack.org
Cc: Alex Yip
Subject: [Congress] data-source renovation
Hi all,
As I mentioned in a previous IRC, when writing our first few policies I had trouble using the tables we currently use to represent external data sources like Nova/Neutron.
The main problem is that wide tables (those with many columns) are hard to use. (a) it is hard to remember what all the columns are, (b) it is easy to mistakenly use the same variable in two different tables in the body of the rule, i.e. to create an accidental join, (c) changes to the datasource drivers can require tedious/error-prone modifications to policy.
I see several options. Once we choose something, I’ll write up a spec and include the other options as alternatives.
1) Add a preprocessor to the policy engine that makes it easier to deal with large tables via named-argument references.
Instead of writing a rule like
p(port_id, name) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)
we would write
p(id, nme) :-
neutron:ports(port_id=id, name=nme)
The preprocessor would fill in all the missing variables and hand the original rule off to the Datalog engine.
Pros: (i) leveraging vanilla database technology under the hood
(ii) policy is robust to changes in the fields of the original data b/c the Congress data model is different than the Nova/Neutron data models
Cons: (i) we will need to invert the preprocessor when showing rules/traces/etc. to the user
(ii) a layer of translation makes debugging difficult
2) Be disciplined about writing narrow tables and write tutorials/recommendations demonstrating how.
Instead of a table like...
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)
we would have many tables...
neutron:ports(port_id)
neutron:ports.addr_pairs(port_id, addr_pairs)
neutron:ports.security_groups(port_id, security_groups)
neutron:ports.extra_dhcp_opts(port_id, extra_dhcp_opts)
neutron:ports.name(port_id, name)
...
People writing policy would write rules such as ...
p(x) :- neutron:ports.name(port, name), ...
[Here, the period e.g. in ports.name is not an operator--just a convenient way to spell the tablename.]
To do this, Congress would need to know which columns in a table are sufficient to uniquely identify a row, which in most cases is just the ID.
Pros: (i) this requires only changes in the datasource drivers; everything else remains the same
(ii) still leveraging database technology under the hood
(iii) policy is robust to changes in fields of original data
Cons: (i) datasource driver can force policy writer to use wide tables
(ii) this data model is much different than the original data models
(iii) we need primary-key information about tables
3) Enhance the Congress policy language to handle objects natively.
Instead of writing a rule like the following ...
p(port_id, name, group) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host),
neutron:ports.security_groups(security_group, group)
we would write a rule such as
p(port_id, name) :-
neutron:ports(port),
port.name(name),
port.id(port_id),
port.security_groups(group)
The big difference here is that the period (.) is an operator in the language, just as in C++/Java.
Pros:
(i) The data model we use in Congress is almost exactly the same as the data model we use in Neutron/Nova.
(ii) Policy is robust to changes in the Neutron/Nova data model as long as those changes only ADD fields.
(iii) Programmers may be slightly more comfortable with this language.
Cons:
(i) The obvious implementation (changing the engine to implement the (.) operator directly is quite a change from traditional database technology. At this point, that seems risky.
(ii) It is unclear how to implement this via a preprocessor (thereby leveraging database technology). The key problem I see is that we would need to translate port.name(...) into something like option (2) above. The difficulty is that TABLE could sometimes be a port, sometimes be a network, sometimes be a subnet, etc.
(iii) Requires some extra syntactic restrictions to ensure we don't lose decidability.
(iv) Because the Congress and Nova/Neutron models are the same, changes to the Nova/Neutron model can require rewriting policy.
Thoughts?
Tim
More information about the OpenStack-dev
mailing list