[openstack-dev] [Keystone][Fernet]multisite identity service management
joehuang
joehuang at huawei.com
Mon Aug 3 03:12:29 UTC 2015
Hi,
Glad to know you guys are talking about the key distribution and rotation for Fernet token. Hans and I did a prototype for multisite identity service management, and have a similar issue.
The use case is : a user should, using a single authentication point be able to manage virtual resources spread over multiple OpenStack regions (https://etherpad.opnfv.org/p/multisite_identity_management)
We did the prototype of Fernet token used in multi-KeyStone cluster for multi-OpenStack instances installed in multi-sites, “write” is only allowed in the master KeyStone cluster, the slave KeyStone cluster is read only ( https://github.com/hafe/dockers, remember that the slave Galera cluster should be configured with replicate_do_db=KeyStone, but not binlog_do_db=KeyStone, Hans may haven’t update the script yet. The prototype is for candidate solution 2 )
From the prototype, we found that Fernet token validation could be successfully done by local KeyStone server with the async-replicated db. This means if we have a lot of sites with OpenStack installed, we can deploy a fully distributed KeyStone service in each site, provide token validation in local site only to realize high performance and high availability.
After the prototype, I think the candidate solution 3 would be better one solution for multisite identity service management.
“Candidate solution 3”. KeyStone service(Distributed) with Fernet token + Async replication ( star-mode).
one master KeyStone cluster with Fernet token in two sites (for site level high availability purpose), other sites will be installed with at least 2 slave nodes where the node is configured with DB async replication from the master cluster members, and one slave’s mater node in site1, another slave’s master node in site 2.
Only the master cluster nodes are allowed to write, other slave nodes waiting for replication from the master cluster ( very little delay) member.
Pros.
1) Why cluster in the master sites? There are lots of master nodes in the cluster, in order to provide more slaves could be done async. replication in parallel.
2) Why two sites for the master cluster? to provide higher reliability (site level) for writing request.
3) Why using multi-slaves in other sites. Slave has no knowledge of other slaves, so easy to manage multi-slaves in one site than a cluster, and multi-slaves work independently but provide multi-instance redundancy(like a cluster, but independent).
Cons. The distribution/rotation of key management.
------------------------------------
Appreciate the new introduced Fernet token very much in addressing the scenario of multi-site cloud identity management, but it brings a new challenge that how to address the key distribution and rotation in multi-site cloud. Should the key distribution/rotation management be the responsibility of a new service or KeyStone itself? It’s tough to depends on script to manage multi-sites (lots of sites, not only 3 or 5).
Best Regards
Chaoyi Huang ( Joe Huang )
From: Dolph Mathews [mailto:dolph.mathews at gmail.com]
Sent: Tuesday, July 28, 2015 3:31 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Keystone][Fernet] HA SQL backend for Fernet keys
On Mon, Jul 27, 2015 at 2:03 PM, Clint Byrum <clint at fewbar.com<mailto:clint at fewbar.com>> wrote:
Excerpts from Dolph Mathews's message of 2015-07-27 11:48:12 -0700:
> On Mon, Jul 27, 2015 at 1:31 PM, Clint Byrum <clint at fewbar.com<mailto:clint at fewbar.com>> wrote:
>
> > Excerpts from Alexander Makarov's message of 2015-07-27 10:01:34 -0700:
> > > Greetings!
> > >
> > > I'd like to discuss pro's and contra's of having Fernet encryption keys
> > > stored in a database backend.
> > > The idea itself emerged during discussion about synchronizing rotated
> > keys
> > > in HA environment.
> > > Now Fernet keys are stored in the filesystem that has some availability
> > > issues in unstable cluster.
> > > OTOH, making SQL highly available is considered easier than that for a
> > > filesystem.
> > >
> >
> > I don't think HA is the root of the problem here. The problem is
> > synchronization. If I have 3 keystone servers (n+1), and I rotate keys on
> > them, I must very carefully restart them all at the exact right time to
> > make sure one of them doesn't issue a token which will not be validated
> > on another. This is quite a real possibility because the validation
> > will not come from the user, but from the service, so it's not like we
> > can use simple persistence rules. One would need a layer 7 capable load
> > balancer that can find the token ID and make sure it goes back to the
> > server that issued it.
> >
>
> This is not true (or if it is, I'd love see a bug report). keystone-manage
> fernet_rotate uses a three phase rotation strategy (staged -> primary ->
> secondary) that allows you to distribute a staged key (used only for token
> validation) throughout your cluster before it becomes a primary key (used
> for token creation and validation) anywhere. Secondary keys are only used
> for token validation.
>
> All you have to do is atomically replace the fernet key directory with a
> new key set.
>
> You also don't have to restart keystone for it to pickup new keys dropped
> onto the filesystem beneath it.
>
That's great news! Is this documented anywhere? I dug through the
operators guides, security guide, install guide, etc. Nothing described
this dance, which is impressive and should be written down!
(BTW, your original assumption would normally have been an accurate one!)
I don't believe it's documented in any of those places, yet. The best explanation of the three phases in tree I'm aware of is probably this (which isn't particularly accessible..):
https://github.com/openstack/keystone/blob/6a6fcc2/keystone/cmd/cli.py#L208-L223
Lance Bragstad and I also gave a small presentation at the Vancouver summit on the behavior and he mentions the same on one of his blog posts:
https://www.youtube.com/watch?v=duRBlm9RtCw&feature=youtu.be
http://lbragstad.com/?p=133
I even tried to discern how it worked from the code but it actually
looks like it does not work the way you describe on casual investigation.
I don't blame you! I'll work to improve the user-facing docs on the topic.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150803/68932828/attachment.html>
More information about the OpenStack-dev
mailing list