<div dir="ltr"><div>The keystone team is pursuing a trigger-based approach to support rolling, zero-downtime upgrades. The proposed operator experience is documented here:<br></div><div><br></div><div>  <a href="http://docs.openstack.org/developer/keystone/upgrading.html">http://docs.openstack.org/developer/keystone/upgrading.html</a></div><div><br></div><div>This differs from Nova and Neutron's approaches to solve for rolling upgrades (which use oslo.versionedobjects), however Keystone is one of the few services that doesn't need to manage communication between multiple releases of multiple service components talking over the message bus (which is the original use case for oslo.versionedobjects, and for which it is aptly suited). Keystone simply scales horizontally and every node talks directly to the database.</div><div><br></div><div>Database triggers are obviously a new challenge for developers to write, honestly challenging to debug (being side effects), and are made even more difficult by having to hand write triggers for MySQL, PostgreSQL, and SQLite independently (SQLAlchemy offers no assistance in this case), as seen in this patch:</div><div><br></div><div>  <a href="https://review.openstack.org/#/c/355618/">https://review.openstack.org/#/c/355618/</a>    </div><div><br></div><div>However, implementing an application-layer solution with oslo.versionedobjects is not an easy task either; refer to Neutron's implementation:</div><div><br></div><div>  <a href="https://review.openstack.org/#/q/topic:bp/adopt-oslo-versioned-objects-for-db">https://review.openstack.org/#/q/topic:bp/adopt-oslo-versioned-objects-for-db</a></div><div><br></div><div>Our primary concern at this point are how to effectively test the triggers we write against our supported database systems, and their various deployment variations. We might be able to easily drop SQLite support (as it's only supported for our own test suite), but should we expect variation in support and/or actual behavior of triggers across the MySQLs, MariaDBs, Perconas, etc, of the world that would make it necessary to test each of them independently? If you have operational experience working with triggers at scale: are there landmines that we need to be aware of? What is it going to take for us to say we support *zero* dowtime upgrades with confidence?</div><div><br></div><div>Steve & Dolph</div></div>