Hi all, we have a kolla-ansible deployed "Queens" Release of openstack with 8 compute nodes and an external Percona XtraDB Cluster (with read-write split with haproxy). New VMs are just currently always scheduled to the same compute node, even though a manual live-migration is working fine to other compute nodes. We're not sure, what the issue is, but perhaps someone may spot it from our config: # nova.conf scheduler config default_availability_zone = az1 ... [filter_scheduler] available_filters = nova.scheduler.filters.all_filters enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, AggregateInstanceExtraSpecsFilter, AggregateMultiTenancyIsolation, DifferentHostFilter, RamFilter, SameHostFilter, NUMATopologyFilter Database is an external Percona XtraDB Cluster (Version 5.7.24) with haproxy for read-write-splitting (currently only one write node). We do see mysql errors in the nova-scheduler.log on the write DB node when an instance is created. 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 91, in _report_state 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db service.service_ref.save() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return fn(self, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/objects/service.py", line 397, in save 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db db_service = db.service_update(self._context, self.id, updates) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/api.py", line 183, in service_update 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return IMPL.service_update(context, service_id, values) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db ectxt.value = e.inner_exc 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(*args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(context, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1043, in _transaction_scope 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db yield resource 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 653, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.session.rollback() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 650, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._end_session_transaction(self.session) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 678, in _end_session_transaction 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db session.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.transaction.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 471, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db t[1].commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1643, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._do_commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1674, in _do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.connection._commit_impl() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 726, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._handle_dbapi_exception(e, None, None, None, None) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db util.raise_from_cause(newraise, exc_info) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 724, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.engine.dialect.do_commit(self.connection) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1765, in do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db dbapi_connection.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 422, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._read_ok_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 396, in _read_ok_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db pkt = self._read_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db packet.check_error() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db err.raise_mysql_exception(self._data) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db raise errorclass(errno, errval) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db 2019-04-15 16:52:20.020 24 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status. The deadlock message is quite strange, as we have haproxy configured so all write requests are handled by one node. There are NO errors in the mysqld.log WHILE creating an instance, but we see from time to time aborted connections from nova. 2019-04-15T14:22:36.232108Z 30616972 [Note] Aborted connection 30616972 to db: 'nova' user: 'nova' host: '10.x.y.z' (Got an error reading communication packets) As I said, all instances are allocated to the same compute node. nova-compute.log doesn't show an error while creating the instance. Beside that, we also see messages from nova.scheduler.host_manager on all other nodes like (but those messages are _not_ triggered, when an instance is spawned.!) 2019-04-15 16:28:47.771 22 INFO nova.scheduler.host_manager [req-f92e340e-a88a-44a0-8cad-588390c25bc2 - - - - -] The instance sync for host 'xxx' did not match. Re-created its InstanceList. Don't know if that may be relevant, but somehow our (currently single) AZ is listed several times. # openstack availability zone list +------------+-------------+ | Zone Name | Zone Status | +------------+-------------+ | internal | available | | az1 | available | | az1 | available | | az1 | available | | az1 | available | +------------+-------------+ May that be related somehow? Thanks for any consideration and support! kind regards Nicolas --