[nova][scheduler] scheduler spawns to the same compute node only
Hi all, we have a kolla-ansible deployed "Queens" Release of openstack with 8 compute nodes and an external Percona XtraDB Cluster (with read-write split with haproxy). New VMs are just currently always scheduled to the same compute node, even though a manual live-migration is working fine to other compute nodes. We're not sure, what the issue is, but perhaps someone may spot it from our config: # nova.conf scheduler config default_availability_zone = az1 ... [filter_scheduler] available_filters = nova.scheduler.filters.all_filters enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, AggregateInstanceExtraSpecsFilter, AggregateMultiTenancyIsolation, DifferentHostFilter, RamFilter, SameHostFilter, NUMATopologyFilter Database is an external Percona XtraDB Cluster (Version 5.7.24) with haproxy for read-write-splitting (currently only one write node). We do see mysql errors in the nova-scheduler.log on the write DB node when an instance is created. 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 91, in _report_state 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db service.service_ref.save() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return fn(self, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/objects/service.py", line 397, in save 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db db_service = db.service_update(self._context, self.id, updates) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/api.py", line 183, in service_update 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return IMPL.service_update(context, service_id, values) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db ectxt.value = e.inner_exc 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(*args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(context, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1043, in _transaction_scope 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db yield resource 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 653, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.session.rollback() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 650, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._end_session_transaction(self.session) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 678, in _end_session_transaction 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db session.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.transaction.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 471, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db t[1].commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1643, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._do_commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1674, in _do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.connection._commit_impl() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 726, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._handle_dbapi_exception(e, None, None, None, None) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db util.raise_from_cause(newraise, exc_info) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 724, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.engine.dialect.do_commit(self.connection) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1765, in do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db dbapi_connection.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 422, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._read_ok_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 396, in _read_ok_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db pkt = self._read_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db packet.check_error() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db err.raise_mysql_exception(self._data) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db raise errorclass(errno, errval) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db 2019-04-15 16:52:20.020 24 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status. The deadlock message is quite strange, as we have haproxy configured so all write requests are handled by one node. There are NO errors in the mysqld.log WHILE creating an instance, but we see from time to time aborted connections from nova. 2019-04-15T14:22:36.232108Z 30616972 [Note] Aborted connection 30616972 to db: 'nova' user: 'nova' host: '10.x.y.z' (Got an error reading communication packets) As I said, all instances are allocated to the same compute node. nova-compute.log doesn't show an error while creating the instance. Beside that, we also see messages from nova.scheduler.host_manager on all other nodes like (but those messages are _not_ triggered, when an instance is spawned.!) 2019-04-15 16:28:47.771 22 INFO nova.scheduler.host_manager [req-f92e340e-a88a-44a0-8cad-588390c25bc2 - - - - -] The instance sync for host 'xxx' did not match. Re-created its InstanceList. Don't know if that may be relevant, but somehow our (currently single) AZ is listed several times. # openstack availability zone list +------------+-------------+ | Zone Name | Zone Status | +------------+-------------+ | internal | available | | az1 | available | | az1 | available | | az1 | available | | az1 | available | +------------+-------------+ May that be related somehow? Thanks for any consideration and support! kind regards Nicolas --
On 4/15/2019 10:36 AM, Nicolas Ghirlanda wrote:
New VMs are just currently always scheduled to the same compute node, even though a manual live-migration is working fine to other compute nodes.
How are you doing the live migration? If you're using the openstack command line and defaulting to the 2.1 compute API microversion, you're forcing the server to another host by bypassing the scheduler which is maybe why live migration is "working" but server create is not ever using the other computes.
We're not sure, what the issue is, but perhaps someone may spot it from our config:
# nova.conf scheduler config
default_availability_zone = az1
How many computes are in az1? All 8?
...
[filter_scheduler] available_filters = nova.scheduler.filters.all_filters enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, AggregateInstanceExtraSpecsFilter, AggregateMultiTenancyIsolation, DifferentHostFilter, RamFilter, SameHostFilter, NUMATopologyFilter
Not really related to this probably but you can remove RamFilter since placement does the MEMORY_MB filtering and the RamFilter was deprecated in Stein as a result. It looks like you're getting the default host_subset_size value: https://docs.openstack.org/nova/queens/configuration/config.html#filter_sche... Which means your scheduler is "packing" by default. If you have multiple computes and you want to spread instances across them, you can adjust the host_subset_size value.
Database is an external Percona XtraDB Cluster (Version 5.7.24) with haproxy for read-write-splitting (currently only one write node).
We do see mysql errors in the nova-scheduler.log on the write DB node when an instance is created.
2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 91, in _report_state 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db service.service_ref.save() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return fn(self, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/objects/service.py", line 397, in save 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db db_service = db.service_update(self._context, self.id, updates) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/api.py", line 183, in service_update 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return IMPL.service_update(context, service_id, values) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db ectxt.value = e.inner_exc 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(*args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(context, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1043, in _transaction_scope 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db yield resource 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 653, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.session.rollback() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 650, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._end_session_transaction(self.session) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 678, in _end_session_transaction 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db session.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.transaction.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 471, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db t[1].commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1643, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._do_commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1674, in _do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.connection._commit_impl() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 726, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._handle_dbapi_exception(e, None, None, None, None) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db util.raise_from_cause(newraise, exc_info) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 724, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.engine.dialect.do_commit(self.connection) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1765, in do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db dbapi_connection.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 422, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._read_ok_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 396, in _read_ok_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db pkt = self._read_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db packet.check_error() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db err.raise_mysql_exception(self._data) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db raise errorclass(errno, errval) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db 2019-04-15 16:52:20.020 24 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status.
This is a service update operation which could indicate that the other computes are reported as 'down' and that's why nothing is getting scheduled to them. Have you checked the "openstack compute service list" output to make sure those computes are all reporting as "up"? https://docs.openstack.org/python-openstackclient/latest/cli/command-objects... There is a retry_on_deadlock decorator on that service_update DB API though so I'm kind of surprised to still see the deadlock errors, unless those just get logged while retrying? https://github.com/openstack/nova/blob/stable/queens/nova/db/sqlalchemy/api....
The deadlock message is quite strange, as we have haproxy configured so all write requests are handled by one node.
There are NO errors in the mysqld.log WHILE creating an instance, but we see from time to time aborted connections from nova.
2019-04-15T14:22:36.232108Z 30616972 [Note] Aborted connection 30616972 to db: 'nova' user: 'nova' host: '10.x.y.z' (Got an error reading communication packets)
As I said, all instances are allocated to the same compute node. nova-compute.log doesn't show an error while creating the instance.
Beside that, we also see messages from nova.scheduler.host_manager on all other nodes like (but those messages are _not_ triggered, when an instance is spawned.!)
2019-04-15 16:28:47.771 22 INFO nova.scheduler.host_manager [req-f92e340e-a88a-44a0-8cad-588390c25bc2 - - - - -] The instance sync for host 'xxx' did not match. Re-created its InstanceList.
Are there any instances on these other hosts? My guess is you're seeing that after the live migration to another host.
Don't know if that may be relevant, but somehow our (currently single) AZ is listed several times.
# openstack availability zone list +------------+-------------+ | Zone Name | Zone Status | +------------+-------------+ | internal | available | | az1 | available | | az1 | available | | az1 | available | | az1 | available | +------------+-------------+
May that be related somehow?
I believe those are the AZs for other services as well (cinder/neutron). Specify the --compute option to filter that. -- Another thing to check is placement - are there 8 compute node resource providers reporting into placement? You can check using the CLI: https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-prov... In Queens, there should be one resource provider per working compute node in the cell database's compute_nodes table (the UUIDs should match as well). -- Thanks, Matt
For what it's worth, we had a discussion about this in November last year: http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000209.... I made a comment at the end of that thread about a 'workaround' we have used. It still happens here on Queens and the workaround doesn't solve it permanently. -- MC On Tue, Apr 16, 2019 at 3:22 AM Matt Riedemann <mriedemos@gmail.com> wrote:
On 4/15/2019 10:36 AM, Nicolas Ghirlanda wrote:
New VMs are just currently always scheduled to the same compute node, even though a manual live-migration is working fine to other compute nodes.
How are you doing the live migration? If you're using the openstack command line and defaulting to the 2.1 compute API microversion, you're forcing the server to another host by bypassing the scheduler which is maybe why live migration is "working" but server create is not ever using the other computes.
We're not sure, what the issue is, but perhaps someone may spot it from our config:
# nova.conf scheduler config
default_availability_zone = az1
How many computes are in az1? All 8?
...
[filter_scheduler] available_filters = nova.scheduler.filters.all_filters enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, AggregateInstanceExtraSpecsFilter, AggregateMultiTenancyIsolation, DifferentHostFilter, RamFilter, SameHostFilter, NUMATopologyFilter
Not really related to this probably but you can remove RamFilter since placement does the MEMORY_MB filtering and the RamFilter was deprecated in Stein as a result.
It looks like you're getting the default host_subset_size value:
https://docs.openstack.org/nova/queens/configuration/config.html#filter_sche...
Which means your scheduler is "packing" by default. If you have multiple computes and you want to spread instances across them, you can adjust the host_subset_size value.
Database is an external Percona XtraDB Cluster (Version 5.7.24) with haproxy for read-write-splitting (currently only one write node).
We do see mysql errors in the nova-scheduler.log on the write DB node when an instance is created.
2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py",
line 91, in _report_state 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db service.service_ref.save() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
line 226, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return fn(self, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/objects/service.py",
line 397, in save 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db db_service = db.service_update(self._context, self.id, updates) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/api.py", line 183, in service_update 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return IMPL.service_update(context, service_id, values) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db ectxt.value = e.inner_exc 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(*args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py",
line 227, in wrapped 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(context, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
line 1043, in _transaction_scope 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db yield resource 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
line 653, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.session.rollback() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
line 650, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._end_session_transaction(self.session) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
line 678, in _end_session_transaction 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db session.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
line 943, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.transaction.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
line 471, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db t[1].commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1643, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._do_commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1674, in _do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.connection._commit_impl() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
line 726, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._handle_dbapi_exception(e, None, None, None, None) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
line 1409, in _handle_dbapi_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db util.raise_from_cause(newraise, exc_info) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py",
line 265, in raise_from_cause 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
line 724, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.engine.dialect.do_commit(self.connection) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py",
line 1765, in do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db dbapi_connection.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py",
line 422, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._read_ok_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py",
line 396, in _read_ok_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db pkt = self._read_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py",
line 683, in _read_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db packet.check_error() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/protocol.py",
line 220, in check_error 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db err.raise_mysql_exception(self._data) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db raise errorclass(errno, errval) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db 2019-04-15 16:52:20.020 24 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status.
This is a service update operation which could indicate that the other computes are reported as 'down' and that's why nothing is getting scheduled to them. Have you checked the "openstack compute service list" output to make sure those computes are all reporting as "up"?
https://docs.openstack.org/python-openstackclient/latest/cli/command-objects...
There is a retry_on_deadlock decorator on that service_update DB API though so I'm kind of surprised to still see the deadlock errors, unless those just get logged while retrying?
https://github.com/openstack/nova/blob/stable/queens/nova/db/sqlalchemy/api....
The deadlock message is quite strange, as we have haproxy configured so all write requests are handled by one node.
There are NO errors in the mysqld.log WHILE creating an instance, but we see from time to time aborted connections from nova.
2019-04-15T14:22:36.232108Z 30616972 [Note] Aborted connection 30616972 to db: 'nova' user: 'nova' host: '10.x.y.z' (Got an error reading communication packets)
As I said, all instances are allocated to the same compute node. nova-compute.log doesn't show an error while creating the instance.
Beside that, we also see messages from nova.scheduler.host_manager on all other nodes like (but those messages are _not_ triggered, when an instance is spawned.!)
2019-04-15 16:28:47.771 22 INFO nova.scheduler.host_manager [req-f92e340e-a88a-44a0-8cad-588390c25bc2 - - - - -] The instance sync for host 'xxx' did not match. Re-created its InstanceList.
Are there any instances on these other hosts? My guess is you're seeing that after the live migration to another host.
Don't know if that may be relevant, but somehow our (currently single) AZ is listed several times.
# openstack availability zone list +------------+-------------+ | Zone Name | Zone Status | +------------+-------------+ | internal | available | | az1 | available | | az1 | available | | az1 | available | | az1 | available | +------------+-------------+
May that be related somehow?
I believe those are the AZs for other services as well (cinder/neutron). Specify the --compute option to filter that.
--
Another thing to check is placement - are there 8 compute node resource providers reporting into placement? You can check using the CLI:
https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-prov...
In Queens, there should be one resource provider per working compute node in the cell database's compute_nodes table (the UUIDs should match as well).
--
Thanks,
Matt
Oh, I see the November archive truncates the mailing list thread. Here's the post from December with the terrible workaround: http://lists.openstack.org/pipermail/openstack-discuss/2018-December/001156.... -- MC
On 4/15/2019 4:46 PM, Mike Carden wrote:
For what it's worth, we had a discussion about this in November last year:
http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000209....
I made a comment at the end of that thread about a 'workaround' we have used. It still happens here on Queens and the workaround doesn't solve it permanently.
OK so specifically setting [placement]/randomize_allocation_candidates=True in your config that's running the placement service, correct? I can always remember host_subset_size but always forget randomize_allocation_candidates. At least they default to the same behavior (packing). https://docs.openstack.org/nova/queens/configuration/config.html#placement.r... -- Thanks, Matt
Hi Matt, thanks for your answers.. find mine below On 15.04.19 19:04, Matt Riedemann wrote:
On 4/15/2019 10:36 AM, Nicolas Ghirlanda wrote:
New VMs are just currently always scheduled to the same compute node, even though a manual live-migration is working fine to other compute nodes.
How are you doing the live migration? If you're using the openstack command line and defaulting to the 2.1 compute API microversion, you're forcing the server to another host by bypassing the scheduler which is maybe why live migration is "working" but server create is not ever using the other computes.
Sound reasonable and yes, I used nova live-migration and specified the target machine. When I used "openstack server migrate --live", it seemed that all vms are transferred to one specific other compute node (but need to confirm that).
We're not sure, what the issue is, but perhaps someone may spot it from our config:
# nova.conf scheduler config
default_availability_zone = az1
How many computes are in az1? All 8?
yes, in 2 hostgroups.
...
[filter_scheduler] available_filters = nova.scheduler.filters.all_filters enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, AggregateInstanceExtraSpecsFilter, AggregateMultiTenancyIsolation, DifferentHostFilter, RamFilter, SameHostFilter, NUMATopologyFilter
Not really related to this probably but you can remove RamFilter since placement does the MEMORY_MB filtering and the RamFilter was deprecated in Stein as a result.
It looks like you're getting the default host_subset_size value:
https://docs.openstack.org/nova/queens/configuration/config.html#filter_sche...
Which means your scheduler is "packing" by default. If you have multiple computes and you want to spread instances across them, you can adjust the host_subset_size value.
Thanks, I will try.
Database is an external Percona XtraDB Cluster (Version 5.7.24) with haproxy for read-write-splitting (currently only one write node).
We do see mysql errors in the nova-scheduler.log on the write DB node when an instance is created.
2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db [-] Unexpected error while reporting service status: OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db Traceback (most recent call last): 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 91, in _report_state 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db service.service_ref.save() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return fn(self, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/objects/service.py", line 397, in save 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db db_service = db.service_update(self._context, self.id, updates) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/api.py", line 183, in service_update 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return IMPL.service_update(context, service_id, values) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db ectxt.value = e.inner_exc 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(*args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db return f(context, *args, **kwargs) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1043, in _transaction_scope 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db yield resource 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.gen.next() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 653, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.session.rollback() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.force_reraise() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db six.reraise(self.type_, self.value, self.tb) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 650, in _session 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._end_session_transaction(self.session) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 678, in _end_session_transaction 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db session.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 943, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.transaction.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 471, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db t[1].commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1643, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._do_commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1674, in _do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.connection._commit_impl() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 726, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._handle_dbapi_exception(e, None, None, None, None) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db util.raise_from_cause(newraise, exc_info) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db reraise(type(exception), exception, tb=exc_tb, cause=cause) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 724, in _commit_impl 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self.engine.dialect.do_commit(self.connection) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1765, in do_commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db dbapi_connection.commit() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 422, in commit 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db self._read_ok_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 396, in _read_ok_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db pkt = self._read_packet() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db packet.check_error() 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db err.raise_mysql_exception(self._data) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db raise errorclass(errno, errval) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1213, u'WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction') (Background on this error at: http://sqlalche.me/e/e3q8) 2019-04-15 16:52:10.016 24 ERROR nova.servicegroup.drivers.db 2019-04-15 16:52:20.020 24 INFO nova.servicegroup.drivers.db [-] Recovered from being unable to report status.
This is a service update operation which could indicate that the other computes are reported as 'down' and that's why nothing is getting scheduled to them. Have you checked the "openstack compute service list" output to make sure those computes are all reporting as "up"?
yes, all compute nodes are up in the "openstack compute service list"
https://docs.openstack.org/python-openstackclient/latest/cli/command-objects...
There is a retry_on_deadlock decorator on that service_update DB API though so I'm kind of surprised to still see the deadlock errors, unless those just get logged while retrying?
https://github.com/openstack/nova/blob/stable/queens/nova/db/sqlalchemy/api....
yep, it's pretty unclear why this is happening. Our Cloud is not used that much, so it's very likely to be the only intance spawned in that timeframe, and as we have a single writer node in the Percona Cluster, I can't imagine why there should be any deadlock situation occurring.
The deadlock message is quite strange, as we have haproxy configured so all write requests are handled by one node.
There are NO errors in the mysqld.log WHILE creating an instance, but we see from time to time aborted connections from nova.
2019-04-15T14:22:36.232108Z 30616972 [Note] Aborted connection 30616972 to db: 'nova' user: 'nova' host: '10.x.y.z' (Got an error reading communication packets)
As I said, all instances are allocated to the same compute node. nova-compute.log doesn't show an error while creating the instance.
Beside that, we also see messages from nova.scheduler.host_manager on all other nodes like (but those messages are _not_ triggered, when an instance is spawned.!)
2019-04-15 16:28:47.771 22 INFO nova.scheduler.host_manager [req-f92e340e-a88a-44a0-8cad-588390c25bc2 - - - - -] The instance sync for host 'xxx' did not match. Re-created its InstanceList.
Are there any instances on these other hosts? My guess is you're seeing that after the live migration to another host.
that may be true as I manually reallocated lots of VMs around that timestamps. Thanks for the explanation.
Don't know if that may be relevant, but somehow our (currently single) AZ is listed several times.
# openstack availability zone list +------------+-------------+ | Zone Name | Zone Status | +------------+-------------+ | internal | available | | az1 | available | | az1 | available | | az1 | available | | az1 | available | +------------+-------------+
May that be related somehow?
I believe those are the AZs for other services as well (cinder/neutron). Specify the --compute option to filter that.
you're right, when I specify --compute there is only one AZ shown. Again thanks for the clarification! :-)
--
Another thing to check is placement - are there 8 compute node resource providers reporting into placement? You can check using the CLI:
https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-prov...
In Queens, there should be one resource provider per working compute node in the cell database's compute_nodes table (the UUIDs should match as well).
I do not have "openstack resource provider"? but in "openstack hypervisor list" I can see all compute nodes with state "up". -- EveryWare AG Nicolas Ghirlanda Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich T +41 44 466 60 00 F +41 44 466 60 10 nicolas.ghirlanda@everyware.ch www.everyware.ch
On Tue, 16 Apr 2019, Nicolas Ghirlanda wrote:
Another thing to check is placement - are there 8 compute node resource providers reporting into placement? You can check using the CLI:
https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-prov...
In Queens, there should be one resource provider per working compute node in the cell database's compute_nodes table (the UUIDs should match as well).
I do not have "openstack resource provider"? but in "openstack hypervisor list" I can see all compute nodes with state "up".
To get the placement related commands in the openstack you need to install the osc-placement plugin: https://pypi.org/project/osc-placement/ -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
participants (4)
-
Chris Dent
-
Matt Riedemann
-
Mike Carden
-
Nicolas Ghirlanda