On Mon, May 12, 2025 at 9:01 PM Sean Mooney <smooney@redhat.com> wrote:

On 12/05/2025 19:27, Kamil Madac wrote:
> I have deployed openstack 2024.2 with kolla-ansible in HA setup with 3
> control nodes. Everything works without issues, but when I stop
> memcached node with IP address 192.168.56.12 (or when node goes
> down), it is not possible to login to Horizon with error message:
>
> Something went wrong!
> An unexpected error has occurred. Try refreshing the page. If that
> doesn't help, contact your local administrator.
>
> In horizon log I have an error message:
>
> 2025-05-12 18:05:37.406588 Internal Server Error: /
> 2025-05-12 18:05:37.406613 Traceback (most recent call last):
> 2025-05-12 18:05:37.406615 File
> "/var/lib/kolla/venv/lib64/python3.9/site-packages/django/core/handlers/exception.py",
> line 55, in inner
> 2025-05-12 18:05:37.406617 response = get_response(request)
> 2025-05-12 18:05:37.406619 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/horizon/middleware/simultaneous_sessions.py",
> line 30, in __call__
> 2025-05-12 18:05:37.406621 self._process_request(request)
> 2025-05-12 18:05:37.406623 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/horizon/middleware/simultaneous_sessions.py",
> line 37, in _process_request
> 2025-05-12 18:05:37.406625 cache_value = cache.get(cache_key)
> 2025-05-12 18:05:37.406627 File
> "/var/lib/kolla/venv/lib64/python3.9/site-packages/django/core/cache/backends/memcached.py",
> line 75, in get
> 2025-05-12 18:05:37.406628 return self._cache.get(key, default)
> 2025-05-12 18:05:37.406630 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py",
> line 347, in get
> 2025-05-12 18:05:37.406632 return self._run_cmd("get", key,
> default, default=default, **kwargs)
> 2025-05-12 18:05:37.406634 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py",
> line 322, in _run_cmd
> 2025-05-12 18:05:37.406636 return self._safely_run_func(client,
> func, default_val, *args, **kwargs)
> 2025-05-12 18:05:37.406637 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py",
> line 199, in _safely_run_func
> 2025-05-12 18:05:37.406639 result = func(*args, **kwargs)
> 2025-05-12 18:05:37.406640 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py",
> line 687, in get
> 2025-05-12 18:05:37.406642 return self._fetch_cmd(b"get", [key],
> False, key_prefix=self.key_prefix).get(
> 2025-05-12 18:05:37.406644 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py",
> line 1133, in _fetch_cmd
> 2025-05-12 18:05:37.406645 self._connect()
> 2025-05-12 18:05:37.406647 File
> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py",
> line 424, in _connect
> 2025-05-12 18:05:37.406648 sock.connect(sockaddr)
> 2025-05-12 18:05:37.406650 ConnectionRefusedError: [Errno 111]
> Connection refused
>
> so horizon tries to connect to memcached node which is down. I have
> default kolla-ansible config with enabled memcached and horizon config
> is following:
>
>
> SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
> CACHES['default']['LOCATION'] = ['192.168.56.11:11211
> <http://192.168.56.11:11211>','192.168.56.12:11211
> <http://192.168.56.12:11211>','192.168.56.21:11211
> <http://192.168.56.21:11211>']
>
> When I stop any other memcached node, horizon is working without issues.
>
> Why is that exact node important for the horizon?

its not just important for hoizon.

in most serivce we take the list of memcached servce adn then
dristribute the cache keys across them.

memcahce is not a clustered solution like a db where you can write to
one peer adn read form another and expect
to get consitent resutls.

if one instance goes down all keys associated with that instance are
unavailable end typiclly lost.

most openstack service will either catch the connection issue and
internally tolerate it as if its a cache miss

or have oslo do that for them. but this looks liek the cachign is not
using oslo but django.

it can also be confirured to do that

```

CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.memcached.PyMemcacheCache",
"LOCATION": "127.0.0.1:11211",
"OPTIONS": {
"no_delay": True,
"ignore_exc": True,
"max_pool_size": 4,
"use_pooling": True,
},
}
} ``` "ignore_exc": True seam to be the relevetn parmater
that one of the exampel in https://docs.djangoproject.com/en/5.2/topics/cache/#cache-arguments

it would appare that django, at least as its used by horizon is not
fault tolerant to memcached outages

so if there is a conenction issue it wil break. im not sure if that
means horizon is also not fault tolerent to cache

missies but it sworth a try.

perhaps a more fault tolerant cache backend supported by django is alos
an option

https://docs.djangoproject.com/en/5.2/topics/cache/#django-s-cache-framework

if you have redis or valkey then perhaps
django.core.cache.backends.redis.RedisCache

or one of the db caches would be an option but i would first test adding
the

"ignore_exc": True parmater to your config.

> Does anyone else have the same experience?
>
> Thanks for any advice.
>
> --
> Kamil Madac