On Tue, May 13, 2025 at 8:23 AM Eugen Block <eblock@nde.ag> wrote:

Hi,

we were facing the same thing when we reinstalled our cloud to be
highly available. We had a list of memcached servers for all the
required services, and then noticed that a failed control node would
disrupt our services. We could pinpoint it to memcached not being
highly-available despite having a list of servers. So we decided to
point all services to localhost only:

# nova
root@controller02:~# grep memcached /etc/nova/nova.conf
memcached_servers = localhost:11211

# Dashboard
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
'LOCATION': '127.0.0.1:11211',
},
}

This has been working great for years now.

Regards,
Eugen

Zitat von Sean Mooney <smooney@redhat.com>:

> On 12/05/2025 19:27, Kamil Madac wrote:
>> I have deployed openstack 2024.2 with kolla-ansible in HA setup
>> with 3 control nodes. Everything works without issues, but when I
>> stop memcached node with IP address 192.168.56.12 (or when node
>> goes down), it is not possible to login to Horizon with
>> error message:
>>
>> Something went wrong!
>> An unexpected error has occurred. Try refreshing the page. If that
>> doesn't help, contact your local administrator.
>>
>> In horizon log I have an error message:
>>
>> 2025-05-12 18:05:37.406588 Internal Server Error: /
>> 2025-05-12 18:05:37.406613 Traceback (most recent call last):
>> 2025-05-12 18:05:37.406615 File
>> "/var/lib/kolla/venv/lib64/python3.9/site-packages/django/core/handlers/exception.py", line 55, in
>> inner
>> 2025-05-12 18:05:37.406617 response = get_response(request)
>> 2025-05-12 18:05:37.406619 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/horizon/middleware/simultaneous_sessions.py", line 30, in
>> __call__
>> 2025-05-12 18:05:37.406621 self._process_request(request)
>> 2025-05-12 18:05:37.406623 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/horizon/middleware/simultaneous_sessions.py", line 37, in
>> _process_request
>> 2025-05-12 18:05:37.406625 cache_value = cache.get(cache_key)
>> 2025-05-12 18:05:37.406627 File
>> "/var/lib/kolla/venv/lib64/python3.9/site-packages/django/core/cache/backends/memcached.py", line 75, in
>> get
>> 2025-05-12 18:05:37.406628 return self._cache.get(key, default)
>> 2025-05-12 18:05:37.406630 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py", line 347, in
>> get
>> 2025-05-12 18:05:37.406632 return self._run_cmd("get", key,
>> default, default=default, **kwargs)
>> 2025-05-12 18:05:37.406634 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py", line 322, in
>> _run_cmd
>> 2025-05-12 18:05:37.406636 return self._safely_run_func(client,
>> func, default_val, *args, **kwargs)
>> 2025-05-12 18:05:37.406637 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/hash.py", line 199, in
>> _safely_run_func
>> 2025-05-12 18:05:37.406639 result = func(*args, **kwargs)
>> 2025-05-12 18:05:37.406640 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py", line 687, in
>> get
>> 2025-05-12 18:05:37.406642 return self._fetch_cmd(b"get",
>> [key], False, key_prefix=self.key_prefix).get(
>> 2025-05-12 18:05:37.406644 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py", line 1133, in
>> _fetch_cmd
>> 2025-05-12 18:05:37.406645 self._connect()
>> 2025-05-12 18:05:37.406647 File
>> "/var/lib/kolla/venv/lib/python3.9/site-packages/pymemcache/client/base.py", line 424, in
>> _connect
>> 2025-05-12 18:05:37.406648 sock.connect(sockaddr)
>> 2025-05-12 18:05:37.406650 ConnectionRefusedError: [Errno 111]
>> Connection refused
>>
>> so horizon tries to connect to memcached node which is down. I have
>> default kolla-ansible config with enabled memcached and horizon
>> config is following:
>>
>>
>> SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
>> CACHES['default']['LOCATION'] = ['192.168.56.11:11211
>> <http://192.168.56.11:11211>','192.168.56.12:11211
>> <http://192.168.56.12:11211>','192.168.56.21:11211
>> <http://192.168.56.21:11211>']
>>
>> When I stop any other memcached node, horizon is working without issues.
>>
>> Why is that exact node important for the horizon?
>
> its not just important for hoizon.
>
> in most serivce we take the list of memcached servce adn then
> dristribute the cache keys across them.
>
> memcahce is not a clustered solution like a db where you can write
> to one peer adn read form another and expect
> to get consitent resutls.
>
> if one instance goes down all keys associated with that instance are
> unavailable end typiclly lost.
>
> most openstack service will either catch the connection issue and
> internally tolerate it as if its a cache miss
>
> or have oslo do that for them. but this looks liek the cachign is
> not using oslo but django.
>
> it can also be confirured to do that
>
> ```
>
> CACHES = {
> "default": {
> "BACKEND": "django.core.cache.backends.memcached.PyMemcacheCache",
> "LOCATION": "127.0.0.1:11211",
> "OPTIONS": {
> "no_delay": True,
> "ignore_exc": True,
> "max_pool_size": 4,
> "use_pooling": True,
> },
> }
> } ``` "ignore_exc": True seam to be the relevetn parmater
> that one of the exampel in
> https://docs.djangoproject.com/en/5.2/topics/cache/#cache-arguments
>
> it would appare that django, at least as its used by horizon is not
> fault tolerant to memcached outages
>
> so if there is a conenction issue it wil break. im not sure if that
> means horizon is also not fault tolerent to cache
>
> missies but it sworth a try.
>
> perhaps a more fault tolerant cache backend supported by django is
> alos an option
>
> https://docs.djangoproject.com/en/5.2/topics/cache/#django-s-cache-framework
>
> if you have redis or valkey then perhaps
> django.core.cache.backends.redis.RedisCache
>
> or one of the db caches would be an option but i would first test adding the
>
> "ignore_exc": True parmater to your config.
>
>> Does anyone else have the same experience?
>>
>> Thanks for any advice.
>>
>> --
>> Kamil Madac