Hi Team,

Pl can anybody spare some valuable time on this & guide me to indentify the RCA & to fix it?

We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs.

DISTRIB_RELEASE: 21.2.1
DISTRIB_CODENAME: Ussuri
DISTRIB_DESCRIPTION: Openstack-Ansible



Issue Description: Dashboard page is not loading. Once loading too slow to work & not able to create any resorces.

Observationg: Mariadb Service was down on Controller-1 & Controller-3. It was showing up in Controller-2 but the cluster was broken. Its restored now.       
Controller-2 is not a part of Rabbitmq Cluster       
Huge MySQL queries are going into a sleep mode

Action Taken:  
1. we have fixed the mariadb cluster issues and recovered the cluster and cluster seems to be fine now      
2. We stopped Horizon service in Controller-2      
3. As we seen issue or HA proxy was making controller2 down, diverted the traffic from haproxy to controller 3 and made controller 2 as secondary      
4. Disabled the ipv6 on the all controller nodes.      
5. Restarted below Neutron services from Controller-2 (Neutron-l3-agent.service, neutron-linuxbridge-agent.service, neutron-metadata-agent.service)
6. RabbitMQ container stopped on Controller-2 now

Current Observation:
Still the Dashboard is too slow to work. Can not create any resource as well....Like VM, Vol etc.







BR//
Sudeb Ghosh
7044064878
9332034788

On Sunday 25 August, 2024 at 06:48:45 pm IST, sudeb ghosh <sudeb_ece@yahoo.co.in> wrote:


Hi Team ,

We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs.

DISTRIB_RELEASE: 21.2.1
DISTRIB_CODENAME: Ussuri
DISTRIB_DESCRIPTION: Openstack-Ansible

Top of this Linux containers are there where the services are running inside such as neutron, galera, horizon, keystone, etcd, nova-api, utility etc for all 3 controllers.

Current Set Up:
We find in haproxy config file only controller2 & controller3 are configured.
So the galera container along with Mariadb Services are running fine within controller2(Primary) & controller3.
But Galera container in controller1 is stopped & mariadb service not running here.

Issue Started with:
All of suddent we faced, user not able to log in dashboard 5 days back. The dashboard page is not getting loaded.


Action Taken:
We find Maria-DB in controller2 is non-primary & DB on controller-3 is down.
Made Maria-DB in controller2 primary & DB on controller-3 up which resulted to load the page & log in.

Current Issue:
1) But the Dashboard page is too slow to work. Sometime throwing 504-Timed out error
2) VM Console(VNC) not working for any VM
3) NOt able to create any VM(Showing 'Scheduling' continuously)
4) 25/35 Hypervisons are showing down in Dashboard within 'Hypervisor list' tab but those are up as per CLI
5) Not able to create any volume


Request you to go through the Problem description & Can anybody help me out providing the solution for the same?

Pl do intimate me if any more info is required.


Regards,
Sudeb Ghosh
sudeb_ece@yahoo.co.in















BR//
Sudeb Ghosh
7044064878
9332034788