Can not work on Dashboard(throwing 504-Timed out) on Openestack-on-ansible environment
Hi Team , We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs. DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible Top of this Linux containers are there where the services are running inside such as neutron, galera, horizon, keystone, etcd, nova-api, utility etc for all 3 controllers. Current Set Up: We find in haproxy config file only controller2 & controller3 are configured. So the galera container along with Mariadb Services are running fine within controller2(Primary) & controller3. But Galera container in controller1 is stopped & mariadb service not running here. Issue Started with: All of suddent we faced, user not able to log in dashboard 5 days back. The dashboard page is not getting loaded. Action Taken: We find Maria-DB in controller2 is non-primary & DB on controller-3 is down. Made Maria-DB in controller2 primary & DB on controller-3 up which resulted to load the page & log in. Current Issue: 1) But the Dashboard page is too slow to work. Sometime throwing 504-Timed out error 2) VM Console(VNC) not working for any VM 3) NOt able to create any VM(Showing 'Scheduling' continuously) 4) 25/35 Hypervisons are showing down in Dashboard within 'Hypervisor list' tab but those are up as per CLI 5) Not able to create any volume Request you to go through the Problem description & Can anybody help me out providing the solution for the same? Pl do intimate me if any more info is required. Regards, Sudeb Ghosh sudeb_ece@yahoo.co.in BR// Sudeb Ghosh 7044064878 9332034788
Hey, My very quick guess would be to check/ensure that memcached is running on controller1 inside it's container, or ensure it's not configured for the keystone to be used. As in case configured memcached not being available will cause significant slow downs during auth and result in mentioned 504. You can also check these docs on alternative on how to deal with memcached as well: https://docs.openstack.org/openstack-ansible-memcached_server/ussuri/configu... On Sun, Aug 25, 2024, 15:19 sudeb ghosh <sudeb_ece@yahoo.co.in> wrote:
Hi Team ,
We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs.
DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible
Top of this Linux containers are there where the services are running inside such as neutron, galera, horizon, keystone, etcd, nova-api, utility etc for all 3 controllers.
Current Set Up: We find in haproxy config file only controller2 & controller3 are configured. So the galera container along with Mariadb Services are running fine within controller2(Primary) & controller3. But Galera container in controller1 is stopped & mariadb service not running here.
Issue Started with: All of suddent we faced, user not able to log in dashboard 5 days back. The dashboard page is not getting loaded.
Action Taken: We find Maria-DB in controller2 is non-primary & DB on controller-3 is down. Made Maria-DB in controller2 primary & DB on controller-3 up which resulted to load the page & log in.
Current Issue: 1) But the Dashboard page is too slow to work. Sometime throwing 504-Timed out error 2) VM Console(VNC) not working for any VM 3) NOt able to create any VM(Showing 'Scheduling' continuously) 4) 25/35 Hypervisons are showing down in Dashboard within 'Hypervisor list' tab but those are up as per CLI 5) Not able to create any volume
Request you to go through the Problem description & Can anybody help me out providing the solution for the same?
Pl do intimate me if any more info is required.
Regards, Sudeb Ghosh sudeb_ece@yahoo.co.in
BR// Sudeb Ghosh 7044064878 9332034788
Hi Team, Pl can anybody spare some valuable time on this & guide me to indentify the RCA & to fix it? We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs. DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible Issue Description: Dashboard page is not loading. Once loading too slow to work & not able to create any resorces. Observationg: Mariadb Service was down on Controller-1 & Controller-3. It was showing up in Controller-2 but the cluster was broken. Its restored now. Controller-2 is not a part of Rabbitmq Cluster Huge MySQL queries are going into a sleep mode Action Taken: 1. we have fixed the mariadb cluster issues and recovered the cluster and cluster seems to be fine now 2. We stopped Horizon service in Controller-2 3. As we seen issue or HA proxy was making controller2 down, diverted the traffic from haproxy to controller 3 and made controller 2 as secondary 4. Disabled the ipv6 on the all controller nodes. 5. Restarted below Neutron services from Controller-2 (Neutron-l3-agent.service, neutron-linuxbridge-agent.service, neutron-metadata-agent.service) 6. RabbitMQ container stopped on Controller-2 now Current Observation: Still the Dashboard is too slow to work. Can not create any resource as well....Like VM, Vol etc. BR// Sudeb Ghosh 7044064878 9332034788 On Sunday 25 August, 2024 at 06:48:45 pm IST, sudeb ghosh <sudeb_ece@yahoo.co.in> wrote: Hi Team , We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs. DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible Top of this Linux containers are there where the services are running inside such as neutron, galera, horizon, keystone, etcd, nova-api, utility etc for all 3 controllers. Current Set Up: We find in haproxy config file only controller2 & controller3 are configured. So the galera container along with Mariadb Services are running fine within controller2(Primary) & controller3. But Galera container in controller1 is stopped & mariadb service not running here. Issue Started with: All of suddent we faced, user not able to log in dashboard 5 days back. The dashboard page is not getting loaded. Action Taken: We find Maria-DB in controller2 is non-primary & DB on controller-3 is down. Made Maria-DB in controller2 primary & DB on controller-3 up which resulted to load the page & log in. Current Issue: 1) But the Dashboard page is too slow to work. Sometime throwing 504-Timed out error 2) VM Console(VNC) not working for any VM 3) NOt able to create any VM(Showing 'Scheduling' continuously) 4) 25/35 Hypervisons are showing down in Dashboard within 'Hypervisor list' tab but those are up as per CLI 5) Not able to create any volume Request you to go through the Problem description & Can anybody help me out providing the solution for the same? Pl do intimate me if any more info is required. Regards, Sudeb Ghosh sudeb_ece@yahoo.co.in BR// Sudeb Ghosh 7044064878 9332034788
Hi, In the actions taken you did not mentioned memcached at all. Have you verified that all memcached backends are operational? Also, does the timouting happens only with Horizon, or if you attempt to create a VM through CLI / API issue will be the same? If issue also occurs through CLI/API I'd also suggest to first check how long it takes for individual APIs to reply with some data, starting for keystone. So I would do series of API requests to find what specific service might be holding performance back (if not all of them). For instance check timings for: * Simply issuing a new token in keystone * Listing ports in neutron * Listing flavors in nova * Listing images in glance * Listing volumes in cinder But still somehow I'm inclined to think that this might be related to one of memcached backends being down resulting keystone to wait very long when trying to cache token after issuing it. пн, 2 сент. 2024 г. в 09:50, sudeb ghosh <sudeb_ece@yahoo.co.in>:
Hi Team,
Pl can anybody spare some valuable time on this & guide me to indentify the RCA & to fix it?
We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs.
DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible
Issue Description: Dashboard page is not loading. Once loading too slow to work & not able to create any resorces.
Observationg: Mariadb Service was down on Controller-1 & Controller-3. It was showing up in Controller-2 but the cluster was broken. Its restored now. Controller-2 is not a part of Rabbitmq Cluster Huge MySQL queries are going into a sleep mode
Action Taken: 1. we have fixed the mariadb cluster issues and recovered the cluster and cluster seems to be fine now 2. We stopped Horizon service in Controller-2 3. As we seen issue or HA proxy was making controller2 down, diverted the traffic from haproxy to controller 3 and made controller 2 as secondary 4. Disabled the ipv6 on the all controller nodes. 5. Restarted below Neutron services from Controller-2 (Neutron-l3-agent.service, neutron-linuxbridge-agent.service, neutron-metadata-agent.service) 6. RabbitMQ container stopped on Controller-2 now
Current Observation: Still the Dashboard is too slow to work. Can not create any resource as well....Like VM, Vol etc.
BR// Sudeb Ghosh 7044064878 9332034788
On Sunday 25 August, 2024 at 06:48:45 pm IST, sudeb ghosh <sudeb_ece@yahoo.co.in> wrote:
Hi Team ,
We have an 'Openstack on Ansible' Private cloud environment with 3 controllers & 35 Compute host nodes. Age of this environment is around 4 yrs.
DISTRIB_RELEASE: 21.2.1 DISTRIB_CODENAME: Ussuri DISTRIB_DESCRIPTION: Openstack-Ansible
Top of this Linux containers are there where the services are running inside such as neutron, galera, horizon, keystone, etcd, nova-api, utility etc for all 3 controllers.
Current Set Up: We find in haproxy config file only controller2 & controller3 are configured. So the galera container along with Mariadb Services are running fine within controller2(Primary) & controller3. But Galera container in controller1 is stopped & mariadb service not running here.
Issue Started with: All of suddent we faced, user not able to log in dashboard 5 days back. The dashboard page is not getting loaded.
Action Taken: We find Maria-DB in controller2 is non-primary & DB on controller-3 is down. Made Maria-DB in controller2 primary & DB on controller-3 up which resulted to load the page & log in.
Current Issue: 1) But the Dashboard page is too slow to work. Sometime throwing 504-Timed out error 2) VM Console(VNC) not working for any VM 3) NOt able to create any VM(Showing 'Scheduling' continuously) 4) 25/35 Hypervisons are showing down in Dashboard within 'Hypervisor list' tab but those are up as per CLI 5) Not able to create any volume
Request you to go through the Problem description & Can anybody help me out providing the solution for the same?
Pl do intimate me if any more info is required.
Regards, Sudeb Ghosh sudeb_ece@yahoo.co.in
BR// Sudeb Ghosh 7044064878 9332034788
participants (2)
-
Dmitriy Rabotyagov
-
sudeb ghosh