[kolla][nova][neutron] Access to VMs is slow when running on a remote compute host

newer
[devstack] [trove] Enabling other...

older
[Mistral] Guaranteed notification...

Giuseppe Sannino

12 Jul 2019 12 Jul '19

6:57 a.m.

Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the following deployment topology: 1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS Controller) location 3) Network node running on the Compute host As per the above info, Controller and compute run on two different networks. Kolla-Ansible is not really designed for such scenario but after manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine. *> Problem <* I have no specific issue working with this deployment except the following: "SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, whatever). *> Observations <* - Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment - With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow - I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. - SSH connection is slow even if I try to login into the VM within the IP Namespace

...

From the ssh -vvv, I can see that the authentication gets stuck here:

debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...

...
...
...
...
10 to 15 seconds later

debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

Attachments:

attachment.html (text/html — 3.1 KB)

Show replies by date

Brian Haley

12 Jul 12 Jul

7:23 a.m.

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...

Hi community, I need your help ,tips, advices.

*> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the following deployment topology:

1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS Controller) location 3) Network node running on the Compute host

As per the above info, Controller and compute run on two different networks.

Kolla-Ansible is not really designed for such scenario but after manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

*> Problem <* I have no specific issue working with this deployment except the following:

"SSH connection to the VM is quite slow".

It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, whatever).

But once logged-in things are OK? For example, an scp stalls the same way, but the transfer is fast?

...

*> Observations <*

* Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace

From the ssh -vvv, I can see that the authentication gets stuck here:

debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:no-more-sessions@openssh.com> debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
...
...
...
...
10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction. -Brian

...

debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com <mailto:hostkeys-00@openssh.com> want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1

Have you ever experienced such issue ? Any suggestion?

Many thanks

/Giuseppe

Mark Goddard

8:35 a.m.

On Fri, 12 Jul 2019 at 15:24, Brian Haley <haleyb.dev@gmail.com> wrote:

...

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices.

*> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the following deployment topology:

1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS Controller) location 3) Network node running on the Compute host

As per the above info, Controller and compute run on two different networks.

Kolla-Ansible is not really designed for such scenario but after manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

*> Problem <* I have no specific issue working with this deployment except the following:

"SSH connection to the VM is quite slow".

It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, whatever).

But once logged-in things are OK? For example, an scp stalls the same way, but the transfer is fast?

...
*> Observations <*

* Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace

From the ssh -vvv, I can see that the authentication gets stuck here:

debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:no-more-sessions@openssh.com> debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
...
...
...
> 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

+1 - ~30s timeout on SSH login is normally a DNS issue.

...

-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com <mailto:hostkeys-00@openssh.com> want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1

Have you ever experienced such issue ? Any suggestion?

Many thanks

/Giuseppe

Slawek Kaplonski

8:40 a.m.

Hi, I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...

On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the following deployment topology: 1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS Controller) location 3) Network node running on the Compute host As per the above info, Controller and compute run on two different networks. Kolla-Ansible is not really designed for such scenario but after manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine. *> Problem <* I have no specific issue working with this deployment except the following: "SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, whatever).

But once logged-in things are OK? For example, an scp stalls the same way, but the transfer is fast?

...
*> Observations <* * Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace From the ssh -vvv, I can see that the authentication gets stuck here: debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:no-more-sessions@openssh.com> debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
...
...
...
> 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com <mailto:hostkeys-00@openssh.com> want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Giuseppe Sannino

15 Jul 15 Jul

7:29 a.m.

Hi! first of all, thanks for the fast replies. I do appreciate that. I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

...

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. The Cinder Volume docker is running on the Compute Host and Cinder is using the filesystem as backend. BR /Giuseppe On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...

Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...
On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the following deployment topology: 1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS Controller) location 3) Network node running on the Compute host As per the above info, Controller and compute run on two different networks. Kolla-Ansible is not really designed for such scenario but after manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine. *> Problem <* I have no specific issue working with this deployment except the following: "SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, whatever).

But once logged-in things are OK? For example, an scp stalls the same way, but the transfer is fast?

...
*> Observations <* * Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace From the ssh -vvv, I can see that the authentication gets stuck here: debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto: no-more-sessions@openssh.com> debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
...
...
>> 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com <mailto:hostkeys-00@openssh.com> want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Sean Mooney

9:18 a.m.

On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote:

...

Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm. if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...

The Cinder Volume docker is running on the Compute Host and Cinder is using the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...
Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...
On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the

following deployment topology:

...
...
1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS

Controller) location

...
...
3) Network node running on the Compute host As per the above info, Controller and compute run on two different

networks.

...
...
Kolla-Ansible is not really designed for such scenario but after

manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

...
...
*> Problem <* I have no specific issue working with this deployment except the

following:

...
...
"SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS,

whatever).

...
But once logged-in things are OK? For example, an scp stalls the same

way, but the transfer is fast?

...
...
*> Observations <* * Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace From the ssh -vvv, I can see that the authentication gets stuck here: debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:

no-more-sessions@openssh.com>

...
...
debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
...
> > > 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or

running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

...
-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com

<mailto:hostkeys-00@openssh.com> want_reply 0

...
...
debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Giuseppe Sannino

9:34 a.m.

Hi Sean, the ssh to localhost is slow as well. "telnet localhost" is also slow. /Giuseppe On Mon, 15 Jul 2019 at 18:18, Sean Mooney <smooney@redhat.com> wrote:

...

On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote:

...
Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm.

if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...
The Cinder Volume docker is running on the Compute Host and Cinder is

using

...
the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...
Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...
On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the

following deployment topology:

...
...
1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS

Controller) location

...
...
3) Network node running on the Compute host As per the above info, Controller and compute run on two different

networks.

...
...
Kolla-Ansible is not really designed for such scenario but after

manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

...
...
*> Problem <* I have no specific issue working with this deployment except the

following:

...
...
"SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS,

whatever).

...
But once logged-in things are OK? For example, an scp stalls the

same

way, but the transfer is fast?

...
...
*> Observations <* * Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller off with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows)

always

...
...
with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace From the ssh -vvv, I can see that the authentication gets stuck here: debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:

no-more-sessions@openssh.com>

...
...
debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network

...
> > > > 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or

running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

...
-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com

<mailto:hostkeys-00@openssh.com> want_reply 0

...
...
debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Alex Schultz

9:45 a.m.

On Mon, Jul 15, 2019 at 10:40 AM Giuseppe Sannino < km.giuseppesannino@gmail.com> wrote:

...

Hi Sean, the ssh to localhost is slow as well. "telnet localhost" is also slow.

Are you having dns issues? Historically if you have UseDNS set to true and your dns servers are bad it can just be slow to connect as it tries to do the reverse lookup.

...

/Giuseppe

On Mon, 15 Jul 2019 at 18:18, Sean Mooney <smooney@redhat.com> wrote:

...
On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote:

...
Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm.

if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...
The Cinder Volume docker is running on the Compute Host and Cinder is

using

...
the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...
Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...
On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote:

...
Hi community, I need your help ,tips, advices. *> Environment <* I have deployed Openstack "Stein" using the latest kolla-ansible on the

following deployment topology:

...
...
1) OS Controller running as VM on a "cloud" location 2) OS Compute running on a baremetal server remotely (wrt OS

Controller) location

...
...
3) Network node running on the Compute host As per the above info, Controller and compute run on two different

networks.

...
...
Kolla-Ansible is not really designed for such scenario but after

manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

...
...
*> Problem <* I have no specific issue working with this deployment except the

following:

...
...
"SSH connection to the VM is quite slow". It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS,

whatever).

...
But once logged-in things are OK? For example, an scp stalls the

same

way, but the transfer is fast?

...
...
*> Observations <* * Except for the slowness during the SSH login, I don't have any further specific issue working with this envirorment * With the Network on the Compute I can turn the OS controller

off

...
...
with no impact on the VM. Still the connection is slow * I tried different type of images (Ubuntu, CentOS, Windows) always with the same result. * SSH connection is slow even if I try to login into the VM within the IP Namespace From the ssh -vvv, I can see that the authentication gets stuck here: debug1: Authentication succeeded (publickey). Authenticated to ***** debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com <mailto:

no-more-sessions@openssh.com>

...
...
debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network > > > > > 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or

running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

...
-Brian

...
debug3: receive packet: type 80 debug1: client_input_global_request: rtype

hostkeys-00@openssh.com

<mailto:hostkeys-00@openssh.com> want_reply 0

...
...
debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IP_TOS 0x10 debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 Have you ever experienced such issue ? Any suggestion? Many thanks /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Giuseppe Sannino

10:05 a.m.

Hi Alex, yeah, it was the first suspect also based on the various research on internet. I currently have the "useDNS" set to no but still the issue persists. /G On Mon, 15 Jul 2019 at 18:46, Alex Schultz <aschultz@redhat.com> wrote:

...

On Mon, Jul 15, 2019 at 10:40 AM Giuseppe Sannino < km.giuseppesannino@gmail.com> wrote:

...
Hi Sean, the ssh to localhost is slow as well. "telnet localhost" is also slow.

Are you having dns issues? Historically if you have UseDNS set to true and your dns servers are bad it can just be slow to connect as it tries to do the reverse lookup.

...
/Giuseppe

On Mon, 15 Jul 2019 at 18:18, Sean Mooney <smooney@redhat.com> wrote:

...
On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote:

...
Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm.

if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...
The Cinder Volume docker is running on the Compute Host and Cinder is

using

...
the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...
Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

...
On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote:

On 7/12/19 9:57 AM, Giuseppe Sannino wrote: > Hi community, > I need your help ,tips, advices. > *> Environment <* > I have deployed Openstack "Stein" using the latest kolla-ansible on the

following deployment topology:

...
> 1) OS Controller running as VM on a "cloud" location > 2) OS Compute running on a baremetal server remotely (wrt OS

Controller) location

...
> 3) Network node running on the Compute host > As per the above info, Controller and compute run on two different

networks.

...
> Kolla-Ansible is not really designed for such scenario but after

manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine.

...
> *> Problem <* > I have no specific issue working with this deployment except the

following:

...
> "SSH connection to the VM is quite slow". > It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS,

whatever).

...
But once logged-in things are OK? For example, an scp stalls the

same

way, but the transfer is fast?

...
> *> Observations <* > * Except for the slowness during the SSH login, I don't have any > further specific issue working with this envirorment > * With the Network on the Compute I can turn the OS controller

off

...
> with no impact on the VM. Still the connection is slow > * I tried different type of images (Ubuntu, CentOS, Windows) always > with the same result. > * SSH connection is slow even if I try to login into the VM within the > IP Namespace > From the ssh -vvv, I can see that the authentication gets stuck here: > debug1: Authentication succeeded (publickey). > Authenticated to ***** > debug1: channel 0: new [client-session] > debug3: ssh_session2_open: channel_new: 0 > debug2: channel 0: send open > debug3: send packet: type 90 > debug1: Requesting no-more-sessions@openssh.com <mailto:

no-more-sessions@openssh.com>

...
> debug3: send packet: type 80 > debug1: Entering interactive session. > debug1: pledge: network > > > > > > 10 to 15 seconds later

What is sshd doing at this time? Have you tried enabling debug or

running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction.

...
-Brian

> debug3: receive packet: type 80 > debug1: client_input_global_request: rtype

hostkeys-00@openssh.com

<mailto:hostkeys-00@openssh.com> want_reply 0

...
> debug3: receive packet: type 91 > debug2: callback start > debug2: fd 3 setting TCP_NODELAY > debug3: ssh_packet_set_tos: set IP_TOS 0x10 > debug2: client_session2_setup: id 0 > debug2: channel 0: request pty-req confirm 1 > Have you ever experienced such issue ? > Any suggestion? > Many thanks > /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Radosław Piliszek

10:26 a.m.

For completeness - always also ensure that 'GSSAPIAuthentication' is set to 'no' because in default config it might require DNS lookups too. (Obviously you can run GSSAPIAuthentication and avoid DNS lookups by configuring GSSAPI appropriately ;-) ). Kind regards, Radek pon., 15 lip 2019 o 19:14 Giuseppe Sannino <km.giuseppesannino@gmail.com> napisał(a):

...

Hi Alex, yeah, it was the first suspect also based on the various research on internet. I currently have the "useDNS" set to no but still the issue persists.

/G

On Mon, 15 Jul 2019 at 18:46, Alex Schultz <aschultz@redhat.com> wrote:

...
On Mon, Jul 15, 2019 at 10:40 AM Giuseppe Sannino < km.giuseppesannino@gmail.com> wrote:

...
Hi Sean, the ssh to localhost is slow as well. "telnet localhost" is also slow.

Are you having dns issues? Historically if you have UseDNS set to true and your dns servers are bad it can just be slow to connect as it tries to do the reverse lookup.

...
/Giuseppe

On Mon, 15 Jul 2019 at 18:18, Sean Mooney <smooney@redhat.com> wrote:

...
On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote:

...
Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm.

if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...
The Cinder Volume docker is running on the Compute Host and Cinder is

using

...
the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

...
Hi,

I suspect some problems with names resolving. Can You check if You have also such delay when doing e.g. “sudo” commands after You ssh to the instance?

> On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote: > > > > On 7/12/19 9:57 AM, Giuseppe Sannino wrote: > > Hi community, > > I need your help ,tips, advices. > > *> Environment <* > > I have deployed Openstack "Stein" using the latest kolla-ansible on the

following deployment topology: > > 1) OS Controller running as VM on a "cloud" location > > 2) OS Compute running on a baremetal server remotely (wrt OS

Controller) location > > 3) Network node running on the Compute host > > As per the above info, Controller and compute run on two different

networks. > > Kolla-Ansible is not really designed for such scenario but after

manipulating the globals.yml and the inventory files (basically I had to move node specific network settings from the globals to the inventory file), eventually the deployment works fine. > > *> Problem <* > > I have no specific issue working with this deployment except the

following: > > "SSH connection to the VM is quite slow". > > It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS,

whatever). > > But once logged-in things are OK? For example, an scp stalls the same

way, but the transfer is fast? > > > *> Observations <* > > * Except for the slowness during the SSH login, I don't have any > > further specific issue working with this envirorment > > * With the Network on the Compute I can turn the OS controller off > > with no impact on the VM. Still the connection is slow > > * I tried different type of images (Ubuntu, CentOS, Windows) always > > with the same result. > > * SSH connection is slow even if I try to login into the VM within the > > IP Namespace > > From the ssh -vvv, I can see that the authentication gets stuck here: > > debug1: Authentication succeeded (publickey). > > Authenticated to ***** > > debug1: channel 0: new [client-session] > > debug3: ssh_session2_open: channel_new: 0 > > debug2: channel 0: send open > > debug3: send packet: type 90 > > debug1: Requesting no-more-sessions@openssh.com <mailto:

no-more-sessions@openssh.com> > > debug3: send packet: type 80 > > debug1: Entering interactive session. > > debug1: pledge: network > > > > > > > 10 to 15 seconds later > > What is sshd doing at this time? Have you tried enabling debug or

running tcpdump when a new connection is attempted? At first glance I'd say it's a DNS issue since it eventually succeeds, the logs would help to point in a direction. > > -Brian > > > > debug3: receive packet: type 80 > > debug1: client_input_global_request: rtype hostkeys-00@openssh.com

<mailto:hostkeys-00@openssh.com> want_reply 0 > > debug3: receive packet: type 91 > > debug2: callback start > > debug2: fd 3 setting TCP_NODELAY > > debug3: ssh_packet_set_tos: set IP_TOS 0x10 > > debug2: client_session2_setup: id 0 > > debug2: channel 0: request pty-req confirm 1 > > Have you ever experienced such issue ? > > Any suggestion? > > Many thanks > > /Giuseppe

— Slawek Kaplonski Senior software engineer Red Hat

Giuseppe Sannino

17 Jul 17 Jul

12:41 a.m.

Hi Radoslaw, all, I applied the GSSAPIAuthentication setting along with the UseDNS one. No luck. One thing I want to share and that maybe goes in a "no network" direction. I disabled the execution of the motd script during the login phase via "chmod -x /etc/update-motd.d/*". I'm able to reduce the login time from ~12secs down to ~3sec. To me, it looks like the issue is with the access in RW to the Guest VM filesystem which takes quite a while. One more thing, in the tcpdump trace, during the SSH login, I can see that when the procedure gets stuck (that pledge network... log in ssh) is the Server (so the Guest OS) that takes time to reply with a "Server packet. But, I can see a TCP ACK sent back from the server. This makes me quite sure that from a networking point of view the things are handled with no delay. /G On Mon, 15 Jul 2019 at 19:27, Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:

...

For completeness - always also ensure that 'GSSAPIAuthentication' is set to 'no' because in default config it might require DNS lookups too. (Obviously you can run GSSAPIAuthentication and avoid DNS lookups by configuring GSSAPI appropriately ;-) ).

Kind regards, Radek

pon., 15 lip 2019 o 19:14 Giuseppe Sannino <km.giuseppesannino@gmail.com> napisał(a):

...
Hi Alex, yeah, it was the first suspect also based on the various research on internet. I currently have the "useDNS" set to no but still the issue persists.

/G

On Mon, 15 Jul 2019 at 18:46, Alex Schultz <aschultz@redhat.com> wrote:

...
On Mon, Jul 15, 2019 at 10:40 AM Giuseppe Sannino < km.giuseppesannino@gmail.com> wrote:

...
Hi Sean, the ssh to localhost is slow as well. "telnet localhost" is also slow.

Are you having dns issues? Historically if you have UseDNS set to true and your dns servers are bad it can just be slow to connect as it tries to do the reverse lookup.

...
/Giuseppe

On Mon, 15 Jul 2019 at 18:18, Sean Mooney <smooney@redhat.com> wrote:

...
...
Hi! first of all, thanks for the fast replies. I do appreciate that.

I did some more test trying to figure out the issue. - Set UseDNS to "no" in sshd_config => Issue persists - Installed and configured Telnet => Telnet login is slow as well

From the "top" or "auth.log"nothing specific popped up. I can sshd taking some cpu for a short while but nothing more than that.

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing

On Mon, 2019-07-15 at 16:29 +0200, Giuseppe Sannino wrote: throughput on

...
the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. unless you see iowait in the guest its likely not related to the disk speed. you might be able to improve the disk performace by changeing the chache mode but unless you are seeing io wait that is just an optimisation to try later.

when you are logged into the vm have you tried ssh again via localhost to determin if the long login time is related to the network or the vm.

if its related to the network it will be fast over localhost if its related to the vm, e.g. because of disk, cpu load, memory load or ssh server configuration then the local ssh will be slow.

...
The Cinder Volume docker is running on the Compute Host and Cinder

...
the filesystem as backend.

BR /Giuseppe

On Fri, 12 Jul 2019 at 17:41, Slawek Kaplonski <skaplons@redhat.com> wrote:

> Hi, > > I suspect some problems with names resolving. Can You check if You have > also such delay when doing e.g. “sudo” commands after You ssh to

...
> instance? > > > On 12 Jul 2019, at 16:23, Brian Haley <haleyb.dev@gmail.com> wrote: > > > > > > > > On 7/12/19 9:57 AM, Giuseppe Sannino wrote: > > > Hi community, > > > I need your help ,tips, advices. > > > *> Environment <* > > > I have deployed Openstack "Stein" using the latest kolla-ansible on the > > following deployment topology: > > > 1) OS Controller running as VM on a "cloud" location > > > 2) OS Compute running on a baremetal server remotely (wrt OS > > Controller) location > > > 3) Network node running on the Compute host > > > As per the above info, Controller and compute run on two different > > networks. > > > Kolla-Ansible is not really designed for such scenario but after > > manipulating the globals.yml and the inventory files (basically I had to > move node specific network settings from the globals to the inventory > file), eventually the deployment works fine. > > > *> Problem <* > > > I have no specific issue working with this deployment except

...
> > following: > > > "SSH connection to the VM is quite slow". > > > It takes around 20 seconds for me to log into the VM (Ubuntu, CentOS, > > whatever). > > > > But once logged-in things are OK? For example, an scp stalls

is using the the the same

...
> > way, but the transfer is fast? > > > > > *> Observations <* > > > * Except for the slowness during the SSH login, I don't have any > > > further specific issue working with this envirorment > > > * With the Network on the Compute I can turn the OS controller off > > > with no impact on the VM. Still the connection is slow > > > * I tried different type of images (Ubuntu, CentOS, Windows) always > > > with the same result. > > > * SSH connection is slow even if I try to login into the VM within the > > > IP Namespace > > > From the ssh -vvv, I can see that the authentication gets stuck here: > > > debug1: Authentication succeeded (publickey). > > > Authenticated to ***** > > > debug1: channel 0: new [client-session] > > > debug3: ssh_session2_open: channel_new: 0 > > > debug2: channel 0: send open > > > debug3: send packet: type 90 > > > debug1: Requesting no-more-sessions@openssh.com <mailto: > > no-more-sessions@openssh.com> > > > debug3: send packet: type 80 > > > debug1: Entering interactive session. > > > debug1: pledge: network > > > > > > > > 10 to 15 seconds later > > > > What is sshd doing at this time? Have you tried enabling debug or > > running tcpdump when a new connection is attempted? At first glance I'd > say it's a DNS issue since it eventually succeeds, the logs would help to > point in a direction. > > > > -Brian > > > > > > > debug3: receive packet: type 80 > > > debug1: client_input_global_request: rtype hostkeys-00@openssh.com > > <mailto:hostkeys-00@openssh.com> want_reply 0 > > > debug3: receive packet: type 91 > > > debug2: callback start > > > debug2: fd 3 setting TCP_NODELAY > > > debug3: ssh_packet_set_tos: set IP_TOS 0x10 > > > debug2: client_session2_setup: id 0 > > > debug2: channel 0: request pty-req confirm 1 > > > Have you ever experienced such issue ? > > > Any suggestion? > > > Many thanks > > > /Giuseppe > > — > Slawek Kaplonski > Senior software engineer > Red Hat > >

Jeremy Stanley

15 Jul 15 Jul

9:29 a.m.

On 2019-07-15 16:29:47 +0200 (+0200), Giuseppe Sannino wrote: [...]

...

Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. [...]

Have you checked dmesg in the guest instance to see if there is any I/O problem reported by the kernel? The login process will block on updating /var/log/wtmp or similar, so if writes to whatever backing store that lives on are delayed, that can explain the symptom. -- Jeremy Stanley

Giuseppe Sannino

9:43 a.m.

Ciao Jeremy, dmesg reports no error on the guest. syslog and auth.log look clean as well. /G On Mon, 15 Jul 2019 at 18:30, Jeremy Stanley <fungi@yuggoth.org> wrote:

...

On 2019-07-15 16:29:47 +0200 (+0200), Giuseppe Sannino wrote: [...]

...
Once logged in the VM is not too slow. CLI doesn't get stuck or similar. One thing worthwhile to mention, it seems like the writing throughput on the disk is a bit slow: 67MB/s wrt around 318MB/s of another VM running on a "datacenter" Openstack installation. [...]

Have you checked dmesg in the guest instance to see if there is any I/O problem reported by the kernel? The login process will block on updating /var/log/wtmp or similar, so if writes to whatever backing store that lives on are delayed, that can explain the symptom. -- Jeremy Stanley

2182

Age (days ago)

2187

Last active (days ago)

List overview

Download

12 comments

8 participants

participants (8)

Alex Schultz
Brian Haley
Giuseppe Sannino
Jeremy Stanley
Mark Goddard
Radosław Piliszek
Sean Mooney
Slawek Kaplonski