[Openstack-operators] attaching network cards to VMs taking a very long time

Radu Popescu | eMAG, Technology radu.popescu at emag.ro
Wed May 23 10:08:18 UTC 2018


Hi,

actually, I didn't know about that option. I'll enable it right now.
Testing is done every morning at about 4:00AM ..so I'll know tomorrow morning if it changed anything.

Thanks,
Radu

On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote:

Sorry email went out incomplete.

Read this:

https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/


make sure that Openstack rootwrap configured to work in daemon mode


Thank you


Saverio



2018-05-22 15:29 GMT+02:00 Saverio Proto <zioproto at gmail.com<mailto:zioproto at gmail.com>>:

Hello Radu,


do you have the Openstack rootwrap configured to work in daemon mode ?


please read this article:


2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology

<radu.popescu at emag.ro<mailto:radu.popescu at emag.ro>>:

Hi,


so, nova says the VM is ACTIVE and actually boots with no network. We are

setting some metadata that we use later on and have cloud-init for different

tasks.

So, VM is up, OS is running, but network is working after a random amount of

time, that can get to around 45 minutes. Thing is, is not happening to all

VMs in that test (around 300), but it's happening to a fair amount - around

25%.


I can see the callback coming few seconds after neutron openvswitch agent

says it's completed the setup. My question is, why is it taking so long for

nova openvswitch agent to configure the port? I can see the port up in both

host OS and openvswitch. I would assume it's doing the whole namespace and

iptables setup. But still, 30 minutes? Seems a lot!


Thanks,

Radu


On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote:


We have other scheduled tests that perform end-to-end (assign floating IP,

ssh, ping outside) and never had an issue.

I think we turned it off because the callback code was initially buggy and

nova would wait forever while things were in fact ok, but I'll  change

"vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run

another large test, just to confirm.


We usually run these large tests after a version upgrade to test the APIs

under load.




On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann <mriedemos at gmail.com<mailto:mriedemos at gmail.com>>

wrote:


On 5/17/2018 9:46 AM, George Mihaiescu wrote:


and large rally tests of 500 instances complete with no issues.



Sure, except you can't ssh into the guests.


The whole reason the vif plugging is fatal and timeout and callback code was

because the upstream CI was unstable without it. The server would report as

ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE

guest that you can't actually do anything with is kind of pointless.


_______________________________________________


OpenStack-operators mailing list


OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>


http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




_______________________________________________

OpenStack-operators mailing list

OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20180523/7d56e6d3/attachment.html>


More information about the OpenStack-operators mailing list