<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1].
In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.”<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">How can I use the root_helper_daemon setting without creating this new error?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Message with logs got moderated so logs are here: <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">http://paste.openstack.org/show/782622/<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b>From:</b> Chris Apsey <bitskrieg@bitskrieg.net> <br>
<b>Sent:</b> Friday, September 27, 2019 9:34 AM<br>
<b>To:</b> Albert Braden <albertb@synopsys.com><br>
<b>Cc:</b> openstack-discuss@lists.openstack.org<br>
<b>Subject:</b> Re: Port creation times out for some VMs in large group<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Albert,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Do this: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=">
https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/</a><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Either way, that should solve your problem.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal">r<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Chris Apsey<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">On Friday, September 27, 2019 12:17 PM, Albert Braden <<a href="mailto:Albert.Braden@synopsys.com">Albert.Braden@synopsys.com</a>> wrote:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p>When I create 100 VMs in our prod cluster:<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.”<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate
the issue. The neutron servers don’t appear to be overloaded during the failure.<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>What config variables should I be looking at?<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>Here are the relevant log entries from the HV:<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1')<o:p></o:p></p>
<p>2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9]
Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>More logs and data:<o:p></o:p></p>
<p> <o:p></o:p></p>
<p><a href="http://paste.openstack.org/show/779524/">http://paste.openstack.org/show/779524/</a><o:p></o:p></p>
<p> <o:p></o:p></p>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>