<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; ">
<div>
<div>Hi all,</div>
<div><br>
</div>
<div>Just in case someone else runs into this problem, we wanted to give an update on this as we've solved most of it.</div>
<div><br>
</div>
<div>Long story short, when the neutorn's get_security_groups API is hit with an admin context, it attempts to get
<i>all</i> security groups. Since we have so many security groups, this effectively causes neutron-server to hang. We did three things to mitigate or fix this:</div>
<div><br>
</div>
<ol>
<li>If neutron.db.securitygroups_db.SecurityGroupDbMixin#get_security_groups is called in a way that we know will cause it to hang, we fail fast and return an error. This will allow "normal" calls to that method to complete without issue. The obvious downside
to this is that the caller will get an error, but the caller would have gotten a time out previously, so this isn't any worse and neutron-server won't hang. We don't intend to upstream this as it is a bit of a hack.</li><li>In neutron.db.securitygroups_db.SecurityGroupDbMixin#_get_security_groups_on_port (which is called when creating a port), we ensured that get_security_groups is getting called with a proper tenant_id filter. It wasn't before and because this gets called
with an admin context from nova-scheduler, it would attempt to get all security groups, which it doesn't need.</li><li>We found that this commit (which isn't in a maintenance release yet) fixed one of the problem areas:
<ul>
<li><a href="https://github.com/openstack/nova/commit/19fdaa225abd007a13cd38c742e27c5ee620186c">https://github.com/openstack/nova/commit/19fdaa225abd007a13cd38c742e27c5ee620186c</a></li><li><a href="https://review.openstack.org/#/c/30048/">https://review.openstack.org/#/c/30048/</a></li><li>We cherry picked that and we're now applying it as a patch via Anvil. It's already been back ported to stable/havana, so once it get's into a maintenance release, we'll be able to remove the patch.</li></ul>
</li></ol>
<div>We think #2 still exists as an upstream bug in master. Will investigate further and submit a bug and patch if someone else hasn't already addressed it.</div>
<div><br>
</div>
<div>/Craig J</div>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style="font-weight:bold">From: </span>Mike Dorman <<a href="mailto:mdorman@godaddy.com">mdorman@godaddy.com</a>><br>
<span style="font-weight:bold">Date: </span>Wednesday, February 5, 2014 5:36 PM<br>
<span style="font-weight:bold">To: </span>"<a href="mailto:openstack@lists.openstack.org">openstack@lists.openstack.org</a>" <<a href="mailto:openstack@lists.openstack.org">openstack@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>[Openstack] [neutron] neutron-server iterating over all security groups, not just those in the project<br>
</div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; ">
<div>We're seeing an issue where neutron-server (Havana) iterates over all security groups (with an individual SELECT query for each), rather than just the security groups in the tenant. We can trigger this by creating a port using the default security group.
If we specify no security groups, or a specific security group, it works fine.</div>
<div><br>
</div>
<div>We have ~1000 tenants and 10 security groups in each tenant in this environment. So this ultimately results in 10k SQL queries, which tanks neutron-server for a few minutes. Note that all the tenants are in the same network.</div>
<div><br>
</div>
<div>Still trying to run down where in the code this is happening. But I've been able to trace the SQL queries up to when it starts the iteration: <a href="http://pastebin.com/ZkP5idkJ">http://pastebin.com/ZkP5idkJ</a></div>
<div><br>
</div>
<div>You can see where the first two queries get the groups/rules just for the specific tenant. But then after that, it's the same queries, but for groups/rules in
<span style="font-weight: bold; ">all</span> tenants.</div>
<div><br>
</div>
<div>We will continue looking into it to see what we can find, but any suggestions or ideas would be appreciated.</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Mike</div>
<div><u><br>
</u></div>
</div>
</div>
</span>
</body>
</html>