<div dir="ltr">><span style="font-size:12.8000001907349px"> If you had created a second network </span><span style="font-size:12.8000001907349px">and subnet this would have been dropped (different broadcast domain).</span><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Well that update wouldn't have been allowed at the API. You can't use a fixed IP from a subnet on a network that your port isn't attached to. Changing a neutron port to a different network is not what we are talking about here.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px"> I said that's a bad design because other things can cause it to go offline, for </span><span style="font-size:12.8000001907349px">example:</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Yet people do it anyway, which is why I referenced the EC2 example. People can deal with outages caused by unexpected failures. The outage we are talking about is part of a normal API call and it doesn't make any sense to the user. </span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px"> </span><span style="font-size:12.8000001907349px">If it </span><span style="font-size:12.8000001907349px">takes 10 minutes for them to re-create their instance elsewhere that cannot be </span><span style="font-size:12.8000001907349px">blamed on neutron, even if it was our API call that caused it to go offline.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">The outage can still be blamed on Neutron. What you are implying here is that instead of improving the usability of Neutron, we just give up and tell users that they should have known better. I don't like supporting a project with that kind of approach to usability. It leads to unhappy users and it reflects poorly on the quality of the project. </span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px">The difference in a port IP change API call is that it requires action on the </span><span style="font-size:12.8000001907349px">VMs part that neutron can't trigger immediately. </span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">We know why these are different because we understand how Neutron works internally, but there is no reason to think that a user would know why these are different. From a user's perspective, one API call to change an IP (floating IP) works as expected, the other has a huge variable delay (port IP).</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px">How is warning the user about </span><span style="font-size:12.8000001907349px">this a bad thing?</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">We can and should make a note of this behavior, but it's not enough IMO. Users don't read the documentation for these kind of things until they hit an issue. We can update the Neutron server to return the DHCP interval to the Neutron client and update the clien</span><span style="font-size:12.8000001907349px">t to output these warnings, but it's still a bit late at that point since we are telling the user, "You just broke your VM for 0-$(1/2 dhcp lease) hours. If you need it sooner, hopefully you have console access or are fine with a forced restart." </span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px">There is no delay in the API call here, the port was updated just as the user </span><span style="font-size:12.8000001907349px">requested.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">I never said there was a delay in the API call. I am talking about how long it takes for that to take effect on the data plane. For it to take full effect, the VMs need to get the information from the DHCP server. The long default lease we have now means they won't get the information for hours on average, which is the long delay I am referring to.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><br></div><div>><span style="font-size:12.8000001907349px">And adding a DHCP option to tell them to renew more frequently doesn't fix the </span><span style="font-size:12.8000001907349px">problem, it only lessens it to ~(interval/2) - </span><span style="font-size:12.8000001907349px">that might not be acceptable to </span><span style="font-size:12.8000001907349px">users and they need to know the danger.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">In the very first email in this thread, I pointed out that this is only reducing the time. I don't think that was ever up for debate. </span><span style="font-size:12.8000001907349px">The danger exists already and warning them with whatever mechanism you had in mind is orthogonal</span><span style="font-size:12.8000001907349px"> to my proposal to reduce the downtime.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">></span><span style="font-size:12.8000001907349px">This is the one point I've been trying </span><span style="font-size:12.8000001907349px">to get across in this whole discussion - these are advanced options that users </span><span style="font-size:12.8000001907349px">need to take caution with, neutron can only do so much.</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Neutron is completely responsible for the management of the DHCP server in this case. We have a lot of room for improvement here. I don't think we should throw in the towel yet.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 3, 2015 at 8:53 AM, Brian Haley <span dir="ltr"><<a href="mailto:brian.haley@hp.com" target="_blank">brian.haley@hp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 02/03/2015 05:10 AM, Kevin Benton wrote:<br>
>>The unicast DHCP will make it to the "wire", but if you've renumbered the<br>
> subnet either a) the DHCP server won't respond because it's IP has changed as<br>
> well; or b) the DHCP server won't respond because there is no mapping for the VM<br>
> on it's old subnet.<br>
><br>
> We aren't changing the DHCP server's IP here. The process that I saw was to add<br>
> a subnet and start moving VMs over. It's not 'b' either, because the server<br>
> generates a DHCPNAK in response and which will immediately cause the client to<br>
> release/renew. I have verified this behavior already and recorded a packet<br>
> capture for you.[1]<br>
><br>
> In the capture, the renewal value is 4 seconds. I captured one renewal before<br>
> the IP address change from 99.99.99.5 to 10.0.0.25 took place. You can see on<br>
> the next renewal, the DHCP server immediately generates a NACK. The client then<br>
> releases its address, requests a new one, assigns it and ACKs within a couple of<br>
> seconds.<br>
<br>
</span>Thanks for the trace. So one thing I noticed is that this unicast DHCP only got<br>
to the server since you created a second subnet on this network (dest MAC of<br>
packet was that of same router interface). If you had created a second network<br>
and subnet this would have been dropped (different broadcast domain). These<br>
little differences are things users need to know because they lead to heads<br>
banging on desks :(<br>
<span class=""><br>
>>This would happen if the AZ their VM was in went offline as well, at which<br>
> point they would change their design to be more cloud-aware than it was. Let's<br>
> not heap all the blame on neutron - the user is tasked with vetting that<br>
> their decisions meet the requirements they desire by thoroughly testing it.<br>
><br>
> An availability zone going offline is not the same as an API operation that<br>
> takes a day to apply. In an internal cloud, maintenance for AZs can be<br>
> advertised and planned around by tenants running single-AZ services. Even if you<br>
> want to reference a public cloud, look how much of the Internet breaks when<br>
> Amazon's us-east-1a or us-east-1d AZs have issues. Even though people are<br>
> supposed to be bringing cattle to the cloud, a huge portion already have pets<br>
> that they are attached to or that they can't convert into cattle.<br>
<br>
</span>You completely missed the context of my reply Kevin - an AZ failure is not a<br>
planned event. You said people bring pets along, and rebooting them is painful.<br>
I said that's a bad design because other things can cause it to go offline, for<br>
example:<br>
<br>
1. Compute node failure<br>
2. Network node failure<br>
3. Router/switch failure<br>
4. Internet failure<br>
...<br>
99. API call<br>
<br>
All the user knows is they can't reach their VM - the cause doesn't matter when<br>
they can't sell their widgets to customers because their site is down. If it<br>
takes 10 minutes for them to re-create their instance elsewhere that cannot be<br>
blamed on neutron, even if it was our API call that caused it to go offline.<br>
<span class=""><br>
> If our floating IP 'associate' action took 12 hours to take effect on a running<br>
> instance, would telling users to reboot their instances to apply floating IPs<br>
> faster be okay? I would certainly heap the blame on Neutron there.<br>
<br>
</span>The difference in a port IP change API call is that it requires action on the<br>
VMs part that neutron can't trigger immediately. It's still asynchronous like a<br>
floating IP call, but the delay is typically going to be longer. All we can say<br>
is it will take from (0 -> interval) seconds. How is warning the user about<br>
this a bad thing?<br>
<span class=""><br>
>>How about a big (*) next to all the things that could cause issues? :)<br>
><br>
> You want to put it next to all of the API calls to put the burden on the users.<br>
> I want to put it next to the DHCP renewal interval in the config files to put<br>
> the burden on the operators. :)<br>
><br>
> (*) Increasing this value will increase the delay between API calls and when<br>
> they take effect on the data plane for any that depend on DHCP to relay the<br>
> information. (e.g. port IP/subnet changes, port dhcp option changes, subnet<br>
> gateways, subnet routes, subnet DNS servers, etc)<br>
<br>
</span>There is no delay in the API call here, the port was updated just as the user<br>
requested. Since they can't see into my config file (unless they look at their<br>
lease info or run a tcpdump trace) they are essentially making a blind change<br>
that immediately affects their instance.<br>
<br>
And adding a DHCP option to tell them to renew more frequently doesn't fix the<br>
problem, it only lessens it to ~(interval/2) - that might not be acceptable to<br>
users and they need to know the danger. This is the one point I've been trying<br>
to get across in this whole discussion - these are advanced options that users<br>
need to take caution with, neutron can only do so much.<br>
<span class="HOEnZb"><font color="#888888"><br>
-Brian<br>
</font></span><span class="im HOEnZb"><br>
<br>
> 1. <a href="http://paste.openstack.org/show/166048/" target="_blank">http://paste.openstack.org/show/166048/</a><br>
><br>
><br>
> On Mon, Feb 2, 2015 at 8:57 AM, Brian Haley <<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a><br>
</span><div class="HOEnZb"><div class="h5">> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>> wrote:<br>
><br>
> Kevin,<br>
><br>
> I think we are finally converging. One of the points I've been trying to make<br>
> is that users are playing with fire when they start playing with some of these<br>
> port attributes, and given the tool we have to work with (DHCP), the<br>
> instantiation of these changes cannot be made seamlessly to a VM. That's life<br>
> in the cloud, and most of these things can (and should) be designed around.<br>
><br>
> On 02/02/2015 06:48 AM, Kevin Benton wrote:<br>
> >> The only thing this discussion has convinced me of is that allowing users<br>
> > to change the fixed IP address on a neutron port leads to a bad<br>
> > user-experience.<br>
> ><br>
> > Not as bad as having to delete a port and create another one on the same<br>
> > network just to change addresses though...<br>
> ><br>
> >> Even with an 8-minute renew time you're talking up to a 7-minute blackout<br>
> > (87.5% of lease time before using broadcast).<br>
> ><br>
> > I suggested 240 seconds renewal time, which is up to 4 minutes of<br>
> > connectivity outage. This doesn't have anything to do with lease time and<br>
> > unicast DHCP will work because the spoof rules allow DHCP client traffic<br>
> > before restricting to specific IPs.<br>
><br>
> The unicast DHCP will make it to the "wire", but if you've renumbered the subnet<br>
> either a) the DHCP server won't respond because it's IP has changed as well; or<br>
> b) the DHCP server won't respond because there is no mapping for the VM on it's<br>
> old subnet.<br>
><br>
> >> Most would have rebooted long before then, true? Cattle not pets, right?<br>
> ><br>
> > Only in an ideal world that I haven't encountered with customer deployments.<br>
> > Many enterprise deployments end up bringing pets along where reboots aren't<br>
> > always free. The time taken to relaunch programs and restore state can end<br>
> > up being 10 minutes+ if it's something like a VDI deployment or dev<br>
> > environment where someone spends a lot of time working on one VM.<br>
><br>
> This would happen if the AZ their VM was in went offline as well, at which point<br>
> they would change their design to be more cloud-aware than it was. Let's not<br>
> heap all the blame on neutron - the user is tasked with vetting that their<br>
> decisions meet the requirements they desire by thoroughly testing it.<br>
><br>
> >> Changing the lease time is just papering-over the real bug - neutron<br>
> > doesn't support seamless changes in IP addresses on ports, since it totally<br>
> > relies on the dhcp configuration settings a deployer has chosen.<br>
> ><br>
> > It doesn't need to be seamless, but it certainly shouldn't be useless.<br>
> > Connectivity interruptions can be expected with IP changes (e.g. I've seen<br>
> > changes in elastic IPs on EC2 can interrupt connectivity to an instance for<br>
> > up to 2 minutes), but an entire day of downtime is awful.<br>
><br>
> Yes, I agree, an entire day of downtime is bad.<br>
><br>
> > One of the things I'm getting at is that a deployer shouldn't be choosing<br>
> > such high lease times and we are encouraging it with a high default. You are<br>
> > arguing for infrequent renewals to work around excessive logging, which is<br>
> > just an implementation problem that should be addressed with a patch to your<br>
> > logging collector (de-duplication) or to dnsmasq (don't log renewals).<br>
><br>
> My #1 deployment problem was around control-plane upgrade, not logging:<br>
><br>
> "During a control-plane upgrade or outage, having a short DHCP lease time will<br>
> take all your VMs offline. The old value of 2 minutes is not a realistic value<br>
> for an upgrade, and I don't think 8 minutes is much better. Yes, when DHCP is<br>
> down you can't boot a new VM, but as long as customers can get to their existing<br>
> VMs they're pretty happy and won't scream bloody murder."<br>
><br>
> >> Documenting a VM reboot is necessary, or even deprecating this (you won't<br>
> >> like<br>
> > that) are sounding better to me by the minute.<br>
> ><br>
> > If this is an approach you really want to go with, then we should at least<br>
> > be consistent and deprecate the extra dhcp options extension (or at least<br>
> > the ability to update ports' dhcp options). Updating subnet attributes like<br>
> > gateway_ip, dns_nameserves, and host_routes should be thrown out as well. All<br>
> > of these things depend on the DHCP server to deliver updated information and<br>
> > are hindered by renewal times. Why discriminate against IP updates on a port?<br>
> > A failure to receive many of those other types of changes could result in<br>
> > just as severe of a connection disruption.<br>
><br>
> How about a big (*) next to all the things that could cause issues? :) We've<br>
> completely "loaded the gun" exposing all these attributes to the general user<br>
> when only the network-aware power-user should be playing with them.<br>
><br>
> (*) Changing these attributes could cause VMs to become unresponsive for a long<br>
> period of time depending on the deployment settings, and should be used with<br>
> caution. Sometimes a VM reboot will be required to re-gain connectivity.<br>
><br>
> > In summary, the information the DHCP server gives to clients is not static.<br>
> > Unless we eliminate updates to everything in the Neutron API that results in<br>
> > different DHCP lease information, my suggestion is that we include a new<br>
> > option for the renewal interval and have the default set <5 minutes. We can<br>
> > leave the lease default to 1 day so the amount of time a DHCP server can be<br>
> > offline without impacting running clients can stay the same.<br>
><br>
> I'm fine with adding Option 58, even though it only lessens the effect of this<br>
> problem, doesn't truly fix it, and might not work with all clients (like in<br>
> Cirros).<br>
><br>
> -Brian<br>
><br>
> > On Fri, Jan 30, 2015 at 8:00 AM, Brian Haley <<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
</div></div><div class="HOEnZb"><div class="h5">> > <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>>> wrote:<br>
> ><br>
> > Kevin,<br>
> ><br>
> > The only thing this discussion has convinced me of is that allowing users to<br>
> > change the fixed IP address on a neutron port leads to a bad<br>
> > user-experience. Even with an 8-minute renew time you're talking up to a<br>
> > 7-minute blackout (87.5% of lease time before using broadcast). This is time<br>
> > that customers are paying for. Most would have rebooted long before then,<br>
> > true? Cattle not pets, right?<br>
> ><br>
> > Changing the lease time is just papering-over the real bug - neutron doesn't<br>
> > support seamless changes in IP addresses on ports, since it totally relies<br>
> > on the dhcp configuration settings a deployer has chosen. Bickering over the<br>
> > lease time doesn't fix that non-deterministic recovery for the VM.<br>
> > Documenting a VM reboot is necessary, or even deprecating this (you won't<br>
> > like that) are sounding better to me by the minute.<br>
> ><br>
> > Is there anyone else that has used, or has customers using, this part of the<br>
> > neutron API? Can they share their experiences?<br>
> ><br>
> > -Brian<br>
> ><br>
> ><br>
> > On 01/30/2015 07:26 AM, Kevin Benton wrote:<br>
> >>> But they will if we document it well, which is what Salvatore suggested.<br>
> >><br>
> >> I don't think this is a good approach, and it's a big part of why I<br>
> > started this<br>
> >> thread. Most of the deployers/operators I have worked with only read the<br>
> >> bare minimum documentation to get a Neutron deployment working and they<br>
> >> only adjust the settings necessary for basic functionality.<br>
> >><br>
> >> We have an overwhelming amount of configuration options and adding a note<br>
> >> specifying that a particular setting for DHCP leases has been optimized to<br>
> >> reduce logging at the cost of long downtimes during port IP address<br>
> > updates is a<br>
> >> waste of time and effort on our part.<br>
> >><br>
> >>> I think the current default value is also more indicative of something<br>
> >> you'd find in your house, or at work - i.e. stable networks.<br>
> >><br>
> >> Tenants don't care what the DHCP lease time is or that it matches what<br>
> >> they would see from a home router. They only care about connectivity.<br>
> >><br>
> >>> One solution is to disallow this operation.<br>
> >><br>
> >> I want this feature to be useful in deployments by default, not strip it<br>
> >> away. You can probably do this with /etc/neutron/policy.json without a<br>
> >> code change if you wanted to block it in a deployment like yours where you<br>
> >> have<br>
> > such<br>
> >> a high lease time.<br>
> >><br>
> >>> Perhaps letting the user set it, but allow the admin to set the valid<br>
> >>> range<br>
> >> for min/max? And if they don't specify they get the default?<br>
> >><br>
> >> Tenants wouldn't have any reason to adjust this default. They would be<br>
> > even less<br>
> >> likely than the operator to know about this weird relationship between a<br>
> >> DHCP setting and the amount of time they lose connectivity after updating<br>
> >> their ports' IPs.<br>
> >><br>
> >>> It impacts anyone that hasn't changed from the default since July 2013<br>
> >>> and<br>
> > later<br>
> >> (Havana), since if they don't notice, they might get bitten by it.<br>
> >><br>
> >> Keep in mind that what I am suggesting with the lease-renewal-time would<br>
> >> be separate from the lease expiration time. The only difference that an<br>
> >> operator would see on upgrade (if using the defaults) is increased DHCP<br>
> >> traffic and<br>
> > more<br>
> >> logs to syslog from dnsmasq. The lease time would still be the same so the<br>
> >> downtime windows for DHCP agents would be maintained. That is much less of<br>
> >> an impact than many of the non-config changes we make between cycles.<br>
> >><br>
> >> To clarify, even with an option for dhcp-renewal-time I am proposing, you<br>
> >> are still opposed to setting it to anything low because of logging and the<br>
> >> ~24 bps background DHCP traffic per VM?<br>
> >><br>
> >> On Thu, Jan 29, 2015 at 7:11 PM, Brian Haley <<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> > <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>><br>
</div></div><div class="HOEnZb"><div class="h5">> >> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>>>> wrote:<br>
> >><br>
> >> On 01/29/2015 05:28 PM, Kevin Benton wrote:<br>
> >>>> How is Neutron breaking this? If I move a port on my physical<br>
> > switch to a<br>
> >>> different subnet, can you still communicate with the host sitting on it?<br>
> >>> Probably not since it has a view of the world (next-hop router) that<br>
> > no longer<br>
> >>> exists, and the network won't route packets for it's old IP address<br>
> > to the new<br>
> >>> location. It has to wait for it's current DHCP lease to tick down<br>
> > to the point<br>
> >>> where it will use broadcast to get a new one, after which point it<br>
> > will work.<br>
> >>><br>
> >>> That's not just moving to a different subnet. That's moving to a<br>
> > different<br>
> >>> broadcast domain. Neutron supports multiple subnets per network<br>
> > (broadcast<br>
> >>> domain). An address on either subnet will work. The router has two<br>
> > interfaces<br>
> >>> into the network, one on each subnet.[2]<br>
> >>><br>
> >>><br>
> >>>> Does it work on Windows VMs too? People run those in clouds too.<br>
> > The point is<br>
> >>> that if we don't know if all the DHCP clients will support it then<br>
> > it's a<br>
> >>> non-starter since there's no way to tell from the server side.<br>
> >>><br>
> >>> It appears they do.[1] Even for clients that don't, the worst case<br>
> > scenario is<br>
> >>> just that they are stuck where we are now.<br>
> >>><br>
> >>>> "... then the deployer can adjust the value upwards...", hmm, can<br>
> > they adjust it<br>
> >>> downwards as well? :)<br>
> >>><br>
> >>> Yes, but most people doing initial openstack deployments don't and<br>
> > wouldn't<br>
> >>> think to without understanding the intricacies of the security<br>
> > groups filtering<br>
> >>> in Neutron.<br>
> >><br>
> >> But they will if we document it well, which is what Salvatore suggested.<br>
> >><br>
> >>>> I'm glad you're willing to "boil the ocean" to try and get the<br>
> > default changed,<br>
> >>> but is all this really worth it when all you have to do is edit the<br>
> > config file<br>
> >>> in your deployment? That's why the value is there in the first place.<br>
> >>><br>
> >>> The default value is basically incompatible with port IP changes. We<br>
> > shouldn't<br>
> >>> be shipping defaults that lead to half-broken functionality. What I'm<br>
> >>> understanding is that the current default value is to workaround<br>
> > shortcomings in<br>
> >>> dnsmasq. This is an example of implementation details leaking out<br>
> > and leading to<br>
> >>> bad UX.<br>
> >><br>
> >> I think the current default value is also more indicative of something<br>
> > you'd<br>
> >> find in your house, or at work - i.e. stable networks.<br>
> >><br>
> >> I had another thought on this Kevin, hoping that we could come to some<br>
> >> resolution, because sure, shipping broken functionality isn't great.<br>
> > But here's<br>
> >> the rub - how do we make a change in a fixed IP work in *all* deployments?<br>
> >> Since the end-user can't set this value, they'll run into this problem<br>
> > in my<br>
> >> deployment, or any other that has some not-very-short lease time. One<br>
> > solution<br>
> >> is to disallow this operation. The other is to fix neutron to make<br>
> > this work<br>
> >> better (I don't know what that involves, but there's bound to be a way).<br>
> >> Perhaps letting the user set it, but allow the admin to set the valid<br>
> > range for<br>
> >> min/max? And if they don't specify they get the default?<br>
> >><br>
> >>> If we had an option to configure how often iptables rules were<br>
> > refreshed to<br>
> >>> match their security group, there is no way we would have a default<br>
> > of 12 hours.<br>
> >>> This is essentially the same level of connectivity interruption, it<br>
> > just happens<br>
> >>> to be a narrow use case so it hasn't been getting any attention.<br>
> >>><br>
> >>> To flip your question around, why do you care if the default is<br>
> > lower? You<br>
> >>> already adjust it beyond the 1 day default in your deployment, so<br>
> > how would a<br>
> >>> different default impact you?<br>
> >><br>
> >> It impacts anyone that hasn't changed from the default since July 2013<br>
> > and later<br>
> >> (Havana), since if they don't notice, they might get bitten by it.<br>
> >><br>
> >> -Brian<br>
> >><br>
> >><br>
> >>><br>
> >>> 1. <a href="http://support.microsoft.com/kb/121005" target="_blank">http://support.microsoft.com/kb/121005</a> 2. Similar to using the<br>
> >>> "secondary" keyword on Cisco devices. Or<br>
> > just the "ip<br>
> >>> addr add" command on linux.<br>
> >>><br>
> >>> On Thu, Jan 29, 2015 at 1:34 PM, Brian Haley <<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> > <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> > <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>>><br>
> >>> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>><br>
> > <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>><br>
> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a> <mailto:<a href="mailto:brian.haley@hp.com">brian.haley@hp.com</a>>>>>> wrote:<br>
> >>><br>
> >>> On 01/29/2015 03:55 AM, Kevin Benton wrote:<br>
> >>>>> Why would users want to change an active port's IP address anyway?<br>
> >>>><br>
> >>>> Re-addressing. It's not common, but the entire reason I<br>
> > brought this<br>
> >> up is<br>
> >>>> because a user was moving an instance to another subnet on the<br>
> > same<br>
> >> network and<br>
> >>>> stranded one of their VMs.<br>
> >>>><br>
> >>>>> I worry about setting a default config value to handle a very<br>
> >> unusual use case.<br>
> >>>><br>
> >>>> Changing a static lease is something that works on normal networks<br>
> >> so I don't<br>
> >>>> think we should break it in Neutron without a really good reason.<br>
> >>><br>
> >>> How is Neutron breaking this? If I move a port on my physical<br>
> > switch to a<br>
> >>> different subnet, can you still communicate with the host<br>
> > sitting on it?<br>
> >>> Probably not since it has a view of the world (next-hop router) that<br>
> >> no longer<br>
> >>> exists, and the network won't route packets for it's old IP<br>
> > address to<br>
> >> the new<br>
> >>> location. It has to wait for it's current DHCP lease to tick<br>
> > down to<br>
> >> the point<br>
> >>> where it will use broadcast to get a new one, after which point it<br>
> >> will work.<br>
> >>><br>
> >>>> Right now, the big reason to keep a high lease time that I agree<br>
> >> with is that it<br>
> >>>> buys operators lots of dnsmasq downtime without affecting running<br>
> >> clients. To<br>
> >>>> get the best of both worlds we can set DHCP option 58 (a.k.a<br>
> >> dhcp-renewal-time<br>
> >>>> or T1) to 240 seconds. Then the lease time can be left to be<br>
> >> something large<br>
> >>>> like 10 days to allow for tons of DHCP server downtime without<br>
> >> affecting running<br>
> >>>> clients.<br>
> >>>><br>
> >>>> There are two issues with this approach. First, some simple dhcp<br>
> >> clients don't<br>
> >>>> honor that dhcp option (e.g. the one with Cirros), but it<br>
> > works with<br>
> >> dhclient so<br>
> >>>> it should work on CentOS, Fedora, etc (I verified it works on<br>
> >> Ubuntu). This<br>
> >>>> isn't a big deal because the worst case is what we have already<br>
> >> (half of the<br>
> >>>> lease time). The second issue is that dnsmasq hardcodes that<br>
> > option,<br>
> >> so a patch<br>
> >>>> would be required to allow it to be specified in the options<br>
> > file. I<br>
> >> am happy to<br>
> >>>> submit the patch required there so that isn't a big deal either.<br>
> >>><br>
> >>> Does it work on Windows VMs too? People run those in clouds<br>
> > too. The<br>
> >> point is<br>
> >>> that if we don't know if all the DHCP clients will support it<br>
> > then it's a<br>
> >>> non-starter since there's no way to tell from the server side.<br>
> >>><br>
> >>>> If we implement that fix, the remaining issue is Brian's other<br>
> >> comment about too<br>
> >>>> much DHCP traffic. I've been doing some packet captures and<br>
> > the standard<br>
> >>>> request/reply for a renewal is 2 unicast packets totaling<br>
> > about 725<br>
> >> bytes.<br>
> >>>> Assuming 10,000 VMs renewing every 240 seconds, there will be an<br>
> >> average of 242<br>
> >>>> kbps background traffic across the entire network. Even at a<br>
> > density<br>
> >> of 50 VMs,<br>
> >>>> that's only 1.2 kbps per compute node. If that's still too much,<br>
> >> then the<br>
> >>>> deployer can adjust the value upwards, but that's hardly a<br>
> > reason to<br>
> >> have a high<br>
> >>>> default.<br>
> >>><br>
> >>> "... then the deployer can adjust the value upwards...", hmm,<br>
> > can they<br>
> >> adjust it<br>
> >>> downwards as well? :)<br>
> >>><br>
> >>>> That just leaves the logging problem. Since we require a change to<br>
> >> dnsmasq<br>
> >>>> anyway, perhaps we could also request an option to suppress logs<br>
> >> from renewals?<br>
> >>>> If that's not adequate, I think 2 log entries per vm every 240<br>
> >> seconds is really<br>
> >>>> only a concern for operators with large clouds and they should<br>
> > have the<br>
> >>>> knowledge required to change a config file anyway. ;-)<br>
> >>><br>
> >>> I'm glad you're willing to "boil the ocean" to try and get the<br>
> > default<br>
> >> changed,<br>
> >>> but is all this really worth it when all you have to do is edit the<br>
> >> config file<br>
> >>> in your deployment? That's why the value is there in the first<br>
> > place.<br>
> >>><br>
> >>> Sorry, I'm still unconvinced we need to do anything more than<br>
> > document<br>
> >> this.<br>
> >>><br>
> >>> -Brian<br>
> >>><br>
> >>><br>
> >>><br>
> >>><br>
> > __________________________________________________________________________<br>
> >>> OpenStack Development Mailing List (not for usage questions)<br>
> >>> Unsubscribe:<br>
> >> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >>><br>
> >>><br>
> >>><br>
> >>><br>
> >>> -- Kevin Benton<br>
> >>><br>
> >>><br>
> >>><br>
> > __________________________________________________________________________<br>
> >>> OpenStack Development Mailing List (not for usage questions)<br>
> >>> Unsubscribe:<br>
> > <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >>><br>
> >><br>
> >><br>
> >> __________________________________________________________________________<br>
> >> OpenStack Development Mailing List (not for usage questions) Unsubscribe:<br>
> > <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> -- Kevin Benton<br>
> >><br>
> >><br>
> >> __________________________________________________________________________<br>
> >> OpenStack Development Mailing List (not for usage questions) Unsubscribe:<br>
> >> <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> >><br>
> ><br>
> ><br>
> > __________________________________________________________________________<br>
> > OpenStack Development Mailing List (not for usage questions) Unsubscribe:<br>
> > <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > -- Kevin Benton<br>
> ><br>
> ><br>
> > __________________________________________________________________________<br>
> > OpenStack Development Mailing List (not for usage questions) Unsubscribe:<br>
> > <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> > <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
> ><br>
><br>
><br>
> __________________________________________________________________________<br>
> OpenStack Development Mailing List (not for usage questions)<br>
> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <<a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Kevin Benton<br>
><br>
><br>
> __________________________________________________________________________<br>
> OpenStack Development Mailing List (not for usage questions)<br>
> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
<br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div>Kevin Benton</div></div>
</div>