<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div><div>On Aug 8, 2014, at 14:09 , Russell Bryant <<a href="mailto:rbryant@redhat.com">rbryant@redhat.com</a>> wrote:</div></div><div><div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On 08/06/2014 01:41 PM, Jay Pipes wrote:<br><blockquote type="cite">On 08/06/2014 01:40 AM, Tom Fifield wrote:<br><blockquote type="cite">On 06/08/14 13:30, Robert Collins wrote:<br><blockquote type="cite">On 6 August 2014 17:27, Tom Fifield <<a href="mailto:tom@openstack.org">tom@openstack.org</a>> wrote:<br><blockquote type="cite">On 06/08/14 13:24, Robert Collins wrote:<br></blockquote><br><blockquote type="cite"><blockquote type="cite">What happened to your DB migrations then? :)<br></blockquote><br><br>Sorry if I misunderstood, I thought we were talking about running VM<br>downtime here?<br></blockquote><br>While DB migrations are running things like the nova metadata service<br>can/will misbehave - and user code within instances will be affected.<br>Thats arguably VM downtime.<br><br>OTOH you could define it more narrowly as 'VMs are not powered off' or<br>'VMs are not stalled for more than 2s without a time slice' etc etc -<br>my sense is that most users are going to be particularly concerned<br>about things for which they have to *do something* - e.g. VMs being<br>powered off or rebooted - but having no network for a short period<br>while vifs are replugged and the overlay network re-establishes itself<br>would be much less concerning.<br></blockquote><br>I think you've got it there, Rob - nicely put :)<br><br>In many cases the users I've spoken to who are looking for a live path<br>out of nova-network on to neutron are actually completely OK with some<br>"API service" downtime (metadata service is an API service by their<br>definition). A little 'glitch' in the network is also OK for many of<br>them.<br><br>Contrast that with the original proposal in this thread ("snapshot VMs<br>in old nova-network deployment, store in Swift or something, then launch<br>VM from a snapshot in new Neutron deployment") - it is completely<br>unacceptable and is not considered a migration path for these users.<br></blockquote><br>Who are these users? Can we speak with them? Would they be interested in<br>participating in the documentation and migration feature process?<br></blockquote><br>Yes, I'd really like to see some participation in the development of a<br>solution if it's an important requirement. Until then, it feels like a<br>case of an open question of "what do you want". Of course the answer is<br>"a pony”.<br></div></blockquote><div><br></div><div><div>Sorry for coming late to the conversation but its been a fairly busy week and I’m just now getting caught up.</div><div><div><br></div><div>The short answer is that if Metacloud were to migrate from nova-network to Neutron we would absolutely require as non-disruptive a process as possible. While we espouse the ideals of cloud and cattle to our clients, we can’t control how they use their clouds and we have to deal with the fact that legacy applications (aka pets) exist and run successfully today on-top of our clouds. </div><div><br></div><div>Our overall approach is guided by the following 2 principals.</div><div><br></div><div><ol class="MailOutline"><li>Minimize the network disruption to individual VMs. Ideally this is measured in seconds, but during a major version upgrade (something like a conversion from nova-network to Neutron) 5 minutes could be tolerated. </li><li>Never disrupt the running VM. If we can avoid having to restart the container in any way, we do. This is by far the most disrupt action for our clients, especially for “pets” so we avoid it.</li></ol></div><div><br></div><div>As previously mentioned in the thread, actual orchestration (aka API) outage doesn’t matter. If we have to take a 2 hour orchestration outage, while not ideal, its with-in the realm of possibility. As an example, our in-place major version upgrades are 1 hour or less of orchestration outage, 5 minutes or less of network outage, and 0 VM downtime. ts also important to note that these are not arbitrary requirements we’ve made up. This is what we see the vast of majority of clients expect and in some cases require from us. I would think that most cloud deployment running production work loads would require a similar set of restrictions.</div><div><br></div><div>I’m really sorry I couldn’t be in Oregon last week to engage in these conversations. I’m happy to discuss via this list, or IRC, or in regular IRC meetings our thoughts around this, requirements, potential assistance we could provide etc.</div></div><div><br></div><div>--<br>Chet Burgess<br>Chief Architect | Metacloud, Inc.<br>Email: <a href="mailto:cfb@metacloud.com">cfb@metacloud.com</a> | Tel: 855-638-2256, Ext. 2428</div></div></div></div></body></html>