<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 28 January 2015 at 17:32, Robert Collins <span dir="ltr"><<a href="mailto:robertc@robertcollins.net" target="_blank">robertc@robertcollins.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">E.g. its a call (not cast) out to Neutron, and Neutron returns when<br>
the VIF(s) are ready to use, at which point Nova brings the VM up. If<br>
the call times out, we error.<br></blockquote><div><br></div><div>I don't think this model really works with distributed systems, and it really doesn't work when you have a limited number of threads to play with - because they get consumed by anything that has to wait a long time for a thing to happen, and eventually you can't service requests any more. Also, it's entirely opposite to what Nova does. Does it return when the VM is running? No, it returns when the VM is requested, saying 'I note your request and will act on it in my own time'.<br><br>What does Neutron have to do to complete a call? That's entirely dependent on the driver, but it could be talking to one, ten or a thousand devices, any of which might be slow to respond: there is no upper bound on how long it takes to bind a port, for instance. So any REST call to Neutron should change its DB and return, and leave an asynchronous process to deal with making the network state change. Neutron should notify Nova when it has changed, and Nova can go on with its life doing other things till the notification comes in.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Right now we have this mix of synchronous and async code, and its<br>
causing us to overlook things and have bugs. I'd be equally happy if<br>
we went all in with an async event driven approach, but we should<br>
decide if we're fish or fowl, not pick bits of both and hope reviewers<br>
can remember every little detail.<br></blockquote><div><br></div><div>On this much we agree, I just happen to like fowl.<br></div><div> <br>> > One other problem, not yet raised, is that Nova doesn't express its needs<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> when it asks for a port to be bound [...]</span></blockquote><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
</span>+1, OTOH I don't think this is a structural problem - it doesn't<br>
matter what protocol or calling style we use, this is just the<br>
parameters in the call :).<br></blockquote><div><br></div><div>Agreed. We just need to make it a proper negotiation, and that's it done. No-one seems to have a problem with this, so I'll have a play with the idea (out of tree for now, given the time of the cycle).<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I think your desire and Salvatore's are compatible: an interface that<br>
is excellent for Nova can also be excellent for other users.<br></blockquote><div><br></div><div>Agreed. But if there's one interface for everything it doesn't really need to be a plugin. The question is whether one interface is enough.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Notifications aren't a complete solution to the orphaning issue unless<br>
the notification system is guaranteed non-lossy. Something like Kafka<br>
would be an excellent substrate for such a system, or we could look at<br>
per-service journalling (on either side of the integration point).<br></blockquote><div><br></div><div>I prefer lossy notification systems. RabbitMQ is non-lossy, and that means it will sit on messages for days and then deliver them long past the point at which they're useful, plus its queue depth is unbounded. It's not a great way to run an eventually consistent system, in my opinion.<br><br>The pattern I like is where you are notified (via an unreliable channel) when your operation has completed, but you must also have a background checking task that goes to see if the notification has gone missing by checking the datamodel. The task doesn't have to trigger very often, and in fact you could hold it off indefinitely with a heartbeat providing the communications channel remains functioning - but it does have to exist. You don't have the problem of having to provide an infinite queue, and it's not a crisis when your messaging system loses a message.<br>-- <br></div><div>Ian.<br></div><br></div></div></div>