Open Stack

Wed Nov 27 18:20:09 UTC 2013

On Wed, Nov 27, 2013 at 06:10:47PM +0000, Edward Hope-Morley wrote:
> On 27/11/13 17:43, Daniel P. Berrange wrote:
> > On Wed, Nov 27, 2013 at 05:39:30PM +0000, Edward Hope-Morley wrote:
> >> On 27/11/13 15:49, Daniel P. Berrange wrote:
> >>> On Wed, Nov 27, 2013 at 02:45:22PM +0000, Edward Hope-Morley wrote:
> >>>> Moving this to the ml as requested, would appreciate
> >>>> comments/thoughts/feedback.
> >>>>
> >>>> So, I recently proposed a small patch to the oslo rpc code (initially in
> >>>> oslo-incubator then moved to oslo.messaging) which extends the existing
> >>>> support for limiting the rpc thread pool so that concurrent requests can
> >>>> be limited based on type/method. The blueprint and patch are here:
> >>>>
> >>>> https://blueprints.launchpad.net/oslo.messaging/+spec/rpc-concurrency-control
> >>>>
> >>>> The basic idea is that if you have server with limited resources you may
> >>>> want restrict operations that would impact those resources e.g. live
> >>>> migrations on a specific hypervisor or volume formatting on particular
> >>>> volume node. This patch allows you, admittedly in a very crude way, to
> >>>> apply a fixed limit to a set of rpc methods. I would like to know
> >>>> whether or not people think this is sort of thing would be useful or
> >>>> whether it alludes to a more fundamental issue that should be dealt with
> >>>> in a different manner.
> >>> Based on this description of the problem I have some observations
> >>>
> >>>  - I/O load from the guest OS itself is just as important to consider
> >>>    as I/O load from management operations Nova does for a guest. Both
> >>>    have the capability to impose denial-of-service on a host. IIUC, the
> >>>    flavour specs have the ability to express resource constraints for
> >>>    the virtual machines to prevent a guest OS initiated DOS-attack
> >>>
> >>>  - I/O load from live migration is attributable to the running
> >>>    virtual machine. As such I'd expect that any resource controls
> >>>    associated with the guest (from the flavour specs) should be
> >>>    applied to control the load from live migration.
> >>>
> >>>    Unfortunately life isn't quite this simple with KVM/libvirt
> >>>    currently. For networking we've associated each virtual TAP
> >>>    device with traffic shaping filters. For migration you have
> >>>    to set a bandwidth cap explicitly via the API. For network
> >>>    based storage backends, you don't directly control network
> >>>    usage, but instead I/O operations/bytes. Ultimately though
> >>>    there should be a way to enforce limits on anything KVM does,
> >>>    similarly I expect other hypervisors can do the same
> >>>
> >>>  - I/O load from operations that Nova does on behalf of a guest
> >>>    that may be running, or may yet to be launched. These are not
> >>>    directly known to the hypervisor, so existing resource limits
> >>>    won't apply. Nova however should have some capability for
> >>>    applying resource limits to I/O intensive things it does and
> >>>    somehow associate them with the flavour limits  or some global
> >>>    per user cap perhaps.
> >>>
> >>>> Thoughts?
> >>> Overall I think that trying to apply caps on the number of API calls
> >>> that can be made is not really a credible way to avoid users inflicting
> >>> DOS attack on the host OS. Not least because it does nothing to control
> >>> what a guest OS itself may do. If you do caps based on num of APIs calls
> >>> in a time period, you end up having to do an extremely pessistic
> >>> calculation - basically have to consider the worst case for any single
> >>> API call, even if most don't hit the worst case. This is going to hurt
> >>> scalability of the system as a whole IMHO.
> >>>
> >>> Regards,
> >>> Daniel
> >> Daniel, thanks for this, these are all valid points and essentially tie
> >> with the fundamental issue of dealing with DOS attacks but for this bp I
> >> actually want to stay away from this area i.e. this is not intended to
> >> solve any tenant-based attack issues in the rpc layer (although that
> >> definitely warrants a discussion e.g. how do we stop a single tenant
> >> from consuming the entire thread pool with requests) but rather I'm
> >> thinking more from a QOS perspective i.e. to allow an admin to account
> >> for a resource bias e.g. slow RAID controller, on a given node (not
> >> necessarily Nova/HV) which could be alleviated with this sort of crude
> >> rate limiting. Of course one problem with this approach is that
> >> blocked/limited requests still reside in the same pool as other requests
> >> so if we did want to use this it may be worth considering offloading
> >> blocked requests or giving them their own pool altogether.
> >>
> >> ...or maybe this is just pie in the sky after all.
> > I don't think it is valid to ignore tenant-based attacks in this. You
> > have a single resource here and it can be consumed by the tenant
> > OS, by the VM associated with the tenant or by Nova itself. As such,
> > IMHO adding rate limiting to Nova APIs alone is a non-solution because
> > you've still left it wide open to starvation by any number of other
> > routes which are arguably even more critical to address than the API
> > calls.
> >
> > Daniel
> Daniel, maybe I have misunderstood you here but with this optional
> extension I am (a) not intending to solve DOS issues and (b) not
> "ignoring" DOS issues since I do not expect to be adding any beyond or
> accentuating those that already exist. The issue here is QOS not DOS.

I consider QOS & DOS to be two sides of the same coin here. A denial of
service is anything which affects the quality of service of the host.
It doesn't have to be done with malicious intent either. I don't think
your proposal provides significant QOS benefits except under some very
narrowly constrained scenario, of which I'm yet to be convinced is
very applicable to the bigger picture / real world deployment scneario.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Open Stack

[openstack-dev] [oslo] rpc concurrency control rfc

OpenStack

Community

Documentation

Branding & Legal