[Openstack-operators] [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
chris.friesen at windriver.com
Tue May 24 15:20:00 UTC 2016
On 05/23/2016 08:46 PM, John Griffith wrote:
> On Mon, May 23, 2016 at 8:32 AM, Ivan Kolodyazhny <e0ne at e0ne.info
> <mailto:e0ne at e0ne.info>> wrote:
> Hi developers and operators,
> I would like to get any feedback from you about my idea before I'll start
> work on spec.
> In Nova, we've got max_concurrent_builds option  to set 'Maximum number
> of instance builds to run concurrently' per each compute. There is no
> equivalent Cinder.
> Why do we need it for Cinder? IMO, it could help us to address following issues:
> * Creation of N volumes at the same time increases a lot of resource usage
> by cinder-volume service. Image caching feature  could help us a bit
> in case when we create volume form image. But we still have to upload N
> images to the volumes backend at the same time.
> * Deletion on N volumes at parallel. Usually, it's not very hard task for
> Cinder, but if you have to delete 100+ volumes at once, you can fit
> different issues with DB connections, CPU and memory usages. In case of
> LVM, it also could use 'dd' command to cleanup volumes.
> * It will be some kind of load balancing in HA mode: if cinder-volume
> process is busy with current operations, it will not catch message from
> RabbitMQ and other cinder-volume service will do it.
> * From users perspective, it seems that better way is to create/delete N
> volumes a bit slower than fail after X volumes were created/deleted.
> Ivan Kolodyazhny,
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> Just curious about a couple things: Is this attempting to solve a problem in
> the actual Cinder Volume Service or is this trying to solve problems with
> backends that can't keep up and deliver resources under heavy load? I get the
> copy-image to volume, that's a special case that certainly does impact Cinder
> services and the Cinder node itself, but there's already throttling going on
> there, at least in terms of IO allowed.
> Also, I'm curious... would the exiting API Rate Limit configuration achieve the
> same sort of thing you want to do here? Granted it's not selective but maybe
> it's worth mentioning.
> If we did do something like this I would like to see it implemented as a driver
> config; but that wouldn't help if the problem lies in the Rabbit or RPC space.
> That brings me back to wondering about exactly where we want to solve problems
> and exactly which. If delete is causing problems like you describe I'd suspect
> we have an issue in our DB code (too many calls to start with) and that we've
> got some overhead elsewhere that should be eradicated. Delete is a super simple
> operation on the Cinder side of things (and most back ends) so I'm a bit freaked
> out thinking that it's taxing resources heavily.
For what it's worth, with the LVM backend under heavy load we've run into cases
where cinder-volume ends up being blocked by disk I/O for over a minute.
Now this was pretty much a worst-case, with cinder volumes on a single spinning
disk. But the fact that IO cgroups don't work with LVM (this is a linux kernel
limitation) means that it's difficult to ensure that the cinder process doesn't
block indefinitely on disk IO.
More information about the OpenStack-operators