[openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

Michał Dulko michal.dulko at intel.com
Tue May 24 15:04:36 UTC 2016



On 05/24/2016 04:38 PM, Gorka Eguileor wrote:
> On 23/05, Ivan Kolodyazhny wrote:
>> Hi developers and operators,
>> I would like to get any feedback from you about my idea before I'll start
>> work on spec.
>>
>> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
>> of instance builds to run concurrently' per each compute. There is no
>> equivalent Cinder.
> Hi,
>
> First I want to say that I think this is a good idea because I know this
> message will get diluted once I start with my mumbling.  ;-)
>
> The first thing we should allow to control is the number of workers per
> service, since we currently only allow setting it for the API nodes and
> all other nodes will use a default of 1000.  I posted a patch [1] to
> allow this and it's been sitting there for the last 3 months.  :'-(
>
> As I see it not all mentioned problems are equal, and the main
> distinction is caused by Cinder being not only in the control path but
> also in the data path. Resulting in some of the issues being backend
> specific limitations, that I believe should be address differently in
> the specs.
>
> For operations where Cinder is in the control path we should be
> limiting/queuing operations in the cinder core code (for example the
> manager) whereas when the limitation only applies to some drivers this
> should be addressed by the drivers themselves.  Although the spec should
> provide a clear mechanism/pattern to solve it in the drivers as well so
> all drivers can use a similar pattern which will provide consistency,
> making it easier to review and maintain.
>
> The queuing should preserve the order of arrival of operations, which
> file locks from Oslo concurrency and Tooz don't do.

I would be seriously opposed to queuing done inside Cinder code. It
makes draining a service harder and increases impact of a failure of a
single service. We already have a queue system and it is whatever you're
running under oslo.messaging (RabbitMQ mostly). Making our RPC workers
number configurable for each service sounds like a best shot to me.

>> Why do we need it for Cinder? IMO, it could help us to address following
>> issues:
>>
>>    - Creation of N volumes at the same time increases a lot of resource
>>    usage by cinder-volume service. Image caching feature [2] could help us a
>>    bit in case when we create volume form image. But we still have to upload N
>>    images to the volumes backend at the same time.
> This is an example where we are in the data path.
>
>>    - Deletion on N volumes at parallel. Usually, it's not very hard task
>>    for Cinder, but if you have to delete 100+ volumes at once, you can fit
>>    different issues with DB connections, CPU and memory usages. In case of
>>    LVM, it also could use 'dd' command to cleanup volumes.
> This is a case where it is a backend limitation and should be handled by
> the drivers.
>
> I know some people say that deletion and attaching have problems when a
> lot of them are requested to the c-vol nodes and that cinder cannot
> handle the workload properly, but in my experience these cases are
> always due to suboptimal cinder configuration, like a low number of DB
> connections configured in cinder that make operations fight for a DB
> connection creating big delays to complete operations.
>
>>    - It will be some kind of load balancing in HA mode: if cinder-volume
>>    process is busy with current operations, it will not catch message from
>>    RabbitMQ and other cinder-volume service will do it.
> I don't understand what you mean with this.  Do you mean that Cinder
> service will stop listening to the message queue when it reaches a
> certain workload on the "heavy" operations?  Then wouldn't it also stop
> processing "light" operations?
>
>>    - From users perspective, it seems that better way is to create/delete N
>>    volumes a bit slower than fail after X volumes were created/deleted.
> I agree, it's better not to fail.  :-)
>
> Cheers,
> Gorka.
>
>>
>> [1]
>> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
>> [2]
>> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
>>
>> Regards,
>> Ivan Kolodyazhny,
>> http://blog.e0ne.info/
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list