[openstack-dev] Moving task flow to conductor - concern about scale

Day, Phil philip.day at hp.com
Fri Jul 19 10:53:36 UTC 2013


> -----Original Message-----
> From: Dan Smith [mailto:dms at danplanet.com]
> Sent: 16 July 2013 14:51
> To: OpenStack Development Mailing List
> Cc: Day, Phil
> Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
> scale
> 
> > In the original context of using Conductor as a database proxy then
> > the number of conductor instances is directly related to the number of
> > compute hosts I need them to serve.
> 
> Just a point of note, as far as I know, the plan has always been to establish
> conductor as a thing that sits between the api and compute nodes. However,
> we started with the immediate need, which was the offloading of database
> traffic.
>

Like I said, I see the need for both a layer between the API and compute and between compute and DB, - I just don't see them as having to be part of the same thing.

 
> > What I not sure is that I would also want to have the same number of
> > conductor instances for task control flow - historically even running
> > 2 schedulers has been a problem, so the thought of having 10's of
> > them makes me very concerned at the moment.   However I can't see any
> > way to specialise a conductor to only handle one type of request.
> 
> Yeah, I don't think the way it's currently being done allows for specialization.
> 
> Since you were reviewing actual task code, can you offer any specifics about
> the thing(s) that concern you? I think that scaling conductor (and its tasks)
> horizontally is an important point we need to achieve, so if you see something
> that needs tweaking, please point it out.
> 
> Based on what is there now and proposed soon, I think it's mostly fairly safe,
> straightforward, and really no different than what two computes do when
> working together for something like resize or migrate.
>

There's nothing I've seen so far that causes me alarm,  but then again we're in the very early stages and haven't moved anything really complex.
However I think there's an inherent big difference in scaling something which is stateless like a DB proxy and scaling a statefull entity like a task workflow component.  I'd also suggest that so far there is no real experience with that latter within the current code base; compute nodes (which are the main scaled-out component so far) work on well defined subsets of the data.


> > So I guess my question is, given that it may have to address two
> > independent scale drivers, is putting task work flow and DB proxy
> > functionality into the same service really the right thing to do - or
> > should there be some separation between them.
> 
> I think that we're going to need more than one "task" node, and so it seems
> appropriate to locate one scales-with-computes function with another.
> 

I just don't buy into this line of thinking - I need more than one API node for HA as well - but that doesn't mean that therefore I want to put anything else that needs more than one node in there.

I don't even think these do scale-with-compute in the same way;  DB proxy scales with the number of compute hosts because each new host introduces an amount of DB load though its periodic tasks.    Task work flow scales with the number of requests coming into  the system to create / modify servers - and that's not directly related to the number of hosts.     

So rather than asking "what doesn't work / might not work in the future" I think the question should be "aside from them both being things that could be described as a conductor - what's the architectural reason for wanting to have these two separate groups of functionality in the same service ?"

If it's really just because the concept of "conductor" got used for a DB proxy layer before the task workflow, then we should either think if a new name for the latter or rename the former.

If they were separate services and it turns out that I can/want/need to run the same number of both then I can pretty easily do that  - but the current approach is removing what to be seems a very important degree of freedom around deployment on a large scale system.

Cheers,
Phil




More information about the OpenStack-dev mailing list