[openstack-dev] [nova] blueprint about multiple workers supported in nova-scheduler

Mike Bayer mbayer at redhat.com
Wed Mar 4 21:38:27 UTC 2015



Attila Fazekas <afazekas at redhat.com> wrote:

> Hi,
> 
> I wonder what is the planned future of the scheduling.
> 
> The scheduler does a lot of high field number query,
> which is CPU expensive when you are using sqlalchemy-orm.
> Does anyone tried to switch those operations to sqlalchemy-core ?

An upcoming feature in SQLAlchemy 1.0 will remove the vast majority of CPU
overhead from the query side of SQLAlchemy ORM by caching all the work done
up until the SQL is emitted, including all the function overhead of building
up the Query object, producing a core select() object internally from the
Query, working out a large part of the object fetch strategies, and finally
the string compilation of the select() into a string as well as organizing
the typing information for result columns. With a query that is constructed
using the “Baked” feature, all of these steps are cached in memory and held
persistently; the same query can then be re-used at which point all of these
steps are skipped. The system produces the cache key based on the in-place
construction of the Query using lambdas so no major changes to code
structure are needed; just the way the Query modifications are performed
needs to be preceded with “lambda q:”, essentially.

With this approach, the traditional session.query(Model) approach can go
from start to SQL being emitted with an order of magnitude less function
calls. On the fetch side, fetching individual columns instead of full
entities has always been an option with ORM and is about the same speed as a
Core fetch of rows. So using ORM with minimal changes to existing ORM code
you can get performance even better than you’d get using Core directly,
since caching of the string compilation is also added.

On the persist side, the new bulk insert / update features provide a bridge
from ORM-mapped objects to bulk inserts/updates without any unit of work
sorting going on. ORM mapped objects are still more expensive to use in that
instantiation and state change is still more expensive, but bulk
insert/update accepts dictionaries as well, which again is competitive with
a straight Core insert.

Both of these features are completed in the master branch, the “baked query”
feature just needs documentation, and I’m basically two or three tickets
away from beta releases of 1.0. The “Baked” feature itself lives as an
extension and if we really wanted, I could backport it into oslo.db as well
so that it works against 0.9.

So I’d ask that folks please hold off on any kind of migration from ORM to
Core for performance reasons. I’ve spent the past several months adding
features directly to SQLAlchemy that allow an ORM-based app to have routes
to operations that perform just as fast as that of Core without a rewrite of
code.

> The scheduler does lot of thing in the application, like filtering 
> what can be done on the DB level more efficiently. Why it is not done
> on the DB side ? 
> 
> There are use cases when the scheduler would need to know even more data,
> Is there a plan for keeping `everything` in all schedulers process memory up-to-date ?
> (Maybe zookeeper)
> 
> The opposite way would be to move most operation into the DB side,
> since the DB already knows everything. 
> (stored procedures ?)
> 
> Best Regards,
> Attila
> 
> 
> ----- Original Message -----
>> From: "Rui Chen" <chenrui.momo at gmail.com>
>> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
>> Sent: Wednesday, March 4, 2015 4:51:07 AM
>> Subject: [openstack-dev] [nova] blueprint about multiple workers supported	in nova-scheduler
>> 
>> Hi all,
>> 
>> I want to make it easy to launch a bunch of scheduler processes on a host,
>> multiple scheduler workers will make use of multiple processors of host and
>> enhance the performance of nova-scheduler.
>> 
>> I had registered a blueprint and commit a patch to implement it.
>> https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
>> 
>> This patch had applied in our performance environment and pass some test
>> cases, like: concurrent booting multiple instances, currently we didn't find
>> inconsistent issue.
>> 
>> IMO, nova-scheduler should been scaled horizontally on easily way, the
>> multiple workers should been supported as an out of box feature.
>> 
>> Please feel free to discuss this feature, thanks.
>> 
>> Best Regards
>> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list