[openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about multiple workers)
Attila Fazekas
afazekas at redhat.com
Mon Mar 9 11:18:17 UTC 2015
----- Original Message -----
> From: "Mike Bayer" <mbayer at redhat.com>
> To: "Attila Fazekas" <afazekas at redhat.com>
> Cc: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Friday, March 6, 2015 2:20:45 AM
> Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and upcoming features (was: [nova] blueprint about
> multiple workers)
>
>
>
> Attila Fazekas <afazekas at redhat.com> wrote:
>
> > I see lot of improvements,
> > but cPython is still cPython.
> >
> > When you benchmarking query related things, please try to
> > get the actual data from the returned objects
>
> that goes without saying. I’ve been benching SQLAlchemy and DBAPIs for many
> years. New performance improvements tend to be the priority for pretty much
> every major release.
>
> > and try to do
> > something with data what is not expected to be optimized out even by
> > a smarter compiler.
>
> Well I tend to favor breaking out the different elements into individual
> tests here, though I guess if you’re trying to trick a JIT then the more
> composed versions may be more relevant. For example, I could already tell
> you that the AttributeDict thing would perform terribly without having to
> mix it up with the DB access. __getattr__ is a poor performer (learned that
> in SQLAlchemy 0.1 about 9 years ago).
Equivalent things also slower in perl.
>
> > Here is my play script and several numbers:
> > http://www.fpaste.org/193999/25585380/raw/
> > Is there any faster ORM way for the same op?
>
> Absolutely, as I’ve been saying for months all the way back in my wiki entry
> on forward, query for individual columns, also skip the session.rollback()
> and do a close() instead (the transaction is still rolled back, we just skip
> the bookkeeping we don’t need). You get the nice attribute access
> pattern too:
The script probably will be extended with explicit transaction management,
I agree my close / rollback usage is bad and ugly.
Also thanks for the URL usage fix.
>
> http://www.fpaste.org/194098/56040781/
>
> def query_sqla_cols(self):
> "SQLAlchemy yield(100) named tuples"
> session = self.Session()
> start = time.time()
> summary = 0
> for obj in session.query(
> Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
> summary += obj.id + obj.A + obj.B + obj.C
> session.rollback()
> end = time.time()
> return [end-start, summary]
>
> def query_sqla_cols_a3(self):
> "SQLAlchemy yield(100) named tuples 3*access"
> session = self.Session()
> start = time.time()
> summary = 0
> for obj in session.query(
> Ints.id, Ints.A, Ints.B, Ints.C).yield_per(100):
> summary += obj.id + obj.A + obj.B + obj.C
> summary += obj.id + obj.A + obj.B + obj.C
> summary += obj.id + obj.A + obj.B + obj.C
> session.rollback()
> end = time.time()
> return [end-start, summary/3]
>
>
> Here’s that:
>
> 0 SQLAlchemy yield(100) named tuples: time: 0.635045 (data [183560260000L])
> 1 SQLAlchemy yield(100) named tuples: time: 0.630911 (data [183560260000L])
> 2 SQLAlchemy yield(100) named tuples: time: 0.641687 (data [183560260000L])
> 0 SQLAlchemy yield(100) named tuples 3*access: time: 0.807285 (data
> [183560260000L])
> 1 SQLAlchemy yield(100) named tuples 3*access: time: 0.814160 (data
> [183560260000L])
> 2 SQLAlchemy yield(100) named tuples 3*access: time: 0.829011 (data
> [183560260000L])
>
> compared to the fastest Core test:
>
> 0 SQlAlchemy core simple: time: 0.707205 (data [183560260000L])
> 1 SQlAlchemy core simple: time: 0.702223 (data [183560260000L])
> 2 SQlAlchemy core simple: time: 0.708816 (data [183560260000L])
>
>
> This is using 1.0’s named tuple which is faster than the one in 0.9. As I
> discussed in the migration notes I linked, over here
> http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html#new-keyedtuple-implementation-dramatically-faster
> is where I discuss how I came up with that named tuple approach.
>
> In 0.9, the tuples are much slower (but still faster than straight entities):
>
> 0 SQLAlchemy yield(100) named tuples: time: 1.083882 (data [183560260000L])
> 1 SQLAlchemy yield(100) named tuples: time: 1.097783 (data [183560260000L])
> 2 SQLAlchemy yield(100) named tuples: time: 1.113621 (data [183560260000L])
> 0 SQLAlchemy yield(100) named tuples 3*access: time: 1.204280 (data
> [183560260000L])
> 1 SQLAlchemy yield(100) named tuples 3*access: time: 1.245768 (data
> [183560260000L])
> 2 SQLAlchemy yield(100) named tuples 3*access: time: 1.258327 (data
> [183560260000L])
>
> Also note that the difference in full object fetches for 0.9 vs. 1.0 are
> quite different:
>
> 0.9.8:
>
> 0 SQLAlchemy yield(100): time: 2.802273 (data [183560260000L])
> 1 SQLAlchemy yield(100): time: 2.778059 (data [183560260000L])
> 2 SQLAlchemy yield(100): time: 2.841441 (data [183560260000L])
>
> 1.0:
>
> 0 SQLAlchemy yield(100): time: 2.019153 (data [183560260000L])
> 1 SQLAlchemy yield(100): time: 2.052810 (data [183560260000L])
> 2 SQLAlchemy yield(100): time: 2.000401 (data [183560260000L])
>
>
> The tests you have here are favoring row fetches over individual query time.
> The original speed complaints were talking about lots of queries, not as
> much lots of rows. The baked strategy is appropriate for the lots of queries
> use case. Feel free to check out the performance examples I linked which
> break all these out.
>
The lot of query is an Openstack illness and the SQLAlchemy itself is not the reason
behind them.
Usually most the zillion query offender is not really
required after why-why analysis.
1. application redesign -> Think more like in event based design,
instead of just timed (polling).
2. group the individual query/rpc call to multi item request/response
The first slowness what users notice (usually not on a dev laptop),
is how slow for example the instance listing.
- Why I can't list 4k / 8k / 10k instance in 2 sec ?
Nova list has multiple issues,
the implementation I checked half year before forced SQLAlchemy,
to do as many work which was above the time limit alone,
even after I deleted some expensive part (for ex.: zillions of extra query).
`named/keyed tuple way` expected to move the time below the threshold,
and make the C helper less muffin chaos in nova to definitely
look like the primary enemy.
`named/keyed tuple way` is expected to be good, because actually
much more data processed than the actual `nova list` command shows.
In this case the client also should indicate what he really wants to see.
(Probably there is a blueprint for fields query arg usage already.)
Is the keyed tuple usage can be made transparent in cheap way when
someone already marked the session as read only ? [1]
Do we need to implement a different db api call for this ?
>
>
> > Looks like still worth to convert the results to dict,
> > when you access the data multiple times.
> >
> > dict is also the typical input type for the json serializers.
> >
> > The plain dict is good enough if you do not want to mange
> > which part is changed, especially when you are not planning to `save` it.
> >
I have question regarding to the backed queries:
Is these same or different query for the baker ?
SELECT a FROM B WHERE c IN (?)
SELECT a FROM B WHERE c IN (?, ?, ?)
So if the set size varies from 1 to 100k does it bakes 100k query,
or only one ?
Does it considers the query plan cache size on the RDBMS side ?
Several RDBMS uses expensive planning, and all of the above query
creates a new plan in the cache.
The usual application strategy to mitigate the cache uses and cache misses,
is sending multiple fixed size set query, power of 2 or 5, in some cases with NULL argument.
[1] https://github.com/openstack/oslo-specs/blob/master/specs/kilo/make-enginefacade-a-facade.rst
> > ----- Original Message -----
> >> From: "Mike Bayer" <mbayer at redhat.com>
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> <openstack-dev at lists.openstack.org>
> >> Sent: Wednesday, March 4, 2015 11:30:49 PM
> >> Subject: Re: [openstack-dev] [all] SQLAlchemy performance suite and
> >> upcoming features (was: [nova] blueprint about
> >> multiple workers)
> >>
> >>
> >>
> >> Mike Bayer <mbayer at redhat.com> wrote:
> >>
> >>> Attila Fazekas <afazekas at redhat.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I wonder what is the planned future of the scheduling.
> >>>>
> >>>> The scheduler does a lot of high field number query,
> >>>> which is CPU expensive when you are using sqlalchemy-orm.
> >>>> Does anyone tried to switch those operations to sqlalchemy-core ?
> >>>
> >>> An upcoming feature in SQLAlchemy 1.0 will remove the vast majority of
> >>> CPU
> >>> overhead from the query side of SQLAlchemy ORM by caching all the work
> >>> done
> >>
> >> Just to keep the Openstack community of what’s upcoming, here’s some more
> >> detail
> >> on some of the new SQLAlchemy performance features, which are based on the
> >> goals I first set up last summer at
> >> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy.
> >>
> >> As 1.0 features a lot of new styles of doing things that are primarily in
> >> the name of performance, in order to help categorize and document these
> >> techniques, 1.0 includes a performance suite in examples/ which features a
> >> comprehensive collection of common database idioms run under timing and
> >> function-count profiling. These idioms are broken into major categories
> >> like
> >> “short selects”, “large resultsets”, “bulk inserts”, and serve not only as
> >> a
> >> way to compare the relative performance of different techniques, but also
> >> as
> >> a way to provide example code categorized into use cases that illustrate
> >> the
> >> variety of ways to achieve that case, including the tradeoffs for each,
> >> across Core and ORM. So in this case, we can see what the “baked” query
> >> looks like in the “short_selects” suite, which times how long it takes to
> >> perform 10000 queries, each of which return one object or row:
> >>
> >> https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/short_selects.py?at=ticket_3054#cl-73
> >>
> >> The results of this suite look like the following:
> >>
> >> test_orm_query : test a straight ORM query of the full entity. (10000
> >> iterations); total time 7.363434 sec
> >> test_orm_query_cols_only : test an ORM query of only the entity columns.
> >> (10000 iterations); total time 6.509266 sec
> >> test_baked_query : test a baked query of the full entity. (10000
> >> iterations);
> >> total time 1.999689 sec
> >> test_baked_query_cols_only : test a baked query of only the entity
> >> columns.
> >> (10000 iterations); total time 1.990916 sec
> >> test_core_new_stmt_each_time : test core, creating a new statement each
> >> time.
> >> (10000 iterations); total time 3.842871 sec
> >> test_core_reuse_stmt : test core, reusing the same statement (but
> >> recompiling
> >> each time). (10000 iterations); total time 2.806590 sec
> >> test_core_reuse_stmt_compiled_cache : test core, reusing the same
> >> statement +
> >> compiled cache. (10000 iterations); total time 0.659902 sec
> >>
> >> Where above, “test_orm” and “test_baked” are both using the ORM API
> >> exclusively. We can see that the “baked” approach, returning column tuples
> >> is almost twice as fast as a naive Core approach, that is, one which
> >> constructs select() objects each time and does not attempt to use any
> >> compilation caching.
> >>
> >> For the use case of fetching large numbers of rows, we can look at the
> >> large_resultsets suite
> >> (https://bitbucket.org/zzzeek/sqlalchemy/src/cc58a605d6cded0594f7db1caa840b3c00b78e5a/examples/performance/large_resultsets.py?at=ticket_3054).
> >> This suite illustrates a single query which fetches 500K rows. The “Baked”
> >> approach isn’t relevant here as we are only emitting a query once, however
> >> the approach we use to fetch rows is significant. Here we can see that
> >> ORM-based “tuple” approaches are very close in speed to the fetching of
> >> rows
> >> using Core directly. We also have a comparison of Core against raw DBAPI
> >> access, where we see very little speed improvement; an example where we
> >> create a very simple object for each DBAPI row fetched is also present to
> >> illustrate how quickly even the most minimal Python function overhead adds
> >> up when we do something 500K times.
> >>
> >> test_orm_full_objects_list : Load fully tracked ORM objects into one big
> >> list(). (500000 iterations); total time 11.055097 sec
> >> test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a
> >> time using yield_per(). (500000 iterations); total time 7.323350 sec
> >> test_orm_bundles : Load lightweight "bundle" objects using the ORM.
> >> (500000
> >> iterations); total time 2.128237 sec
> >> test_orm_columns : Load individual columns into named tuples using the
> >> ORM.
> >> (500000 iterations); total time 1.585236 sec
> >> test_core_fetchall : Load Core result rows using fetchall. (500000
> >> iterations); total time 1.187013 sec
> >> test_core_fetchmany_w_streaming : Load Core result rows using
> >> fetchmany/streaming. (500000 iterations); total time 0.945906 sec
> >> test_core_fetchmany : Load Core result rows using Core / fetchmany.
> >> (500000
> >> iterations); total time 0.959626 sec
> >> test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI
> >> fetchall(),
> >> generate an object for each row. (500000 iterations); total time 1.168365
> >> sec
> >> test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't
> >> make
> >> any objects. (500000 iterations); total time 0.835586 sec
> >>
> >> An ongoing document of new SQLAlchemy 1.0 features, performance
> >> enhancements, and behavior-changing fixes is at
> >> http://docs.sqlalchemy.org/en/latest/changelog/migration_10.html.
> >>
> >>
> >>
> >>
> >>> up until the SQL is emitted, including all the function overhead of
> >>> building
> >>> up the Query object, producing a core select() object internally from the
> >>> Query, working out a large part of the object fetch strategies, and
> >>> finally
> >>> the string compilation of the select() into a string as well as
> >>> organizing
> >>> the typing information for result columns. With a query that is
> >>> constructed
> >>> using the “Baked” feature, all of these steps are cached in memory and
> >>> held
> >>> persistently; the same query can then be re-used at which point all of
> >>> these
> >>> steps are skipped. The system produces the cache key based on the
> >>> in-place
> >>> construction of the Query using lambdas so no major changes to code
> >>> structure are needed; just the way the Query modifications are performed
> >>> needs to be preceded with “lambda q:”, essentially.
> >>>
> >>> With this approach, the traditional session.query(Model) approach can go
> >>> from start to SQL being emitted with an order of magnitude less function
> >>> calls. On the fetch side, fetching individual columns instead of full
> >>> entities has always been an option with ORM and is about the same speed
> >>> as
> >>> a
> >>> Core fetch of rows. So using ORM with minimal changes to existing ORM
> >>> code
> >>> you can get performance even better than you’d get using Core directly,
> >>> since caching of the string compilation is also added.
> >>>
> >>> On the persist side, the new bulk insert / update features provide a
> >>> bridge
> >>> from ORM-mapped objects to bulk inserts/updates without any unit of work
> >>> sorting going on. ORM mapped objects are still more expensive to use in
> >>> that
> >>> instantiation and state change is still more expensive, but bulk
> >>> insert/update accepts dictionaries as well, which again is competitive
> >>> with
> >>> a straight Core insert.
> >>>
> >>> Both of these features are completed in the master branch, the “baked
> >>> query”
> >>> feature just needs documentation, and I’m basically two or three tickets
> >>> away from beta releases of 1.0. The “Baked” feature itself lives as an
> >>> extension and if we really wanted, I could backport it into oslo.db as
> >>> well
> >>> so that it works against 0.9.
> >>>
> >>> So I’d ask that folks please hold off on any kind of migration from ORM
> >>> to
> >>> Core for performance reasons. I’ve spent the past several months adding
> >>> features directly to SQLAlchemy that allow an ORM-based app to have
> >>> routes
> >>> to operations that perform just as fast as that of Core without a rewrite
> >>> of
> >>> code.
> >>>
> >>>> The scheduler does lot of thing in the application, like filtering
> >>>> what can be done on the DB level more efficiently. Why it is not done
> >>>> on the DB side ?
> >>>>
> >>>> There are use cases when the scheduler would need to know even more
> >>>> data,
> >>>> Is there a plan for keeping `everything` in all schedulers process
> >>>> memory
> >>>> up-to-date ?
> >>>> (Maybe zookeeper)
> >>>>
> >>>> The opposite way would be to move most operation into the DB side,
> >>>> since the DB already knows everything.
> >>>> (stored procedures ?)
> >>>>
> >>>> Best Regards,
> >>>> Attila
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Rui Chen" <chenrui.momo at gmail.com>
> >>>>> To: "OpenStack Development Mailing List (not for usage questions)"
> >>>>> <openstack-dev at lists.openstack.org>
> >>>>> Sent: Wednesday, March 4, 2015 4:51:07 AM
> >>>>> Subject: [openstack-dev] [nova] blueprint about multiple workers
> >>>>> supported in nova-scheduler
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I want to make it easy to launch a bunch of scheduler processes on a
> >>>>> host,
> >>>>> multiple scheduler workers will make use of multiple processors of host
> >>>>> and
> >>>>> enhance the performance of nova-scheduler.
> >>>>>
> >>>>> I had registered a blueprint and commit a patch to implement it.
> >>>>> https://blueprints.launchpad.net/nova/+spec/scheduler-multiple-workers-support
> >>>>>
> >>>>> This patch had applied in our performance environment and pass some
> >>>>> test
> >>>>> cases, like: concurrent booting multiple instances, currently we didn't
> >>>>> find
> >>>>> inconsistent issue.
> >>>>>
> >>>>> IMO, nova-scheduler should been scaled horizontally on easily way, the
> >>>>> multiple workers should been supported as an out of box feature.
> >>>>>
> >>>>> Please feel free to discuss this feature, thanks.
> >>>>>
> >>>>> Best Regards
> >>>>>
> >>>>>
> >>>>> __________________________________________________________________________
> >>>>> OpenStack Development Mailing List (not for usage questions)
> >>>>> Unsubscribe:
> >>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>> __________________________________________________________________________
> >>>> OpenStack Development Mailing List (not for usage questions)
> >>>> Unsubscribe:
> >>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>> __________________________________________________________________________
> >>> OpenStack Development Mailing List (not for usage questions)
> >>> Unsubscribe:
> >>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list