[openstack-dev] [Ceilometer] time consuming of listing resource
jaypipes at gmail.com
Sat Dec 28 16:16:47 UTC 2013
On 12/28/2013 05:51 AM, 刘胜 wrote:
> Hi all：
> I have reported a bug about time consuming of “resource-list” in
> ceilometer CLI：
> In order to Identify the causes of this phenomenon, I have pdb the codes
> in my invironment(configured mysql as db driver):
> the most import part of process of listing resource is implemented in
> following codes:
> code of get_resources() in /ceilometer/storage/impl_sqlalchemy.py：
> for meter, first_ts, last_ts in query.all():
> yield api_models.Resource(
> for m in meter.resource.meters
> The method generate iterator of object of api_models.Resource for
> ceilometer API to show.
> 1.The operation “query.all()” will query the DB table “meter” with the
> expression generated forward,in my invironment the DB table “meter” have
> more than 300000 items, so this operation may consume about 30 seconds;
> 2.The operation"for m in meter.resource.meters" will circulate the
> meters of this resource . a resource of server may have more than 100000
> meter iterms in my invironment. So the time of whole process is too
> long. I think the meter of Resource object can be reduced and I have
> tested this modification, it is OK for listing resource,and reduce the
> most time consumption
> I have noticed that there are many methods of db operation may time
> ps: I have configured the ceilometer pulling interval from 600s to 60s
> in /etc/ceilometer/pipeline.yaml, but the invironment has just run 10 days!
> I'm a beginner of ceilometer,and want to fix this bug,but I haven't
> found a suitable way
> may be someone can help me with this?
Yep. The performance of the SQL driver in Ceilometer out-of-the-box with
that particular line is unusable in our experience. We have our Chef
cookbook literally patch Ceilometer's source code and comment out that
particular line because it makes performance of Ceilometer unusable.
I hate to say it, but the SQL driver in Ceilometer really needs an
overhaul, both at the schema level and the code level:
On the schema level:
* The indexes, especially on sourceassoc, are wrong:
** The order of the columns in the multi-column indexes like idx_sr,
idx_sm, idx_su, idx_sp is incorrect. Columns used in predicates should
*precede* columns (like source_id) that are used in joins. The way the
indexes are structured now makes them unusable by the query optimizer
for 99% of queries on the sourceassoc table, which means any queries on
sourceassoc trigger a full table scan of the hundreds of millions of
records in the table. Things are made worse by the fact that INSERT
operations are slowed for each index on a table, and the fact that none
of these indexes are used just means we're wasting cycles on each INSERT
for no reason.
** The indexes are across the entire VARCHAR(255) field width. This
isn't necessary (and I would argue that the base field type should be
smaller). Index width can be reduced (and performance increased) by
limiting the indexable width to 32 (or smaller).
The solution to the main indexing issues is to do the following:
DROP INDEX idx_sr ON sourceassoc;
CREATE INDEX idx_sr ON sourceassoc (resource_id(32), source_id(32));
DROP INDEX idx_sp ON sourceassoc;
CREATE INDEX idx_sp ON sourceassoc (project_id(32), source_id(32));
DROP INDEX idx_su ON sourceassoc;
CREATE INDEX idx_su ON sourceassoc (user_id(32), source_id(32));
DROP INDEX idx_sm ON sourceassoc;
CREATE INDEX idx_sm ON sourceassoc (meter_id, source_id(32));
Keep in mind if you have (hundreds of) millions of records in the
sourceassoc table, the above will take a long time to run. It will take
hours, but you'll be happy you did it. You'll see the database
performance increase dramatically.
* The columns that refer to IDs of various kinds should not be UTF8.
Changing these columns to a latin1 or even binary charset would cut the
space requirements for the data and index storage by 65%. This means you
can fit around 3x as many records in the same data and index pages. The
more records you fit into an index page, the faster seeks and scans will be.
* sourceassoc has no primary key.
* The meter table has the following:
KEY ix_meter_id (id)
which is entirely redundant (id is the primary key) and does nothing
but slow down insert operations for every record in the meter table.
* The meter table mixes frequently searched and aggregated fields (like
timestamp, counter_type, project_id) with infrequently accessed fields
(like resource_metadata, which is a VARCHAR(5000)). This leads to poorer
performance of aggregate queries on the meter table that use the
clustered index (primary key) in aggregation (for an example, see the
particular line of code that we comment out of Ceilometer above). A
better performing schema would consolidate slim, frequently accessed
fields into the main meter table and move infrequently accessed or
searched fields into a meter_extra table. This would mean many more
records of the main meter table can fit into a single InnoDB data page
(the clustered index), which means faster seeks and scans for 99% of
queries on that table.
On the code level there are a variety of inefficient queries that are
generated, and there are a number of places where using something like a
memcache caching layer for common lookup queries could help reduce load
on the DB server.
I'm hoping to push some patches in the early part of 2014 that address
performance and scalability issues in the SQL driver for Ceilometer.
More information about the OpenStack-dev