[nova] super long online_data_migrations

Jay Pipes jaypipes at gmail.com
Mon Apr 1 12:21:10 UTC 2019


On 03/31/2019 10:21 PM, Mohammed Naser wrote:
> Hi there,
> 
> During upgrades, I've noticed that when running online_data_migrations
> with "infinite-until-done" mode, it loops over all of the migrations
> one by one.
> 
> However, one of the online data migrations
> (instance_obj.populate_missing_availability_zones) makes a query that
> takes a really long time as it seems inefficient (which eventually
> results in 0, cause it already ran), which means as it loops in
> "blocks" of 50, there's almost a 2-3 to 8 minute wait in really large
> environments.
> 
> The question ends up in specific:
> 
> SELECT count(*) AS count_1
> FROM (SELECT instance_extra.created_at AS instance_extra_created_at,
> instance_extra.updated_at AS instance_extra_updated_at,
> instance_extra.deleted_at AS instance_extra_deleted_at,
> instance_extra.deleted AS instance_extra_deleted, instance_extra.id AS
> instance_extra_id, instance_extra.instance_uuid AS
> instance_extra_instance_uuid
> FROM instance_extra
> WHERE instance_extra.keypairs IS NULL AND instance_extra.deleted = 0) AS anon_1

Ugh. :(

The online data migration shouldn't be calling the above SQL statement 
at all.

Instead, the migration should be doing something like this:

SELECT ie.instance_uuid FROM instance_extra AS ie
WHERE ie.keypairs IS NULL AND ie.deletd = 0
LIMIT 100

and then while getting any rows returned from the above, perform the 
work of transforming the problematic data in the table for each matched 
instance_uuid.

I'm actually not sure what the above query has to do with availability 
zones, but I'll look into it later on this morning.

Can you report a bug about this and we'll get on it ASAP?

Best,
-jay

> The explain for the DB query in this example:
> 
> +------+-------------+----------------+------+---------------+------+---------+------+--------+-------------+
> | id   | select_type | table          | type | possible_keys | key  |
> key_len | ref  | rows   | Extra       |
> +------+-------------+----------------+------+---------------+------+---------+------+--------+-------------+
> |    1 | SIMPLE      | instance_extra | ALL  | NULL          | NULL |
> NULL    | NULL | 382473 | Using where |
> +------+-------------+----------------+------+---------------+------+---------+------+--------+-------------+
> 
> It's possible that it can be ever worse, as this number is from
> another very-long running environments.
> 
> +------+-------------+----------------+------+---------------+------+---------+------+---------+-------------+
> | id   | select_type | table          | type | possible_keys | key  |
> key_len | ref  | rows    | Extra       |
> +------+-------------+----------------+------+---------------+------+---------+------+---------+-------------+
> |    1 | SIMPLE      | instance_extra | ALL  | NULL          | NULL |
> NULL    | NULL | 3008741 | Using where |
> +------+-------------+----------------+------+---------------+------+---------+------+---------+-------------+
> 
> I'm not the SQL expert, could we not optimize this?  Alternatively,
> could we update the online data migrations code to "pop out" any of
> the migrations that return 0 for the next iteration, that way it only
> works on those online_data_migrations that *have* to be done, and
> ignore those it knows are done?
> 
> Thanks,
> Mohammed
> 



More information about the openstack-discuss mailing list