[Openstack-operators] MessagingTimeout in block live-migration due to long image fetch operation

Matt Riedemann mriedemos at gmail.com
Fri Dec 1 23:05:42 UTC 2017


On 11/28/2017 9:13 AM, Gustavo Randich wrote:
> (running Mitaka)
> 
> When doing block live-migration, if the image / backing file is not 
> present at destination host, sometimes pre-live migration fails after 60 
> seconds as shown below. Retrying the migration to the same destination 
> host succeeds.
> 
> It seems that an rpc_response_timeout of 60 seconds is not enough for 
> this scenario, in which fetching the image involves 90 seconds. We don't 
> like to increase rpc_response_timeout  to say, 120 seconds, only for 
> this reason ('cause in other kind of errors we prefer to fail fast).
> 
> Given that migrations are usually long, shouldn't this operation be 
> under the scope of a configurable timeout such as 
> live_migration_progress_timeout or live_migration_completion_timeout 
> which overrides the default rpc timeout?

I think we've talked about adding a config option or somehow doing rpc 
timeouts differently for operations that we know are prone to timeouts, 
so I don't think people would be against a config option for this. I 
know there is at least one place in nova where we specify an rpc 
response timeout which is not the default.

-- 

Thanks,

Matt



More information about the OpenStack-operators mailing list