[openstack-dev] [nova] Migration progress

Bhandaru, Malini K malini.k.bhandaru at intel.com
Fri Feb 5 04:02:40 UTC 2016

I agree with Daniel,  keep the periods consistent 5 - 5 .

Another thought, for such ephemeral/changing data, such as progress, why not save the information in the cache (and flush to database at a lower rate), and retrieve for display to active listeners/UI from the cache. Once complete or aborted, of course flush the cache.

Also should we provide a "verbose flag", that is only capture progress information when requested? That is when a human user might be issuing the command from the cli or GUI tool.


-----Original Message-----
From: Daniel P. Berrange [mailto:berrange at redhat.com] 
Sent: Wednesday, February 03, 2016 11:46 AM
To: Paul Carlton <paul.carlton2 at hpe.com>
Cc: Feng, Shaohe <shaohe.feng at intel.com>; OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [nova] Migration progress

On Wed, Feb 03, 2016 at 11:27:16AM +0000, Paul Carlton wrote:
> On 03/02/16 10:49, Daniel P. Berrange wrote:
> >On Wed, Feb 03, 2016 at 10:44:36AM +0000, Daniel P. Berrange wrote:
> >>On Wed, Feb 03, 2016 at 10:37:24AM +0000, Koniszewski, Pawel wrote:
> >>>Hello everyone,
> >>>
> >>>On the yesterday's live migration meeting we had concerns that 
> >>>interval of writing migration progress to the database is too short.
> >>>
> >>>Information about migration progress will be stored in the database 
> >>>and exposed through the API (/servers/<uuid>/migrations/<id>). In 
> >>>current proposition [1] migration progress will be updated every 2 
> >>>seconds. It basically means that every 2 seconds a call through RPC 
> >>>will go from compute to conductor to write migration data to the 
> >>>database. In case of parallel live migrations each migration will report progress by itself.
> >>>
> >>>Isn't 2 seconds interval too short for updates if the information 
> >>>is exposed through the API and it requires RPC and DB call to 
> >>>actually save it in the DB?
> >>>
> >>>Our default configuration allows only for 1 concurrent live 
> >>>migration [2], but it might vary between different deployments and 
> >>>use cases as it is configurable. Someone might want to trigger 10 
> >>>(or even more) parallel live migrations and each might take even a 
> >>>day to finish in case of block migration. Also if deployment is big enough rabbitmq might be fully-loaded.
> >>>I'm not sure whether updating each migration every 2 seconds makes 
> >>>sense in this case. On the other hand it might be hard to observe 
> >>>fast enough that migration is stuck if we increase this interval...
> >>Do we have any actual data that this is a real problem. I have a 
> >>pretty hard time believing that a database update of a single field 
> >>every 2 seconds is going to be what pushes Nova over the edge into a 
> >>performance collapse, even if there are 20 migrations running in 
> >>parallel, when you compare it to the amount of DB queries & updates 
> >>done across other areas of the code for pretty much every singke API call and background job.
> >Also note that progress is rounded to the nearest integer. So even if 
> >the migration runs all day, there is a maximum of 100 possible 
> >changes in value for the progress field, so most of the updates 
> >should turn in to no-ops at the database level.
> >
> >Regards,
> >Daniel
> I agree with Daniel, these rpc and db access ops are a tiny percentage 
> of the overall load on rabbit and mysql and properly configured these 
> subsystems should have no issues with this workload.
> One correction, unless I'm misreading it, the existing 
> _live_migration_monitor code updates the progress field of the 
> instance record every 5 seconds.  However this value can go up and 
> down so an infinate number of updates are possible?

Oh yes, you are in fact correct. Technically you could have an unbounded number of updates if migration goes backwards. Some mitigation against this is if we see progress going backwards we'll actually abort the migration if it gets stuck for too long. We'll also be progressively increasing the permitted downtime. So except in pathelogical scenarios I think the number of updates should still be relatively small.

> However, the issue raised here is not with the existing implementation 
> but with the proposed change 
> https://review.openstack.org/#/c/258813/5/nova/virt/libvirt/driver.py
> This add a save() operation on the migration object every 2 seconds

Ok, that is more heavy weight since it is recording the raw byte values and so it is guaranteed to do a database update pretty much every time.
It still shouldn't be too unreasonable a loading though. FWIW I think it is worth being consistent in the update frequency betweeen the progress value & the migration object save, so switching to be every
5 seconds probably makes more sense, so we know both objects are reflecting the same point in time.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list