[openstack-dev] [nova] Migration progress

Daniel P. Berrange berrange at redhat.com
Wed Feb 3 11:45:54 UTC 2016


On Wed, Feb 03, 2016 at 11:27:16AM +0000, Paul Carlton wrote:
> On 03/02/16 10:49, Daniel P. Berrange wrote:
> >On Wed, Feb 03, 2016 at 10:44:36AM +0000, Daniel P. Berrange wrote:
> >>On Wed, Feb 03, 2016 at 10:37:24AM +0000, Koniszewski, Pawel wrote:
> >>>Hello everyone,
> >>>
> >>>On the yesterday's live migration meeting we had concerns that interval of
> >>>writing migration progress to the database is too short.
> >>>
> >>>Information about migration progress will be stored in the database and
> >>>exposed through the API (/servers/<uuid>/migrations/<id>). In current
> >>>proposition [1] migration progress will be updated every 2 seconds. It
> >>>basically means that every 2 seconds a call through RPC will go from compute
> >>>to conductor to write migration data to the database. In case of parallel
> >>>live migrations each migration will report progress by itself.
> >>>
> >>>Isn't 2 seconds interval too short for updates if the information is exposed
> >>>through the API and it requires RPC and DB call to actually save it in the
> >>>DB?
> >>>
> >>>Our default configuration allows only for 1 concurrent live migration [2],
> >>>but it might vary between different deployments and use cases as it is
> >>>configurable. Someone might want to trigger 10 (or even more) parallel live
> >>>migrations and each might take even a day to finish in case of block
> >>>migration. Also if deployment is big enough rabbitmq might be fully-loaded.
> >>>I'm not sure whether updating each migration every 2 seconds makes sense in
> >>>this case. On the other hand it might be hard to observe fast enough that
> >>>migration is stuck if we increase this interval...
> >>Do we have any actual data that this is a real problem. I have a pretty hard
> >>time believing that a database update of a single field every 2 seconds is
> >>going to be what pushes Nova over the edge into a performance collapse, even
> >>if there are 20 migrations running in parallel, when you compare it to the
> >>amount of DB queries & updates done across other areas of the code for pretty
> >>much every singke API call and background job.
> >Also note that progress is rounded to the nearest integer. So even if the
> >migration runs all day, there is a maximum of 100 possible changes in value
> >for the progress field, so most of the updates should turn in to no-ops at
> >the database level.
> >
> >Regards,
> >Daniel
> I agree with Daniel, these rpc and db access ops are a tiny percentage
> of the overall load on rabbit and mysql and properly configured these
> subsystems should have no issues with this workload.
> 
> One correction, unless I'm misreading it, the existing
> _live_migration_monitor code updates the progress field of the instance
> record every 5 seconds.  However this value can go up and down so
> an infinate number of updates are possible?

Oh yes, you are in fact correct. Technically you could have an unbounded
number of updates if migration goes backwards. Some mitigation against
this is if we see progress going backwards we'll actually abort the
migration if it gets stuck for too long. We'll also be progressively
increasing the permitted downtime. So except in pathelogical scenarios
I think the number of updates should still be relatively small.

> However, the issue raised here is not with the existing implementation
> but with the proposed change
> https://review.openstack.org/#/c/258813/5/nova/virt/libvirt/driver.py
> This add a save() operation on the migration object every 2 seconds

Ok, that is more heavy weight since it is recording the raw byte values
and so it is guaranteed to do a database update pretty much every time.
It still shouldn't be too unreasonable a loading though. FWIW I think
it is worth being consistent in the update frequency betweeen the
progress value & the migration object save, so switching to be every
5 seconds probably makes more sense, so we know both objects are
reflecting the same point in time.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list