[openstack-dev] [nova] ops meetup feedback

Sean Dague sean at dague.net
Tue Sep 20 15:01:23 UTC 2016


On 09/20/2016 10:38 AM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 09:20:15AM -0400, Sean Dague wrote:
>> This is a bit delayed due to the release rush, finally getting back to
>> writing up my experiences at the Ops Meetup.
>>
>> Nova Feedback Session
>> =====================
>>
>> We had a double session for Feedback for Nova from Operators, raw
>> etherpad here - https://etherpad.openstack.org/p/NYC-ops-Nova.
>>
>> The median release people were on in the room was Kilo. Some were
>> upgrading to Liberty, many had older than Kilo clouds. Remembering
>> these are the larger ops environments that are engaged enough with the
>> community to send people to the Ops Meetup.
>>
>>
>> Performance Bottlenecks
>> -----------------------
>>
>> * scheduling issues with Ironic - (this is a bug we got through during
>>   the week after the session)
>> * live snapshots actually end up performance issue for people
>>
>> The workarounds config group was not well known, and everyone in the
>> room wished we advertised that a bit more. The solution for snapshot
>> performance is in there
>>
>> There were also general questions about what scale cells should be
>> considered at.
>>
>> ACTION: we should make sure workarounds are advertised better
> 
> Workarounds ought to be something that admins are rarely, if
> ever, having to deal with.
> 
> If the lack of live snapshot is such a major performance problem
> for ops, this tends to suggest that our default behaviour is wrong,
> rather than a need to publicise that operators should set this
> workaround.
> 
> eg, instead of optimizing for the case of a broken live snapshot
> support by default, we should optimize for the case of working
> live snapshot by default. The broken live snapshot stuff was so
> rare that no one has ever reproduced it outside of the gate
> AFAIK.
> 
> IOW, rather than hardcoding disable_live_snapshot=True in nova,
> we should just set it in the gate CI configs, and leave it set
> to False in Nova, so operators get good performance out of the
> box.
> 
> Also it has been a while since we added the workaround, and IIRC,
> we've got newer Ubuntu available on at least some of the gate
> hosts now, so we have the ability to test to see if it still
> hits newer Ubuntu. 

Here is my reconstruction of the snapshot issue from what I can remember
of the conversation.

Nova defaults to live snapshots. This uses the libvirt facility which
dumps both memory and disk. And then we throw away the memory. For large
memory guests (especially volume backed ones that might have a fast path
for the disk) this leads to a lot of overhead for no gain. The
workaround got them past it.

Maybe there is another bug we should be addressing here, but it was an
issue out there people were seeing on the performance side.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list