[openstack-dev] [Openstack-operators] [Nova] Reconciling flavors and block device mappings

Tim Bell Tim.Bell at cern.ch
Fri Aug 26 16:10:52 UTC 2016


On 26 Aug 2016, at 17:44, Andrew Laski <andrew at lascii.com<mailto:andrew at lascii.com>> wrote:




On Fri, Aug 26, 2016, at 11:01 AM, John Griffith wrote:


On Fri, Aug 26, 2016 at 7:37 AM, Andrew Laski <andrew at lascii.com<mailto:andrew at lascii.com>> wrote:


On Fri, Aug 26, 2016, at 03:44 AM,Kostiantyn.Volenbovskyi at swisscom.com<mailto:Kostiantyn.Volenbovskyi at swisscom.com>
wrote:
> Hi,
> option 1 (=that's what patches suggest) sounds totally fine.
> Option 3 > Allow block device mappings, when present, to mostly determine
> instance  packing
> sounds like option 1+additional logic (=keyword 'mostly')
> I think I miss to understand the part of 'undermining the purpose of the
> flavor'
> Why new behavior might require one more parameter to limit number of
> instances of host?
> Isn't it that those VMs will be under control of other flavor
> constraints, such as CPU and RAM anyway and those will be the ones
> controlling 'instance packing'?

Yes it is possible that CPU and RAM could be controlling instance
packing. But my understanding is that since those are often
oversubscribed
I don't understand why the oversubscription ratio matters here?


My experience is with environments where the oversubscription was used to be a little loose with how many vCPUs were allocated or how much RAM was allocated but disk was strictly controlled.




while disk is not that it's actually the disk amounts
that control the packing on some environments.
Maybe an explanation of what you mean by "packing" here.  Customers that I've worked with over the years have used CPU and Mem as their levers and the main thing that they care about in terms of how many Instances go on a Node.  I'd like to learn more about why that's wrong and that disk space is the mechanism that deployers use for this.


By packing I just mean the various ways that different flavors fit on a host. A host may be designed to hold 1 xlarge, or 2 large, or 4 mediums, or 1 large and 2 mediums, etc... The challenge I see here is that the constraint can be managed by using CPU or RAM or disk or some combination of the three. For deployers just using disk the above patches will change behavior for them.

It's not wrong to use CPU/RAM, but it's not what everyone is doing. One purpose of this email was to gauge if it would be acceptable to only use CPU/RAM for packing.




But that is a sub option
here, just document that disk amounts should not be used to determine
flavor packing on hosts and instead CPU and RAM must be used.

> Does option 3 covers In case someone relied on eg. flavor root disk for
> disk volume booted from volume - and now instance packing will change
> once patches are implemented?

That's the goal. In a simple case of having hosts with 16 CPUs, 128GB of
RAM and 2TB of disk and a flavor with VCPU=4, RAM=32GB, root_gb=500GB,
swap/ephemeral=0 the deployer is stating that they want only 4 instances
on that host.
How do you arrive at that logic?  What if they actually wanted a single VCPU=4,RAM=32GB,root_gb=500 but then they wanted the remaining resources split among Instances that were all 1 VCPU, 1 G ram and a 1 G root disk?

My example assumes the one stated flavor. But if they have a smaller flavor then more than 4 instances would fit.


If there is CPU and RAM oversubscription enabled then by
using volumes a user could end up with more than 4 instances on that
host. So a max_instances=4 setting could solve that. However I don't
like the idea of adding a new config, and I think it's too simplistic to
cover more complex use cases. But it's an option.

I would venture to guess that most Operators would be sad to read that.  So rather than give them an explicit lever that does exactly what they want clearly and explicitly we should make it as complex as possible and have it be the result of a 4 or 5 variable equation?  Not to mention it's completely dynamic (because it seems like
lots of clouds have more than one flavor).

Is that lever exactly what they want? That's part of what I'd like to find out here. But currently it's possible to setup a situation where 1 large flavor or 4 small flavors fit on a host. So would the max_instances=4 setting be desired? Keeping in mind that if the above patches merged 4 large flavors could be put on that host if they only use remote volumes and aren't using proper CPU/RAM limits.

I probably was not clear enough in my original description or made some bad assumptions. The concern I have is that if someone is currently relying on disk sizes for their instance limits then the above patches change behavior for them and affect capacity limits and planning. Is this okay and if not what do we do?


From a single operator perspective, we’d prefer an option which would allow boot from volume with a larger size than the flavour. The quota for volumes would avoid abuse.

The use cases we encounter are a standard set of flavors with defined core/memory/disk ratios which correspond to the underlying hardware (now SSD based so we are a little disk space constrained). A user wants to define a Windows VM which needs much more disk space than the equivalent Linux flavour. We therefore suggest to use a volume to boot from. Given that we are disk constrained and want to maximise the CPU/memory usage, not using all the local space is less of an issue than asking them to choose a much larger flavour which would lead to cores/memory being unused.

Tim



All I know is that the current state is broken.  It's not just the scheduling problem, I could live with that probably since it's too hard to fix... but keep in mind that you're reporting the complete wrong information for the Instance in these cases.  My flavor says it's 5G, but in reality it's 200 or whatever.  Rather than make it perfect we should just fix it.  Personally I thought the proposals for a scheduler check and the addition of the Instances/Node option was a win win for everyone.  What am I
missing?  Would you rather a custom filter scheduler so it wasn't a config option?

There is another effort in progress to address the reporting issue. If you poke around Nova specs or conversations you'll hear it referred to as Resource Providers, though it's actually a series of specs with various names. There's certainly a conversation that can be had about waiting for that effort vs trying to address resource tracking in a backportable manner, but that's not what I wanted to get into here.



>
> BR,
> Konstantin
>
> > -----Original Message-----
> > From: Andrew Laski [mailto:andrew at lascii.com<mailto:andrew at lascii.com>]
> > Sent: Thursday, August 25, 2016 10:20 PM
> > To: openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>
> > Cc: openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>
> > Subject: [Openstack-operators] [Nova] Reconciling flavors and block device
> > mappings
> >
> > Cross posting to gather some operator feedback.
> >
> > There have been a couple of contentious patches gathering attention recently
> > about how to handle the case where a block device mapping supersedes flavor
> > information. Before moving forward on either of those I think we should have a
> > discussion about how best to handle the general case, and how to handle any
> > changes in behavior that results from that.
> >
> > There are two cases presented:
> >
> > 1. A user boots an instance using a Cinder volume as a root disk, however the
> > flavor specifies root_gb = x where x > 0. The current behavior in Nova is that the
> > scheduler is given the flavor root_gb info to take into account during scheduling.
> > This may disqualify some hosts from receiving the instance even though that disk
> > space  is not necessary because the root disk is a remote volume.
> > https://review.openstack.org/#/c/200870/
> >
> > 2. A user boots an instance and uses the block device mapping parameters to
> > specify a swap or ephemeral disk size that is less than specified on the flavor.
> > This leads to the same problem as above, the scheduler is provided information
> > that doesn't match the actual disk space to be consumed.
> > https://review.openstack.org/#/c/352522/
> >
> > Now the issue: while it's easy enough to provide proper information to the
> > scheduler on what the actual disk consumption will be when using block device
> > mappings that undermines one of the purposes of flavors which is to control
> > instance packing on hosts. So the outstanding question is to what extent should
> > users have the ability to use block device mappings to bypass flavor constraints?
> >
> > One other thing to note is that while a flavor constrains how much local disk is
> > used it does not constrain volume size at all. So a user can specify an
> > ephemeral/swap disk <= to what the flavor provides but can have an arbitrary
> > sized root disk if it's a remote volume.
> >
> > Some possibilities:
> >
> > Completely allow block device mappings, when present, to determine instance
> > packing. This is what the patches above propose and there's a strong desire for
> > this behavior from some folks. But changes how many instances may fit on a
> > host which could be undesirable to some.
> >
> > Keep the status quo. It's clear that is undesirable based on the bug reports and
> > proposed patches above.
> >
> > Allow block device mappings, when present, to mostly determine instance
> > packing. By that I mean that the scheduler only takes into account local disk that
> > would be consumed, but we add additional configuration to Nova which limits
> > the number of instance that can be placed on a host. This is a compromise
> > solution but I fear that a single int value does not meet the needs of deployers
> > wishing to limit instances on a host. They want it to take into account cpu
> > allocations and ram and disk, in short a flavor :)
> >
> > And of course there may be some other unconsidered solution. That's where
> > you, dear reader, come in.
> >
> > Thoughts?
> >
> > -Andrew
> >
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org/?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org<mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160826/7d913fd5/attachment.html>


More information about the OpenStack-dev mailing list