[openstack-dev] [tc][nova][ironic][mogan] Evaluate Mogan project
Zhenguo Niu
niu.zglinux at gmail.com
Tue Sep 26 05:15:52 UTC 2017
Thanks Dmitry for the feedback, please see my response inline.
On Mon, Sep 25, 2017 at 8:35 PM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
> Hi!
>
> Thanks for raising this. I was interested in the project for some time,
> but I never got a chance to wrap my head around. I also have a few concerns
> - please see inline.
>
> On 09/25/2017 01:27 PM, Zhenguo Niu wrote:
>
>> Hi folks,
>>
>> First of all, thanks for the audiences for Mogan project update in the TC
>> room during Denver PTG. Here we would like to get more suggestions before
>> we apply for inclusion.
>>
>> Speaking only for myself, I find the current direction of one
>> API+scheduler for vm/baremetal/container unfortunate. After containers
>> management moved out to be a separated project Zun, baremetal with Nova and
>> Ironic continues to be a pain point.
>>
>> #. API
>> Only part of the Nova APIs and parameters can apply to baremetal
>> instances, meanwhile for interoperable with other virtual drivers, bare
>> metal specific APIs such as deploy time RAID, advanced partitions can not
>> be included. It's true that we can support various compute drivers, but
>> the reality is that the support of each of hypervisor is not equal,
>> especially for bare metals in a virtualization world. But I understand the
>> problems with that as Nova was designed to provide compute
>> resources(virtual machines) instead of bare metals.
>>
>
> A correction: any compute resources.
>
> Nova works okay with bare metals. It's never going to work perfectly
> though, because we always have to find a common subset of features between
> VM and BM. RAID is a good example indeed. We have a solution for the
> future, but it's not going to satisfy everyone.
>
> Now I have a question: to which extend do you plan to maintain the "cloud"
> nature of the API? Let's take RAID as an example. Ironic can apply a very
> generic or a very specific configuration. You can request "just RAID-5" or
> you can ask for specific disks to be combined in a specific combination. I
> believe the latter is not something we want to expose to cloud users, as
> it's not going to be a cloud any more.
>
>
In fact, we don't have a clear spec for RAID support yet, but the team
tends to use a generic configuration just as the concerns you raised. But
if we can track disk information in Mogan or Placement(as a nested resource
provider with node), it's also possible for users to specify disks with
some hints like "SSD 500GB", then Mogan can match the disk and pass down a
specific configuration to Ironic. Anyhow, we should fully discuss this with
Ironic team after a spec proposed.
Besides RAID configuration, we already added partitions support when
claiming a server with parition images. But there is a limit to root,
ephemeral and swap as advanced partitions like LVM is not ready on Ironic
side. We are interested in working with Ironic team to make that done this
cycle.
>
>> #. Scheduler
>> Bare metal doesn't fit in to the model of 1:1 nova-compute to resource,
>> as nova-compute processes can't be run on the inventory nodes themselves.
>> That is to say host aggregates, availability zones and such things based on
>> compute service(host) can't be applied to bare metal resources. And for
>> grouping like anti-affinity, the granularity is also not same with virtual
>> machines, bare metal users may want their HA instances not on the same
>> failure domain instead of the node itself. Short saying, we can only get a
>> rigid resource class only scheduling for bare metals.
>>
>
> It's not rigid. Okay, it's rigid, but it's not as rigid as what we used to
> have.
>
> If you're going back to VCPUs-memory-disk triad, you're making it more
> rigid. Of these three, only memory has ever made practical sense for
> deployers. VCPUs is a bit subtle, as it depends on hyper-threading
> enabled/disabled, and I've never seen people using it too often.
>
> But our local_gb thing is an outright lie. Of 20 disks a machine can
> easily have, which one do you report for local_gb? Well, in the best case
> people used ironic root device hints with ironic-inspector to figure out.
> Which is great, but requires ironic-inspector. In the worst case people
> just put random number there to make scheduling work. This is horrible,
> please make sure to not get back to it.
>
>
I dont' mean to get back to the original VCPUs-memory-disk scheduling here.
Currently we just follow the "rigid" resource class scheduling as what nova
does but with node aggregates and affinity/anti-affinity grouping support.
> What I would love to see of a bare metal scheduling project is a
> scheduling based on inventory. I was thinking of being able to express
> things like "give me a node with 2 GPU of at least 256 CUDA cores each". Do
> you plan on this kind of things? This would truly mean flexible scheduling.
>
> Which brings me to one of my biggest reservations about Mogan: I don't
> think copying Nova's architecture is a good idea overall. Particularly, I
> think you have flavors, which do not map at all into bare metal world IMO.
>
>
Yes, totally agree. Mogan is relatively new project, and we are open for
all suggestions from the community especially from Ironic team as you know
bare metal better. About what truly mean flexible scheduling and whether we
need a flavor to map into bare metal world, we can work out together,
that's why Mogan created and why we would like to apply for inclusion.
>
>>
>> And most of the cloud providers in the market offering virtual machines
>> and bare metals as separated resources, but unfortunately, it's hard to
>> achieve this with one compute service.
>>
>
> Do you have proofs for the first statement? And do you imply public
> clouds? Our customers deploy hybrid environments, to my best knowledge.
> Nobody I know uses one compute service in the whole cloud anyway.
>
>
Yes, public clouds, please check the links below.
http://www.hwclouds.com/en-us/product/bms.html
https://www.ibm.com/cloud-computing/bluemix/bare-metal-servers
https://cloud.tencent.com/product/cpm
> I heard people are deploying seperated Nova for virtual machines and bare
>> metals with many downstream hacks to the bare metal single-driver Nova but
>> as the changes to Nova would be massive and may invasive to virtual
>> machines, it seems not practical to be upstream.
>>
>
> I think you're overestimated the problem. In TripleO we deploy separate
> virtual nova compute nodes. If ironic is enabled, its nova computes go to
> controllers. Then you can use host aggregates to split flavors between VM
> and BM. With resources classes it's even more trivial: you get this split
> naturally.
>
>
I also mean the public cloud scenario, when we offer bare metal as first
class resources instead of generic compute resources. Yes it's true that
you can use host aggregates to flavors and resource classes to get VM and
BM split naturally. But it's impossible to manage quota separately and even
worse we don't have a filter to list BMs and VMs separately as they are
just same resources.
>
>> So we created Mogan [1] about one year ago, which aims to offer bare
>> metals as first class resources to users with a set of bare metal specific
>> API and a baremetal-centric scheduler(with Placement service). It was like
>> an experimental project at the beginning, but the outcome makes us believe
>> it's the right way. Mogan will fully embrace Ironic for bare metal
>> provisioning and with RSD server [2] introduced to OpenStack, it will be a
>> new world for bare metals, as with that we can compose hardware resources
>> on the fly.
>>
>
> Good that you touched this topic, because I have a question here :)
>
> With ironic you *request* a node. With RSD and similar you *create* a
> node, which is closer to VMs than to traditional BMs. This gives a similar
> problem to what we have with nova now. Namely, exact vs non-exact filters.
> How do you solve it? Assuming you plan on using flavors on (which I think
> is a bad idea), do you use exact or non-exact filters? How do you handle
> the difference between approaches?
>
>
Mogan will talk to RSD Pod Manager to compose hardware instead of
doing scheduling itself, then enroll the node/ports to Ironic and do
provisioning with the redfish driver. So there's no different filters
problems mentioned above as we don't do scheduling for such servers at all.
The detailed spec will be proposed soon.
https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/pod-manager-api-specification.pdf
>
>> Also, I would like to clarify the overlaps between Mogan and Nova, I bet
>> there must be some users who wants to use one API for the compute resources
>> management as they don't care about whether it's a virtual machine or a
>> bare metal server. Baremetal driver with Nova is still the right choice for
>> such users to get raw performance compute resources. On the contrary, Mogan
>> is for real bare metal users and cloud providers who wants to offer bare
>> metals as a separated resources.
>>
>> Thank you for your time!
>>
>>
>> [1] https://wiki.openstack.org/wiki/Mogan
>> [2] https://www.intel.com/content/www/us/en/architecture-and-tec
>> hnology/rack-scale-design-overview.html
>>
>> --
>> Best Regards,
>> Zhenguo Niu
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Best Regards,
Zhenguo Niu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170926/96c06a5f/attachment-0001.html>
More information about the OpenStack-dev
mailing list