<div dir="ltr"><div>Thanks Dmitry for the feedback, please see my response inline.</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 25, 2017 at 8:35 PM, Dmitry Tantsur <span dir="ltr"><<a href="mailto:dtantsur@redhat.com" target="_blank">dtantsur@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">Hi!<br>

<br>

Thanks for raising this. I was interested in the project for some time, but I never got a chance to wrap my head around. I also have a few concerns - please see inline.<span><br>

<br>

On 09/25/2017 01:27 PM, Zhenguo Niu wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

Hi folks,<br>

<br>

First of all, thanks for the audiences for Mogan project update in the TC room during Denver PTG. Here we would like to get more suggestions before we apply for inclusion.<br>

<br>

Speaking only for myself, I find the current direction of one API+scheduler for vm/baremetal/container unfortunate. After containers management moved out to be a separated project Zun, baremetal with Nova and Ironic continues to be a pain point.<br>

<br>

#. API<br>

Only part of the Nova APIs and parameters can apply to baremetal instances, meanwhile for interoperable with other virtual drivers, bare metal specific APIs such as deploy time RAID, advanced partitions can not  be included. It's true that we can support various compute drivers, but the reality is that the support of each of hypervisor is not equal, especially for bare metals in a virtualization world. But I understand the problems with that as Nova was designed to provide compute resources(virtual machines) instead of bare metals.<br>

</blockquote>

<br></span>

A correction: any compute resources.<br>

<br>

Nova works okay with bare metals. It's never going to work perfectly though, because we always have to find a common subset of features between VM and BM. RAID is a good example indeed. We have a solution for the future, but it's not going to satisfy everyone.<br>

<br>

Now I have a question: to which extend do you plan to maintain the "cloud" nature of the API? Let's take RAID as an example. Ironic can apply a very generic or a very specific configuration. You can request "just RAID-5" or you can ask for specific disks to be combined in a specific combination. I believe the latter is not something we want to expose to cloud users, as it's not going to be a cloud any more.<span><br>

<br></span></blockquote><div><br></div><div>In fact, we don't have a clear spec for RAID support yet, but the team tends to use a generic configuration just as the concerns you raised. But if we can track disk information in Mogan or Placement(as a nested resource provider with node), it's also possible for users to specify disks with some hints like "SSD 500GB", then Mogan can match the disk and pass down a specific configuration to Ironic. Anyhow, we should fully discuss this with Ironic team after a spec proposed.</div><div><br></div><div>Besides RAID configuration, we already added partitions support when claiming a server with parition images. But there is a limit to root, ephemeral and swap as advanced partitions like LVM is not ready on Ironic side. We are interested in working with Ironic team to make that done this cycle.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

<br>

#. Scheduler<br>

Bare metal doesn't fit in to the model of 1:1 nova-compute to resource, as nova-compute processes can't be run on the inventory nodes themselves. That is to say host aggregates, availability zones and such things based on compute service(host) can't be applied to bare metal resources. And for grouping like anti-affinity, the granularity is also not same with virtual machines, bare metal users may want their HA instances not on the same failure domain instead of the node itself. Short saying, we can only get a rigid resource class only scheduling for bare metals.<br>

</blockquote>

<br></span>

It's not rigid. Okay, it's rigid, but it's not as rigid as what we used to have.<br>

<br>

If you're going back to VCPUs-memory-disk triad, you're making it more rigid. Of these three, only memory has ever made practical sense for deployers. VCPUs is a bit subtle, as it depends on hyper-threading enabled/disabled, and I've never seen people using it too often.<br>

<br>

But our local_gb thing is an outright lie. Of 20 disks a machine can easily have, which one do you report for local_gb? Well, in the best case people used ironic root device hints with ironic-inspector to figure out. Which is great, but requires ironic-inspector. In the worst case people just put random number there to make scheduling work. This is horrible, please make sure to not get back to it.<br>

<br></blockquote><div><br></div><div>I dont' mean to get back to the original VCPUs-memory-disk scheduling here. Currently we just follow the "rigid" resource class scheduling as what nova does but with node aggregates and affinity/anti-affinity grouping support.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

What I would love to see of a bare metal scheduling project is a scheduling based on inventory. I was thinking of being able to express things like "give me a node with 2 GPU of at least 256 CUDA cores each". Do you plan on this kind of things? This would truly mean flexible scheduling.<br>

<br>

Which brings me to one of my biggest reservations about Mogan: I don't think copying Nova's architecture is a good idea overall. Particularly, I think you have flavors, which do not map at all into bare metal world IMO.<span><br>

<br></span></blockquote><div><br></div><div>Yes, totally agree. Mogan is relatively new project, and we are open for all suggestions from the community especially from Ironic team as you know bare metal better. About what truly mean flexible scheduling and whether we need a flavor to map into bare metal world, we can work out together, that's why Mogan created and why we would like to apply for inclusion.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

<br>

<br>

And most of the cloud providers in the market offering virtual machines and bare metals as separated resources, but unfortunately, it's hard to achieve this with one compute service.<br>

</blockquote>

<br></span>

Do you have proofs for the first statement? And do you imply public clouds? Our customers deploy hybrid environments, to my best knowledge. Nobody I know uses one compute service in the whole cloud anyway.<span><br>

<br></span></blockquote><div><br></div><div>Yes, public clouds, please check the links below.</div><div><br></div><div><a href="http://www.hwclouds.com/en-us/product/bms.html">http://www.hwclouds.com/en-us/product/bms.html</a><br><a href="https://www.ibm.com/cloud-computing/bluemix/bare-metal-servers">https://www.ibm.com/cloud-computing/bluemix/bare-metal-servers</a><br><a href="https://cloud.tencent.com/product/cpm">https://cloud.tencent.com/product/cpm</a><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

I heard people are deploying seperated Nova for virtual machines and bare metals with many downstream hacks to the bare metal single-driver Nova but as the changes to Nova would be massive and may invasive to virtual machines, it seems not practical to be upstream.<br>

</blockquote>

<br></span>

I think you're overestimated the problem. In TripleO we deploy separate virtual nova compute nodes. If ironic is enabled, its nova computes go to controllers. Then you can use host aggregates to split flavors between VM and BM. With resources classes it's even more trivial: you get this split naturally.<span><br>

<br></span></blockquote><div><br></div><div>I also mean the public cloud scenario, when we offer bare metal as first class resources instead of generic compute resources. Yes it's true that you can use host aggregates to flavors and resource classes to get VM and BM split naturally. But it's impossible to manage quota separately and even worse we don't have a filter to list BMs and VMs separately as they are just same resources.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

<br>

So we created Mogan [1] about one year ago, which aims to offer bare metals as first class resources to users with a set of bare metal specific API and a baremetal-centric scheduler(with Placement service). It was like an experimental project at the beginning, but the outcome makes us believe it's the right way. Mogan will fully embrace Ironic for bare metal provisioning and with RSD server [2] introduced to OpenStack, it will be a new world for bare metals, as with that we can compose hardware resources on the fly.<br>

</blockquote>

<br></span>

Good that you touched this topic, because I have a question here :)<br>

<br>

With ironic you *request* a node. With RSD and similar you *create* a node, which is closer to VMs than to traditional BMs. This gives a similar problem to what we have with nova now. Namely, exact vs non-exact filters. How do you solve it? Assuming you plan on using flavors on (which I think is a bad idea), do you use exact or non-exact filters? How do you handle the difference between approaches?<br>

<br></blockquote><div><br></div><div>Mogan will talk to RSD Pod Manager to compose hardware instead of doing scheduling itself, then enroll the node/ports to Ironic and do provisioning with the redfish driver. So there's no different filters problems mentioned above as we don't do scheduling for such servers at all. The detailed spec will be proposed soon. </div><div><br></div><div><a href="https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/pod-manager-api-specification.pdf">https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/pod-manager-api-specification.pdf</a></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

<br>

Also, I would like to clarify the overlaps between Mogan and Nova, I bet there must be some users who wants to use one API for the compute resources management as they don't care about whether it's a virtual machine or a bare metal server. Baremetal driver with Nova is still the right choice for such users to get raw performance compute resources. On the contrary, Mogan is for real bare metal users and cloud providers who wants to offer bare metals as a separated resources.<br>

<br>

Thank you for your time!<br>

<br>

<br>

[1] <a href="https://wiki.openstack.org/wiki/Mogan" target="_blank" rel="noreferrer">https://wiki.openstack.org/wik<wbr>i/Mogan</a><br>

[2] <a href="https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html" target="_blank" rel="noreferrer">https://www.intel.com/content/<wbr>www/us/en/architecture-and-tec<wbr>hnology/rack-scale-design-over<wbr>view.html</a><br>

<br>

-- <br>

Best Regards,<br>

Zhenguo Niu<br>

<br>

<br></span>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank" rel="noreferrer">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank" rel="noreferrer">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br>

</blockquote>

<br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank" rel="noreferrer">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank" rel="noreferrer">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><div>Best Regards,<br></div>Zhenguo Niu<br></div></div>

</div></div>