<div dir="ltr"><div>Hi Spyros,<br><br></div><div>Thanks for starting this thread. My initial understanding was that the planned session would more around<br>heat performance/scalability issues w/ magnum.<br><br></div><div>As most of the additional stuff you mentioned are around heat best practices, I think the specs/reviews<br>would be a great place to start the discussion and we can also squeeze them as part of the same session.<br></div><div><br></div><div>Some comments inline.<br><br></div><div class="gmail_extra">On Mon, Oct 10, 2016 at 9:24 PM, Spyros Trigazis <span dir="ltr"><<a href="mailto:strigazi@gmail.com" target="_blank">strigazi@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Sergey,<div><br></div><div>I have seen the session, I wanted to add more details to</div><div>start the discussion earlier and to be better prepared.</div><div><br></div><div>Thanks,</div><div>Spyros</div><div><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On 10 October 2016 at 17:36, Sergey Kraynev <span dir="ltr"><<a href="mailto:skraynev@mirantis.com" target="_blank">skraynev@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Spyros,<div><br></div><div>AFAIK we already have special session slot related with your topic.</div><div>So thank you for the providing all items here.</div><div>Rabi, can we add link on this mail to etherpad ? (it will save our time during session :) )</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On 10 October 2016 at 18:11, Spyros Trigazis <span dir="ltr"><<a href="mailto:strigazi@gmail.com" target="_blank">strigazi@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr">Hi heat and magnum.<br><br>Apart from the scalability issues that have been observed, I'd like to<br>add few more subjects to discuss during the summit.<br><br>1. One nested stack per node and linear scale of cluster creation<br>time.<br><br>1.1<div>For large stacks, the creation of all nested stack scales linearly. We</div><div>haven't run any tested using the convergence-engine.<br></div></div></div></div></blockquote></div></div></blockquote></div></div></div></div></blockquote><div><br></div><div>From what I understand, magnum uses ResourceGroups and Template Resources. <br></div><div>(ex. Cluster->RGs->master/nodes) to build the cluster.<br><br></div><div>As the nested stack operations happen over rpc, they should be distributed across all available engines.<br></div>So, the finding that the build time increases linearly is not good. It would probably be worth providing more<br>details of heat configuration(ex. number of engine workers etc) on your test setup. it would also be useful<br>to do some tests with convergence enabled, as that is the default from newton.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">Magnum seems to use a collection of software configs (scipts) as a multipart mime with server<br>user_data. So the the build time for 'every node' would be dependent on the time taken by these scripts<br></div><div class="gmail_quote">at boot. <br></div><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr"><div>1.2</div><div>For large stacks, 1000 nodes, the final call to heat to fetch the<br>IPs for all nodes takes 3 to 4 minutes. In heat, the stack has status<br>CREATE_COMPLETE but magnum's state is updated when this long final<br>call is done. Can we do better? Maybe fetch only the master IPs or<br>get he IPs in chunks.<br></div></div></div></div></blockquote></div></div></blockquote></div></div></div></div></blockquote><div><br><br></div><div>We seem load the nested stacks in memory to retrieve their outputs. That would probably explain the<br></div><div>behaviour above, where you load all the nested stacks for the nodes to fetch their ips. There is some<br></div><div>work[1][2] happening atm to change that. <br></div><div><br>[1] <a href="https://review.openstack.org/#/c/383839/" target="_blank">https://review.openstack.org/#<wbr>/c/383839/</a><br>[2] <a href="https://review.openstack.org/#/c/384718" target="_blank">https://review.openstack.org/#<wbr>/c/384718</a><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr"><div>1.3</div><div>After the stack create API call to heat, magnum's conductor<br>busy-waits heat with a thread/cluster. (In case of a magnum conductor<br>restart, we lose that thread and we can't update the status in<br>magnum). Investigate better ways to sync the status between magnum<br>and heat.<br></div></div></div></div></blockquote></div></div></blockquote></div></div></div></div></blockquote><div>Rather than waiting/polling, probably you can implement an observer that consumes events<br>from heat/event-sink and updates magnum accordingly? May be there are better options too.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr"><div>2. Next generation magnum clusters<br><br>A need that comes up frequently in magnum is heterogeneous clusters.<br>* We want to able to create cluster on different hardware, (e.g. spawn<br>  vms on nodes with SSDs and nodes without SSDs or other special<br>  hardware available only in some nodes of the cluster FPGA, GPU)<br>* Spawn cluster across different AZs<br><br>I'll describe briefly our plan here, for further information we have a<br>detailed spec under review. [1]<br><br>To address this issue we introduce the node-group concept in magnum.<br>Each node-group will correspond to a different heat stack. The master<br>nodes can be organized in one or more stacks, so as the worker nodes.<br><br>We investigate how to implement this feature. We consider the<br>following:<br>At the moment, we have three template files, cluster, master and<br>node, and all three template files create one stack. The new<br>generation of clusters will have a cluster stack containing<br>the resources in the cluster template, specifically, networks, lbaas<br>floating-ips etc. Then, the output of this stack would be passed as<br>input to create the master node stack(s) and the worker nodes<br>stack(s).<br></div></div></div></div></blockquote></div></div></blockquote></div></div></div></div></blockquote><div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr"><div>3. Use of heat-agent<br><br>A missing feature in magnum is the lifecycle operations in magnum. For<br>restart of services and COE upgrades (upgrade docker, kubernetes and<br>mesos) we consider using the heat-agent. Another option is to create a<br>magnum agent or daemon like trove.<br><br>3.1<div>For restart, a few systemctl restart or service restart commands will<br>be issued. [2]<br><br></div><div>3.2<br>For upgrades there are three scenarios:<br>1. Upgrade a service which runs in a container. In this case, a small<br>   script that runs in each node is sufficient. No vm reboot required.<br>2. For an ubuntu based image or similar that requires a package upgrade<br>   a similar small script is sufficient too. No vm reboot required.<br>3. For our fedora atomic images, we need to perform a rebase on the<br>   rpm-ostree files system which requires a reboot.<br>4. Finally, a thought under investigation is replacing the nodes one<br>   by one using a different image. e.g. Upgrade from fedora 24 to 25<br>   with new versions of packages all in a new qcow2 image. How could<br>   we update the stack for this?<br><br>Options 1. and 2. can be done by upgrading all worker nodes at once or<br>one by one. Options 3. and 4. should be done one by one.<br><br>I'm drafting a spec about upgrades, should be ready by Wednesday.<br><br>Cheers,<br>Spyros<br><br>[1] <a href="https://review.openstack.org/#/c/352734/" target="_blank">https://review.openstack.org/#<wbr>/c/352734/</a><br>[2] <a href="https://review.openstack.org/#/c/368981/" target="_blank">https://review.openstack.org/#<wbr>/c/368981/</a><br></div></div></div>

<br></div></div>______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr">Regards,<div>Sergey.</div></div></div>

</font></span></div>

<br>______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br></blockquote></div><br></div>

</div></div><br>______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br><div><div dir="ltr"><div>Regards,</div>Rabi Misra<div><br></div></div></div>

</div></div>