Open Stack

Wed Aug 31 21:02:25 UTC 2016

remove

-----Original Message-----
From: openstack-operators-request at lists.openstack.org [mailto:openstack-operators-request at lists.openstack.org] 
Sent: Wednesday, August 31, 2016 5:00 AM
To: openstack-operators at lists.openstack.org
Subject: OpenStack-operators Digest, Vol 70, Issue 36

Send OpenStack-operators mailing list submissions to
	openstack-operators at lists.openstack.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

or, via email, send a message with subject or body 'help' to
	openstack-operators-request at lists.openstack.org

You can reach the person managing the list at
	openstack-operators-owner at lists.openstack.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of OpenStack-operators digest..."

Today's Topics:

   1. Re: NYC Ops Meetup - Ubuntu packaging session	summary
      (Corey Bryant)
   2. [scientific][scientific-wg] Reminder:	Scientific WG meeting
      Wednesday 0900 UTC (Stig Telfer)
   3. Re: Update on Nova scheduler poor performance with Ironic
      (David Medberry)
   4. [UX] Horizon Searchlight Usability Study -	Call for
      Participants (Danielle Mundle)
   5. Re: Update on Nova scheduler poor performance with Ironic
      (Matt Riedemann)
   6. Re: Update on Nova scheduler poor performance with Ironic
      (Joshua Harlow)
   7. python and nice utf ? ? :) (Saverio Proto)

----------------------------------------------------------------------

Message: 1
Date: Tue, 30 Aug 2016 08:50:55 -0400
From: Corey Bryant <corey.bryant at canonical.com>
To: Saverio Proto <zioproto at gmail.com>
Cc: OpenStack Operators <openstack-operators at lists.openstack.org>
Subject: Re: [Openstack-operators] NYC Ops Meetup - Ubuntu packaging
	session	summary
Message-ID:
	<CADn0iZ05STTEVc2cSLPnnUz1HaxDv=AKenwBGbL7fwV+ecz2WA at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Tue, Aug 30, 2016 at 4:07 AM, Saverio Proto <zioproto at gmail.com> wrote:

> > Most of the topics that were covered in the Ubuntu packaging session 
> > are summarized in our wiki, to which I've updated based on our discussions:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.ubuntu.com
> > _OpenStack&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=
> > wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RB
> > B2NQL2DwHBtK7rx3DppwY&s=h1L7JJbVqzk-voNd6Ns_8VeOuu2ddguBjT4VXloCVdk&
> > e=
>
> Hello Corey,
>
>
Hello Saverio,

> thanks for updating the wiki so quickly.
>

Np, thanks for the input.

> I am trying to rebuild stable/liberty cinder adding 
> https://review.openstack.org/#/c/306610/
>
> I am building for ubuntu trusty at the moment.
>
> First of all this line makes no sense to people not familiar with
> launchpad:
>
> git clone lp:~ubuntu-server-dev/ubuntu/+source/nova
>
> I would write:
>
> git clone 
> https://git.launchpad.net/~ubuntu-server-dev/ubuntu/+source/nova

I added this.  Note there's a section at the top of that page called 'Git Configuration' linking to lp configuration.

>
>
> I had to look up my shell history from NYC to understand what to 
> replace in 'lp' :) Or we could link this page 
> https://help.launchpad.net/Code/Git
> where it explains how to hack the gitconfig.
>
> Also, there is a part: "if you have added an appropriate changelog 
> comment then: debcommit"
> We can improve saying that we can edit the changelog with the command 
> 'dch -i'
>

Good point, I've made that update.

>
> I tried to submit a merge request for the cinder package.
> https://code.launchpad.net/~zioproto/ubuntu/+source/
> cinder/+git/cinder/+merge/304341
>
> I already spotted an error in my patch in the debian changelog, a 
> malformed email was added probably by debcommit.
>
> What is the review workflow ? I should commit amend or just adding 
> commits on top of this branch ?
>
>
First, thanks for contributing!

You can just fix that in a follow on commit.  For the most part though I want to see a clean git history.  When I merge I use --ff-only, so your git history will get merged into the main branch in tact.

thanks !
>
> Saverio
>

--
Regards,
Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160830/7c4080c6/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 30 Aug 2016 17:20:27 +0100
From: Stig Telfer <stig.openstack at telfer.org>
To: user-committee at lists.openstack.org, "openstack-oper."
	<openstack-operators at lists.openstack.org>
Subject: [Openstack-operators] [scientific][scientific-wg] Reminder:
	Scientific WG meeting Wednesday 0900 UTC
Message-ID: <B6F13498-5C71-4E37-855D-89530F89E089 at telfer.org>
Content-Type: text/plain; charset=utf-8

Hi all - 

We have a Scientific WG IRC meeting on Wednesday at 0900 UTC on channel #openstack-meeting.

The agenda is available here[1] and full IRC meeting details are here[2].

This week we?ll be looking at gathering some top picks for a selection themed on scientific compute from the conference schedule.  Also, reviewing progress so far on the OpenStack/HPC white papers and seeking volunteer experts!

Best wishes,
Stig

[1] https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_August_31st_2016
[2] http://eavesdrop.openstack.org/#Scientific_Working_Group

------------------------------

Message: 3
Date: Tue, 30 Aug 2016 10:49:32 -0600
From: David Medberry <openstack at medberry.net>
To: Mathieu Gagn? <mgagne at calavera.ca>
Cc: "openstack-operators at lists.openstack.org"
	<openstack-operators at lists.openstack.org>
Subject: Re: [Openstack-operators] Update on Nova scheduler poor
	performance with Ironic
Message-ID:
	<CAJhvMSueOsD19WFykhPks9HyZDZxuETGK82RaHMTxzRGJu74Yg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Great writeup @Mathieu and thanks @sean and @jrolls!

-d

On Mon, Aug 29, 2016 at 3:34 PM, Mathieu Gagn? <mgagne at calavera.ca> wrote:

> Hi,
>
> For those that attended the OpenStack Ops meetup, you probably heard 
> me complaining about a serious performance issue we had with Nova 
> scheduler (Kilo) with Ironic.
>
> Thanks to Sean Dague and Matt Riedemann, we found the root cause.
>
> It was caused by this block of code [1] which is hitting the database 
> for each node loaded by the scheduler. This block of code is called if 
> no instance info is found in the scheduler cache.
>
> I found that this instance info is only populated if the 
> scheduler_tracks_instance_changes config [2] is enabled which it is by 
> default. But being a good operator (wink wink), I followed the Ironic 
> install guide which recommends disabling it [3], unknowingly getting 
> myself into deep troubles.
>
> There isn't much information about the purpose of this config in the 
> kilo branch. Fortunately, you can find more info in the master branch 
> [4], thanks to the config documentation effort. This instance info 
> cache is used by filters which rely on instance location to perform 
> affinity/anti-affinity placement or anything that cares about the 
> instances running on the destination node.
>
> Enabling this option will make it so Nova scheduler loads instance 
> info asynchronously at start up. Depending on the number of 
> hypervisors and instances, it can take several minutes. (we are 
> talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in 
> our case)
>
> So Jim Roll jumped into the discussion on IRC and found a bug [5] he 
> opened and fixed in Liberty. It makes it so Nova scheduler never 
> populates the instance info cache if Ironic host manager is loaded.
> For those running Nova with Ironic, you will agree that there is no 
> known use case where affinity/anti-affinity is used. (please reply if 
> you know of one)
>
> To summarize, the poor performance of Nova scheduler will only show if 
> you are running the Kilo version of Nova and you disable 
> scheduler_tracks_instance_changes which might be the case if you are 
> running Ironic too.
>
> For those curious about our Nova scheduler + Ironic setup, we have 
> done the following to get nova scheduler to ludicrous speed:
>
> 1) Use CachingScheduler
>
> There was a great talk at the OpenStack Summit about why you would 
> want to use it. [6]
>
> By default, the Nova scheduler will load ALL nodes (hypervisors) from 
> database to memory before each scheduling. If you have A LOT of 
> hypervisors, this process can take a while. This means scheduling 
> won't happen until this step is completed. It could also mean that 
> scheduling will always fail if you don't tweak service_down_time (see
> 3 below) if you have lot of hypervisors.
>
> This driver will make it so nodes (hypervisors) are loaded in memory 
> every ~60 seconds. Since information is now pre-cached, the scheduling 
> process can happen right away, it is super fast.
>
> There is a lot of side-effects to using it though. For example:
> - you can only run ONE nova-scheduler process since cache state won't 
> be shared between processes and you don't want instances to be 
> scheduled twice to the same node/hypervisor.
> - It can take ~1m before new capacity is recognized by the scheduler.
> (new or freed nodes) The cache is refreshed every 60 seconds with a 
> periodic task. (this can be changed with scheduler_driver_task_period)
>
> In the context of Ironic, it is a compromise we are willing to accept.
> We are not adding Ironic nodes that often and nodes aren't 
> created/deleting as often as virtual machines.
>
> 2) Run a single nova-compute service
>
> I strongly suggest you DO NOT run multiple nova-compute services. If 
> you do, you will have duplicated hypervisors loaded by the scheduler 
> and you could end up with conflicting scheduling. You will also have 
> twice as much hypervisors to load in the scheduler.
>
> Note: I heard about multiple compute host support in Nova for Ironic 
> with use of an hash ring but I don't have much details about it. So 
> this recommendation might not apply to you if you are using a recent 
> version of Nova.
>
> 3) Increase service_down_time
>
> If you have a lot of nodes, you might have to increase this value 
> which is set to 60 seconds by default. This value is used by the 
> ComputeFilter filter to exclude nodes it hasn't heard from. If it 
> takes more than 60 seconds to list the list of nodes, you might guess 
> what we will happen, the scheduler will reject all of them since node 
> info is already outdated when it finally hits the filtering steps. I 
> strongly suggest you tweak this setting, regardless of the use of 
> CachingScheduler.
>
> 4) Tweak scheduler to only load empty nodes/hypervisors
>
> So this is a hack [7] we did before finding out about the bug [5] we 
> described and identified earlier. When investigating our performance 
> issue, we enabled debug logging and saw that periodic task was taking 
> forever to complete (10-15m) with CachingScheduler driver.
>
> We knew (strongly suspected) Nova scheduler was spending a huge amount 
> of time loading nodes/hypervisors. We (unfortunately) didn't push 
> further our investigation and jumped right away to optimization phase.
>
> So we came up with the idea of only loading empty nodes/hypervisors.
> Remember, we are still in the context of Ironic, not cloud and virtual 
> machines. So it made perfect sense for us to stop spending time 
> loading nodes/hypervisors we would discard anyway.
>
> Thanks to all that help us debugging our scheduling performance 
> issues, it is now crazy fast. =)
>
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openst
> ack_nova_blob_kilo-2Deol_nova_&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeA
> w-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC
> 8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=A4MHj6YCvTPCLYHW-FCgVHByM8poAc
> f2NymHzu7-yKM&e=
> scheduler/host_manager.py#L589-L592
> [2] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openst
> ack_nova_blob_kilo-2Deol_nova_&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeA
> w-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC
> 8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=A4MHj6YCvTPCLYHW-FCgVHByM8poAc
> f2NymHzu7-yKM&e=
> scheduler/host_manager.py#L65-L68
> [3] http://docs.openstack.org/developer/ironic/deploy/
> install-guide.html#configure-compute-to-use-the-bare-metal-service
> [4] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openst
> ack_nova_blob_282c257aff6b53a1b6bb4b4b034a67&d=CwICAg&c=Sqcl0Ez6M0X8ae
> M67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_h
> JTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=t-EOHZXjZch5-IjS
> b_B6uvMHTw4CgUPLABCITbhTwC8&e=
> 0c450d19d8/nova/conf/scheduler.py#L166-L185
> [5] https://bugs.launchpad.net/nova/+bug/1479124
> [6] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_w
> atch-3Fv-3DBcHyiOdme2s&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMN
> tXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmN
> WZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=GQFPX5dbh2TYJBxoBLTP4D3-WvGlDUVBf2T43i
> efFqo&e= [7] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_m
> gagne_1fbeca4c0b60af73f019bc2e21eb4a80&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKI
> iDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&
> m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=yXP_5A6iLDp-Ic69uAXNOK
> 0FtGldSPXwWV0tD0pB4wY&e=
>
> --
> Mathieu
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> s
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160830/47efe2df/attachment-0001.html>

------------------------------

Message: 4
Date: Tue, 30 Aug 2016 15:41:23 -0500
From: Danielle Mundle <danielle.m.mundle at gmail.com>
To: openstack-operators at lists.openstack.org
Subject: [Openstack-operators] [UX] Horizon Searchlight Usability
	Study -	Call for Participants
Message-ID:
	<CADoMQ1UB8CwjfO5f+6C9TZTxYVFvuXUi6V6uap4gyORJcYUSug at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello operators!

Our next UX project is a usability evaluation on a proposed search function in Horizon. We?re looking to talk to operators that have some familiarity with Horizon and use it on occasion for their role.

The sessions would last ~45 minutes and be conducted remotely online through WebEx during September 12th through the 21st.  I will be moderating the one-on-one session, and there is a chance that up to 3-4 observers/notetakers from the community would be present on the call.

If you?d like to help with this initiative, please indicate your availability on this Doodle poll: [ https://urldefense.proofpoint.com/v2/url?u=http-3A__doodle.com_poll_g6pv2iucktuemgyy&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=e-Bagncb3uKyQxuCLEmHGVUye6HStS1maKkgt0BJ6KQ&e=
] and include your full name and email address. I will then follow up with a meeting invitation for your scheduled time. Feel free to contact me with any questions.

Thanks for supporting UX research in the OpenStack community!

--Danielle
IRC: uxdanielle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160830/fb155bf8/attachment-0001.html>

------------------------------

Message: 5
Date: Tue, 30 Aug 2016 20:57:07 -0500
From: Matt Riedemann <mriedem at linux.vnet.ibm.com>
To: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Update on Nova scheduler poor
	performance with Ironic
Message-ID: <5f59bd09-1cf6-c8a4-9971-072b7ee06dbe at linux.vnet.ibm.com>
Content-Type: text/plain; charset=windows-1252; format=flowed

On 8/29/2016 4:34 PM, Mathieu Gagn? wrote:
> Hi,
>
> For those that attended the OpenStack Ops meetup, you probably heard 
> me complaining about a serious performance issue we had with Nova 
> scheduler (Kilo) with Ironic.
>
> Thanks to Sean Dague and Matt Riedemann, we found the root cause.
>
> It was caused by this block of code [1] which is hitting the database 
> for each node loaded by the scheduler. This block of code is called if 
> no instance info is found in the scheduler cache.
>
> I found that this instance info is only populated if the 
> scheduler_tracks_instance_changes config [2] is enabled which it is by 
> default. But being a good operator (wink wink), I followed the Ironic 
> install guide which recommends disabling it [3], unknowingly getting 
> myself into deep troubles.
>
> There isn't much information about the purpose of this config in the 
> kilo branch. Fortunately, you can find more info in the master branch 
> [4], thanks to the config documentation effort. This instance info 
> cache is used by filters which rely on instance location to perform 
> affinity/anti-affinity placement or anything that cares about the 
> instances running on the destination node.
>
> Enabling this option will make it so Nova scheduler loads instance 
> info asynchronously at start up. Depending on the number of 
> hypervisors and instances, it can take several minutes. (we are 
> talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in 
> our case)
>
> So Jim Roll jumped into the discussion on IRC and found a bug [5] he 
> opened and fixed in Liberty. It makes it so Nova scheduler never 
> populates the instance info cache if Ironic host manager is loaded.
> For those running Nova with Ironic, you will agree that there is no 
> known use case where affinity/anti-affinity is used. (please reply if 
> you know of one)
>
> To summarize, the poor performance of Nova scheduler will only show if 
> you are running the Kilo version of Nova and you disable 
> scheduler_tracks_instance_changes which might be the case if you are 
> running Ironic too.
>
> For those curious about our Nova scheduler + Ironic setup, we have 
> done the following to get nova scheduler to ludicrous speed:

But have you gone plaid? :)

>
> 1) Use CachingScheduler
>
> There was a great talk at the OpenStack Summit about why you would 
> want to use it. [6]
>
> By default, the Nova scheduler will load ALL nodes (hypervisors) from 
> database to memory before each scheduling. If you have A LOT of 
> hypervisors, this process can take a while. This means scheduling 
> won't happen until this step is completed. It could also mean that 
> scheduling will always fail if you don't tweak service_down_time (see
> 3 below) if you have lot of hypervisors.
>
> This driver will make it so nodes (hypervisors) are loaded in memory 
> every ~60 seconds. Since information is now pre-cached, the scheduling 
> process can happen right away, it is super fast.
>
> There is a lot of side-effects to using it though. For example:
> - you can only run ONE nova-scheduler process since cache state won't 
> be shared between processes and you don't want instances to be 
> scheduled twice to the same node/hypervisor.
> - It can take ~1m before new capacity is recognized by the scheduler.
> (new or freed nodes) The cache is refreshed every 60 seconds with a 
> periodic task. (this can be changed with scheduler_driver_task_period)
>
> In the context of Ironic, it is a compromise we are willing to accept.
> We are not adding Ironic nodes that often and nodes aren't 
> created/deleting as often as virtual machines.
>
> 2) Run a single nova-compute service
>
> I strongly suggest you DO NOT run multiple nova-compute services. If 
> you do, you will have duplicated hypervisors loaded by the scheduler 
> and you could end up with conflicting scheduling. You will also have 
> twice as much hypervisors to load in the scheduler.
>
> Note: I heard about multiple compute host support in Nova for Ironic 
> with use of an hash ring but I don't have much details about it. So 
> this recommendation might not apply to you if you are using a recent 
> version of Nova.

The spec for the hash ring stuff that landed fairly recently in Newton is here:

http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/ironic-multiple-compute-hosts.html

It's very early and there isn't CI for it yet, but jroll was saying he was going to be experimenting with it at Rackspace (or maybe it was with OSIC...).

>
> 3) Increase service_down_time
>
> If you have a lot of nodes, you might have to increase this value
> which is set to 60 seconds by default. This value is used by the
> ComputeFilter filter to exclude nodes it hasn't heard from. If it
> takes more than 60 seconds to list the list of nodes, you might guess
> what we will happen, the scheduler will reject all of them since node
> info is already outdated when it finally hits the filtering steps. I
> strongly suggest you tweak this setting, regardless of the use of
> CachingScheduler.
>
> 4) Tweak scheduler to only load empty nodes/hypervisors
>
> So this is a hack [7] we did before finding out about the bug [5] we
> described and identified earlier. When investigating our performance
> issue, we enabled debug logging and saw that periodic task was taking
> forever to complete (10-15m) with CachingScheduler driver.
>
> We knew (strongly suspected) Nova scheduler was spending a huge amount
> of time loading nodes/hypervisors. We (unfortunately) didn't push
> further our investigation and jumped right away to optimization phase.
>
> So we came up with the idea of only loading empty nodes/hypervisors.
> Remember, we are still in the context of Ironic, not cloud and virtual
> machines. So it made perfect sense for us to stop spending time
> loading nodes/hypervisors we would discard anyway.
>
> Thanks to all that help us debugging our scheduling performance
> issues, it is now crazy fast. =)
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_kilo-2Deol_nova_scheduler_host-5Fmanager.py-23L589-2DL592&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=AZlbe_9LW1Ru7WsumjBTS9k8Y6jrXENvqu7Oaj0Wp6k&e= 
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_kilo-2Deol_nova_scheduler_host-5Fmanager.py-23L65-2DL68&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=jSfgljjncuHrQGShPtCEZA8b74__2Tf5KR1oY0DkuOQ&e= 
> [3] http://docs.openstack.org/developer/ironic/deploy/install-guide.html#configure-compute-to-use-the-bare-metal-service
> [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_282c257aff6b53a1b6bb4b4b034a670c450d19d8_nova_conf_scheduler.py-23L166-2DL185&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=rLbUJ8AOVdFf5IhT48N_48k-r89HFbHNWDD3hWkA2J8&e= 
> [5] https://bugs.launchpad.net/nova/+bug/1479124
> [6] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DBcHyiOdme2s&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=GQFPX5dbh2TYJBxoBLTP4D3-WvGlDUVBf2T43iefFqo&e= 
> [7] https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_mgagne_1fbeca4c0b60af73f019bc2e21eb4a80&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=yXP_5A6iLDp-Ic69uAXNOK0FtGldSPXwWV0tD0pB4wY&e= 
>
> --
> Mathieu
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Thanks for the write up, it's always nice to see a follow up from events 
back to the mailing list for people that didn't attend.

-- 

Thanks,

Matt Riedemann

------------------------------

Message: 6
Date: Tue, 30 Aug 2016 22:33:27 -0700
From: Joshua Harlow <harlowja at fastmail.com>
To: Mathieu Gagn? <mgagne at calavera.ca>
Cc: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Update on Nova scheduler poor
	performance with Ironic
Message-ID: <57C66C27.3080100 at fastmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Mathieu Gagn? wrote:
> Hi,
>
> For those that attended the OpenStack Ops meetup, you probably heard
> me complaining about a serious performance issue we had with Nova
> scheduler (Kilo) with Ironic.

BTW, thanks for helping push this and complaining about it and ...

It's a tough and thankless job but it's needed IMHO :)

Without further ado,

>
> Thanks to Sean Dague and Matt Riedemann, we found the root cause.
>
> It was caused by this block of code [1] which is hitting the database
> for each node loaded by the scheduler. This block of code is called if
> no instance info is found in the scheduler cache.
>
> I found that this instance info is only populated if the
> scheduler_tracks_instance_changes config [2] is enabled which it is by
> default. But being a good operator (wink wink), I followed the Ironic
> install guide which recommends disabling it [3], unknowingly getting
> myself into deep troubles.
>
> There isn't much information about the purpose of this config in the
> kilo branch. Fortunately, you can find more info in the master branch
> [4], thanks to the config documentation effort. This instance info
> cache is used by filters which rely on instance location to perform
> affinity/anti-affinity placement or anything that cares about the
> instances running on the destination node.
>
> Enabling this option will make it so Nova scheduler loads instance
> info asynchronously at start up. Depending on the number of
> hypervisors and instances, it can take several minutes. (we are
> talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in
> our case)

This feels like a classic thing that could just be made better by a 
scatter/gather (in threads or other?) to the database or other service. 
1s per node seems ummm, sorta bad and/or non-optimal (I wonder if this 
is low hanging fruit to improve this). I can travel around the world 7.5 
times in that amount of time (if I was a light beam, haha).

>
> So Jim Roll jumped into the discussion on IRC and found a bug [5] he
> opened and fixed in Liberty. It makes it so Nova scheduler never
> populates the instance info cache if Ironic host manager is loaded.
> For those running Nova with Ironic, you will agree that there is no
> known use case where affinity/anti-affinity is used. (please reply if
> you know of one)
>
> To summarize, the poor performance of Nova scheduler will only show if
> you are running the Kilo version of Nova and you disable
> scheduler_tracks_instance_changes which might be the case if you are
> running Ironic too.
>
> For those curious about our Nova scheduler + Ironic setup, we have
> done the following to get nova scheduler to ludicrous speed:
>
> 1) Use CachingScheduler
>
> There was a great talk at the OpenStack Summit about why you would
> want to use it. [6]
>
> By default, the Nova scheduler will load ALL nodes (hypervisors) from
> database to memory before each scheduling. If you have A LOT of
> hypervisors, this process can take a while. This means scheduling
> won't happen until this step is completed. It could also mean that
> scheduling will always fail if you don't tweak service_down_time (see
> 3 below) if you have lot of hypervisors.
>
> This driver will make it so nodes (hypervisors) are loaded in memory
> every ~60 seconds. Since information is now pre-cached, the scheduling
> process can happen right away, it is super fast.
>
> There is a lot of side-effects to using it though. For example:
> - you can only run ONE nova-scheduler process since cache state won't
> be shared between processes and you don't want instances to be
> scheduled twice to the same node/hypervisor.

Out of curiosity, do you have only one scheduler process active and 
passive scheduler process(es) idle waiting to become active if the other 
schedule dies? (pretty simply done via something like 
https://urldefense.proofpoint.com/v2/url?u=https-3A__kazoo.readthedocs.io_en_latest_api_recipe_election.html&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=t7U7rcvhIz4r7BI6UqHWwb_-vJl5MBELYBcW6mcWxJg&e= ) Or do 
you have some manual/other process that kicks off a new scheduler if the 
'main' one dies?

> - It can take ~1m before new capacity is recognized by the scheduler.
> (new or freed nodes) The cache is refreshed every 60 seconds with a
> periodic task. (this can be changed with scheduler_driver_task_period)
>
> In the context of Ironic, it is a compromise we are willing to accept.
> We are not adding Ironic nodes that often and nodes aren't
> created/deleting as often as virtual machines.
>
> 2) Run a single nova-compute service
>
> I strongly suggest you DO NOT run multiple nova-compute services. If
> you do, you will have duplicated hypervisors loaded by the scheduler
> and you could end up with conflicting scheduling. You will also have
> twice as much hypervisors to load in the scheduler.

This seems scary (whenever I hear run a single of anything in a *cloud* 
platform, that makes me shiver). It'd be nice if we at least recommended 
people run 
https://urldefense.proofpoint.com/v2/url?u=https-3A__kazoo.readthedocs.io_en_latest_api_recipe_election.html&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=t7U7rcvhIz4r7BI6UqHWwb_-vJl5MBELYBcW6mcWxJg&e=  or have 
some active/passive automatic election process to handle that single 
thing dying (which they usually do, at odd times of the night). Honestly 
I'd (personally) really like to get to the bottom of how we as a group 
of developers ever got to the place where software was released (and/or 
even recommended to be used) in a *cloud* platform that ever required 
only one of anything to be ran (that's crazy bonkers, and yes there is 
history here, but damn, it just feels rotten as all hell, for lack of 
better words).

>
> Note: I heard about multiple compute host support in Nova for Ironic
> with use of an hash ring but I don't have much details about it. So
> this recommendation might not apply to you if you are using a recent
> version of Nova.
>
> 3) Increase service_down_time
>
> If you have a lot of nodes, you might have to increase this value
> which is set to 60 seconds by default. This value is used by the
> ComputeFilter filter to exclude nodes it hasn't heard from. If it
> takes more than 60 seconds to list the list of nodes, you might guess
> what we will happen, the scheduler will reject all of them since node
> info is already outdated when it finally hits the filtering steps. I
> strongly suggest you tweak this setting, regardless of the use of
> CachingScheduler.

Same kind of feeling I had above also applies, something feels broken if 
such things have to be found by operators (I'm pretty sure yahoo when I 
was there saw something similar) and not by the developers making the 
software. If I could (and I know I really can't due to the community we 
work in) I'd very much have an equivalent of a retrospective around how 
these kinds of solutions got built and how they ended up getting 
released to the wider public with such flaws....

>
> 4) Tweak scheduler to only load empty nodes/hypervisors
>
> So this is a hack [7] we did before finding out about the bug [5] we
> described and identified earlier. When investigating our performance
> issue, we enabled debug logging and saw that periodic task was taking
> forever to complete (10-15m) with CachingScheduler driver.
>
> We knew (strongly suspected) Nova scheduler was spending a huge amount
> of time loading nodes/hypervisors. We (unfortunately) didn't push
> further our investigation and jumped right away to optimization phase.
>
> So we came up with the idea of only loading empty nodes/hypervisors.
> Remember, we are still in the context of Ironic, not cloud and virtual
> machines. So it made perfect sense for us to stop spending time
> loading nodes/hypervisors we would discard anyway.
>
> Thanks to all that help us debugging our scheduling performance
> issues, it is now crazy fast. =)
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_kilo-2Deol_nova_scheduler_host-5Fmanager.py-23L589-2DL592&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=AZlbe_9LW1Ru7WsumjBTS9k8Y6jrXENvqu7Oaj0Wp6k&e= 
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_kilo-2Deol_nova_scheduler_host-5Fmanager.py-23L65-2DL68&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=jSfgljjncuHrQGShPtCEZA8b74__2Tf5KR1oY0DkuOQ&e= 
> [3] http://docs.openstack.org/developer/ironic/deploy/install-guide.html#configure-compute-to-use-the-bare-metal-service
> [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_nova_blob_282c257aff6b53a1b6bb4b4b034a670c450d19d8_nova_conf_scheduler.py-23L166-2DL185&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=rLbUJ8AOVdFf5IhT48N_48k-r89HFbHNWDD3hWkA2J8&e= 
> [5] https://bugs.launchpad.net/nova/+bug/1479124
> [6] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DBcHyiOdme2s&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=GQFPX5dbh2TYJBxoBLTP4D3-WvGlDUVBf2T43iefFqo&e= 
> [7] https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_mgagne_1fbeca4c0b60af73f019bc2e21eb4a80&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=yXP_5A6iLDp-Ic69uAXNOK0FtGldSPXwWV0tD0pB4wY&e= 
>
> --
> Mathieu
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

------------------------------

Message: 7
Date: Wed, 31 Aug 2016 13:56:10 +0200
From: Saverio Proto <zioproto at gmail.com>
To: OpenStack Operators <openstack-operators at lists.openstack.org>
Subject: [Openstack-operators] python and nice utf ? ? :)
Message-ID:
	<CAPmmg8sAh3_vrX03fg7=eMcgnK35rwfNUOEd=BBXmjBb7fSAAw at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hello ops,

this patch fixed my problem:

https://review.openstack.org/#/c/361308/

but it is an ugly hack according to:

https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_3828723_why-2Dshould-2Dwe-2Dnot-2Duse-2Dsys-2Dsetdefaultencodingutf-2D8-2Din-2Da-2Dpy-2Dscript&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=wi89PuWDYQ0JRI5ySBcZ6t9GoPsn7FkjNJyX_hJTcd4&m=XfjPaC8RvagfmNWZr5Y5RBB2NQL2DwHBtK7rx3DppwY&s=DlSfbhyPQmyv5Clg3jayAUPaJ9iF20BucsCoLMfhp3A&e= 

anyone knows how to make it better ?

Saverio

------------------------------

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

End of OpenStack-operators Digest, Vol 70, Issue 36
***************************************************

Open Stack

[Openstack-operators] unsubscribe

OpenStack

Community

Documentation

Branding & Legal