From sorrison at gmail.com Tue Jan 2 07:30:13 2018 From: sorrison at gmail.com (Sam Morrison) Date: Tue, 2 Jan 2018 18:30:13 +1100 Subject: [Openstack-operators] Mixed service version CI testing In-Reply-To: <1514481900-sup-4670@fewbar.com> References: <1514481900-sup-4670@fewbar.com> Message-ID: <173E60AF-BB42-428F-9695-0B9C10A89FDF@gmail.com> We usually upgrade nova last so would be helpful. Nectar has been running a mix of versions for a couple of years now and we treat each project as it’s own thing and upgrade everything separately. You can see what versions we run currently at https://trello.com/b/9fkuT1eU/nectar-openstack-versions Sam > On 29 Dec 2017, at 4:28 am, Clint Byrum wrote: > > Excerpts from Matt Riedemann's message of 2017-12-19 09:58:34 -0600: >> During discussion in the TC channel today [1], we got talking about how >> there is a perception that you must upgrade all of the services together >> for anything to work, at least the 'core' services like >> keystone/nova/cinder/neutron/glance - although maybe that's really just >> nova/cinder/neutron? >> >> Anyway, I posit that the services are not as tightly coupled as some >> people assume they are, at least not since kilo era when microversions >> started happening in nova. >> >> However, with the way we do CI testing, and release everything together, >> the perception is there that all things must go together to work. >> >> In our current upgrade job, we upgrade everything to N except the >> nova-compute service, that remains at N-1 to test rolling upgrades of >> your computes and to make sure guests are unaffected by the upgrade of >> the control plane. >> >> I asked if it would be valuable to our users (mostly ops for this >> right?) if we had an upgrade job where everything *except* nova were >> upgraded. If that's how the majority of people are doing upgrades anyway >> it seems we should make sure that works. >> >> I figure leaving nova at N-1 makes more sense because nova depends on >> the other services (keystone/glance/cinder/neutron) and is likely the >> harder / slower upgrade if you're going to do rolling upgrades of your >> compute nodes. >> >> This type of job would not run on nova changes on the master branch, >> since those changes would not be exercised in this type of environment. >> So we'd run this on master branch changes to >> keystone/cinder/glance/neutron/trove/designate/etc. >> >> Does that make sense? Would this be valuable at all? Or should the >> opposite be tested where we upgrade nova to N and leave all of the >> dependent services at N-1? >> > > It makes sense completely. What would really be awesome would be to test > the matrix of single upgrades: > > upgrade only keystone > upgrade only glance > upgrade only neutron > upgrade only cinder > upgrade only nova > > That would have a good chance at catching any co-dependencies that crop > up. > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From lebre.adrien at free.fr Tue Jan 2 23:22:37 2018 From: lebre.adrien at free.fr (lebre.adrien at free.fr) Date: Wed, 3 Jan 2018 00:22:37 +0100 (CET) Subject: [Openstack-operators] [FEMDC] Wed. 3 Jan - IRC Meeting Cancelled - Next meeting Wed.17. In-Reply-To: <690841999.263326520.1514935166633.JavaMail.root@zimbra29-e5> Message-ID: <837886476.263329540.1514935357427.JavaMail.root@zimbra29-e5> Dear all, Due to the Christmas/new year period, the meeting is cancelled. Next meeting is scheduled on Wed, the 17th. ad_ri3n_ From tobias at citynetwork.se Wed Jan 3 11:27:13 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Wed, 3 Jan 2018 12:27:13 +0100 Subject: [Openstack-operators] [publiccloud-wg] Reminder for todays meeting Message-ID: <0502bfd2-2840-ac24-5c40-10ba5c076d99@citynetwork.se> Hi all, Time again for a meeting for the Public Cloud WG - today at 1400 UTC in #openstack-meeting-3 Agenda and etherpad at: https://etherpad.openstack.org/p/publiccloud-wg See you later! Tobias Rydberg -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From mrhillsman at gmail.com Wed Jan 3 13:52:43 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Wed, 3 Jan 2018 07:52:43 -0600 Subject: [Openstack-operators] Ohayo! Q1 2018 Message-ID: https://etherpad.openstack.org/p/TYO-ops-meetup-2018 ​ Hey everyone, What do you think about the new logo! Just a friendly reminder that the Ops Meetup for Spring 2018 is approaching March 7-8, 2018 in Tokyo and we are looking for additional topics. Spring 2018 will have NFV+General on day one and Enterprise+General on day two. Add additional topics to the etherpad or +/- 1 those already proposed. Additionally if you are attending and would like to moderate a session, add your name to the moderator list near the bottom of the etherpad. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: opsmeetuplogo.png Type: image/png Size: 38873 bytes Desc: not available URL: From tobias at citynetwork.se Fri Jan 5 14:01:41 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Fri, 5 Jan 2018 15:01:41 +0100 Subject: [Openstack-operators] [publiccloud-wg] Missing features work session Message-ID: Hi everyone, During our last meeting we decided to get together at IRC for a work session dedicated to get the "Missing features list" up to date, and take the fist steps converting items into a more official list at launchpad - where we have a project [1]. Would be awesome to see as many of you as possible joining this. Where: #openstack-publiccloud When: Wednesday 10th January 1400 UTC Agenda: https://etherpad.openstack.org/p/publiccloud-wg This first effort of its kind is as you can see at the same time as bi-weekly meetings. Please send feedback of that, I'm happy to setup another session just like this - at a time that suites you better! Hope to see you there! Regards, Tobias Rydberg Chair Public Cloud WG [1] https://launchpad.net/openstack-publiccloud-wg -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From gael.therond at gmail.com Fri Jan 5 14:06:57 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Fri, 05 Jan 2018 14:06:57 +0000 Subject: [Openstack-operators] [publiccloud-wg] Missing features work session In-Reply-To: References: Message-ID: I'm thrilled to see improvement within this field of concerns and the way Openstack mature by listening from users, would them be architects, operators, end-users etc. I'm glade to see such initiative and for sure will be there! See you on Wednesday! Le ven. 5 janv. 2018 à 15:02, Tobias Rydberg a écrit : > Hi everyone, > > During our last meeting we decided to get together at IRC for a work > session dedicated to get the "Missing features list" up to date, and > take the fist steps converting items into a more official list at > launchpad - where we have a project [1]. Would be awesome to see as many > of you as possible joining this. > > Where: #openstack-publiccloud > When: Wednesday 10th January 1400 UTC > Agenda: https://etherpad.openstack.org/p/publiccloud-wg > > This first effort of its kind is as you can see at the same time as > bi-weekly meetings. Please send feedback of that, I'm happy to setup > another session just like this - at a time that suites you better! > > Hope to see you there! > > Regards, > Tobias Rydberg > Chair Public Cloud WG > > > [1] https://launchpad.net/openstack-publiccloud-wg > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lbragstad at gmail.com Fri Jan 5 16:53:52 2018 From: lbragstad at gmail.com (Lance Bragstad) Date: Fri, 5 Jan 2018 10:53:52 -0600 Subject: [Openstack-operators] [policy] [keystone] Analyzing other access-control systems Message-ID: <65925814-981d-142b-9d74-1dd0032c1aaf@gmail.com> Hey all, This note is a continuation of a thread we started last year on analyzing other policy systems [0]. Now that we're back from the holidays and having policy meetings on Wednesdays [1], it'd be good to pick up the conversation again. We had a few good sessions a couple months ago going through AWS IAM policy bits and contrasting it to RBAC in OpenStack. Before we wrapped up those sessions we thought about doing the same thing with GKE or a more technical deep dive of the IAM stuff. Do we want to pick this back up in the next few weeks? We can use this thread to generate discussion about what we'd like to see and jot down ideas. It might be nice timing to get a session or two scheduled before the PTG, where we can have face-to-face discussions. Thoughts? [0] http://lists.openstack.org/pipermail/openstack-dev/2017-October/123069.html [1] http://eavesdrop.openstack.org/#Keystone_Policy_Meeting -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mriedemos at gmail.com Fri Jan 5 23:09:15 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 5 Jan 2018 17:09:15 -0600 Subject: [Openstack-operators] Fwd: [openstack-dev] [nova][neutron] Filtering Instances by IP address performance improvement test result In-Reply-To: References: Message-ID: <193f61c7-aff3-b4cf-dace-9b50d592c3c7@gmail.com> FYI, some performance test results for a series of patches between nova and neutron to try and improve the performance of listing instances and filtering by ip. -------- Forwarded Message -------- Subject: [openstack-dev] [nova][neutron] Filtering Instances by IP address performance improvement test result Date: Fri, 5 Jan 2018 10:53:56 +0800 From: Zhenyu Zheng Reply-To: OpenStack Development Mailing List (not for usage questions) To: OpenStack Development Mailing List (not for usage questions) Hi All, We are working on patches to improve the performance filtering instance by IP address this cycle. As discussed in the previous ML[1], it contains both patches from Nova and Neutron[2][3][4][5][6]. As the POC is almost functional(the neutron extension part seems not working, it cannot be successfully listed in patchset 14 of [5] , I have to bypass the "if" condition for checking neutron "ip-substring-filtering" extension to make it work, but that seems easy to fix), I made some tests to check what kind of improvement has been done with those patches. In the tests, I wrote a simple script [7](the script is silly, please don't laugh at me:) ) which generated 2000 vm records in Nova DB with IP address allocated(one IP for each vm), and also 2000 port records with corresponding IP addresses in my local devstack env. Before adding those patches, querying instance with a specific IP filtering causes about 4000 ms, the test has been done several times, and I took the averaged result: Inline image 1 After adding those patches(and some modifications as mentioned above) querying with the same request causes only about 400ms: Inline image 2 So, the design seems working well. I also tested with a "Sub-String" manner filtering with IP address: 192.168.7.2, which will match 66 instances, and it takes about 900ms: Inline image 3 It increased, but seems reasonable as it matches more instances, and still much better than current implementation. Please test out in your own env if interested, the script might need some modification as I hardcoded db connection, network_id and subnet_id. And also, please help review the patches :) [1] http://lists.openstack.org/pipermail/openstack-operators/2017-October/014459.html [2] https://review.openstack.org/#/c/509326/ [3] https://review.openstack.org/#/c/525505/ [4] https://review.openstack.org/#/c/518865/ [5] https://review.openstack.org/#/c/521683/ [6] https://review.openstack.org/#/c/525284/ [7] https://github.com/zhengzhenyu/groceries/blob/master/Ip_filtering_performance_test.py BR, Kevin Zheng -------------- next part -------------- __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev From mrhillsman at gmail.com Mon Jan 8 17:26:13 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 08 Jan 2018 11:26:13 -0600 Subject: [Openstack-operators] OpenStack Individual BoD Elections Message-ID: <8D9AF85B-3235-45B4-B803-AFD6FC1D72A4@gmail.com> Hi everyone, Just a friendly reminder that the Individual BoD elections has started. Please take time to consider all the candidates and vote accordingly: https://www.openstack.org/election/2018-individual-director-election/CandidateList You should have your ballot via the email associated with your OpenStack Foundation profile. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: +1 (832) 264-2646 irc: mrhillsman -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Mon Jan 8 18:33:17 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 8 Jan 2018 12:33:17 -0600 Subject: [Openstack-operators] [nova] Rocky PTG early planning Message-ID: As the Queens release winds to a close, I've started thinking about topics for Rocky that can be discussed at the PTG. I've created an etherpad [1] for just throwing various topics in there, completely free-form at this point; just remember to add your name next to any topic you add. [1] https://etherpad.openstack.org/p/nova-ptg-rocky -- Thanks, Matt From piotr.bielak at corp.ovh.com Tue Jan 9 14:18:28 2018 From: piotr.bielak at corp.ovh.com (Piotr Bielak) Date: Tue, 9 Jan 2018 14:18:28 +0000 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler Message-ID: Hi! I'm conducting some research about the nova scheduler logs verbosity. Did you ever encounter any situation, where you didn't feel satisfied with the amount or quality of the logs (at any log level). It is known that the nova-scheduler produces hardly any logs at INFO level. What are your experiences with the nova-scheduler in production environments? What would you like to see in the logs, what isn't printed at the moment (maybe some expected log format)? Thanks for any help and advice, Piotr Bielak From mihailmed at gmail.com Tue Jan 9 17:14:44 2018 From: mihailmed at gmail.com (Mikhail Medvedev) Date: Tue, 9 Jan 2018 11:14:44 -0600 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: Message-ID: On Tue, Jan 9, 2018 at 8:18 AM, Piotr Bielak wrote: > Hi! I'm conducting some research about the nova scheduler logs > verbosity. Did you ever encounter any situation, where you didn't feel > satisfied with the amount or quality of the logs (at any log level). > It is known that the nova-scheduler produces hardly any logs at INFO > level. What are your experiences with the nova-scheduler in production > environments? What would you like to see in the logs, what isn't printed > at the moment (maybe some expected log format)? I am supporting a couple of OpenStack dev clouds and I found in order to solve most operational problems faster I need DEBUG enabled everywhere. In case of scheduler, "No valid host was found" was untractable without the debug messages (as of Mitaka). I need to know what filter ate all the hosts and what values it used for calculations as a minimum. > > Thanks for any help and advice, > Piotr Bielak > --- Mikhail Medvedev (mmedvede) IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue Jan 9 18:38:05 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 9 Jan 2018 12:38:05 -0600 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: Message-ID: On 1/9/2018 8:18 AM, Piotr Bielak wrote: > Hi! I'm conducting some research about the nova scheduler logs > verbosity. Did you ever encounter any situation, where you didn't feel > satisfied with the amount or quality of the logs (at any log level). > It is known that the nova-scheduler produces hardly any logs at INFO > level. What are your experiences with the nova-scheduler in production > environments? What would you like to see in the logs, what isn't printed > at the moment (maybe some expected log format)? You might be interested in this older spec: https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/improve-sched-logging.html Also, there is a noticeable impact to performance when running the scheduler with debug logging enabled which is why it's not recommended to run with debug enabled in production. -- Thanks, Matt From stig.openstack at telfer.org Tue Jan 9 19:45:52 2018 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 9 Jan 2018 19:45:52 +0000 Subject: [Openstack-operators] [scientific] IRC meeting today: SGX security and Ironic, RCUK cloud workshop Message-ID: <9D5DF709-B703-4D64-B0C7-A49C06A30566@telfer.org> Hello All - We have an IRC meeting in channel #openstack-meeting at 2100 UTC today (just over an hour’s time). All are welcome. Today’s agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_9th_2018 We have Andrey Brito from the Federal University of Campina Grande discussing some of his work using Intel SGX for strengthening the security of Ironic compute instances. We also have a roundup of yesterday’s RCUK Cloud Workshop in London. Plus, inevitably, a roundup of people’s experiences of the impact of the Spectre/Meltdown remediations. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue Jan 9 20:33:39 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 9 Jan 2018 20:33:39 +0000 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: Message-ID: <20180109203339.xqqbnhuu2grarspm@yuggoth.org> On 2018-01-09 12:38:05 -0600 (-0600), Matt Riedemann wrote: [...] > Also, there is a noticeable impact to performance when running the > scheduler with debug logging enabled which is why it's not > recommended to run with debug enabled in production. Further, OpenStack considers security risks exposed by DEBUG level logging to be only hardening opportunities, and as such these often linger unfixed or don't get backported to earlier releases (in other words, we consider running in production with DEBUG level logging to be a risky from an information security standpoint). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From ahothan at cisco.com Wed Jan 10 07:15:17 2018 From: ahothan at cisco.com (Alec Hothan (ahothan)) Date: Wed, 10 Jan 2018 07:15:17 +0000 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: Message-ID: <6F7B09F8-9905-4CB4-931D-107FBF797C66@cisco.com> Great to see some interest on trying to improve the log! +1 on the “no valid host found”, this one should be at the very top of the to-be-fixed list. Very difficult to troubleshoot filters in lab testing (let alone in production) as there can be many of them. This will get worst with more NFV optimization filters with so many combinations it gets really complex to debug when a VM cannot be launched with NFV optimized flavors. With the scheduler filtering engine, there should be a systematic way to log the reason for not finding a valid host - at the very least the error should display which filter failed as an ERROR (and not as DEBUG). We really need to avoid deploying with DEBUG log level but unfortunately this is the only way to troubleshoot. Too many debug-level logs are for development debug (meaning pretty much useless in any circumstances – developer forgot to remove before commit of the feature), many errors that should be logged as ERROR but have been logged as DEBUG only. Thanks Alec From: Mikhail Medvedev Date: Tuesday, January 9, 2018 at 9:16 AM To: Piotr Bielak Cc: "openstack-operators at lists.openstack.org" Subject: Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler On Tue, Jan 9, 2018 at 8:18 AM, Piotr Bielak > wrote: > Hi! I'm conducting some research about the nova scheduler logs > verbosity. Did you ever encounter any situation, where you didn't feel > satisfied with the amount or quality of the logs (at any log level). > It is known that the nova-scheduler produces hardly any logs at INFO > level. What are your experiences with the nova-scheduler in production > environments? What would you like to see in the logs, what isn't printed > at the moment (maybe some expected log format)? I am supporting a couple of OpenStack dev clouds and I found in order to solve most operational problems faster I need DEBUG enabled everywhere. In case of scheduler, "No valid host was found" was untractable without the debug messages (as of Mitaka). I need to know what filter ate all the hosts and what values it used for calculations as a minimum. > > Thanks for any help and advice, > Piotr Bielak > --- Mikhail Medvedev (mmedvede) IBM -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jan 10 19:30:57 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 10 Jan 2018 13:30:57 -0600 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: <6F7B09F8-9905-4CB4-931D-107FBF797C66@cisco.com> References: <6F7B09F8-9905-4CB4-931D-107FBF797C66@cisco.com> Message-ID: On 1/10/2018 1:15 AM, Alec Hothan (ahothan) wrote: > +1 on the “no valid host found”, this one should be at the very top of > the to-be-fixed list. > > Very difficult to troubleshoot filters in lab testing (let alone in > production) as there can be many of them. This will get worst with more > NFV optimization filters with so many combinations it gets really > complex to debug when a VM cannot be launched with NFV optimized > flavors. With the scheduler filtering engine, there should be a > systematic way to log the reason for not finding a valid host - at the > very least the error should display which filter failed as an ERROR (and > not as DEBUG). > > We really need to avoid deploying with DEBUG log level but unfortunately > this is the only way to troubleshoot. Too many debug-level logs are for > development debug (meaning pretty much useless in any circumstances – > developer forgot to remove before commit of the feature), many errors > that should be logged as ERROR but have been logged as DEBUG only. > The scheduler logs do print which filter returned 0 hosts for a given request at INFO level. For example, I was debugging this NoValidHost failure in a CI run: http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/job-output.txt.gz#_2018-01-05_17_39_54_336999 And tracing the request ID to the scheduler logs, and filtering on just INFO level logging to simulate what you'd have for the default in production, I found the filter that kicked it out here: http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/controller/logs/screen-n-sch.txt.gz?level=INFO#_Jan_05_17_00_30_564582 And there is a summary like this: Jan 05 17:00:30.565996 ubuntu-xenial-infracloud-chocolate-0001705073 nova-scheduler[8932]: INFO nova.filters [None req-737984ae-3ae8-4506-a5f9-6655a4ebf206 tempest-ServersAdminTestJSON-787960229 tempest-ServersAdminTestJSON-787960229] Filtering removed all hosts for the request with instance ID '8ae8dc23-8f3b-4f0f-8775-2dcc2a5fc75b'. Filter results: ['RetryFilter: (start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'SameHostFilter: (start: 1, end: 0)'] "end: 0" means that's the filter that rejected the request. Without digging into the actual code, the descriptions for the filters is here: https://docs.openstack.org/nova/latest/user/filter-scheduler.html Now just why this request failed requires a bit of understanding of why my environment looks like (this CI run is using the CachingScheduler), so there isn't a simple "ERROR: SameHostFilter rejected request because you're using the CachingScheduler which is racy by design and you created the instances in separate requests". You'd have a ton of false negative ERRORs in the logs because of valid reasons for rejected a request based on the current state of the system, which is going to make debugging real issues that much harder. -- Thanks, Matt From ahothan at cisco.com Wed Jan 10 19:49:12 2018 From: ahothan at cisco.com (Alec Hothan (ahothan)) Date: Wed, 10 Jan 2018 19:49:12 +0000 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: <6F7B09F8-9905-4CB4-931D-107FBF797C66@cisco.com> Message-ID: Matt, Thanks for clarifying the logs, the older release I was using did not have much information in the scheduler log. I’ll double check on the Newton release to see how they look like. As you mention, a simple pass/fail result may not always explain why it fails but it is definitely good to know which filter failed. I still think that a VM failure to launch should be related to 1 ERROR log rather than 1 INFO log. In the example you provide, it is fine to have race conditions that result in a rejection and an ERROR log. The main problem is that the nova API does not return sufficient detail on the reason for a NoValidHostFound and perhaps that should be fixed at that level. Extending the API to return a reason field which is a json dict that is returned by the various filters (with more meaningful filter-specific info) will help tremendously (no more need to go through the log to find out why). Regards, Alec From: Matt Riedemann Date: Wednesday, January 10, 2018 at 11:33 AM To: "openstack-operators at lists.openstack.org" Subject: Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler On 1/10/2018 1:15 AM, Alec Hothan (ahothan) wrote: +1 on the “no valid host found”, this one should be at the very top of the to-be-fixed list. Very difficult to troubleshoot filters in lab testing (let alone in production) as there can be many of them. This will get worst with more NFV optimization filters with so many combinations it gets really complex to debug when a VM cannot be launched with NFV optimized flavors. With the scheduler filtering engine, there should be a systematic way to log the reason for not finding a valid host - at the very least the error should display which filter failed as an ERROR (and not as DEBUG). We really need to avoid deploying with DEBUG log level but unfortunately this is the only way to troubleshoot. Too many debug-level logs are for development debug (meaning pretty much useless in any circumstances – developer forgot to remove before commit of the feature), many errors that should be logged as ERROR but have been logged as DEBUG only. The scheduler logs do print which filter returned 0 hosts for a given request at INFO level. For example, I was debugging this NoValidHost failure in a CI run: http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/job-output.txt.gz#_2018-01-05_17_39_54_336999 And tracing the request ID to the scheduler logs, and filtering on just INFO level logging to simulate what you'd have for the default in production, I found the filter that kicked it out here: http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/controller/logs/screen-n-sch.txt.gz?level=INFO#_Jan_05_17_00_30_564582 And there is a summary like this: Jan 05 17:00:30.565996 ubuntu-xenial-infracloud-chocolate-0001705073 nova-scheduler[8932]: INFO nova.filters [None req-737984ae-3ae8-4506-a5f9-6655a4ebf206 tempest-ServersAdminTestJSON-787960229 tempest-ServersAdminTestJSON-787960229] Filtering removed all hosts for the request with instance ID '8ae8dc23-8f3b-4f0f-8775-2dcc2a5fc75b'. Filter results: ['RetryFilter: (start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'SameHostFilter: (start: 1, end: 0)'] "end: 0" means that's the filter that rejected the request. Without digging into the actual code, the descriptions for the filters is here: https://docs.openstack.org/nova/latest/user/filter-scheduler.html Now just why this request failed requires a bit of understanding of why my environment looks like (this CI run is using the CachingScheduler), so there isn't a simple "ERROR: SameHostFilter rejected request because you're using the CachingScheduler which is racy by design and you created the instances in separate requests". You'd have a ton of false negative ERRORs in the logs because of valid reasons for rejected a request based on the current state of the system, which is going to make debugging real issues that much harder. -- Thanks, Matt _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jan 10 20:40:57 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 10 Jan 2018 14:40:57 -0600 Subject: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler In-Reply-To: References: <6F7B09F8-9905-4CB4-931D-107FBF797C66@cisco.com> Message-ID: On 1/10/2018 1:49 PM, Alec Hothan (ahothan) wrote: > The main problem is that the nova API does not return sufficient detail > on the reason for a NoValidHostFound and perhaps that should be fixed at > that level. Extending the API to return a reason field which is a json > dict that is returned by the various filters  (with more meaningful > filter-specific info) will help tremendously (no more need to go through > the log to find out why). There are security implications to doing this, which is why the ultimate reason behind the NoValidHost hasn't been exposed to the end user. It could leak details about the size, topology and configuration of the cloud and open it up to attacks. A better alternative would be something like an audit log (or fault) that only the user with the admin role could see, like when they are investigating a support ticket. There might be other cases where we should do a better job of validation in the API before casting off to the scheduler. If we can detect common reasons for a scheduling (or build) failure up front in the API, we can return that information immediately back to the user who can act upon it. That, in turn, should also improve our API documentation (assuming it's a common failure or something that's just not clear usage-wise in the docs). -- Thanks, Matt From jonmills at gmail.com Wed Jan 10 21:39:25 2018 From: jonmills at gmail.com (Jonathan Mills) Date: Wed, 10 Jan 2018 16:39:25 -0500 Subject: [Openstack-operators] neutron and dns_domain Message-ID: Dear Operators, I have a mix of Mitaka and Pike clusters, all for private clouds, and with multiple tenants. I am very interested in having the ability to have per-network (really, per-tenant) dns_domain. You would think that this works, based on the documentation here: https://docs.openstack.org/ocata/networking-guide/config-dns-int.html And yes, I have read and re-read that document many times, and carefully followed its instructions. I have the 'dns' extension_driver enabled in ML2. I have set an alternate value from 'openstacklocal' in neutron.conf. I am using the neutron dnsmasq processes as my real DNS servers in my VMs for tenant internal name resolution. (Instance short-name resolution does work, it's just that the FQDN of every VM is wrong.) I have created per-network dns_domain entries in my neutron database. Nevertheless, it does not work. In every tenant, every VM has a dns suffix equal to whatever I have set for 'dns_domain' in neutron.conf (the global default). Scouring the web for clues, I've come across this, which seems to describe my problem: https://bugs.launchpad.net/neutron/+bug/1580588 Notice that the importance is 'wishlist'. Wishlist? I find it surprising that it is a mere wish to have DNS work as expected. I'm curious so I'm asking the community, is this really not working for anyone? And if it is not working for anyone else either, is it really not a big deal? It seems to me this would pose a rather large problem for any number of use cases. In my immediate situation, I am deploying VMs onto a provider network that has a pre-existing Puppet infrastructure, and all the FQDNs are wrong, which means the generation of Puppet SSL certificates on these VMs is problematic. Any feedback would be much appreciated! Cheers, Jonathan Mills NASA Center for Climate Simulation (NCCS) Goddard Space Flight Center, Greenbelt, MD -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Wed Jan 10 21:57:54 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Wed, 10 Jan 2018 21:57:54 +0000 Subject: [Openstack-operators] neutron and dns_domain In-Reply-To: References: Message-ID: As you’re using a L2 network topology and until all of your project use a different network you can do: domain=domain1,10.10.10.0/24 domain=domain2,20.20.20.0/24 Within the dnsmasq-neutron.conf file. Of course, restart the neutron-server service once done. Le mer. 10 janv. 2018 à 22:40, Jonathan Mills a écrit : > Dear Operators, > > I have a mix of Mitaka and Pike clusters, all for private clouds, and with > multiple tenants. I am very interested in having the ability to have > per-network (really, per-tenant) dns_domain. You would think that this > works, based on the documentation here: > https://docs.openstack.org/ocata/networking-guide/config-dns-int.html > > And yes, I have read and re-read that document many times, and carefully > followed its instructions. I have the 'dns' extension_driver enabled in > ML2. I have set an alternate value from 'openstacklocal' in neutron.conf. > I am using the neutron dnsmasq processes as my real DNS servers in my VMs > for tenant internal name resolution. (Instance short-name resolution does > work, it's just that the FQDN of every VM is wrong.) I have created > per-network dns_domain entries in my neutron database. Nevertheless, it > does not work. In every tenant, every VM has a dns suffix equal to > whatever I have set for 'dns_domain' in neutron.conf (the global default). > Scouring the web for clues, I've come across this, which seems to describe > my problem: > > https://bugs.launchpad.net/neutron/+bug/1580588 > > Notice that the importance is 'wishlist'. Wishlist? I find it surprising > that it is a mere wish to have DNS work as expected. I'm curious so I'm > asking the community, is this really not working for anyone? And if it is > not working for anyone else either, is it really not a big deal? It seems > to me this would pose a rather large problem for any number of use cases. > In my immediate situation, I am deploying VMs onto a provider network that > has a pre-existing Puppet infrastructure, and all the FQDNs are wrong, > which means the generation of Puppet SSL certificates on these VMs is > problematic. > > Any feedback would be much appreciated! > > Cheers, > > Jonathan Mills > NASA Center for Climate Simulation (NCCS) > Goddard Space Flight Center, Greenbelt, MD > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmills at gmail.com Wed Jan 10 22:00:41 2018 From: jonmills at gmail.com (Jonathan Mills) Date: Wed, 10 Jan 2018 17:00:41 -0500 Subject: [Openstack-operators] neutron and dns_domain In-Reply-To: References: Message-ID: Thanks, Flint WALRUS. I will certainly try that. On Wed, Jan 10, 2018 at 4:57 PM, Flint WALRUS wrote: > As you’re using a L2 network topology and until all of your project use a > different network you can do: > > domain=domain1,10.10.10.0/24 > domain=domain2,20.20.20.0/24 > > Within the dnsmasq-neutron.conf file. > Of course, restart the neutron-server service once done. > Le mer. 10 janv. 2018 à 22:40, Jonathan Mills a > écrit : > >> Dear Operators, >> >> I have a mix of Mitaka and Pike clusters, all for private clouds, and >> with multiple tenants. I am very interested in having the ability to have >> per-network (really, per-tenant) dns_domain. You would think that this >> works, based on the documentation here: https://docs.openstack.org/ >> ocata/networking-guide/config-dns-int.html >> >> And yes, I have read and re-read that document many times, and carefully >> followed its instructions. I have the 'dns' extension_driver enabled in >> ML2. I have set an alternate value from 'openstacklocal' in neutron.conf. >> I am using the neutron dnsmasq processes as my real DNS servers in my VMs >> for tenant internal name resolution. (Instance short-name resolution does >> work, it's just that the FQDN of every VM is wrong.) I have created >> per-network dns_domain entries in my neutron database. Nevertheless, it >> does not work. In every tenant, every VM has a dns suffix equal to >> whatever I have set for 'dns_domain' in neutron.conf (the global default). >> Scouring the web for clues, I've come across this, which seems to describe >> my problem: >> >> https://bugs.launchpad.net/neutron/+bug/1580588 >> >> Notice that the importance is 'wishlist'. Wishlist? I find it >> surprising that it is a mere wish to have DNS work as expected. I'm >> curious so I'm asking the community, is this really not working for >> anyone? And if it is not working for anyone else either, is it really not >> a big deal? It seems to me this would pose a rather large problem for any >> number of use cases. In my immediate situation, I am deploying VMs onto a >> provider network that has a pre-existing Puppet infrastructure, and all the >> FQDNs are wrong, which means the generation of Puppet SSL certificates on >> these VMs is problematic. >> >> Any feedback would be much appreciated! >> >> Cheers, >> >> Jonathan Mills >> NASA Center for Climate Simulation (NCCS) >> Goddard Space Flight Center, Greenbelt, MD >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmills at gmail.com Wed Jan 10 22:33:11 2018 From: jonmills at gmail.com (Jonathan Mills) Date: Wed, 10 Jan 2018 17:33:11 -0500 Subject: [Openstack-operators] neutron and dns_domain In-Reply-To: References: Message-ID: That does, indeed, appear to be a solution to the problem. Albeit, not an ideal solution. I really hope that this will be resolved in Neutron eventually. On Wed, Jan 10, 2018 at 5:00 PM, Jonathan Mills wrote: > Thanks, Flint WALRUS. I will certainly try that. > > On Wed, Jan 10, 2018 at 4:57 PM, Flint WALRUS > wrote: > >> As you’re using a L2 network topology and until all of your project use a >> different network you can do: >> >> domain=domain1,10.10.10.0/24 >> domain=domain2,20.20.20.0/24 >> >> Within the dnsmasq-neutron.conf file. >> Of course, restart the neutron-server service once done. >> Le mer. 10 janv. 2018 à 22:40, Jonathan Mills a >> écrit : >> >>> Dear Operators, >>> >>> I have a mix of Mitaka and Pike clusters, all for private clouds, and >>> with multiple tenants. I am very interested in having the ability to have >>> per-network (really, per-tenant) dns_domain. You would think that this >>> works, based on the documentation here: https://docs.openstack.org/oca >>> ta/networking-guide/config-dns-int.html >>> >>> And yes, I have read and re-read that document many times, and carefully >>> followed its instructions. I have the 'dns' extension_driver enabled in >>> ML2. I have set an alternate value from 'openstacklocal' in neutron.conf. >>> I am using the neutron dnsmasq processes as my real DNS servers in my VMs >>> for tenant internal name resolution. (Instance short-name resolution does >>> work, it's just that the FQDN of every VM is wrong.) I have created >>> per-network dns_domain entries in my neutron database. Nevertheless, it >>> does not work. In every tenant, every VM has a dns suffix equal to >>> whatever I have set for 'dns_domain' in neutron.conf (the global default). >>> Scouring the web for clues, I've come across this, which seems to describe >>> my problem: >>> >>> https://bugs.launchpad.net/neutron/+bug/1580588 >>> >>> Notice that the importance is 'wishlist'. Wishlist? I find it >>> surprising that it is a mere wish to have DNS work as expected. I'm >>> curious so I'm asking the community, is this really not working for >>> anyone? And if it is not working for anyone else either, is it really not >>> a big deal? It seems to me this would pose a rather large problem for any >>> number of use cases. In my immediate situation, I am deploying VMs onto a >>> provider network that has a pre-existing Puppet infrastructure, and all the >>> FQDNs are wrong, which means the generation of Puppet SSL certificates on >>> these VMs is problematic. >>> >>> Any feedback would be much appreciated! >>> >>> Cheers, >>> >>> Jonathan Mills >>> NASA Center for Climate Simulation (NCCS) >>> Goddard Space Flight Center, Greenbelt, MD >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jan 11 00:13:33 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 10 Jan 2018 18:13:33 -0600 Subject: [Openstack-operators] [nova][cinder] nova support for volume multiattach Message-ID: <6b28b91d-5004-e37d-cfa9-04a5eff537dc@gmail.com> Hi everyone, I wanted to point out that the nova API patch for volume mulitattach support is available for review: https://review.openstack.org/#/c/271047/ It's actually a series of changes, but that is the last one that enables the feature in nova. It relies on the 2.59 compute API microversion to be able to create a server from a mulitattach volume or to attach a mulitattach volume to a server. We do not allow attaching a mulitattach volume to a shelved offloaded server, to be consistent with the 2.49 microversion for tagged attach. When creating a server from a mulitattach volume, the compute API will check to see that all nova-compute services in the deployment have been upgraded to the service version that supports the mulitattach code in the libvirt driver. Similarly, when attaching a mulitattach volume to an existing server instance, the compute API will check that the compute hosting the instance is new enough to support mulitattach volumes (has been upgraded) and it's using a virt driver that supports the capability (currently only the libvirt driver). There are more details in the release note but I wanted to point out those restrictions. There is also a set of tempest integration tests here: https://review.openstack.org/#/c/266605/ Those will be tested in the nova-multiattach CI job: https://review.openstack.org/#/c/532689/ Due to restrictions with libvirt, mulitattach support is only available if qemu<2.10 or libvirt>=3.10. The test environment takes this into account for upstream testing. Nova will rely on Cinder microversion >=3.44, which was added in Queens, for safe detach of a mulitattach volume. There is a design spec for Cinder which describes how volume mulitattach will be supported in Cinder and how operators will be able to configure volume types and Cinder policy rules for mulitattach support: https://specs.openstack.org/openstack/cinder-specs/specs/queens/enable-multiattach.html Several people from various companies have been pushing this hard in the Queens release and we're two weeks away from feature freeze. I'm on vacation next week also, but I have a feeling that this will get done finally in Queens. -- Thanks, Matt From zhipengh512 at gmail.com Thu Jan 11 07:31:05 2018 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Thu, 11 Jan 2018 15:31:05 +0800 Subject: [Openstack-operators] [publiccloud-wg]Public Cloud Feature List Hackathon Day 2 Message-ID: Hi Folks, Today we are gonna continue to comb through the public cloud feature list[0] as we did yesterday. Please join the discussion at #openstack-publiccloud starting from UTC1400. [0]https://docs.google.com/spreadsheets/d/1Mf8OAyTzZxCKzYHMgBl-QK_2- XSycSkOjqCyMTIedkA/edit?usp=sharing -- Zhipeng (Howard) Huang Standard Engineer IT Standard & Patent/IT Product Line Huawei Technologies Co,. Ltd Email: huangzhipeng at huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipengh at uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Thu Jan 11 07:33:58 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Thu, 11 Jan 2018 16:33:58 +0900 Subject: [Openstack-operators] Converting existing instances from virtio-blk to virtio-scsi Message-ID: <95C92595-DDDD-4E47-B24B-55AF7420F4C5@planethoster.info> Hi, We currently have a private cloud running old instances using the virtio-blk driver and new instances using the virtio-scsi driver. We would like to convert all our existing instances to virtio-scsi but there doesn’t appear to be an official way to do this. Can I modify this in the openstack database? What parameters would I need to change? Is there an easier, less likely to break everything way? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Bell at cern.ch Thu Jan 11 08:23:39 2018 From: Tim.Bell at cern.ch (Tim Bell) Date: Thu, 11 Jan 2018 08:23:39 +0000 Subject: [Openstack-operators] Converting existing instances from virtio-blk to virtio-scsi In-Reply-To: <95C92595-DDDD-4E47-B24B-55AF7420F4C5@planethoster.info> References: <95C92595-DDDD-4E47-B24B-55AF7420F4C5@planethoster.info> Message-ID: BTW, this is also an end user visible change as the VMs would see the disk move from /dev/vda to /dev/sda. Depending on how the VMs are configured, this may cause issues also for the end user. Tim From: Jean-Philippe Méthot Date: Thursday, 11 January 2018 at 08:37 To: openstack-operators Subject: [Openstack-operators] Converting existing instances from virtio-blk to virtio-scsi Hi, We currently have a private cloud running old instances using the virtio-blk driver and new instances using the virtio-scsi driver. We would like to convert all our existing instances to virtio-scsi but there doesn’t appear to be an official way to do this. Can I modify this in the openstack database? What parameters would I need to change? Is there an easier, less likely to break everything way? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhipengh512 at gmail.com Thu Jan 11 08:38:34 2018 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Thu, 11 Jan 2018 16:38:34 +0800 Subject: [Openstack-operators] [publiccloud-wg]Rocky PTG Planning Etherpad Message-ID: Hi Team, I drafted an initial framework of the etherpad we could use for Rocky PTG in Dublin. You are more than welcomed to provide input: https://etherpad.openstack.org/p/publiccloud-wg-ptg-rocky -- Zhipeng (Howard) Huang Standard Engineer IT Standard & Patent/IT Product Line Huawei Technologies Co,. Ltd Email: huangzhipeng at huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipengh at uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Thu Jan 11 08:51:48 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Thu, 11 Jan 2018 08:51:48 +0000 Subject: [Openstack-operators] [publiccloud-wg]Public Cloud Feature List Hackathon Day 2 In-Reply-To: References: Message-ID: Hi folks, I’ve just added an entry with the google doc regarding GraphQL API as it strike me yesterday, if you need further information feel free to contact me. Le jeu. 11 janv. 2018 à 08:32, Zhipeng Huang a écrit : > Hi Folks, > > Today we are gonna continue to comb through the public cloud feature > list[0] as we did yesterday. Please join the discussion at > #openstack-publiccloud starting from UTC1400. > > [0] > https://docs.google.com/spreadsheets/d/1Mf8OAyTzZxCKzYHMgBl-QK_2-XSycSkOjqCyMTIedkA/edit?usp=sharing > > -- > Zhipeng (Howard) Huang > > Standard Engineer > IT Standard & Patent/IT Product Line > Huawei Technologies Co,. Ltd > Email: huangzhipeng at huawei.com > Office: Huawei Industrial Base, Longgang, Shenzhen > > (Previous) > Research Assistant > Mobile Ad-Hoc Network Lab, Calit2 > University of California, Irvine > Email: zhipengh at uci.edu > Office: Calit2 Building Room 2402 > > OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zioproto at gmail.com Thu Jan 11 14:41:47 2018 From: zioproto at gmail.com (Saverio Proto) Date: Thu, 11 Jan 2018 15:41:47 +0100 Subject: [Openstack-operators] oslo.log JSON logs are missing the request ID Message-ID: Hello, probably someone here is using stuff like Kibana to look at Openstack logs. We are trying here to use the json logging, and we are surprised that the request-id is not printed in the json output. I wrote this email to the devs: http://lists.openstack.org/pipermail/openstack-dev/2018-January/126144.html Does anyone have this working, or the field is really missing in the code of the oslo.log json formatter ? thanks a lot Saverio From thingee at gmail.com Fri Jan 12 20:44:36 2018 From: thingee at gmail.com (Mike Perez) Date: Fri, 12 Jan 2018 12:44:36 -0800 Subject: [Openstack-operators] Developer Mailing List Digest January 5-12th Message-ID: <20180112204436.GA3640@gmail.com> Contribute to the Dev Digest by summarizing OpenStack Dev List thread: * https://etherpad.openstack.org/p/devdigest * http://lists.openstack.org/pipermail/openstack-dev/ * http://lists.openstack.org/pipermail/openstack-sigs HTML version: https://www.openstack.org/blog/2018/01/developer-mailing-list-digest-january-5-12th/ Success Bot Says ================ * e0ne on #openstack-horizon [0]: amotoki runs horizon with django 2.0 * tristianC on #rdo [1]: review.rdoproject.org is now running sf-2.7 * mriedem on #openstack-nova [2]: nova merged alternate hosts support for server build * mriedem on #openstack-nova [3]: After a week of problems, finally got a volume multiattach test run to actually attach a volume to two instances without melting the world. \o/ * zaneb [4]: 14% reduction in Heat memory use in the TripleO gate from fixing https://bugs.launchpad.net/heat/+bug/1731349 * Tell us yours in OpenStack IRC channels using the command "#success " * More: https://wiki.openstack.org/wiki/Successes [0] - http://eavesdrop.openstack.org/irclogs/%23openstack-horizon/%23openstack-horizon.2017-12-18.log.html [1] - http://eavesdrop.openstack.org/irclogs/%23rdo/%23rdo.2017-12-21.log.html [2] - http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2017-12-22.log.html [3] - http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-01-05.log.html [4] - http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2018-01-09.log.html Community Summaries =================== * Technical Committee Status update [0] * POST /api-sig/news [1] * Release countdown [2] * Nova placement resource provider update [3] * Keystone team update [4] * Nova Notification Update [5] * TC report [6] [0] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126178.html [1] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126147.html [2] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/125996.html [3] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126179.html [4] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126188.html [5] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126025.html [6] - http://lists.openstack.org/pipermail/openstack-dev/2018-January/126082.html Community Goals for Rocky ========================= So far one goal has been proposed by Kendall Nelson for migrating to Storyboard. It was agreed to postpone the goal until the S cycle, as it could take longer than six months to achieve. There is a good backlog of goals [0], just no champions. It'll be bad for momentum if we have a cycle with no community wide goal. [0] - https://etherpad.openstack.org/p/community-goals Full thread: http://lists.openstack.org/pipermail/openstack-dev/2018-January/126090.html PTG Post-lunch Presentations ============================ Feedback received from past PTG session(s) was the lack of situational awareness and missed opportunity for "global" communication at the event. In Dublin we'd used the end of the lunch break to for communications that could be interesting to OpenStack upstream developers and project team members. The idea is not to find a presentation for everyday, but if we find content that is generally useful. Interesting topics include general guidance to make the most of the PTG weeks (good Monday content), development tricks, code review etiquette, new library features you should adopt, lightning talks (good Friday content). We'd like to keep the slot under 20 minutes. If you have ideas please fill out this etherpad [0] in a few weeks. [0] - https://etherpad.openstack.org/p/dublin-PTG-postlunch Full thread: http://lists.openstack.org/pipermail/openstack-dev/2018-January/126102.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From lauren at openstack.org Fri Jan 12 20:59:04 2018 From: lauren at openstack.org (Lauren Sell) Date: Fri, 12 Jan 2018 14:59:04 -0600 Subject: [Openstack-operators] =?utf-8?q?Vancouver_Summit_CFP_is_open_-_wh?= =?utf-8?q?at=E2=80=99s_new?= Message-ID: <4572415A-B17D-44A2-967D-61376515BD24@openstack.org> Hi everyone, Today, we opened the Call for Presentations for the Vancouver Summit , which will take place May 21-24. The deadline to submit your proposal is February 8th. What’s New? We’re focused on open infrastructure integration. The Summit has evolved over the years to cover more than just OpenStack, but we’re making an even bigger effort to attract speakers across the open infrastructure ecosystem. In addition to OpenStack-related sessions, we’ll be featuring the newest project at the Foundation -- Kata Containers -- as well as recruiting many others from projects like Ansible, Ceph, Kubernetes, ONAP and many more. We’ve also organized Tracks around specific problem domains. We encourage you to submit proposals covering OpenStack and the “open infrastructure” tools you’re using, as well as the integration work needed to address these problem domains. We also encourage you to invite peers from other open source communities to come speak and collaborate. The Tracks are: CI/CD Container Infrastructure Edge Computing HPC / GPU / AI Open Source Community Private & Hybrid Cloud Public Cloud Telecom & NFV Where previously we had Track Chairs, we now have Programming Committees for each Track, made up of both Members and a Chair (or co-chairs). We’re also recruiting members and chairs from many different open source communities working in open infrastructure, in addition to the many familiar faces in the OpenStack community who will lead the effort. If you’re interested in nominating yourself or someone else to be a member of the Summit Programming Committee for a specific Track, please fill out the nomination form . Nominations will close on January 26, 2018. Again, the deadline to submit proposals is February 8, 2018. Please note topic submissions for the OpenStack Forum (planning/working sessions with OpenStack devs and operators) will open at a later date. We can’t wait to see you in Vancouver! We’re working hard to make it the best Summit yet, and look forward to bringing together different open infrastructure communities to solve these hard problems together! Want to provide feedback on this process? Please focus discussion on the openstack-community mailing list, or contact me or the OpenStack Foundation Summit Team directly at summit at openstack.org. Thank you, Lauren -------------- next part -------------- An HTML attachment was scrubbed... URL: From mvanwink at rackspace.com Fri Jan 12 22:51:49 2018 From: mvanwink at rackspace.com (Matt Van Winkle) Date: Fri, 12 Jan 2018 22:51:49 +0000 Subject: [Openstack-operators] [User-committee] UC IRC Meeting on Monday 1/15 - Cancel In-Reply-To: References: Message-ID: https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee has been updated accordingly From: Edgar Magana Date: Friday, January 12, 2018 at 4:47 PM To: user-committee , openstack-operators Subject: [User-committee] UC IRC Meeting on Monday 1/15 - Cancel Monday Jan 15th is a holiday in USA. We will not have our weekly IRC meeting next week. Thanks, User Community -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at medberry.net Tue Jan 16 14:24:45 2018 From: openstack at medberry.net (David Medberry) Date: Tue, 16 Jan 2018 07:24:45 -0700 Subject: [Openstack-operators] Ops Mid Cycle in Tokyo Mar 7-8 2018 Message-ID: Hi all, Broad distribution to make sure folks are aware of the upcoming Ops Meetup in Tokyo. You can help "steer" this meetup by participating in the planning meetings or more practically by editing this page (respectfully): https://etherpad.openstack.org/p/TYO-ops-meetup-2018 Sign up for the meetup is here:https://goo.gl/HBJkPy We'll see you there! -dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Tue Jan 16 15:21:51 2018 From: emccormick at cirrusseven.com (Erik McCormick) Date: Tue, 16 Jan 2018 10:21:51 -0500 Subject: [Openstack-operators] Ops Meetups Team meeting minutes and next meeting Message-ID: Planning for the Spring Ops Meetup in Tokyo (March 6 and 7) continues to come together nicely. If you plan to join us, please go sign up at: https://goo.gl/HBJkPy Also, please help us to fill out the agenda by suggesting topics or adding a +1 to the ones you like at: https://etherpad.openstack.org/p/TYO-ops-meetup-2018 Meeting minutes are here: Meeting ended Tue Jan 16 14:34:38 2018 UTC. Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-16-14.08.html Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-16-14.08.txt Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-16-14.08.log.html Next meeting will be Tuesday, January 23 at 14:00 UTC. Thanks, Erik From jon at csail.mit.edu Tue Jan 16 20:08:32 2018 From: jon at csail.mit.edu (Jonathan Proulx) Date: Tue, 16 Jan 2018 15:08:32 -0500 Subject: [Openstack-operators] Custom libvirt fragment for instance type? Message-ID: <20180116200832.GU9790@csail.mit.edu> Hi All, Looking for a way to inject: into the libvirt.xml for instances of a particular flavor. My needs could also be met by attatching it to the glance image or if needs be per hypervisor. My Googling is not turning up anything. Is there any way to set arbitray (or this particular) Libvirt/KVM freature? Thanks, -Jon -- From Tim.Bell at cern.ch Tue Jan 16 20:42:00 2018 From: Tim.Bell at cern.ch (Tim Bell) Date: Tue, 16 Jan 2018 20:42:00 +0000 Subject: [Openstack-operators] Custom libvirt fragment for instance type? In-Reply-To: <20180116200832.GU9790@csail.mit.edu> References: <20180116200832.GU9790@csail.mit.edu> Message-ID: <5C388222-5F25-4EAB-9BA0-6C328060A2D3@cern.ch> If you want to hide the VM signature, you can use the img_hide_hypervisor_id property (https://docs.openstack.org/python-glanceclient/latest/cli/property-keys.html) Tim -----Original Message----- From: jon Date: Tuesday, 16 January 2018 at 21:14 To: openstack-operators Subject: [Openstack-operators] Custom libvirt fragment for instance type? Hi All, Looking for a way to inject: into the libvirt.xml for instances of a particular flavor. My needs could also be met by attatching it to the glance image or if needs be per hypervisor. My Googling is not turning up anything. Is there any way to set arbitray (or this particular) Libvirt/KVM freature? Thanks, -Jon -- _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From jon at csail.mit.edu Tue Jan 16 20:49:25 2018 From: jon at csail.mit.edu (Jonathan Proulx) Date: Tue, 16 Jan 2018 15:49:25 -0500 Subject: [Openstack-operators] Custom libvirt fragment for instance type? In-Reply-To: <5C388222-5F25-4EAB-9BA0-6C328060A2D3@cern.ch> References: <20180116200832.GU9790@csail.mit.edu> <5C388222-5F25-4EAB-9BA0-6C328060A2D3@cern.ch> Message-ID: <20180116204925.GV9790@csail.mit.edu> On Tue, Jan 16, 2018 at 08:42:00PM +0000, Tim Bell wrote: :If you want to hide the VM signature, you can use the img_hide_hypervisor_id property (https://docs.openstack.org/python-glanceclient/latest/cli/property-keys.html) Thanks Tim, I believe that's the magic I was looking for. -Jon :Tim : :-----Original Message----- :From: jon :Date: Tuesday, 16 January 2018 at 21:14 :To: openstack-operators :Subject: [Openstack-operators] Custom libvirt fragment for instance type? : : Hi All, : : Looking for a way to inject: : : : : : : : : into the libvirt.xml for instances of a particular flavor. : : My needs could also be met by attatching it to the glance image or if : needs be per hypervisor. : : My Googling is not turning up anything. Is there any way to set : arbitray (or this particular) Libvirt/KVM freature? : : Thanks, : -Jon : : -- : : _______________________________________________ : OpenStack-operators mailing list : OpenStack-operators at lists.openstack.org : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : : -- From melwittt at gmail.com Tue Jan 16 21:24:12 2018 From: melwittt at gmail.com (melanie witt) Date: Tue, 16 Jan 2018 13:24:12 -0800 Subject: [Openstack-operators] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata Message-ID: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> Hello Stackers, This is a heads up to any of you using the AggregateCoreFilter, AggregateRamFilter, and/or AggregateDiskFilter in the filter scheduler. These filters have effectively allowed operators to set overcommit ratios per aggregate rather than per compute node in <= Newton. Beginning in Ocata, there is a behavior change where aggregate-based overcommit ratios will no longer be honored during scheduling. Instead, overcommit values must be set on a per compute node basis in nova.conf. Details: as of Ocata, instead of considering all compute nodes at the start of scheduler filtering, an optimization has been added to query resource capacity from placement and prune the compute node list with the result *before* any filters are applied. Placement tracks resource capacity and usage and does *not* track aggregate metadata [1]. Because of this, placement cannot consider aggregate-based overcommit and will exclude compute nodes that do not have capacity based on per compute node overcommit. How to prepare: if you have been relying on per aggregate overcommit, during your upgrade to Ocata, you must change to using per compute node overcommit ratios in order for your scheduling behavior to stay consistent. Otherwise, you may notice increased NoValidHost scheduling failures as the aggregate-based overcommit is no longer being considered. You can safely remove the AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter from your enabled_filters and you do not need to replace them with any other core/ram/disk filters. The placement query takes care of the core/ram/disk filtering instead, so CoreFilter, RamFilter, and DiskFilter are redundant. Thanks, -melanie [1] Placement has been a new slate for resource management and prior to placement, there were conflicts between the different methods for setting overcommit ratios that were never addressed, such as, "which value to take if a compute node has overcommit set AND the aggregate has it set? Which takes precedence?" And, "if a compute node is in more than one aggregate, which overcommit value should be taken?" So, the ambiguities were not something that was desirable to bring forward into placement. From jon at csail.mit.edu Tue Jan 16 21:48:01 2018 From: jon at csail.mit.edu (Jonathan Proulx) Date: Tue, 16 Jan 2018 16:48:01 -0500 Subject: [Openstack-operators] Custom libvirt fragment for instance type? In-Reply-To: <20180116204925.GV9790@csail.mit.edu> References: <20180116200832.GU9790@csail.mit.edu> <5C388222-5F25-4EAB-9BA0-6C328060A2D3@cern.ch> <20180116204925.GV9790@csail.mit.edu> Message-ID: <20180116214801.GW9790@csail.mit.edu> On Tue, Jan 16, 2018 at 03:49:25PM -0500, Jonathan Proulx wrote: :On Tue, Jan 16, 2018 at 08:42:00PM +0000, Tim Bell wrote: ::If you want to hide the VM signature, you can use the img_hide_hypervisor_id property (https://docs.openstack.org/python-glanceclient/latest/cli/property-keys.html) : :Thanks Tim, I believe that's the magic I was looking for. Unfortunately settign that doesn't appear to do anything way back here in Mitaka land :( Oh well reason 128478 that I need to take that series of leaps I suppose. From stig.openstack at telfer.org Tue Jan 16 23:06:22 2018 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 16 Jan 2018 23:06:22 +0000 Subject: [Openstack-operators] [scientific] IRC meeting Wednesday 1100UTC: Bare metal Magnum Message-ID: Hi All - We have an IRC meeting on Wednesday at 1100 UTC in channel #openstack-meeting. Everyone is welcome. Agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_17th_2018 This week we have two main items on the agenda: Our guest is Spyros Trigazis from CERN, who will be discussing latest improvements in Magnum’s support for research computing use cases, and in particular bare metal use cases. We’d also like to kick off some discussion around PTG planning. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From moreira.belmiro.email.lists at gmail.com Wed Jan 17 06:09:57 2018 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Wed, 17 Jan 2018 06:09:57 +0000 Subject: [Openstack-operators] Custom libvirt fragment for instance type? In-Reply-To: <20180116214801.GW9790@csail.mit.edu> References: <20180116200832.GU9790@csail.mit.edu> <5C388222-5F25-4EAB-9BA0-6C328060A2D3@cern.ch> <20180116204925.GV9790@csail.mit.edu> <20180116214801.GW9790@csail.mit.edu> Message-ID: Hi Jonathan, this was introduced in Pike. Belmiro On Tue, 16 Jan 2018 at 22:48, Jonathan Proulx wrote: > On Tue, Jan 16, 2018 at 03:49:25PM -0500, Jonathan Proulx wrote: > :On Tue, Jan 16, 2018 at 08:42:00PM +0000, Tim Bell wrote: > ::If you want to hide the VM signature, you can use the > img_hide_hypervisor_id property ( > https://docs.openstack.org/python-glanceclient/latest/cli/property-keys.html > ) > : > :Thanks Tim, I believe that's the magic I was looking for. > > Unfortunately settign that doesn't appear to do anything way back here > in Mitaka land :( > > Oh well reason 128478 that I need to take that series of leaps I > suppose. > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at citynetwork.se Wed Jan 17 10:40:11 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Wed, 17 Jan 2018 11:40:11 +0100 Subject: [Openstack-operators] [publiccloud-wg] Reminder for todays meeting Message-ID: Hi all, Time again for a meeting for the Public Cloud WG - today at 1400 UTC in #openstack-meeting-3 Agenda and etherpad at: https://etherpad.openstack.org/p/publiccloud-wg See you later! Tobias Rydberg -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From jaypipes at gmail.com Wed Jan 17 13:24:07 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Wed, 17 Jan 2018 08:24:07 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> Message-ID: <01988279-c826-a002-d176-c4cd59176d70@gmail.com> On 01/16/2018 08:19 PM, Zhenyu Zheng wrote: > Thanks for the info, so it seems we are not going to implement aggregate > overcommit ratio in placement at least in the near future? As @edleafe alluded to, we will not be adding functionality to the placement service to associate an overcommit ratio with an aggregate. This was/is buggy functionality that we do not wish to bring forward into the placement modeling system. Reasons the current functionality is poorly architected and buggy (mentioned in @melwitt's footnote): 1) If a nova-compute service's CONF.cpu_allocation_ratio is different from the host aggregate's cpu_allocation_ratio metadata value, which value should be considered by the AggregateCoreFilter filter? 2) If a nova-compute service is associated with multiple host aggregates, and those aggregates contain different values for their cpu_allocation_ratio metadata value, which one should be used by the AggregateCoreFilter? The bottom line for me is that the AggregateCoreFilter has been used as a crutch to solve a **configuration management problem**. Instead of the configuration management system (Puppet, etc) setting nova-compute service CONF.cpu_allocation_ratio options *correctly*, having the admin set the HostAggregate metadata cpu_allocation_ratio value is error-prone for the reasons listed above. Incidentally, this same design flaw is the reason that availability zones are so poorly defined in Nova. There is actually no such thing as an availability zone in Nova. Instead, an AZ is merely a metadata tag (or a CONF option! ) that may or may not exist against a host aggregate. There's lots of spaghetti in Nova due to the decision to use host aggregate metadata for availability zone information, which should have always been the domain of a **configuration management system** to set. [*] In the Placement service, we have the concept of aggregates, too. However, in Placement, an aggregate (note: not "host aggregate") is merely a grouping mechanism for resource providers. Placement aggregates do not have any attributes themselves -- they merely represent the relationship between resource providers. Placement aggregates suffer from neither of the above listed design flaws because they are not buckets for metadata. ok . Best, -jay [*] Note the assumption on line 97 here: https://github.com/openstack/nova/blob/master/nova/availability_zones.py#L96-L100 > On Wed, Jan 17, 2018 at 5:24 AM, melanie witt > wrote: > > Hello Stackers, > > This is a heads up to any of you using the AggregateCoreFilter, > AggregateRamFilter, and/or AggregateDiskFilter in the filter > scheduler. These filters have effectively allowed operators to set > overcommit ratios per aggregate rather than per compute node in <= > Newton. > > Beginning in Ocata, there is a behavior change where aggregate-based > overcommit ratios will no longer be honored during scheduling. > Instead, overcommit values must be set on a per compute node basis > in nova.conf. > > Details: as of Ocata, instead of considering all compute nodes at > the start of scheduler filtering, an optimization has been added to > query resource capacity from placement and prune the compute node > list with the result *before* any filters are applied. Placement > tracks resource capacity and usage and does *not* track aggregate > metadata [1]. Because of this, placement cannot consider > aggregate-based overcommit and will exclude compute nodes that do > not have capacity based on per compute node overcommit. > > How to prepare: if you have been relying on per aggregate > overcommit, during your upgrade to Ocata, you must change to using > per compute node overcommit ratios in order for your scheduling > behavior to stay consistent. Otherwise, you may notice increased > NoValidHost scheduling failures as the aggregate-based overcommit is > no longer being considered. You can safely remove the > AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter > from your enabled_filters and you do not need to replace them with > any other core/ram/disk filters. The placement query takes care of > the core/ram/disk filtering instead, so CoreFilter, RamFilter, and > DiskFilter are redundant. > > Thanks, > -melanie > > [1] Placement has been a new slate for resource management and prior > to placement, there were conflicts between the different methods for > setting overcommit ratios that were never addressed, such as, "which > value to take if a compute node has overcommit set AND the aggregate > has it set? Which takes precedence?" And, "if a compute node is in > more than one aggregate, which overcommit value should be taken?" > So, the ambiguities were not something that was desirable to bring > forward into placement. > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > From tobias at citynetwork.se Wed Jan 17 15:05:10 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Wed, 17 Jan 2018 16:05:10 +0100 Subject: [Openstack-operators] [publiccloud-wg] Missing features work session Message-ID: <707b7cc0-8393-6a2c-2539-cc6abd71f7dd@citynetwork.se> Hi everyone, We had a good session last week working the list we call "Missing features" - to get that up to date, finding contact persons and authors for each items. We now plan to have 2 more work sessions for that, listed below. This time we change time of day to 0800 UTC. Links: https://etherpad.openstack.org/p/publiccloud-wg https://launchpad.net/openstack-publiccloud-wg Where: #openstack-publiccloud When: Thursday 18th January 0800 UTC Where: #openstack-publiccloud When: Wednesday 24th January 0800 UTC Hope to see you there! Regards, Tobias Rydberg Chair Public Cloud WG -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From sorrison at gmail.com Thu Jan 18 03:36:45 2018 From: sorrison at gmail.com (Sam Morrison) Date: Thu, 18 Jan 2018 14:36:45 +1100 Subject: [Openstack-operators] Ubuntu Kernel with Meltdown mitigation SSL issues Message-ID: Hi All, We updated our control infrastructure to the latest Ubuntu Xenial Kernel (4.4.0-109) which includes the meltdown fixes. We have found this kernel to have issues with SSL connections with python and have since downgraded. We get errors like: SSLError: SSL exception connecting to https://keystone.example.com:35357/v3/auth/tokens: ("bad handshake: Error([('', 'osrandom_rand_bytes', 'getrandom() initialization failed.')],)”,) Full trace: http://paste.openstack.org/show/646803/ This was affecting glance mainly but all API services were having issues. Our controllers are running inside KVM VMs and the guests see the CPU as "Intel Xeon E3-12xx v2 (Ivy Bridge)” This isn’t an openstack issue specifically but hopefully it helps others who may be seeing similar issues. Cheers, Sam From aheczko at mirantis.com Thu Jan 18 08:42:02 2018 From: aheczko at mirantis.com (Adam Heczko) Date: Thu, 18 Jan 2018 09:42:02 +0100 Subject: [Openstack-operators] Ubuntu Kernel with Meltdown mitigation SSL issues In-Reply-To: References: Message-ID: Hello Sam, thank you for sharing this information. Could you please provide more information related to your specific setup. How is Keystone API endpoint TLS terminated in your setup? AFAIK in our OpenStack labs we haven't observed anything like this although we terminate TLS on Nginx or HAProxy. On Thu, Jan 18, 2018 at 4:36 AM, Sam Morrison wrote: > Hi All, > > We updated our control infrastructure to the latest Ubuntu Xenial Kernel > (4.4.0-109) which includes the meltdown fixes. > > We have found this kernel to have issues with SSL connections with python > and have since downgraded. We get errors like: > > SSLError: SSL exception connecting to https://keystone.example.com: > 35357/v3/auth/tokens: ("bad handshake: Error([('', 'osrandom_rand_bytes', > 'getrandom() initialization failed.')],)”,) > > Full trace: http://paste.openstack.org/show/646803/ > > This was affecting glance mainly but all API services were having issues. > > Our controllers are running inside KVM VMs and the guests see the CPU as > "Intel Xeon E3-12xx v2 (Ivy Bridge)” > > This isn’t an openstack issue specifically but hopefully it helps others > who may be seeing similar issues. > > > Cheers, > Sam > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Adam Heczko Security Engineer @ Mirantis Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From logan at protiumit.com Thu Jan 18 20:18:46 2018 From: logan at protiumit.com (Logan V.) Date: Thu, 18 Jan 2018 14:18:46 -0600 Subject: [Openstack-operators] Ubuntu Kernel with Meltdown mitigation SSL issues In-Reply-To: References: Message-ID: We upgraded our control plane to 4.4.0-109 + intel-microcode 3.20180108.0~ubuntu16.04.2 several days ago, and are about 1/2 of the way thru upgrading our compute hosts with these changes. We use Ocata for all services, and no issue like this has been observed yet on our env. Control hosts are E5-2600 V2's and the computes are a mix of E5-2600 v2/v3/v4's along with some Xeon D1541's. On Thu, Jan 18, 2018 at 2:42 AM, Adam Heczko wrote: > Hello Sam, thank you for sharing this information. > Could you please provide more information related to your specific setup. > How is Keystone API endpoint TLS terminated in your setup? > AFAIK in our OpenStack labs we haven't observed anything like this although > we terminate TLS on Nginx or HAProxy. > > > On Thu, Jan 18, 2018 at 4:36 AM, Sam Morrison wrote: >> >> Hi All, >> >> We updated our control infrastructure to the latest Ubuntu Xenial Kernel >> (4.4.0-109) which includes the meltdown fixes. >> >> We have found this kernel to have issues with SSL connections with python >> and have since downgraded. We get errors like: >> >> SSLError: SSL exception connecting to >> https://keystone.example.com:35357/v3/auth/tokens: ("bad handshake: >> Error([('', 'osrandom_rand_bytes', 'getrandom() initialization >> failed.')],)”,) >> >> Full trace: http://paste.openstack.org/show/646803/ >> >> This was affecting glance mainly but all API services were having issues. >> >> Our controllers are running inside KVM VMs and the guests see the CPU as >> "Intel Xeon E3-12xx v2 (Ivy Bridge)” >> >> This isn’t an openstack issue specifically but hopefully it helps others >> who may be seeing similar issues. >> >> >> Cheers, >> Sam >> >> >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > -- > Adam Heczko > Security Engineer @ Mirantis Inc. > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From jaypipes at gmail.com Thu Jan 18 20:49:09 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Thu, 18 Jan 2018 15:49:09 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <43f5884e-39d2-23c8-7606-5940f33251bd@gmail.com> Message-ID: <3ad94edd-fbbd-1257-a88e-0a97cc4b588b@gmail.com> On 01/18/2018 03:06 PM, Logan V. wrote: > We have used aggregate based scheduler filters since deploying our > cloud in Kilo. This explains the unpredictable scheduling we have seen > since upgrading to Ocata. Before this post, was there some indication > I missed that these filters can no longer be used? Even now reading > the Ocata release notes[1] or checking the filter scheduler docs[2] I > cannot find any indication that AggregateCoreFilter, > AggregateRamFilter, and AggregateDiskFilter are useless in Ocata+. If > I missed something I'd like to know where it is so I can avoid that > mistake again! We failed to provide a release note about it. :( That's our fault and I apologize. > Just to make sure I understand correctly, given this list of filters > we used in Newton: > AggregateInstanceExtraSpecsFilter,AggregateNumInstancesFilter,AggregateCoreFilter,AggregateRamFilter,RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter > > I should remove AggregateCoreFilter, AggregateRamFilter, and RamFilter > from the list because they are no longer useful, and replace them with > the appropriate nova.conf settings instead, correct? Yes, correct. > What about AggregateInstanceExtraSpecsFilter and > AggregateNumInstancesFilter? Do these still work? Yes. Best, -jay > Thanks > Logan > > [1] https://docs.openstack.org/releasenotes/nova/ocata.html > [2] https://docs.openstack.org/ocata/config-reference/compute/schedulers.html > > On Wed, Jan 17, 2018 at 7:57 AM, Sylvain Bauza wrote: >> >> >> On Wed, Jan 17, 2018 at 2:22 PM, Jay Pipes wrote: >>> >>> On 01/16/2018 08:19 PM, Zhenyu Zheng wrote: >>>> >>>> Thanks for the info, so it seems we are not going to implement aggregate >>>> overcommit ratio in placement at least in the near future? >>> >>> >>> As @edleafe alluded to, we will not be adding functionality to the >>> placement service to associate an overcommit ratio with an aggregate. This >>> was/is buggy functionality that we do not wish to bring forward into the >>> placement modeling system. >>> >>> Reasons the current functionality is poorly architected and buggy >>> (mentioned in @melwitt's footnote): >>> >>> 1) If a nova-compute service's CONF.cpu_allocation_ratio is different from >>> the host aggregate's cpu_allocation_ratio metadata value, which value should >>> be considered by the AggregateCoreFilter filter? >>> >>> 2) If a nova-compute service is associated with multiple host aggregates, >>> and those aggregates contain different values for their cpu_allocation_ratio >>> metadata value, which one should be used by the AggregateCoreFilter? >>> >>> The bottom line for me is that the AggregateCoreFilter has been used as a >>> crutch to solve a **configuration management problem**. >>> >>> Instead of the configuration management system (Puppet, etc) setting >>> nova-compute service CONF.cpu_allocation_ratio options *correctly*, having >>> the admin set the HostAggregate metadata cpu_allocation_ratio value is >>> error-prone for the reasons listed above. >>> >> >> Well, the main cause why people started to use AggregateCoreFilter and >> others is because pre-Newton, it was litterally impossible to assign >> different allocation ratios in between computes except if you were grouping >> them in aggregates and using those filters. >> Now that ratios are per-compute, there is no need to keep those filters >> except if you don't touch computes nova.conf's so that it defaults to the >> scheduler ones. The crazy usecase would be like "I have 1000+ computes and I >> just want to apply specific ratios to only one or two" but then, I'd second >> Jay and say "Config management is the solution to your problem". >> >> >>> >>> Incidentally, this same design flaw is the reason that availability zones >>> are so poorly defined in Nova. There is actually no such thing as an >>> availability zone in Nova. Instead, an AZ is merely a metadata tag (or a >>> CONF option! :( ) that may or may not exist against a host aggregate. >>> There's lots of spaghetti in Nova due to the decision to use host aggregate >>> metadata for availability zone information, which should have always been >>> the domain of a **configuration management system** to set. [*] >>> >> >> IMHO, not exactly the root cause why we have spaghetti code for AZs. I >> rather like the idea to see an availability zone as just a user-visible >> aggregate, because it makes things simple to understand. >> What the spaghetti code is due to is because the transitive relationship >> between an aggregate, a compute and an instance is misunderstood and we >> introduced the notion of "instance AZ" which is a fool. Instances shouldn't >> have a field saying "here is my AZ", it should rather be a flag saying "what >> the user wanted as AZ ? (None being a choice) " >> >> >>> In the Placement service, we have the concept of aggregates, too. However, >>> in Placement, an aggregate (note: not "host aggregate") is merely a grouping >>> mechanism for resource providers. Placement aggregates do not have any >>> attributes themselves -- they merely represent the relationship between >>> resource providers. Placement aggregates suffer from neither of the above >>> listed design flaws because they are not buckets for metadata. >>> >>> ok . >>> >>> Best, >>> -jay >>> >>> [*] Note the assumption on line 97 here: >>> >>> >>> https://github.com/openstack/nova/blob/master/nova/availability_zones.py#L96-L100 >>> >>>> On Wed, Jan 17, 2018 at 5:24 AM, melanie witt >>> > wrote: >>>> >>>> Hello Stackers, >>>> >>>> This is a heads up to any of you using the AggregateCoreFilter, >>>> AggregateRamFilter, and/or AggregateDiskFilter in the filter >>>> scheduler. These filters have effectively allowed operators to set >>>> overcommit ratios per aggregate rather than per compute node in <= >>>> Newton. >>>> >>>> Beginning in Ocata, there is a behavior change where aggregate-based >>>> overcommit ratios will no longer be honored during scheduling. >>>> Instead, overcommit values must be set on a per compute node basis >>>> in nova.conf. >>>> >>>> Details: as of Ocata, instead of considering all compute nodes at >>>> the start of scheduler filtering, an optimization has been added to >>>> query resource capacity from placement and prune the compute node >>>> list with the result *before* any filters are applied. Placement >>>> tracks resource capacity and usage and does *not* track aggregate >>>> metadata [1]. Because of this, placement cannot consider >>>> aggregate-based overcommit and will exclude compute nodes that do >>>> not have capacity based on per compute node overcommit. >>>> >>>> How to prepare: if you have been relying on per aggregate >>>> overcommit, during your upgrade to Ocata, you must change to using >>>> per compute node overcommit ratios in order for your scheduling >>>> behavior to stay consistent. Otherwise, you may notice increased >>>> NoValidHost scheduling failures as the aggregate-based overcommit is >>>> no longer being considered. You can safely remove the >>>> AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter >>>> from your enabled_filters and you do not need to replace them with >>>> any other core/ram/disk filters. The placement query takes care of >>>> the core/ram/disk filtering instead, so CoreFilter, RamFilter, and >>>> DiskFilter are redundant. >>>> >>>> Thanks, >>>> -melanie >>>> >>>> [1] Placement has been a new slate for resource management and prior >>>> to placement, there were conflicts between the different methods for >>>> setting overcommit ratios that were never addressed, such as, "which >>>> value to take if a compute node has overcommit set AND the aggregate >>>> has it set? Which takes precedence?" And, "if a compute node is in >>>> more than one aggregate, which overcommit value should be taken?" >>>> So, the ambiguities were not something that was desirable to bring >>>> forward into placement. >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >>>> >>>> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>>> >>>> >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > From mgagne at calavera.ca Thu Jan 18 20:54:02 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Thu, 18 Jan 2018 15:54:02 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> Message-ID: Hi, On Tue, Jan 16, 2018 at 4:24 PM, melanie witt wrote: > Hello Stackers, > > This is a heads up to any of you using the AggregateCoreFilter, > AggregateRamFilter, and/or AggregateDiskFilter in the filter scheduler. > These filters have effectively allowed operators to set overcommit ratios > per aggregate rather than per compute node in <= Newton. > > Beginning in Ocata, there is a behavior change where aggregate-based > overcommit ratios will no longer be honored during scheduling. Instead, > overcommit values must be set on a per compute node basis in nova.conf. > > Details: as of Ocata, instead of considering all compute nodes at the start > of scheduler filtering, an optimization has been added to query resource > capacity from placement and prune the compute node list with the result > *before* any filters are applied. Placement tracks resource capacity and > usage and does *not* track aggregate metadata [1]. Because of this, > placement cannot consider aggregate-based overcommit and will exclude > compute nodes that do not have capacity based on per compute node > overcommit. > > How to prepare: if you have been relying on per aggregate overcommit, during > your upgrade to Ocata, you must change to using per compute node overcommit > ratios in order for your scheduling behavior to stay consistent. Otherwise, > you may notice increased NoValidHost scheduling failures as the > aggregate-based overcommit is no longer being considered. You can safely > remove the AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter > from your enabled_filters and you do not need to replace them with any other > core/ram/disk filters. The placement query takes care of the core/ram/disk > filtering instead, so CoreFilter, RamFilter, and DiskFilter are redundant. > > Thanks, > -melanie > > [1] Placement has been a new slate for resource management and prior to > placement, there were conflicts between the different methods for setting > overcommit ratios that were never addressed, such as, "which value to take > if a compute node has overcommit set AND the aggregate has it set? Which > takes precedence?" And, "if a compute node is in more than one aggregate, > which overcommit value should be taken?" So, the ambiguities were not > something that was desirable to bring forward into placement. So we are a user of this feature and I do have some questions/concerns. We use this feature to segregate capacity/hosts based on CPU allocation ratio using aggregates. This is because we have different offers/flavors based on those allocation ratios. This is part of our business model. A flavor extra_specs is use to schedule instances on appropriate hosts using AggregateInstanceExtraSpecsFilter. Our setup has a configuration management system and we use aggregates exclusively when it comes to allocation ratio. We do not rely on cpu_allocation_ratio config in nova-scheduler or nova-compute. One of the reasons is we do not wish to have to update/package/redeploy our configuration management system just to add one or multiple compute nodes to an aggregate/capacity pool. This means anyone (likely an operator or other provisioning technician) can perform this action without having to touch or even know about our configuration management system. We can also transfer capacity from one aggregate to another if there is a need, again, using aggregate memberships. (we do "evacuate" the node if there are instances on it) Our capacity monitoring is based on aggregate memberships and this offer an easy overview of the current capacity. Note that a host can be in one and only one aggregate in our setup. What's the migration path for us? My understanding is that we will now be forced to have people rely on our configuration management system (which they don't have access to) to perform simple task we used to be able to do through the API. I find this unfortunate and I would like to be offered an alternative solution as the current proposed solution is not acceptable for us. We are loosing "agility" in our operational tasks. -- Mathieu From sorrison at gmail.com Thu Jan 18 23:59:28 2018 From: sorrison at gmail.com (Sam Morrison) Date: Fri, 19 Jan 2018 10:59:28 +1100 Subject: [Openstack-operators] Ubuntu Kernel with Meltdown mitigation SSL issues In-Reply-To: References: Message-ID: We have an F5 doing all the SSL in front of our API servers. SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES256-GCM-SHA384 The majority of the requests that were failing was a glance request /v2/images?limit=20 (around 25% of requests which is around 1-2 a second) Glance is on Ocata. We also saw the same error on the heat and designate running pike and other services. We thought it was to do with low entropy on the control VMs as they were actually low, however we tweaked this and increased entropy to over 3000 and still had issues. The underlying hypervisor is also running Xenial and the 4.4.0-109 kernel but it hasn't got the intel-microcode package installed. Let me know if anyone wants more details of our setup and I'll happily provide. Cheers, Sam On Fri, Jan 19, 2018 at 7:18 AM, Logan V. wrote: > We upgraded our control plane to 4.4.0-109 + intel-microcode > 3.20180108.0~ubuntu16.04.2 several days ago, and are about 1/2 of the > way thru upgrading our compute hosts with these changes. We use Ocata > for all services, and no issue like this has been observed yet on our > env. Control hosts are E5-2600 V2's and the computes are a mix of > E5-2600 v2/v3/v4's along with some Xeon D1541's. > > On Thu, Jan 18, 2018 at 2:42 AM, Adam Heczko wrote: > > Hello Sam, thank you for sharing this information. > > Could you please provide more information related to your specific setup. > > How is Keystone API endpoint TLS terminated in your setup? > > AFAIK in our OpenStack labs we haven't observed anything like this > although > > we terminate TLS on Nginx or HAProxy. > > > > > > On Thu, Jan 18, 2018 at 4:36 AM, Sam Morrison > wrote: > >> > >> Hi All, > >> > >> We updated our control infrastructure to the latest Ubuntu Xenial Kernel > >> (4.4.0-109) which includes the meltdown fixes. > >> > >> We have found this kernel to have issues with SSL connections with > python > >> and have since downgraded. We get errors like: > >> > >> SSLError: SSL exception connecting to > >> https://keystone.example.com:35357/v3/auth/tokens: ("bad handshake: > >> Error([('', 'osrandom_rand_bytes', 'getrandom() initialization > >> failed.')],)”,) > >> > >> Full trace: http://paste.openstack.org/show/646803/ > >> > >> This was affecting glance mainly but all API services were having > issues. > >> > >> Our controllers are running inside KVM VMs and the guests see the CPU as > >> "Intel Xeon E3-12xx v2 (Ivy Bridge)” > >> > >> This isn’t an openstack issue specifically but hopefully it helps others > >> who may be seeing similar issues. > >> > >> > >> Cheers, > >> Sam > >> > >> > >> > >> > >> _______________________________________________ > >> OpenStack-operators mailing list > >> OpenStack-operators at lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > > > > > > -- > > Adam Heczko > > Security Engineer @ Mirantis Inc. > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Fri Jan 19 00:24:53 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Thu, 18 Jan 2018 19:24:53 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> Message-ID: On Thu, Jan 18, 2018 at 5:19 PM, Jay Pipes wrote: > On 01/18/2018 03:54 PM, Mathieu Gagné wrote: >> >> Hi, >> >> On Tue, Jan 16, 2018 at 4:24 PM, melanie witt wrote: >>> >>> Hello Stackers, >>> >>> This is a heads up to any of you using the AggregateCoreFilter, >>> AggregateRamFilter, and/or AggregateDiskFilter in the filter scheduler. >>> These filters have effectively allowed operators to set overcommit ratios >>> per aggregate rather than per compute node in <= Newton. >>> >>> Beginning in Ocata, there is a behavior change where aggregate-based >>> overcommit ratios will no longer be honored during scheduling. Instead, >>> overcommit values must be set on a per compute node basis in nova.conf. >>> >>> Details: as of Ocata, instead of considering all compute nodes at the >>> start >>> of scheduler filtering, an optimization has been added to query resource >>> capacity from placement and prune the compute node list with the result >>> *before* any filters are applied. Placement tracks resource capacity and >>> usage and does *not* track aggregate metadata [1]. Because of this, >>> placement cannot consider aggregate-based overcommit and will exclude >>> compute nodes that do not have capacity based on per compute node >>> overcommit. >>> >>> How to prepare: if you have been relying on per aggregate overcommit, >>> during >>> your upgrade to Ocata, you must change to using per compute node >>> overcommit >>> ratios in order for your scheduling behavior to stay consistent. >>> Otherwise, >>> you may notice increased NoValidHost scheduling failures as the >>> aggregate-based overcommit is no longer being considered. You can safely >>> remove the AggregateCoreFilter, AggregateRamFilter, and >>> AggregateDiskFilter >>> from your enabled_filters and you do not need to replace them with any >>> other >>> core/ram/disk filters. The placement query takes care of the >>> core/ram/disk >>> filtering instead, so CoreFilter, RamFilter, and DiskFilter are >>> redundant. >>> >>> Thanks, >>> -melanie >>> >>> [1] Placement has been a new slate for resource management and prior to >>> placement, there were conflicts between the different methods for setting >>> overcommit ratios that were never addressed, such as, "which value to >>> take >>> if a compute node has overcommit set AND the aggregate has it set? Which >>> takes precedence?" And, "if a compute node is in more than one aggregate, >>> which overcommit value should be taken?" So, the ambiguities were not >>> something that was desirable to bring forward into placement. >> >> >> So we are a user of this feature and I do have some questions/concerns. >> >> We use this feature to segregate capacity/hosts based on CPU >> allocation ratio using aggregates. >> This is because we have different offers/flavors based on those >> allocation ratios. This is part of our business model. >> A flavor extra_specs is use to schedule instances on appropriate hosts >> using AggregateInstanceExtraSpecsFilter. > > > The AggregateInstanceExtraSpecsFilter will continue to work, but this filter > is run *after* the placement service would have already eliminated compute > node records due to placement considering the allocation ratio set for the > compute node provider's inventory records. Ok. Does it mean I will have to use something else to properly filter compute nodes based on flavor? Is there a way for a compute node to expose some arbitrary feature/spec instead and still use flavor extra_specs to filter? (I still have to read on placement API) I don't mind migrating out of aggregates but I need to find a way to make it "self service" through the API with granular control like aggregates used to offer. We won't be giving access to our configuration manager to our technicians and even less direct access to the database. I see that you are suggesting using the placement API below, see my comments below. >> Our setup has a configuration management system and we use aggregates >> exclusively when it comes to allocation ratio. > > > Yes, that's going to be a problem. You will need to use your configuration > management system to write the nova.CONF.XXX_allocation_ratio configuration > option values appropriately for each compute node. Yes, that's my understanding and which is a concern for us. >> We do not rely on cpu_allocation_ratio config in nova-scheduler or >> nova-compute. >> One of the reasons is we do not wish to have to >> update/package/redeploy our configuration management system just to >> add one or multiple compute nodes to an aggregate/capacity pool. > > > Yes, I understand. > >> This means anyone (likely an operator or other provisioning >> technician) can perform this action without having to touch or even >> know about our configuration management system. >> We can also transfer capacity from one aggregate to another if there >> is a need, again, using aggregate memberships. > > > Aggregates don't have "capacity". Aggregates are not capacity pools. Only > compute nodes provide resources for guests to consume. Aggregates have been so far a very useful construct for us. You might not agree with our concept of "capacity pools" but so far, that's what we got and has been working very well for years. Our monitoring/operations are entirely based on this concept. You list the aggregate members, do some computing and cross calculation with hypervisor stats and you have a capacity monitoring system going. >> (we do "evacuate" the >> >> node if there are instances on it) >> Our capacity monitoring is based on aggregate memberships and this >> offer an easy overview of the current capacity. > > > By "based on aggregate membership", I believe you are referring to a system > where you have all compute nodes in a particular aggregate only schedule > instances with a particular flavor "A" and so you manage "capacity" by > saying things like "aggregate X can fit 10 more instances of flavor A in > it"? > > Do I understand you correctly? Yes, more or less. We do group compute nodes based on flavor "series". (we have A1 and B1 series) > >> Note that a host can >> >> be in one and only one aggregate in our setup. > > > In *your* setup. And that's the only reason this works for you. You'd get > totally unpredictable behaviour if your compute nodes were in multiple > aggregates. Yes. It worked very well for us so far. I do agree that it's not perfect and that you technically can end up with unpredictable behaviour if a host is part of multiple aggregates. That's why we avoid doing it. >> What's the migration path for us? >> >> My understanding is that we will now be forced to have people rely on >> our configuration management system (which they don't have access to) >> to perform simple task we used to be able to do through the API. >> I find this unfortunate and I would like to be offered an alternative >> solution as the current proposed solution is not acceptable for us. >> We are loosing "agility" in our operational tasks. > > > I see a possible path forward: > > We add a new CONF option called "disable_allocation_ratio_autoset". This new > CONF option would disable the behaviour of the nova-compute service in > automatically setting the allocation ratio of its inventory records for > VCPU, MEMORY_MB and DISK_GB resources. > > This would allow you to set compute node allocation ratios in batches. > > At first, it might be manual... executing something like this against the > API database: > > UPDATE inventories > INNER JOIN resource_provider > ON inventories.resource_provider_id = resource_provider.id > AND inventories.resource_class_id = $RESOURCE_CLASS_ID > INNER JOIN resource_provider_aggregates > ON resource_providers.id = > resource_provider_aggregates.resource_provider_id > INNER JOIN provider_aggregates > ON resource_provider_aggregates.aggregate_id = provider_aggregates.id > AND provider_aggregates.uuid = $AGGREGATE_UUID > SET inventories.allocation_ratio = $NEW_VALUE; > > We could follow up with a little CLI tool that would do the above for you on > the command line... something like this: > > nova-manage db set_aggregate_placement_allocation_ratio > --aggregate_uuid=$AGG_UUID --resource_class=VCPU --ratio 16.0 > > Of course, you could always call the Placement REST API to override the > allocation ratio for particular providers: > > DATA='{"resource_provider_generation": X, "allocation_ratio": $RATIO}' > curl -XPUT -H "Content-Type: application/json" -H$AUTH_TOKEN -d$DATA \ > https://$PLACEMENT/resource_providers/$RP_UUID/inventories/VCPU > > and you could loop through all the resource providers listed under a > particular aggregate, which you can find using something like this: > > curl https://$PLACEMENT/resource_providers?member_of:$AGG_UUID > > Anyway, there's multiple ways to set the allocation ratios in batches, as > you can tell. > > I think the key is somehow disabling the behaviour of the nova-compute > service of overriding the allocation ratio of compute nodes with the value > of the nova.cnf options. > > Thoughts? So far, a couple challenges/issues: We used to have fine grain control over the calls a user could make to the Nova API: * os_compute_api:os-aggregates:add_host * os_compute_api:os-aggregates:remove_host This means we could make it so our technicians could *ONLY* manage this aspect of our cloud. With placement API, it's all or nothing. (and found some weeks ago that it's hardcoded to the "admin" role) And you now have to craft your own curl calls and no more UI in Horizon. (let me know if I missed something regarding the ACL) I will read about placement API and see with my coworkers how we could adapt our systems/tools to use placement API instead. (assuming disable_allocation_ratio_autoset will be implemented) But ACL is a big concern for us if we go down that path. While I agree there are very technical/raw solutions to the issue (like the ones you suggested), please understand that from our side, this is still a major regression in the usability of OpenStack from an operator point of view. And it's unfortunate that I feel I now have to play catch up and explain my concerns about a "fait accompli" that wasn't well communicated to the operators and wasn't clearly mentioned in the release notes. I would have appreciated an email to the ops list explaining the proposed change and if anyone has concerns/comments about it. I don't often reply but I feel like I would have this time as this is a major change for us. Thanks for your time and suggestions, -- Mathieu From thierry at openstack.org Fri Jan 19 10:51:17 2018 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 19 Jan 2018 11:51:17 +0100 Subject: [Openstack-operators] Ubuntu Kernel with Meltdown mitigation SSL issues In-Reply-To: References: Message-ID: <7949700d-1c45-4861-98cb-0ca3d2d882c9@openstack.org> Sam Morrison wrote: > We updated our control infrastructure to the latest Ubuntu Xenial Kernel (4.4.0-109) which includes the meltdown fixes. > > We have found this kernel to have issues with SSL connections with python and have since downgraded. We get errors like: > > SSLError: SSL exception connecting to https://keystone.example.com:35357/v3/auth/tokens: ("bad handshake: Error([('', 'osrandom_rand_bytes', 'getrandom() initialization failed.')],)”,) > > Full trace: http://paste.openstack.org/show/646803/ > > This was affecting glance mainly but all API services were having issues. > > Our controllers are running inside KVM VMs and the guests see the CPU as "Intel Xeon E3-12xx v2 (Ivy Bridge)” > > This isn’t an openstack issue specifically but hopefully it helps others who may be seeing similar issues. Thanks Sam for sharing! If you can clearly narrow it down to a specific update (kernel or microcode), can you make sure the bug is reported back to Ubuntu ? Distros are struggling with the stability of the Meltdown/Spectre workarounds (especially the opaque CPU microcode updates) and can probably use any information we can provide to them. -- Thierry Carrez (ttx) From andrea.frittoli at gmail.com Fri Jan 19 17:17:22 2018 From: andrea.frittoli at gmail.com (Andrea Frittoli) Date: Fri, 19 Jan 2018 17:17:22 +0000 Subject: [Openstack-operators] [Openstack-sigs] [QA] Proposal for a QA SIG In-Reply-To: References: <0873dec8-624d-b32a-5608-74cc74c02005@openstack.org> Message-ID: Hello everyone, After a long holiday break I would like to resume work on bringing the QA SIG to life. I proposed a QA SIG session [0] for the next PTG, but I'm not sure the right audience will be in Dublin. Could you please reply if you are interested but won't be in Dublin or add your name to the etherpad if you plan to be there and attend? If we have enough attendance in Dublin we can kick off there - otherwise I will setup a meeting with all interested parties (IRC meeting probably, but other options are possible). Thank you! Andrea Frittoli (andreaf) [0] https://etherpad.openstack.org/p/qa-rocky-ptg On Mon, Nov 20, 2017 at 9:15 AM Thierry Carrez wrote: > Rochelle Grober wrote: > > Thierry Carrez wrote: > >> One question I have is whether we'd need to keep the "QA" project team > at > >> all. Personally I think it would create confusion to keep it around, > for no gain. > >> SIGs code contributors get voting rights for the TC anyway, and SIGs > are free > >> to ask for space at the PTG... so there is really no reason (imho) to > keep a > >> "QA" project team in parallel to the SIG ? > > > > Well, you can get rid of the "QA Project Team" but you would then need > to replace it with something like the Tempest Project, or perhaps the Test > Project. You still need a PTL and cores to write, review and merge tempest > fixes and upgrades, along with some of the tests. The Interop Guideline > tests are part of Tempest because being there provides oversight on the > style and quality of the code of those tests. We still need that. > > SIGs can totally produce some code (and have review teams), but I agree > that in this case this code is basically a part of "the product" (rather > than a tool produced by guild of practitioners) and therefore makes > sense to be kept in an upstream project team. Let's keep things the way > they are, while we work out other changes that may trigger other > organizational shuffles (like reusing our project infrastructure beyond > just OpenStack). > > I wonder if we should not call the SIG under a different name to reduce > the confusion between QA-the-project-team and QA-the-SIG. Collaborative > Testing SIG? > > -- > Thierry Carrez (ttx) > > _______________________________________________ > openstack-sigs mailing list > openstack-sigs at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amotoki at gmail.com Mon Jan 22 05:40:49 2018 From: amotoki at gmail.com (Akihiro Motoki) Date: Mon, 22 Jan 2018 14:40:49 +0900 Subject: [Openstack-operators] [horizon][packaging] django-openstack-auth retirement Message-ID: Hi, packaging teams and operators This mail is the announcement of retirement of django-openstack-auth python package in the Queens release. Horizon team merged the code of django-openstack-auth into the horizon repo mainly from the maintenance reason. For more detail, see the blueprint https://blueprints.launchpad.net/horizon/+spec/merge-openstack-auth. [To packaging teams] Ensure not to install django-openstack-auth in Queens horizon package. "openstack_auth" python module is now provided by horizon instead of django_openstack_auth. [To operators] If you install horizon and django-openstack-auth by using pip (instead of distribution packages), please uninstall django-openstack-auth python package before upgrading horizon. Otherwise, "openstack_auth" module is maintained by both horizon and django-openstack-auth after upgrading horizon and it confuses the pip file management, while horizon works. If you have questions, feel to reach the horizon team. Thanks, Akihiro From khansaa.abdalla at outlook.com Mon Jan 22 08:51:15 2018 From: khansaa.abdalla at outlook.com (khansa A. Mohamed) Date: Mon, 22 Jan 2018 08:51:15 +0000 Subject: [Openstack-operators] Instance status error | pike release Message-ID: Hi all , I have fresh installed openstack pike , when I tried to create new instance it gives elow error : Remote error: NoMatchingPlugin The plugin my_password could not be found [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = self.dispatcher.dispatch(me Code 500 Details File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1031, in schedule_and_build_instances instance_uuids) File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 626, in _schedule_instances request_spec, instance_uuids) File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 586, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 52, in select_destinations instance_uuids) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 33, in select_destinations instance_uuids) File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 137, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call retry=self.retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 123, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send retry=retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send raise result Thanks in advance for your support -------------- next part -------------- An HTML attachment was scrubbed... URL: From maciej at kucia.net Mon Jan 22 16:36:17 2018 From: maciej at kucia.net (Maciej Kucia) Date: Mon, 22 Jan 2018 17:36:17 +0100 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV Message-ID: Hi! Is there any noticeable performance penalty when using multiple virtual functions? For simplicity I am enabling all available virtual functions in my NICs. Sometimes application is using only few of them. I am using Intel and Mellanox. I do not see any performance drop but I am getting feedback that this might not be the best approach. Any recommendations? Thanks, Maciej -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Mon Jan 22 17:38:15 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Mon, 22 Jan 2018 12:38:15 -0500 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV In-Reply-To: References: Message-ID: <68e78eaf-00bd-eb4b-3a6b-6e45a2cd7adc@gmail.com> On 01/22/2018 11:36 AM, Maciej Kucia wrote: > Hi! > > Is there any noticeable performance penalty when using multiple virtual > functions? > > For simplicity I am enabling all available virtual functions in my NICs. I presume by the above you are referring to setting your pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a particular PF's PCI address domain/bus? > Sometimes application is using only few of them. I am using Intel and > Mellanox. > > I do not see any performance drop but I am getting feedback that this > might not be the best approach. Who is giving you this feedback? The only issue with enabling (potentially 254 or more) VFs for each PF is that each VF will end up as a record in the pci_devices table in the Nova cell database. Multiply 254 or more times the number of PFs times the number of compute nodes in your deployment and you can get a large number of records that need to be stored. That said, the pci_devices table is well indexed and even if you had 1M or more records in the table, the access of a few hundred of those records when the resource tracker does a PciDeviceList.get_by_compute_node() [1] will still be quite fast. Best, -jay [1] https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L572 and then https://github.com/openstack/nova/blob/stable/pike/nova/pci/manager.py#L71 > Any recommendations? > > Thanks, > Maciej > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From maciej at kucia.net Mon Jan 22 23:47:34 2018 From: maciej at kucia.net (Maciej Kucia) Date: Tue, 23 Jan 2018 00:47:34 +0100 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV In-Reply-To: References: <68e78eaf-00bd-eb4b-3a6b-6e45a2cd7adc@gmail.com> Message-ID: Thank you for the reply. I am interested in SR-IOV and pci whitelisting is certainly involved. I suspect that OpenStack itself can handle those numbers of devices, especially in telco applications where not much scheduling is being done. The feedback I am getting is from sysadmins who work on network virtualization but I think this is just a rumor without any proof. The question is if performance penalty from SR-IOV drivers or PCI itself is negligible. Should cloud admin configure maximum number of VFs for flexibility or should it be manually managed and balanced depending on application? Regards, Maciej > > 2018-01-22 18:38 GMT+01:00 Jay Pipes : > >> On 01/22/2018 11:36 AM, Maciej Kucia wrote: >> >>> Hi! >>> >>> Is there any noticeable performance penalty when using multiple virtual >>> functions? >>> >>> For simplicity I am enabling all available virtual functions in my NICs. >>> >> >> I presume by the above you are referring to setting your >> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a >> particular PF's PCI address domain/bus? >> >> Sometimes application is using only few of them. I am using Intel and >>> Mellanox. >>> >>> I do not see any performance drop but I am getting feedback that this >>> might not be the best approach. >>> >> >> Who is giving you this feedback? >> >> The only issue with enabling (potentially 254 or more) VFs for each PF is >> that each VF will end up as a record in the pci_devices table in the Nova >> cell database. Multiply 254 or more times the number of PFs times the >> number of compute nodes in your deployment and you can get a large number >> of records that need to be stored. That said, the pci_devices table is well >> indexed and even if you had 1M or more records in the table, the access of >> a few hundred of those records when the resource tracker does a >> PciDeviceList.get_by_compute_node() [1] will still be quite fast. >> >> Best, >> -jay >> >> [1] https://github.com/openstack/nova/blob/stable/pike/nova/comp >> ute/resource_tracker.py#L572 and then >> https://github.com/openstack/nova/blob/stable/pike/nova/pci/ >> manager.py#L71 >> >> Any recommendations? >>> >>> Thanks, >>> Maciej >>> >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgsousa at gmail.com Tue Jan 23 00:44:45 2018 From: pgsousa at gmail.com (Pedro Sousa) Date: Tue, 23 Jan 2018 00:44:45 +0000 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV In-Reply-To: References: <68e78eaf-00bd-eb4b-3a6b-6e45a2cd7adc@gmail.com> Message-ID: Hi, I have sr-iov in production in some customers with maximum number of VFs and didn't notice any performance issues. My understanding is that of course you will have performance penalty if you consume all those vfs, because you're dividing the bandwidth across them, but other than if they're are there doing nothing you won't notice anything. But I'm just talking from my experience :) Regards, Pedro Sousa On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia wrote: > Thank you for the reply. I am interested in SR-IOV and pci whitelisting is > certainly involved. > I suspect that OpenStack itself can handle those numbers of devices, > especially in telco applications where not much scheduling is being done. > The feedback I am getting is from sysadmins who work on network > virtualization but I think this is just a rumor without any proof. > > The question is if performance penalty from SR-IOV drivers or PCI itself > is negligible. Should cloud admin configure maximum number of VFs for > flexibility or should it be manually managed and balanced depending on > application? > > Regards, > Maciej > > >> >> 2018-01-22 18:38 GMT+01:00 Jay Pipes : >> >>> On 01/22/2018 11:36 AM, Maciej Kucia wrote: >>> >>>> Hi! >>>> >>>> Is there any noticeable performance penalty when using multiple virtual >>>> functions? >>>> >>>> For simplicity I am enabling all available virtual functions in my NICs. >>>> >>> >>> I presume by the above you are referring to setting your >>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a >>> particular PF's PCI address domain/bus? >>> >>> Sometimes application is using only few of them. I am using Intel and >>>> Mellanox. >>>> >>>> I do not see any performance drop but I am getting feedback that this >>>> might not be the best approach. >>>> >>> >>> Who is giving you this feedback? >>> >>> The only issue with enabling (potentially 254 or more) VFs for each PF >>> is that each VF will end up as a record in the pci_devices table in the >>> Nova cell database. Multiply 254 or more times the number of PFs times the >>> number of compute nodes in your deployment and you can get a large number >>> of records that need to be stored. That said, the pci_devices table is well >>> indexed and even if you had 1M or more records in the table, the access of >>> a few hundred of those records when the resource tracker does a >>> PciDeviceList.get_by_compute_node() [1] will still be quite fast. >>> >>> Best, >>> -jay >>> >>> [1] https://github.com/openstack/nova/blob/stable/pike/nova/comp >>> ute/resource_tracker.py#L572 and then >>> https://github.com/openstack/nova/blob/stable/pike/nova/pci/ >>> manager.py#L71 >>> >>> Any recommendations? >>>> >>>> Thanks, >>>> Maciej >>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> OpenStack-operators at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >> >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blair.bethwaite at gmail.com Tue Jan 23 02:39:20 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Tue, 23 Jan 2018 13:39:20 +1100 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV In-Reply-To: References: <68e78eaf-00bd-eb4b-3a6b-6e45a2cd7adc@gmail.com> Message-ID: This is starting to veer into magic territory for my level of understanding so beware... but I believe there are (or could be depending on your exact hardware) PCI config space considerations. IIUC each SRIOV VF will have its own PCI BAR. Depending on the window size required (which may be determined by other hardware features such as flow-steering), you can potentially hit compatibility issues with your server BIOS not supporting mapping of addresses which surpass 4GB. This can then result in the device hanging on initialisation (at server boot) and effectively bricking the box until the device is removed. We have seen this first hand on a Dell R730 with Mellanox ConnectX-4 card (there are several other Dell 13G platforms with the same BIOS chipsets). We were explicitly increasing the PCI BAR size for the device (not upping the number of VFs) in relation to a memory exhaustion issue when running MPI collective communications on hosts with 28+ cores, we only had 16 (or maybe 32, I forget) VFs configured in the firmware. At the end of that support case (which resulted in a replacement NIC), the support engineer's summary included: """ -When a BIOS limits the BAR to be contained in the 4GB address space - it is a BIOS limitation. Unfortunately, there is no way to tell - Some BIOS implementations use proprietary heuristics to decide when to map a specific BAR below 4GB. -When SR-IOV is enabled, and num-vfs is high, the corresponding VF BAR can be huge. In this case, the BIOS may exhaust the ~2GB address space that it has available below 4GB. In this case, the BIOS may hang – and the server won’t boot. """ At the very least you should ask your hardware vendors some very specific questions before doing anything that might change your PCI BAR sizes. Cheers, On 23 January 2018 at 11:44, Pedro Sousa wrote: > Hi, > > I have sr-iov in production in some customers with maximum number of VFs and > didn't notice any performance issues. > > My understanding is that of course you will have performance penalty if you > consume all those vfs, because you're dividing the bandwidth across them, > but other than if they're are there doing nothing you won't notice anything. > > But I'm just talking from my experience :) > > Regards, > Pedro Sousa > > On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia wrote: >> >> Thank you for the reply. I am interested in SR-IOV and pci whitelisting is >> certainly involved. >> I suspect that OpenStack itself can handle those numbers of devices, >> especially in telco applications where not much scheduling is being done. >> The feedback I am getting is from sysadmins who work on network >> virtualization but I think this is just a rumor without any proof. >> >> The question is if performance penalty from SR-IOV drivers or PCI itself >> is negligible. Should cloud admin configure maximum number of VFs for >> flexibility or should it be manually managed and balanced depending on >> application? >> >> Regards, >> Maciej >> >>> >>> >>> 2018-01-22 18:38 GMT+01:00 Jay Pipes : >>>> >>>> On 01/22/2018 11:36 AM, Maciej Kucia wrote: >>>>> >>>>> Hi! >>>>> >>>>> Is there any noticeable performance penalty when using multiple virtual >>>>> functions? >>>>> >>>>> For simplicity I am enabling all available virtual functions in my >>>>> NICs. >>>> >>>> >>>> I presume by the above you are referring to setting your >>>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a >>>> particular PF's PCI address domain/bus? >>>> >>>>> Sometimes application is using only few of them. I am using Intel and >>>>> Mellanox. >>>>> >>>>> I do not see any performance drop but I am getting feedback that this >>>>> might not be the best approach. >>>> >>>> >>>> Who is giving you this feedback? >>>> >>>> The only issue with enabling (potentially 254 or more) VFs for each PF >>>> is that each VF will end up as a record in the pci_devices table in the Nova >>>> cell database. Multiply 254 or more times the number of PFs times the number >>>> of compute nodes in your deployment and you can get a large number of >>>> records that need to be stored. That said, the pci_devices table is well >>>> indexed and even if you had 1M or more records in the table, the access of a >>>> few hundred of those records when the resource tracker does a >>>> PciDeviceList.get_by_compute_node() [1] will still be quite fast. >>>> >>>> Best, >>>> -jay >>>> >>>> [1] >>>> https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L572 >>>> and then >>>> >>>> https://github.com/openstack/nova/blob/stable/pike/nova/pci/manager.py#L71 >>>> >>>>> Any recommendations? >>>>> >>>>> Thanks, >>>>> Maciej >>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-operators mailing list >>>>> OpenStack-operators at lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> OpenStack-operators at lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Cheers, ~Blairo From hongbin.lu at huawei.com Wed Jan 24 23:02:15 2018 From: hongbin.lu at huawei.com (Hongbin Lu) Date: Wed, 24 Jan 2018 23:02:15 +0000 Subject: [Openstack-operators] [nova][neutron] Extend instance IP filter for floating IP Message-ID: <0957CD8F4B55C0418161614FEC580D6B281A8378@YYZEML702-CHM.china.huawei.com> Hi all, Nova currently allows us to filter instances by fixed IP address(es). This feature is known to be useful in an operational scenario that cloud administrators detect abnormal traffic in an IP address and want to trace down to the instance that this IP address belongs to. This feature works well except a limitation that it only supports fixed IP address(es). In the real operational scenarios, cloud administrators might find that the abused IP address is a floating IP and want to do the filtering in the same way as fixed IP. Right now, unfortunately, the experience is diverged between these two classes of IP address. Cloud administrators need to deploy the logic to (i) detect the class of IP address (fixed or floating), (ii) use nova's IP filter if the address is a fixed IP address, (iii) do manual filtering if the address is a floating IP address. I wonder if nova team is willing to accept an enhancement that makes the IP filter support both. Optimally, cloud administrators can simply pass the abused IP address to nova and nova will handle the heterogeneity. In term of implementation, I expect the change is small. After this patch [1], Nova will query Neutron to compile a list of ports' device_ids (device_id is equal to the uuid of the instance to which the port binds) and use the device_ids to query the instances. If Neutron returns an empty list, Nova can give a second try to query Neutron for floating IPs. There is a RFE [2] and POC [3] for proposing to add a device_id attribute to the floating IP API resource. Nova can leverage this attribute to compile a list of instances uuids and use it as filter on listing the instances. If this feature is implemented, will it benefit the general community? Finally, I also wonder how others are tackling a similar problem. Appreciate your feedback. [1] https://review.openstack.org/#/c/525505/ [2] https://bugs.launchpad.net/neutron/+bug/1723026 [3] https://review.openstack.org/#/c/534882/ Best regards, Hongbin -------------- next part -------------- An HTML attachment was scrubbed... URL: From zioproto at gmail.com Thu Jan 25 12:04:55 2018 From: zioproto at gmail.com (Saverio Proto) Date: Thu, 25 Jan 2018 13:04:55 +0100 Subject: [Openstack-operators] Snapshot from Cinder to Glance doing a ceph rbd clone Message-ID: Hello ops, we have this working for Nova ephemeral images already, but Cinder did not implement this spec: https://specs.openstack.org/openstack/cinder-specs/specs/liberty/optimze-rbd-copy-volume-to-image.html Is anyone carrying an unmerged patch that implements this spec ? I could not believe my eyes this morning when I figured out this was working only for nova. thanks Saverio From jp.methot at planethoster.info Fri Jan 26 00:58:15 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Fri, 26 Jan 2018 09:58:15 +0900 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi Message-ID: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> Hi, Lately, we’ve been converting our VMs block devices (cinder block devices) to use the virtio-scsi driver instead of virtio-blk by modifying the database. This works great, however, we’ve run into an issue with an instance that has more than one drive. Essentially, the root device has address
bus='0' target='0' unit='0’/> . I believe this results in the root drive > getting called sdb in the vm while the second drive gets called sda. > > If my assumption is right, what exactly controls which drive gets address > unit=0 and which drive gets address unit=1 in the vm configuration? > > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From jp.methot at planethoster.info Fri Jan 26 04:21:44 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Fri, 26 Jan 2018 13:21:44 +0900 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> Message-ID: <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Thank you, it indeed seems to be the same issue. I will be following this bug report. A shame too, because we were waiting for the patch to allow us to setup 2 drives on virtio-scsi before starting to make the change. In the meantime, have you found a way to circumvent the issue? Could it be as easy as changing the drive order in the database? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 26 janv. 2018 à 13:06, Logan V. a écrit : > > https://bugs.launchpad.net/nova/+bug/1729584 -------------- next part -------------- An HTML attachment was scrubbed... URL: From logan at protiumit.com Fri Jan 26 05:23:56 2018 From: logan at protiumit.com (Logan V.) Date: Thu, 25 Jan 2018 23:23:56 -0600 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: There is a small patch in the bug which resolves the config drive ordering. Without that patch I don't know of any workaround. The config drive will always end up first in the boot order and the instance will always fail to boot in that situation. For the multi-volume instances where the boot volume is out of order, I don't know of any patch for that. One workaround is to detach any secondary data volumes from the instance, and then reattach them after booting from the one and only attached boot volume. Logan On Thu, Jan 25, 2018 at 10:21 PM, Jean-Philippe Méthot wrote: > Thank you, it indeed seems to be the same issue. I will be following this > bug report. A shame too, because we were waiting for the patch to allow us > to setup 2 drives on virtio-scsi before starting to make the change. In the > meantime, have you found a way to circumvent the issue? Could it be as easy > as changing the drive order in the database? > > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 26 janv. 2018 à 13:06, Logan V. a écrit : > > https://bugs.launchpad.net/nova/+bug/1729584 > > From jp.methot at planethoster.info Fri Jan 26 06:22:35 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Fri, 26 Jan 2018 15:22:35 +0900 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: Yea, the configdrive is a non-issue for us since we don’t use those. The multi-drive issue is the only one really affecting us. While removing the second drive and reattaching it after boot is probably a good solution, I think it’s likely the issue will come back after a hard reboot or migration. Probably better to wait before I start converting my multi-disk instances to virtio-scsi. If I am not mistaken, this should also be an issue in Pike and master, right? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 26 janv. 2018 à 14:23, Logan V. a écrit : > > There is a small patch in the bug which resolves the config drive > ordering. Without that patch I don't know of any workaround. The > config drive will always end up first in the boot order and the > instance will always fail to boot in that situation. > > For the multi-volume instances where the boot volume is out of order, > I don't know of any patch for that. One workaround is to detach any > secondary data volumes from the instance, and then reattach them after > booting from the one and only attached boot volume. > > Logan > > On Thu, Jan 25, 2018 at 10:21 PM, Jean-Philippe Méthot > wrote: >> Thank you, it indeed seems to be the same issue. I will be following this >> bug report. A shame too, because we were waiting for the patch to allow us >> to setup 2 drives on virtio-scsi before starting to make the change. In the >> meantime, have you found a way to circumvent the issue? Could it be as easy >> as changing the drive order in the database? >> >> >> Jean-Philippe Méthot >> Openstack system administrator >> Administrateur système Openstack >> PlanetHoster inc. >> >> >> >> >> Le 26 janv. 2018 à 13:06, Logan V. a écrit : >> >> https://bugs.launchpad.net/nova/+bug/1729584 >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Bell at cern.ch Fri Jan 26 07:13:56 2018 From: Tim.Bell at cern.ch (Tim Bell) Date: Fri, 26 Jan 2018 07:13:56 +0000 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: Labels can be one approach where you mount by disk label rather than device Creating the volume with the label # mkfs -t ext4 -L testvol /dev/vdb /etc/fstab then contains LABEL=testvol /mnt ext4 noatime,nodiratime,user_xattr 0 0 You still need to be careful to not attach data disks at install time though but it addresses booting order problems. Tim From: Jean-Philippe Méthot Date: Friday, 26 January 2018 at 07:28 To: "Logan V." Cc: openstack-operators Subject: Re: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi Yea, the configdrive is a non-issue for us since we don’t use those. The multi-drive issue is the only one really affecting us. While removing the second drive and reattaching it after boot is probably a good solution, I think it’s likely the issue will come back after a hard reboot or migration. Probably better to wait before I start converting my multi-disk instances to virtio-scsi. If I am not mistaken, this should also be an issue in Pike and master, right? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. Le 26 janv. 2018 à 14:23, Logan V. > a écrit : There is a small patch in the bug which resolves the config drive ordering. Without that patch I don't know of any workaround. The config drive will always end up first in the boot order and the instance will always fail to boot in that situation. For the multi-volume instances where the boot volume is out of order, I don't know of any patch for that. One workaround is to detach any secondary data volumes from the instance, and then reattach them after booting from the one and only attached boot volume. Logan On Thu, Jan 25, 2018 at 10:21 PM, Jean-Philippe Méthot > wrote: Thank you, it indeed seems to be the same issue. I will be following this bug report. A shame too, because we were waiting for the patch to allow us to setup 2 drives on virtio-scsi before starting to make the change. In the meantime, have you found a way to circumvent the issue? Could it be as easy as changing the drive order in the database? Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. Le 26 janv. 2018 à 13:06, Logan V. > a écrit : https://bugs.launchpad.net/nova/+bug/1729584 -------------- next part -------------- An HTML attachment was scrubbed... URL: From deb at glidr.info Fri Jan 26 11:44:15 2018 From: deb at glidr.info (Debabrata Das) Date: Fri, 26 Jan 2018 17:14:15 +0530 Subject: [Openstack-operators] Openstack AIO in production In-Reply-To: <16132434717.100658f5f1047.6169938931032397799@glidr.info> References: <16132434717.100658f5f1047.6169938931032397799@glidr.info> Message-ID: <1613248515d.b71ef89e1194.5419927070977063@glidr.info> Hi, We are a small shop and have our customers in host our solution in their data centers. We plan to use OpenStack to automate our delivery but are challenged the minimum hardware required to have an HA system. Most of our customers have a 2-4 servers in and a dedicated HA is practical for us. Hence, we plan to deploy all-in-one with share database in production. Any suggestion would be of great help. Thanks! Deb From maciej at kucia.net Fri Jan 26 12:19:33 2018 From: maciej at kucia.net (Maciej Kucia) Date: Fri, 26 Jan 2018 13:19:33 +0100 Subject: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV In-Reply-To: References: <68e78eaf-00bd-eb4b-3a6b-6e45a2cd7adc@gmail.com> Message-ID: Appreciate the feedback. It seems the conclusion is that generally one can safety enable large number of VFs with an exception of some limited hardware configurations which might require reducing VFs number due to BIOS limitation. Thanks & Regards, Maciej 2018-01-23 3:39 GMT+01:00 Blair Bethwaite : > This is starting to veer into magic territory for my level of > understanding so beware... but I believe there are (or could be > depending on your exact hardware) PCI config space considerations. > IIUC each SRIOV VF will have its own PCI BAR. Depending on the window > size required (which may be determined by other hardware features such > as flow-steering), you can potentially hit compatibility issues with > your server BIOS not supporting mapping of addresses which surpass > 4GB. This can then result in the device hanging on initialisation (at > server boot) and effectively bricking the box until the device is > removed. > > We have seen this first hand on a Dell R730 with Mellanox ConnectX-4 > card (there are several other Dell 13G platforms with the same BIOS > chipsets). We were explicitly increasing the PCI BAR size for the > device (not upping the number of VFs) in relation to a memory > exhaustion issue when running MPI collective communications on hosts > with 28+ cores, we only had 16 (or maybe 32, I forget) VFs configured > in the firmware. > > At the end of that support case (which resulted in a replacement NIC), > the support engineer's summary included: > """ > -When a BIOS limits the BAR to be contained in the 4GB address space - > it is a BIOS limitation. > Unfortunately, there is no way to tell - Some BIOS implementations use > proprietary heuristics to decide when to map a specific BAR below 4GB. > > -When SR-IOV is enabled, and num-vfs is high, the corresponding VF BAR > can be huge. > In this case, the BIOS may exhaust the ~2GB address space that it has > available below 4GB. > In this case, the BIOS may hang – and the server won’t boot. > """ > > At the very least you should ask your hardware vendors some very > specific questions before doing anything that might change your PCI > BAR sizes. > > Cheers, > > On 23 January 2018 at 11:44, Pedro Sousa wrote: > > Hi, > > > > I have sr-iov in production in some customers with maximum number of VFs > and > > didn't notice any performance issues. > > > > My understanding is that of course you will have performance penalty if > you > > consume all those vfs, because you're dividing the bandwidth across them, > > but other than if they're are there doing nothing you won't notice > anything. > > > > But I'm just talking from my experience :) > > > > Regards, > > Pedro Sousa > > > > On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia wrote: > >> > >> Thank you for the reply. I am interested in SR-IOV and pci whitelisting > is > >> certainly involved. > >> I suspect that OpenStack itself can handle those numbers of devices, > >> especially in telco applications where not much scheduling is being > done. > >> The feedback I am getting is from sysadmins who work on network > >> virtualization but I think this is just a rumor without any proof. > >> > >> The question is if performance penalty from SR-IOV drivers or PCI itself > >> is negligible. Should cloud admin configure maximum number of VFs for > >> flexibility or should it be manually managed and balanced depending on > >> application? > >> > >> Regards, > >> Maciej > >> > >>> > >>> > >>> 2018-01-22 18:38 GMT+01:00 Jay Pipes : > >>>> > >>>> On 01/22/2018 11:36 AM, Maciej Kucia wrote: > >>>>> > >>>>> Hi! > >>>>> > >>>>> Is there any noticeable performance penalty when using multiple > virtual > >>>>> functions? > >>>>> > >>>>> For simplicity I am enabling all available virtual functions in my > >>>>> NICs. > >>>> > >>>> > >>>> I presume by the above you are referring to setting your > >>>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs > on a > >>>> particular PF's PCI address domain/bus? > >>>> > >>>>> Sometimes application is using only few of them. I am using Intel and > >>>>> Mellanox. > >>>>> > >>>>> I do not see any performance drop but I am getting feedback that this > >>>>> might not be the best approach. > >>>> > >>>> > >>>> Who is giving you this feedback? > >>>> > >>>> The only issue with enabling (potentially 254 or more) VFs for each PF > >>>> is that each VF will end up as a record in the pci_devices table in > the Nova > >>>> cell database. Multiply 254 or more times the number of PFs times the > number > >>>> of compute nodes in your deployment and you can get a large number of > >>>> records that need to be stored. That said, the pci_devices table is > well > >>>> indexed and even if you had 1M or more records in the table, the > access of a > >>>> few hundred of those records when the resource tracker does a > >>>> PciDeviceList.get_by_compute_node() [1] will still be quite fast. > >>>> > >>>> Best, > >>>> -jay > >>>> > >>>> [1] > >>>> https://github.com/openstack/nova/blob/stable/pike/nova/ > compute/resource_tracker.py#L572 > >>>> and then > >>>> > >>>> https://github.com/openstack/nova/blob/stable/pike/nova/ > pci/manager.py#L71 > >>>> > >>>>> Any recommendations? > >>>>> > >>>>> Thanks, > >>>>> Maciej > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> OpenStack-operators mailing list > >>>>> OpenStack-operators at lists.openstack.org > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > >>>>> > >>>> > >>>> _______________________________________________ > >>>> OpenStack-operators mailing list > >>>> OpenStack-operators at lists.openstack.org > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > >>> > >>> > >> > >> > >> _______________________________________________ > >> OpenStack-operators mailing list > >> OpenStack-operators at lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > > -- > Cheers, > ~Blairo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shake.chen at gmail.com Fri Jan 26 12:20:21 2018 From: shake.chen at gmail.com (Shake Chen) Date: Fri, 26 Jan 2018 20:20:21 +0800 Subject: [Openstack-operators] Openstack AIO in production In-Reply-To: <1613248515d.b71ef89e1194.5419927070977063@glidr.info> References: <16132434717.100658f5f1047.6169938931032397799@glidr.info> <1613248515d.b71ef89e1194.5419927070977063@glidr.info> Message-ID: you can try kolla and kolla-ansible. first you deploy one node as master, future you can extend master to three of five, it is no problem. On Fri, Jan 26, 2018 at 7:44 PM, Debabrata Das wrote: > Hi, > > We are a small shop and have our customers in host our solution in their > data centers. We plan to use OpenStack to automate our delivery but are > challenged the minimum hardware required to have an HA system. Most of our > customers have a 2-4 servers in and a dedicated HA is practical for us. > Hence, we plan to deploy all-in-one with share database in production. Any > suggestion would be of great help. > > Thanks! > Deb > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Shake Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From zandor.z at gmail.com Fri Jan 26 13:29:59 2018 From: zandor.z at gmail.com (Sandor Zeestraten) Date: Fri, 26 Jan 2018 14:29:59 +0100 Subject: [Openstack-operators] [charms] Migrating HA control plane by scaling up and down Message-ID: Hey OpenStack Charmers, We have a Newton deployment on MAAS with 3 controller machines running all the usual OpenStack controller services in 3x HA with the hacluster charm in LXD containers. Now we'd like migrate some of those OpenStack services to 3 larger controller machines. Downtime of the services during the migration is not an issue. My current plan is test something like this: * Add the new controller machines to the model * Increase the cluster_count from 3 to 6 on the hacluster charm of the services in question * Add units to the service to LXD containers on the new controller machine * Wait for things to deploy and cluster * Decrease the cluster_count from 6 to 3 * Remove units on the old controller Questions: 1. Is there a preferred way to migrate OpenStack services deployed by charms? 2. Does the plan above look somewhat sane? 3. If yes to the above, does the order of changing the cluster_count and adding/removing units matter? I've seen this bug for example: https://bugs. launchpad.net/charm-hacluster/+bug/1424048 4. Anything to keep in mind for scaling up and down the rabbitmq and percona clusters? Cheers -- Sandor Zeestraten -------------- next part -------------- An HTML attachment was scrubbed... URL: From molenkam at uwo.ca Fri Jan 26 15:59:28 2018 From: molenkam at uwo.ca (Gary Molenkamp) Date: Fri, 26 Jan 2018 10:59:28 -0500 Subject: [Openstack-operators] Passing additional parameters to KVM for a single instance Message-ID: I'm trying to import a Solaris10 image into Ocata that is working under libvirt/KVM on a Fedora workstation.  However, in order for the kvm instance to work, it needs a few additional parameters to qemu that I use in the libvirt XML file:       Westmere            For the first parameter, I know I could modify the /etc/nova/nova.conf of the entire hypervisor on the compute node to Westmere and limit instances to that hypervisor, but that limits additional instances on that compute node.   Is there a way to instruct nova to use a westmere cpu for a single instance? Likewise, how can I pass the -no-kvm-irqchip option for instances of this image? Any pointers would be appreciated. Thanks -- Gary Molenkamp Computer Science Systems Administrator University of Western Ontario molenkam at uwo.ca http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566 From gael.therond at gmail.com Fri Jan 26 16:59:00 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Fri, 26 Jan 2018 16:59:00 +0000 Subject: [Openstack-operators] Passing additional parameters to KVM for a single instance In-Reply-To: References: Message-ID: I would rather suggest you to deal with flavor/images metdata and host aggregate for such segregation of cpu capacity and versionning. If someone have another technics I’m pretty curious of it too. Le ven. 26 janv. 2018 à 17:00, Gary Molenkamp a écrit : > I'm trying to import a Solaris10 image into Ocata that is working under > libvirt/KVM on a Fedora workstation. However, in order for the kvm > instance to work, it needs a few additional parameters to qemu that I > use in the libvirt XML file: > > > Westmere > > > > > > > For the first parameter, I know I could modify the /etc/nova/nova.conf > of the entire hypervisor on the compute node to Westmere and limit > instances to that hypervisor, but that limits additional instances on > that compute node. Is there a way to instruct nova to use a westmere > cpu for a single instance? > > Likewise, how can I pass the -no-kvm-irqchip option for instances of > this image? > > Any pointers would be appreciated. > > Thanks > > > -- > Gary Molenkamp Computer Science > Systems Administrator University of Western Ontario > molenkam at uwo.ca http://www.csd.uwo.ca > (519) 661-2111 x86882 (519) 661-3566 > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blake at platform9.com Fri Jan 26 17:16:46 2018 From: blake at platform9.com (Blake Covarrubias) Date: Fri, 26 Jan 2018 09:16:46 -0800 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: The inconsistency in device naming is documented in https://docs.openstack.org/nova/pike/user/block-device-mapping.html#intermezzo-problem-with-device-names . Similar to Tim's suggested approach, you can also mount the device by its UUID. A while back I wrote a small, relatively untested, Python script to which modifies fstab & replaces the device names with UUID= (see attached). It depends on Augeas (python-augeas) to modify fstab. This script can be downloaded into the Instance using cloud-init, and then executed on initial boot with runcmd. — Blake Covarrubias Product Manager Platform9 Systems, Inc On Thu, Jan 25, 2018 at 11:13 PM, Tim Bell wrote: > Labels can be one approach where you mount by disk label rather than device > > > > Creating the volume with the label > > > > # mkfs -t ext4 -L testvol /dev/vdb > > > > /etc/fstab then contains > > > > LABEL=testvol /mnt ext4 noatime,nodiratime,user_xattr 0 0 > > > > You still need to be careful to not attach data disks at install time > though but it addresses booting order problems. > > > > Tim > > > > *From: *Jean-Philippe Méthot > *Date: *Friday, 26 January 2018 at 07:28 > *To: *"Logan V." > *Cc: *openstack-operators > *Subject: *Re: [Openstack-operators] Inverted drive letters on block > devices that use virtio-scsi > > > > Yea, the configdrive is a non-issue for us since we don’t use those. The > multi-drive issue is the only one really affecting us. While removing the > second drive and reattaching it after boot is probably a good solution, I > think it’s likely the issue will come back after a hard reboot or > migration. Probably better to wait before I start converting my multi-disk > instances to virtio-scsi. If I am not mistaken, this should also be an > issue in Pike and master, right? > > > > Jean-Philippe Méthot > > Openstack system administrator > > Administrateur système Openstack > > PlanetHoster inc. > > > > > > > > > > Le 26 janv. 2018 à 14:23, Logan V. a écrit : > > > > There is a small patch in the bug which resolves the config drive > ordering. Without that patch I don't know of any workaround. The > config drive will always end up first in the boot order and the > instance will always fail to boot in that situation. > > For the multi-volume instances where the boot volume is out of order, > I don't know of any patch for that. One workaround is to detach any > secondary data volumes from the instance, and then reattach them after > booting from the one and only attached boot volume. > > Logan > > On Thu, Jan 25, 2018 at 10:21 PM, Jean-Philippe Méthot > wrote: > > Thank you, it indeed seems to be the same issue. I will be following this > bug report. A shame too, because we were waiting for the patch to allow us > to setup 2 drives on virtio-scsi before starting to make the change. In the > meantime, have you found a way to circumvent the issue? Could it be as easy > as changing the drive order in the database? > > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 26 janv. 2018 à 13:06, Logan V. a écrit : > > https://bugs.launchpad.net/nova/+bug/1729584 > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fstab_dev_to_uuid.py Type: text/x-python-script Size: 1848 bytes Desc: not available URL: From jaypipes at gmail.com Fri Jan 26 17:25:51 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Fri, 26 Jan 2018 12:25:51 -0500 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: The bug in question doesn't have anything to do with that. I've pushed a fix and a test case up here: https://review.openstack.org/538310 Best, -jay On 01/26/2018 12:16 PM, Blake Covarrubias wrote: > The inconsistency in device naming is documented in > https://docs.openstack.org/nova/pike/user/block-device-mapping.html#intermezzo-problem-with-device-names. > > Similar to Tim's suggested approach, you can also mount the device by > its UUID. A while back I wrote a small, relatively untested, Python > script to which modifies fstab & replaces the device names with > UUID= (see attached). It depends on Augeas (python-augeas) > to modify fstab. > > This script can be downloaded into the Instance using cloud-init, and > then executed on initial boot with runcmd. > > — > Blake Covarrubias > Product Manager > Platform9 Systems, Inc > > On Thu, Jan 25, 2018 at 11:13 PM, Tim Bell > wrote: > > Labels can be one approach where you mount by disk label rather than > device > > Creating the volume with the label > > |# mkfs -t ext4 -L testvol /dev/vdb| > > /etc/fstab then contains > > LABEL=testvol /mnt ext4 noatime,nodiratime,user_xattr    0       0 > > You still need to be careful to not attach data disks at install > time though but it addresses booting order problems. > > Tim > > *From: *Jean-Philippe Méthot > > *Date: *Friday, 26 January 2018 at 07:28 > *To: *"Logan V." > > *Cc: *openstack-operators > > *Subject: *Re: [Openstack-operators] Inverted drive letters on block > devices that use virtio-scsi > > Yea, the configdrive is a non-issue for us since we don’t use those. > The multi-drive issue is the only one really affecting us. While > removing the second drive and reattaching it after boot is probably > a good solution, I think it’s likely the issue will come back after > a hard reboot or migration. Probably better to wait before I start > converting my multi-disk instances to virtio-scsi. If I am not > mistaken, this should also be an issue in Pike and master, right? > > Jean-Philippe Méthot > > Openstack system administrator > > Administrateur système Openstack > > PlanetHoster inc. > > > > Le 26 janv. 2018 à 14:23, Logan V. > > a écrit : > > There is a small patch in the bug which resolves the config drive > ordering. Without that patch I don't know of any workaround. The > config drive will always end up first in the boot order and the > instance will always fail to boot in that situation. > > For the multi-volume instances where the boot volume is out of > order, > I don't know of any patch for that. One workaround is to detach any > secondary data volumes from the instance, and then reattach them > after > booting from the one and only attached boot volume. > > Logan > > On Thu, Jan 25, 2018 at 10:21 PM, Jean-Philippe Méthot > > wrote: > > Thank you, it indeed seems to be the same issue. I will be > following this > bug report. A shame too, because we were waiting for the > patch to allow us > to setup 2 drives on virtio-scsi before starting to make the > change. In the > meantime, have you found a way to circumvent the issue? > Could it be as easy > as changing the drive order in the database? > > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 26 janv. 2018 à 13:06, Logan V. > > a écrit : > > https://bugs.launchpad.net/nova/+bug/1729584 > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From melwittt at gmail.com Fri Jan 26 20:53:00 2018 From: melwittt at gmail.com (melanie witt) Date: Fri, 26 Jan 2018 12:53:00 -0800 Subject: [Openstack-operators] Inverted drive letters on block devices that use virtio-scsi In-Reply-To: References: <126F17A2-D3B7-41EB-892F-51D7E05542D8@planethoster.info> <5EE8EF59-9FF5-4197-9D63-FB56699F143E@planethoster.info> Message-ID: <02D85352-232B-4F82-AD3B-2F670C0C2BD1@gmail.com> > On Jan 25, 2018, at 21:23, Logan V. wrote: > > There is a small patch in the bug which resolves the config drive > ordering. Without that patch I don't know of any workaround. The > config drive will always end up first in the boot order and the > instance will always fail to boot in that situation. > > For the multi-volume instances where the boot volume is out of order, > I don't know of any patch for that. One workaround is to detach any > secondary data volumes from the instance, and then reattach them after > booting from the one and only attached boot volume. I’ve posted an addition to the small patch in the launchpad bug intended to handle the multi-volume problem. If there’s any way you could try that out and let us know if it fixes things for your case, it would help us out a lot. -melanie From dmsimard at redhat.com Mon Jan 29 13:30:12 2018 From: dmsimard at redhat.com (David Moreau Simard) Date: Mon, 29 Jan 2018 08:30:12 -0500 Subject: [Openstack-operators] [all][kolla][rdo] Collaboration with Kolla for the RDO test days Message-ID: Hi ! For those who might be unfamiliar with the RDO [1] community project: we hang out in #rdo, we don't bite and we build vanilla OpenStack packages. These packages are what allows you to leverage one of the deployment projects such as TripleO, PackStack or Kolla to deploy on CentOS or RHEL. The RDO community collaborates with these deployment projects by providing trunk and stable packages in order to let them develop and test against the latest and the greatest of OpenStack. RDO test days typically happen around a week after an upstream milestone has been reached [2]. The purpose is to get everyone together in #rdo: developers, users, operators, maintainers -- and test not just RDO but OpenStack itself as installed by the different deployment projects. We tried something new at our last test day [3] and it worked out great. Instead of encouraging participants to install their own cloud for testing things, we supplied a cloud of our own... a bit like a limited duration TryStack [4]. This lets users without the operational knowledge, time or hardware to install an OpenStack environment to see what's coming in the upcoming release of OpenStack and get the feedback loop going ahead of the release. We used Packstack for the last deployment and invited Packstack cores to deploy, operate and troubleshoot the installation for the duration of the test days. The idea is to rotate between the different deployment projects to give every interested project a chance to participate. Last week, we reached out to Kolla to see if they would be interested in participating in our next RDO test days [5] around February 8th. We supply the bare metal hardware and their core contributors get to deploy and operate a cloud with real users and developers poking around. All around, this is a great opportunity to get feedback for RDO, Kolla and OpenStack. We'll be advertising the event a bit more as the test days draw closer but until then, I thought it was worthwhile to share some context for this new thing we're doing. Let me know if you have any questions ! Thanks, [1]: https://www.rdoproject.org/ [2]: https://www.rdoproject.org/testday/ [3]: https://dmsimard.com/2017/11/29/come-try-a-real-openstack-queens-deployment/ [4]: http://trystack.org/ [5]: http://eavesdrop.openstack.org/meetings/kolla/2018/kolla.2018-01-24-16.00.log.html David Moreau Simard Senior Software Engineer | OpenStack RDO dmsimard = [irc, github, twitter] From jaypipes at gmail.com Mon Jan 29 13:47:46 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Mon, 29 Jan 2018 08:47:46 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> Message-ID: <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> Greetings again, Mathieu, response inline... On 01/18/2018 07:24 PM, Mathieu Gagné wrote: > So far, a couple challenges/issues: > > We used to have fine grain control over the calls a user could make to > the Nova API: > * os_compute_api:os-aggregates:add_host > * os_compute_api:os-aggregates:remove_host > > This means we could make it so our technicians could *ONLY* manage > this aspect of our cloud. > With placement API, it's all or nothing. (and found some weeks ago > that it's hardcoded to the "admin" role) > And you now have to craft your own curl calls and no more UI in > Horizon. (let me know if I missed something regarding the ACL) > > I will read about placement API and see with my coworkers how we could > adapt our systems/tools to use placement API instead. (assuming > disable_allocation_ratio_autoset will be implemented) > But ACL is a big concern for us if we go down that path. OK, I think I may have stumbled upon a possible solution to this that would allow you to keep using the same host aggregate metadata APIs for setting allocation ratios. See below. > While I agree there are very technical/raw solutions to the issue > (like the ones you suggested), please understand that from our side, > this is still a major regression in the usability of OpenStack from an > operator point of view. Yes, understood. > And it's unfortunate that I feel I now have to play catch up and > explain my concerns about a "fait accompli" that wasn't well > communicated to the operators and wasn't clearly mentioned in the > release notes. > I would have appreciated an email to the ops list explaining the > proposed change and if anyone has concerns/comments about it. I don't > often reply but I feel like I would have this time as this is a major > change for us. Agree with you. Frankly, I did not realize this would be an issue. Had I known, of course we would have sent a note out about this and consulted with operators ahead of time. Anyway, on to a possible solution. For background, please see this bug: https://bugs.launchpad.net/nova/+bug/1742747 When looking at that bug and the associated patch, I couldn't help but think that perhaps we could just change the default behaviour of the resource tracker when it encounters a nova.conf CONF.cpu_allocation_ratio value of 0.0. The current behaviour of the nova-compute resource tracker is to follow the policy outlined in the CONF option's documentation: [1] "From Ocata (15.0.0) this is used to influence the hosts selected by the Placement API. Note that when Placement is used, the CoreFilter is redundant, because the Placement API will have already filtered out hosts that would have failed the CoreFilter. This configuration specifies ratio for CoreFilter which can be set per compute node. For AggregateCoreFilter, it will fall back to this configuration value if no per-aggregate setting is found. NOTE: This can be set per-compute, or if set to 0.0, the value set on the scheduler node(s) or compute node(s) will be used and defaulted to 16.0." [1] https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L407-L418 What I believe we can do is change the behaviour so that if a 0.0 value is found in the nova.conf file on the nova-compute worker, then instead of defaulting to 16.0, the resource tracker would first look to see if the compute node was associated with a host aggregate that had the "cpu_allocation_ratio" a metadata item. If one was found, then the host aggregate's cpu_allocation_ratio would be used. If not, then the 16.0 default would be used. What do you think? Best, -jay From chris.friesen at windriver.com Mon Jan 29 17:40:23 2018 From: chris.friesen at windriver.com (Chris Friesen) Date: Mon, 29 Jan 2018 11:40:23 -0600 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> Message-ID: <5A6F5C87.1020904@windriver.com> On 01/29/2018 07:47 AM, Jay Pipes wrote: > What I believe we can do is change the behaviour so that if a 0.0 value is found > in the nova.conf file on the nova-compute worker, then instead of defaulting to > 16.0, the resource tracker would first look to see if the compute node was > associated with a host aggregate that had the "cpu_allocation_ratio" a metadata > item. If one was found, then the host aggregate's cpu_allocation_ratio would be > used. If not, then the 16.0 default would be used. Presumably you'd need to handle the case where the host is in multiple host aggregates that have "cpu_allocation_ratio" as a metadata item. I think the AggregateCoreFilter uses the smallest ratio in this case. Chris From jaypipes at gmail.com Mon Jan 29 17:41:58 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Mon, 29 Jan 2018 12:41:58 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <5A6F5C87.1020904@windriver.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> <5A6F5C87.1020904@windriver.com> Message-ID: On 01/29/2018 12:40 PM, Chris Friesen wrote: > On 01/29/2018 07:47 AM, Jay Pipes wrote: > >> What I believe we can do is change the behaviour so that if a 0.0 >> value is found >> in the nova.conf file on the nova-compute worker, then instead of >> defaulting to >> 16.0, the resource tracker would first look to see if the compute node >> was >> associated with a host aggregate that had the "cpu_allocation_ratio" a >> metadata >> item. If one was found, then the host aggregate's cpu_allocation_ratio >> would be >> used. If not, then the 16.0 default would be used. > > Presumably you'd need to handle the case where the host is in multiple > host aggregates that have "cpu_allocation_ratio" as a metadata item.  I > think the AggregateCoreFilter uses the smallest ratio in this case. Yes, this is one of the many issues with the host aggregate metadata implementation. -jay From itzshamail at gmail.com Mon Jan 29 17:50:10 2018 From: itzshamail at gmail.com (Shamail Tahir) Date: Mon, 29 Jan 2018 12:50:10 -0500 Subject: [Openstack-operators] [User-committee] Stepping aside announcement In-Reply-To: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> References: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> Message-ID: <4B0866E8-8D5D-4AD9-A3CA-6C2784706EFB@gmail.com> > On Jan 29, 2018, at 12:12 PM, Edgar Magana wrote: > > Dear Community, > > This is an overdue announcement but I was waiting for the right moment and today it is with the opening of the UC election. It has been almost seven years of full commitment to OpenStack and the entire ecosystem around it. During the last couple of years, I had the opportunity to serve as Chair of the User Committee. I have participated in this role with nothing more important but passion and dedication for the users and operators. OpenStack has been very important for me and it will be always the most enjoyable work I have ever done. > > It is time to move on. Our team is extending its focus to other cloud domains and I will be leading one of the those. Therefore, I would like to announce that I am stepping aside from my role as UC Chair. Per our UC election, there will be no just 2 seats available but three: https://governance.openstack.org/uc/reference/uc-election-feb2018.html Thank you for everything you’ve done for the community thus far Edgar! Your leadership has been instrumental in helping us evolve over the last 2-3 cycles. I hope you are still able to participate in the community even after you leave the User Committee. > > I want to encourage the whole AUC community to participate, be part of the User Committee is a very important and grateful activity. Please, go for it! > > Thank you all, > > Edgar Magana > Sr. Principal Architect > Workday, Inc. > > > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Mon Jan 29 17:59:55 2018 From: amy at demarco.com (Amy Marrich) Date: Mon, 29 Jan 2018 11:59:55 -0600 Subject: [Openstack-operators] [User-committee] Stepping aside announcement In-Reply-To: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> References: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> Message-ID: Edgar, Thank you for all your hard work and contributions! Amy (spotz) On Mon, Jan 29, 2018 at 11:12 AM, Edgar Magana wrote: > Dear Community, > > > > This is an overdue announcement but I was waiting for the right moment and > today it is with the opening of the UC election. It has been almost seven > years of full commitment to OpenStack and the entire ecosystem around it. > During the last couple of years, I had the opportunity to serve as Chair of > the User Committee. I have participated in this role with nothing more > important but passion and dedication for the users and operators. OpenStack > has been very important for me and it will be always the most enjoyable > work I have ever done. > > > > It is time to move on. Our team is extending its focus to other cloud > domains and I will be leading one of the those. Therefore, I would like to > announce that I am stepping aside from my role as UC Chair. Per our UC > election, there will be no just 2 seats available but three: > https://governance.openstack.org/uc/reference/uc-election-feb2018.html > > > > I want to encourage the whole AUC community to participate, be part of the > User Committee is a very important and grateful activity. Please, go for it! > > > > Thank you all, > > > > Edgar Magana > > Sr. Principal Architect > > Workday, Inc. > > > > > > > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Mon Jan 29 18:44:58 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Mon, 29 Jan 2018 12:44:58 -0600 Subject: [Openstack-operators] [User-committee] Stepping aside announcement In-Reply-To: References: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> Message-ID: Thanks for your service to the community Edgar! Hope to see you at an event soon and we can toast to your departure and continued success! On Mon, Jan 29, 2018 at 11:59 AM, Amy Marrich wrote: > Edgar, > > Thank you for all your hard work and contributions! > > Amy (spotz) > > On Mon, Jan 29, 2018 at 11:12 AM, Edgar Magana > wrote: > >> Dear Community, >> >> >> >> This is an overdue announcement but I was waiting for the right moment >> and today it is with the opening of the UC election. It has been almost >> seven years of full commitment to OpenStack and the entire ecosystem around >> it. During the last couple of years, I had the opportunity to serve as >> Chair of the User Committee. I have participated in this role with nothing >> more important but passion and dedication for the users and operators. >> OpenStack has been very important for me and it will be always the most >> enjoyable work I have ever done. >> >> >> >> It is time to move on. Our team is extending its focus to other cloud >> domains and I will be leading one of the those. Therefore, I would like to >> announce that I am stepping aside from my role as UC Chair. Per our UC >> election, there will be no just 2 seats available but three: >> https://governance.openstack.org/uc/reference/uc-election-feb2018.html >> >> >> >> I want to encourage the whole AUC community to participate, be part of >> the User Committee is a very important and grateful activity. Please, go >> for it! >> >> >> >> Thank you all, >> >> >> >> Edgar Magana >> >> Sr. Principal Architect >> >> Workday, Inc. >> >> >> >> >> >> >> >> _______________________________________________ >> User-committee mailing list >> User-committee at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee >> >> > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee > > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arkady.Kanevsky at dell.com Mon Jan 29 20:31:12 2018 From: Arkady.Kanevsky at dell.com (Arkady.Kanevsky at dell.com) Date: Mon, 29 Jan 2018 20:31:12 +0000 Subject: [Openstack-operators] [User-committee] Stepping aside announcement In-Reply-To: References: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> Message-ID: <7fc1f445dba54c3bb72c2815e44fde56@AUSX13MPS308.AMER.DELL.COM> Edgar, thank you for all your hard work and passion. From: Melvin Hillsman [mailto:mrhillsman at gmail.com] Sent: Monday, January 29, 2018 12:45 PM To: Amy Marrich Cc: openstack at lists.openstack.org; openstack-operators ; user-committee Subject: Re: [User-committee] Stepping aside announcement Thanks for your service to the community Edgar! Hope to see you at an event soon and we can toast to your departure and continued success! On Mon, Jan 29, 2018 at 11:59 AM, Amy Marrich > wrote: Edgar, Thank you for all your hard work and contributions! Amy (spotz) On Mon, Jan 29, 2018 at 11:12 AM, Edgar Magana > wrote: Dear Community, This is an overdue announcement but I was waiting for the right moment and today it is with the opening of the UC election. It has been almost seven years of full commitment to OpenStack and the entire ecosystem around it. During the last couple of years, I had the opportunity to serve as Chair of the User Committee. I have participated in this role with nothing more important but passion and dedication for the users and operators. OpenStack has been very important for me and it will be always the most enjoyable work I have ever done. It is time to move on. Our team is extending its focus to other cloud domains and I will be leading one of the those. Therefore, I would like to announce that I am stepping aside from my role as UC Chair. Per our UC election, there will be no just 2 seats available but three: https://governance.openstack.org/uc/reference/uc-election-feb2018.html I want to encourage the whole AUC community to participate, be part of the User Committee is a very important and grateful activity. Please, go for it! Thank you all, Edgar Magana Sr. Principal Architect Workday, Inc. _______________________________________________ User-committee mailing list User-committee at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee _______________________________________________ User-committee mailing list User-committee at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yihleong at gmail.com Mon Jan 29 22:17:36 2018 From: yihleong at gmail.com (Yih Leong, Sun.) Date: Mon, 29 Jan 2018 14:17:36 -0800 Subject: [Openstack-operators] [User-committee] Stepping aside announcement In-Reply-To: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> References: <1BDADB08-9333-4A24-BC5A-1058F14EADD6@workday.com> Message-ID: Sad to hear but thanks for your contributions and I enjoyed the time working with you! On Monday, January 29, 2018, Edgar Magana wrote: > Dear Community, > > > > This is an overdue announcement but I was waiting for the right moment and > today it is with the opening of the UC election. It has been almost seven > years of full commitment to OpenStack and the entire ecosystem around it. > During the last couple of years, I had the opportunity to serve as Chair of > the User Committee. I have participated in this role with nothing more > important but passion and dedication for the users and operators. OpenStack > has been very important for me and it will be always the most enjoyable > work I have ever done. > > > > It is time to move on. Our team is extending its focus to other cloud > domains and I will be leading one of the those. Therefore, I would like to > announce that I am stepping aside from my role as UC Chair. Per our UC > election, there will be no just 2 seats available but three: > https://governance.openstack.org/uc/reference/uc-election-feb2018.html > > > > I want to encourage the whole AUC community to participate, be part of the > User Committee is a very important and grateful activity. Please, go for it! > > > > Thank you all, > > > > Edgar Magana > > Sr. Principal Architect > > Workday, Inc. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Mon Jan 29 23:30:52 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 29 Jan 2018 18:30:52 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> Message-ID: Hi Jay, First, thank you very much for the followup. Response inline. On Mon, Jan 29, 2018 at 8:47 AM, Jay Pipes wrote: > Greetings again, Mathieu, response inline... > > On 01/18/2018 07:24 PM, Mathieu Gagné wrote: >> >> So far, a couple challenges/issues: >> >> We used to have fine grain control over the calls a user could make to >> the Nova API: >> * os_compute_api:os-aggregates:add_host >> * os_compute_api:os-aggregates:remove_host >> >> This means we could make it so our technicians could *ONLY* manage >> this aspect of our cloud. >> With placement API, it's all or nothing. (and found some weeks ago >> that it's hardcoded to the "admin" role) >> And you now have to craft your own curl calls and no more UI in >> Horizon. (let me know if I missed something regarding the ACL) >> >> I will read about placement API and see with my coworkers how we could >> adapt our systems/tools to use placement API instead. (assuming >> disable_allocation_ratio_autoset will be implemented) >> But ACL is a big concern for us if we go down that path. > > > OK, I think I may have stumbled upon a possible solution to this that would > allow you to keep using the same host aggregate metadata APIs for setting > allocation ratios. See below. > >> While I agree there are very technical/raw solutions to the issue >> (like the ones you suggested), please understand that from our side, >> this is still a major regression in the usability of OpenStack from an >> operator point of view. > > > Yes, understood. > >> And it's unfortunate that I feel I now have to play catch up and >> explain my concerns about a "fait accompli" that wasn't well >> communicated to the operators and wasn't clearly mentioned in the >> release notes. >> I would have appreciated an email to the ops list explaining the >> proposed change and if anyone has concerns/comments about it. I don't >> often reply but I feel like I would have this time as this is a major >> change for us. > > > Agree with you. Frankly, I did not realize this would be an issue. Had I > known, of course we would have sent a note out about this and consulted with > operators ahead of time. > > Anyway, on to a possible solution. > > For background, please see this bug: > > https://bugs.launchpad.net/nova/+bug/1742747 > > When looking at that bug and the associated patch, I couldn't help but think > that perhaps we could just change the default behaviour of the resource > tracker when it encounters a nova.conf CONF.cpu_allocation_ratio value of > 0.0. > > The current behaviour of the nova-compute resource tracker is to follow the > policy outlined in the CONF option's documentation: [1] > > "From Ocata (15.0.0) this is used to influence the hosts selected by > the Placement API. Note that when Placement is used, the CoreFilter > is redundant, because the Placement API will have already filtered > out hosts that would have failed the CoreFilter. > > This configuration specifies ratio for CoreFilter which can be set > per compute node. For AggregateCoreFilter, it will fall back to this > configuration value if no per-aggregate setting is found. > > NOTE: This can be set per-compute, or if set to 0.0, the value > set on the scheduler node(s) or compute node(s) will be used > and defaulted to 16.0." > > [1] > https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L407-L418 > > What I believe we can do is change the behaviour so that if a 0.0 value is > found in the nova.conf file on the nova-compute worker, then instead of > defaulting to 16.0, the resource tracker would first look to see if the > compute node was associated with a host aggregate that had the > "cpu_allocation_ratio" a metadata item. If one was found, then the host > aggregate's cpu_allocation_ratio would be used. If not, then the 16.0 > default would be used. > > What do you think? > The bug and related discussions to resource tracker are a bit too technical for me so I can't comment on that. (yet) I however get the feeling that an aggregate based solution would be "hackish" and Nova developers would prefer a placement centric solution. As mentioned, I would have to educate myself about placement API and see how I can adapt our flow to fit the new mechanism. So lets explore what would looks like a placement centric solution. (let me know if I get anything wrong) Here are our main concerns/challenges so far, which I will compare to our current flow: 1. Compute nodes should not be enabled by default When adding new compute node, we add them to a special aggregate which makes scheduling instances on it impossible. (with an impossible condition) We are aware that there is a timing issue where someone *could* schedule an instance on it if we aren't fast enough. So far, it has never been a major issue for us. If we want to move away from aggregates, the "enable_new_services" config (since Ocata) would be a great solution to our need. I don't think Placement needs to be involved in this case, unless you can show me a better alternative solution. 2. Ability to move compute nodes around through API (without configuration management system) We use aggregate to isolate our flavor series which are mainly based on cpu allocation ratio. This means distinct capacity pool of compute nodes for each flavor series. We can easily move around our compute nodes if one aggregate (or capacity pool) need more compute nodes through API. There is no need to deploy a new version of our configuration management system. Based on your previous comment, Nova developers could implement a way so configuration file is no longer used when pushing ratios to Placement. An op should be able to provide the ratio himself through the Placement API. Some questions: * What's the default ratio if none is provided? (none through config file or API) * How can I restrict flavors to matching hosts? Will Placement respect allocation ratios provided by a flavor and find corresponding compute nodes? I couldn't find details on that one in previous emails. Some challenges: * Find a way to easily visualise/monitor available capacity/hosts per capacity pool (previously aggregate) 3. Ability to delegate above operations to ops With aggregates, you can easily precisely delegate host memberships to ops or other people using the corresponding policies: * os_compute_api:os-aggregates:add_host * os_compute_api:os-aggregates:remove_host And those people can be autonomous with the API without having to redeploy anything. Being able to manipulate/administer the hosts through an API is golden and therefore totally disagree with "Config management is the solution to your problem". With Placement API, there is no fine grain control over who/what can be done through the API. (it's even hardcoded to the "admin" role) So there is some work to be done here: * Remove hardcoded "admin" role from code. Already being work on by Matt Riedemann [1] * Fine grain control/policies for Placement API. The last point needs a bit more work. If we can implement control on resource providers, allocations, usages, traits, etc, I will be happy. In the end, that's one of the major "regression" I found with placement. I don't want a human to be able to do more than it should be able to do and break everything. So I'm not ready to cross that river yet, I'm still running Mitaka. But I need to make sure I'm not stuck when Ocata happens for us. [1] https://review.openstack.org/#/c/524425/ -- Mathieu From mgagne at calavera.ca Mon Jan 29 23:48:16 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 29 Jan 2018 18:48:16 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata In-Reply-To: <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> References: <1253b153-9e7e-6dbe-1836-5a9a2f059bdb@gmail.com> <57306d1a-529b-3907-7c5a-a9b46057b236@gmail.com> <37aa1d2c-8e80-8cf3-2a92-444916b3b80f@gmail.com> Message-ID: On Mon, Jan 29, 2018 at 8:47 AM, Jay Pipes wrote: > > What I believe we can do is change the behaviour so that if a 0.0 value is > found in the nova.conf file on the nova-compute worker, then instead of > defaulting to 16.0, the resource tracker would first look to see if the > compute node was associated with a host aggregate that had the > "cpu_allocation_ratio" a metadata item. If one was found, then the host > aggregate's cpu_allocation_ratio would be used. If not, then the 16.0 > default would be used. > > What do you think? > > Best, > -jay > I don't mind this kind of fix. I just want to make sure developers will be able to support it in the future and not yank it out of frustration. =) If a proper fix can be found later, I don't mind. But I want to make sure I don't end up in a "broken" version where I would need to skip 1-2 releases to get a working fix. -- Mathieu From sagarun at gmail.com Tue Jan 30 00:45:46 2018 From: sagarun at gmail.com (Arun SAG) Date: Mon, 29 Jan 2018 16:45:46 -0800 Subject: [Openstack-operators] Race in FixedIP.associate_pool In-Reply-To: References: Message-ID: Hello, On Tue, Dec 12, 2017 at 12:22 PM, Arun SAG wrote: > Hello, > > We are running nova-network in ocata. We use mysql in a master-slave > configuration, The master is read/write, and all reads go to the slave > (slave_connection is set). When we tried to boot multiple VMs in > parallel (lets say 15), we see a race in allocate_for_instance's > FixedIP.associate_pool. We see FixedIP.associate_pool associates an > IP, but later in the code we try to read the allocated FixedIP using > objects.FixedIPList.get_by_instance_uuid and it throws > FixedIPNotFoundException. We also checked the slave replication status > and Seconds_Behind_Master: 0 > [snip] > > This kind of how the logs look like > 2017-12-08 22:33:37,124 DEBUG > [yahoo.contrib.ocata_openstack_yahoo_plugins.nova.network.manager] > /opt/openstack/venv/nova/lib/python2.7/site-packages/yahoo/contrib/ocata_openstack_yahoo_plugins/nova/network/manager.py:get_instance_nw_info:894 > Fixed IP NOT found for instance > 2017-12-08 22:33:37,125 DEBUG > [yahoo.contrib.ocata_openstack_yahoo_plugins.nova.network.manager] > /opt/openstack/venv/nova/lib/python2.7/site-packages/yahoo/contrib/ocata_openstack_yahoo_plugins/nova/network/manager.py:get_instance_nw_info:965 > Built network info: |[]| > 2017-12-08 22:33:37,126 INFO [nova.network.manager] > /opt/openstack/venv/nova/lib/python2.7/site-packages/nova/network/manager.py:allocate_for_instance:428 > Allocated network: '[]' for instance > 2017-12-08 22:33:37,126 ERROR [oslo_messaging.rpc.server] > /opt/openstack/venv/nova/lib/python2.7/site-packages/oslo_messaging/rpc/server.py:_process_incoming:164 > Exception during message handling > Traceback (most recent call last): > File "/opt/openstack/venv/nova/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", > line 155, in _process_incoming > res = self.dispatcher.dispatch(message) > File "/opt/openstack/venv/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 222, in dispatch > return self._do_dispatch(endpoint, method, ctxt, args) > File "/opt/openstack/venv/nova/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", > line 192, in _do_dispatch > result = func(ctxt, **new_args) > File "/opt/openstack/venv/nova/lib/python2.7/site-packages/yahoo/contrib/ocata_openstack_yahoo_plugins/nova/network/manager.py", > line 347, in allocate_for_instance > vif = nw_info[0] > IndexError: list index out of range > > > This problem goes way when we get rid of the slave_connection setting > and just use single master. Has any one else seen this? Any > recommendation to fix this issue? > > This issue is kind of similar to https://bugs.launchpad.net/nova/+bug/1249065 > If anyone is running into db race while running database in master-slave mode with async replication, The bug has been identified and getting fixed here https://bugs.launchpad.net/oslo.db/+bug/1746116 -- Arun S A G http://zer0c00l.in/ From inc007 at gmail.com Tue Jan 30 01:00:35 2018 From: inc007 at gmail.com (=?UTF-8?B?TWljaGHFgiBKYXN0cnrEmWJza2k=?=) Date: Mon, 29 Jan 2018 17:00:35 -0800 Subject: [Openstack-operators] [all][kolla][rdo] Collaboration with Kolla for the RDO test days In-Reply-To: References: Message-ID: Cool, thank you David, sign me up!:) On 29 January 2018 at 05:30, David Moreau Simard wrote: > Hi ! > > For those who might be unfamiliar with the RDO [1] community project: > we hang out in #rdo, we don't bite and we build vanilla OpenStack > packages. > > These packages are what allows you to leverage one of the deployment > projects such as TripleO, PackStack or Kolla to deploy on CentOS or > RHEL. > The RDO community collaborates with these deployment projects by > providing trunk and stable packages in order to let them develop and > test against the latest and the greatest of OpenStack. > > RDO test days typically happen around a week after an upstream > milestone has been reached [2]. > The purpose is to get everyone together in #rdo: developers, users, > operators, maintainers -- and test not just RDO but OpenStack itself > as installed by the different deployment projects. > > We tried something new at our last test day [3] and it worked out great. > Instead of encouraging participants to install their own cloud for > testing things, we supplied a cloud of our own... a bit like a limited > duration TryStack [4]. > This lets users without the operational knowledge, time or hardware to > install an OpenStack environment to see what's coming in the upcoming > release of OpenStack and get the feedback loop going ahead of the > release. > > We used Packstack for the last deployment and invited Packstack cores > to deploy, operate and troubleshoot the installation for the duration > of the test days. > The idea is to rotate between the different deployment projects to > give every interested project a chance to participate. > > Last week, we reached out to Kolla to see if they would be interested > in participating in our next RDO test days [5] around February 8th. > We supply the bare metal hardware and their core contributors get to > deploy and operate a cloud with real users and developers poking > around. > All around, this is a great opportunity to get feedback for RDO, Kolla > and OpenStack. > > We'll be advertising the event a bit more as the test days draw closer > but until then, I thought it was worthwhile to share some context for > this new thing we're doing. > > Let me know if you have any questions ! > > Thanks, > > [1]: https://www.rdoproject.org/ > [2]: https://www.rdoproject.org/testday/ > [3]: https://dmsimard.com/2017/11/29/come-try-a-real-openstack-queens-deployment/ > [4]: http://trystack.org/ > [5]: http://eavesdrop.openstack.org/meetings/kolla/2018/kolla.2018-01-24-16.00.log.html > > David Moreau Simard > Senior Software Engineer | OpenStack RDO > > dmsimard = [irc, github, twitter] > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From ignaziocassano at gmail.com Tue Jan 30 10:20:45 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 30 Jan 2018 11:20:45 +0100 Subject: [Openstack-operators] zabbix templates for openstack Message-ID: Hello guys, I am searching some zabbix templates for openstack for monitoring either openstack infrastructure or openstack instances. I read zcp ceilometer proxy could be useful but I'd like to know if it works with gnocchi because I am using ocata release. At this time zcp seems to work with ceilometer and mongo. Please, has anyone found any template or tool for the above purpose ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Jan 30 16:16:48 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 30 Jan 2018 11:16:48 -0500 Subject: [Openstack-operators] ops meetups team meeting 2018-1-30 Message-ID: Today's meeting was pushed one hour later and we had much better attendance. It was agreed to move the meeting to 10 EST permanently, which is 1300 UTC currently. This unfortunately means it will move around in regions which do not honor daylight savings time. We also agreed that the conversion of the OpenStack Operators Guide (formerly maintained with the rest of the openstack docs) to a wiki is a suitable starting point, please see https://wiki.openstack.org/wiki/OpsGuide and note that now the conversion has been agreed, we plan to update the content (mostly untouched, currently, since 2013). If you have comments but do not have privilege to edit the wiki directly please post your comments to this mailing list (openstack operators). Meeting Minutes are here http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-30-15.01.html (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-30-15.01.txt Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-01-30-15.01.log.html Regards, Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Tue Jan 30 22:42:12 2018 From: stig.openstack at telfer.org (Stig Telfer) Date: Tue, 30 Jan 2018 22:42:12 +0000 Subject: [Openstack-operators] [scientific] IRC meeting: preemptible instances and upcoming events Message-ID: <304614B7-A3FC-45EC-9BCD-2EFAE1288B72@telfer.org> Hi All - We have a Scientific SIG IRC meeting on Wednesday at 1100 UTC in channel #openstack-meeting. Everyone is welcome. This week’s agenda includes an update on recent work towards preemptible “spot” instances. We also have a few events on the calendar to discuss and plan for. Full agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_31st_2018 Best wishes Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Tue Jan 30 23:59:00 2018 From: melwittt at gmail.com (melanie witt) Date: Tue, 30 Jan 2018 15:59:00 -0800 Subject: [Openstack-operators] [openstack-dev] Race in FixedIP.associate_pool In-Reply-To: References: Message-ID: <72BDE5F0-112C-4C1F-87ED-69C508716C68@gmail.com> > On Jan 29, 2018, at 16:45, Arun SAG wrote: > > If anyone is running into db race while running database in > master-slave mode with async replication, The bug has been identified > and getting fixed here > https://bugs.launchpad.net/oslo.db/+bug/1746116 Thanks for your persistence in tracking down the problem and raising the bug. If you get a chance, do please test the proposed patch in your environment to help ensure there aren’t any loose ends left. Once the fix is merged, I think we should propose backports to stable/pike and stable/ocata, do .z releases for them, and bump oslo.db in upper-constraints.txt in the openstack/requirements repo for the stable/pike and stable/ocata branches. That way, operators running from stable can get the fix by upgrading their oslo.db packages to those .z releases. -melanie From shilla.saebi at gmail.com Wed Jan 31 00:57:42 2018 From: shilla.saebi at gmail.com (Shilla Saebi) Date: Tue, 30 Jan 2018 19:57:42 -0500 Subject: [Openstack-operators] User Committee Election coming soon! Message-ID: Hello Everyone, The OpenStack User Committee will be holding an election in February, per the (UC) bylaws and charter . The current UC will serve until the elections in February, and at that point, the current two UC members who still have 6 months to serve, get a 6-month seat, and an election is run to determine the other three members. Candidates ranking 1st, 2nd, and 3rd, will get a one-year seat. Voting for the 2018 UC members will be granted to the Active User Contributors (AUC). Open candidacy for the UC positions will be from January 29 - February 11, 05:59 UTC. Voting for the User Committee (UC) members will be open on February12th and will remain open until February 18, 11:59 UTC. As a reminder, please see the community code of conduct ( http://www.openstack.org/legal/community-code-of-conduct/) Please let me, or anyone from the UC know if you have any questions, comments or concerns. Thank you, Shilla Saebi -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at citynetwork.se Wed Jan 31 11:37:20 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Wed, 31 Jan 2018 12:37:20 +0100 Subject: [Openstack-operators] [publiccloud-wg] Reminder for todays meeting Message-ID: <506bf34d-12b6-8c2c-05f6-2ba0195e04ee@citynetwork.se> Hi all, Time again for a meeting for the Public Cloud WG - today at 1400 UTC in #openstack-meeting-3 Agenda and etherpad at: https://etherpad.openstack.org/p/publiccloud-wg See you later! Tobias Rydberg -- Tobias Rydberg Senior Developer Mobile: +46 733 312780 www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From jean-daniel.bonnetot at corp.ovh.com Wed Jan 31 13:46:21 2018 From: jean-daniel.bonnetot at corp.ovh.com (Jean-Daniel Bonnetot) Date: Wed, 31 Jan 2018 13:46:21 +0000 Subject: [Openstack-operators] [User-committee] [publiccloud-wg] Reminder for todays meeting In-Reply-To: <506bf34d-12b6-8c2c-05f6-2ba0195e04ee@citynetwork.se> References: <506bf34d-12b6-8c2c-05f6-2ba0195e04ee@citynetwork.se> Message-ID: <74312438-3380-4728-8909-466CA8FFC8E5@corp.ovh.com> Hi, For me it's not the best time frame. A have a weekly meeting on that time. Is it possible to move our weekly meeting 30min later ? If it's not possible for this meeting, maybe it's doable for the next ones? Jean-Daniel Bonnetot ovh.com | @pilgrimstack On 31 Jan 2018, at 12:37, Tobias Rydberg > wrote: Hi all, Time again for a meeting for the Public Cloud WG - today at 1400 UTC in #openstack-meeting-3 Agenda and etherpad at: https://etherpad.openstack.org/p/publiccloud-wg See you later! Tobias Rydberg -- Tobias Rydberg Senior Developer Mobile: +46 733 312780 www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED _______________________________________________ User-committee mailing list User-committee at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias at citynetwork.se Wed Jan 31 13:55:37 2018 From: tobias at citynetwork.se (Tobias Rydberg) Date: Wed, 31 Jan 2018 14:55:37 +0100 Subject: [Openstack-operators] [User-committee] [publiccloud-wg] Reminder for todays meeting In-Reply-To: <74312438-3380-4728-8909-466CA8FFC8E5@corp.ovh.com> References: <506bf34d-12b6-8c2c-05f6-2ba0195e04ee@citynetwork.se> <74312438-3380-4728-8909-466CA8FFC8E5@corp.ovh.com> Message-ID: <75d5792f-8b3a-c988-888b-fbd33039c651@citynetwork.se> Hi, It's maybe time for a new doodle about bi-weekly meeting ... potentially also about going to weekly meeting? Personally I'm fine with morning times as well, if that is better for more people. Tobias On 2018-01-31 14:46, Jean-Daniel Bonnetot wrote: > Hi, > > For me it's not the best time frame. A have a weekly meeting on that time. > Is it possible to move our weekly meeting 30min later ? > If it's not possible for this meeting, maybe it's doable for the next > ones? > > Jean-Daniel Bonnetot > ovh.com | @pilgrimstack > > > > > >> On 31 Jan 2018, at 12:37, Tobias Rydberg > > wrote: >> >> Hi all, >> >> Time again for a meeting for the Public Cloud WG - today at 1400 UTC >> in #openstack-meeting-3 >> >> Agenda and etherpad at: >> https://etherpad.openstack.org/p/publiccloud-wg >> >> >> See you later! >> >> Tobias Rydberg >> >> -- >> Tobias Rydberg >> Senior Developer >> Mobile: +46 733 312780 >> >> www.citynetwork.eu | www.citycloud.com >> >> >> INNOVATION THROUGH OPEN IT INFRASTRUCTURE >> ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED >> >> >> _______________________________________________ >> User-committee mailing list >> User-committee at lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3945 bytes Desc: S/MIME Cryptographic Signature URL: From zhipengh512 at gmail.com Wed Jan 31 14:01:06 2018 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Wed, 31 Jan 2018 22:01:06 +0800 Subject: [Openstack-operators] [User-committee] [publiccloud-wg] Reminder for todays meeting In-Reply-To: <75d5792f-8b3a-c988-888b-fbd33039c651@citynetwork.se> References: <506bf34d-12b6-8c2c-05f6-2ba0195e04ee@citynetwork.se> <74312438-3380-4728-8909-466CA8FFC8E5@corp.ovh.com> <75d5792f-8b3a-c988-888b-fbd33039c651@citynetwork.se> Message-ID: shall we cancel the meeting for today then ? On Wed, Jan 31, 2018 at 9:55 PM, Tobias Rydberg wrote: > Hi, > > It's maybe time for a new doodle about bi-weekly meeting ... potentially > also about going to weekly meeting? > > Personally I'm fine with morning times as well, if that is better for more > people. > > Tobias > > On 2018-01-31 14:46, Jean-Daniel Bonnetot wrote: > > Hi, > > For me it's not the best time frame. A have a weekly meeting on that time. > Is it possible to move our weekly meeting 30min later ? > If it's not possible for this meeting, maybe it's doable for the next ones? > > Jean-Daniel Bonnetot > ovh.com | @pilgrimstack > > > > > > On 31 Jan 2018, at 12:37, Tobias Rydberg wrote: > > Hi all, > > Time again for a meeting for the Public Cloud WG - today at 1400 UTC in > #openstack-meeting-3 > > Agenda and etherpad at: https://etherpad.openstack.org/p/publiccloud-wg > > See you later! > > Tobias Rydberg > > -- > Tobias Rydberg > Senior Developer > Mobile: +46 733 312780 > > www.citynetwork.eu | www.citycloud.com > > INNOVATION THROUGH OPEN IT INFRASTRUCTURE > ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED > > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee > > > > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee > > -- Zhipeng (Howard) Huang Standard Engineer IT Standard & Patent/IT Product Line Huawei Technologies Co,. Ltd Email: huangzhipeng at huawei.com Office: Huawei Industrial Base, Longgang, Shenzhen (Previous) Research Assistant Mobile Ad-Hoc Network Lab, Calit2 University of California, Irvine Email: zhipengh at uci.edu Office: Calit2 Building Room 2402 OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Wed Jan 31 17:52:16 2018 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 31 Jan 2018 09:52:16 -0800 Subject: [Openstack-operators] [neutron][lbaas][neutron-lbaas][octavia] Announcing the deprecation of neutron-lbaas and neutron-lbaas-dashboard Message-ID: Today we are announcing the start of the deprecation cycle for neutron-lbaas and neutron-lbaas-dashboard. As part of the neutron stadium evolution [1], neutron-lbaas was identified as a project that should spin out of neutron and become its own project. The specification detailing this process was approved [2] during the newton OpenStack release cycle. OpenStack load balancing no longer requires deep access into the neutron code base and database. All of the required networking capabilities are now available via stable APIs. This change de-couples the load balancing release versioning from the rest of the OpenStack deployment. Since Octavia uses stable APIs when interacting with other OpenStack services, you can run a different version of Octavia in relation to your OpenStack cloud deployment. Per OpenStack deprecation policy, both projects will continue to receive support and bug fixes during the deprecation cycle, but no new features will be added to either project. All future feature enhancements will now occur on the Octavia project(s) [3]. We are not announcing the end of the deprecation cycle at this time, but it will follow OpenStack policy of at least two release cycles prior to retirement. This means that the first release that these projects could be retired would be the “T” OpenStack release cycle. We have created a Frequently Asked Questions (FAQ) wiki page to help answer additional questions you may have about this process: https://wiki.openstack.org/wiki/Neutron/LBaaS/Deprecation For more information or if you have additional questions, please see the following resources: The FAQ: https://wiki.openstack.org/wiki/Neutron/LBaaS/Deprecation The Octavia documentation: https://docs.openstack.org/octavia/latest/ Reach out to us via IRC on the Freenode IRC network, channel #openstack-lbaas Weekly Meeting: 20:00 UTC on Wednesdays in #openstack-lbaas on the Freenode IRC network. Sending email to the OpenStack developer mailing list: openstack-dev [at] lists [dot] openstack [dot] org. Please prefix the subject with '[openstack-dev][Octavia]' Thank you for your support and patience during this transition, Michael Johnson Octavia PTL [1] http://specs.openstack.org/openstack/neutron-specs/specs/newton/neutron-stadium.html [2] http://specs.openstack.org/openstack/neutron-specs/specs/newton/kill-neutron-lbaas.html [3] https://governance.openstack.org/tc/reference/projects/octavia.html From shilla.saebi at gmail.com Wed Jan 31 17:59:54 2018 From: shilla.saebi at gmail.com (Shilla Saebi) Date: Wed, 31 Jan 2018 12:59:54 -0500 Subject: [Openstack-operators] [Updated] User Committee Election coming soon! Message-ID: Forgot to mention that additional details and the process for nomination can be found here and we look forward to receiving your submissions. On Tue, Jan 30, 2018 at 7:57 PM, Shilla Saebi wrote: > Hello Everyone, > > > > The OpenStack User Committee will be holding an election in February, per > the (UC) bylaws and charter > . The current > UC will serve until the elections in February, and at that point, the > current two UC members who still have 6 months to serve, get a 6-month > seat, and an election is run to determine the other three members. > Candidates ranking 1st, 2nd, and 3rd, will get a one-year seat. Voting > for the 2018 UC members will be granted to the Active User Contributors > (AUC). > > > > Open candidacy for the UC positions will be from January 29 - February 11, > 05:59 UTC. Voting for the User Committee (UC) members will be open on > February12th and will remain open until February 18, 11:59 UTC. > > > As a reminder, please see the community code of conduct ( > http://www.openstack.org/legal/community-code-of-conduct/) > > > > Please let me, or anyone from the UC know if you have any questions, > comments or concerns. > > > > Thank you, > > > Shilla Saebi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bitskrieg at bitskrieg.net Wed Jan 31 21:16:16 2018 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Wed, 31 Jan 2018 16:16:16 -0500 Subject: [Openstack-operators] [nova] nova-compute automatically disabling itself? In-Reply-To: References: Message-ID: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> All, Running in to a strange issue I haven't seen before. Randomly, the nova-compute services on compute nodes are disabling themselves (as if someone ran openstack compute service set --disable hostX nova-compute. When this happens, the node continues to report itself as 'up' - the service is just disabled. As a result, if enough of these occur, we get scheduling errors due to lack of available resources (which makes sense). Re-enabling them works just fine and they continue on as if nothing happened. I looked through the logs and I can find the API calls where we re-enable the services (PUT /v2.1/os-services/enable), but I do not see any API calls where the services are getting disabled initially. Is anyone aware of any cases where compute nodes will automatically disable their nova-compute service on their own, or has anyone seen this before and might know a root cause? We have plenty of spare vcpus and RAM on each node - like less than 25% utilization (both in absolute terms and in terms of applied ratios). We're seeing follow-on errors regarding rmq messages getting lost and vif-plug failures, but we think those are a symptom, not a cause. Currently running pike on Xenial. --- v/r Chris Apsey bitskrieg at bitskrieg.net https://www.bitskrieg.net From mriedemos at gmail.com Wed Jan 31 21:40:36 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 31 Jan 2018 15:40:36 -0600 Subject: [Openstack-operators] [nova] nova-compute automatically disabling itself? In-Reply-To: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> References: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> Message-ID: <172e32f3-e23e-15ce-33fa-6cd2af93eb73@gmail.com> On 1/31/2018 3:16 PM, Chris Apsey wrote: > All, > > Running in to a strange issue I haven't seen before. > > Randomly, the nova-compute services on compute nodes are disabling > themselves (as if someone ran openstack compute service set --disable > hostX nova-compute.  When this happens, the node continues to report > itself as 'up' - the service is just disabled.  As a result, if enough > of these occur, we get scheduling errors due to lack of available > resources (which makes sense).  Re-enabling them works just fine and > they continue on as if nothing happened.  I looked through the logs and > I can find the API calls where we re-enable the services (PUT > /v2.1/os-services/enable), but I do not see any API calls where the > services are getting disabled initially. > > Is anyone aware of any cases where compute nodes will automatically > disable their nova-compute service on their own, or has anyone seen this > before and might know a root cause?  We have plenty of spare vcpus and > RAM on each node - like less than 25% utilization (both in absolute > terms and in terms of applied ratios). > > We're seeing follow-on errors regarding rmq messages getting lost and > vif-plug failures, but we think those are a symptom, not a cause. > > Currently running pike on Xenial. > > --- > v/r > > Chris Apsey > bitskrieg at bitskrieg.net > https://www.bitskrieg.net > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators This is actually a feature added in Pike: https://review.openstack.org/#/c/463597/ This came up in discussion with operators at the Forum in Boston. The vif-plug failures are likely the reason those computes are getting disabled. There is a config option "consecutive_build_service_disable_threshold" which you can set to disable the auto-disable behavior as some have experienced issues with it: https://bugs.launchpad.net/nova/+bug/1742102 -- Thanks, Matt From openstack at fried.cc Wed Jan 31 21:46:01 2018 From: openstack at fried.cc (Eric Fried) Date: Wed, 31 Jan 2018 15:46:01 -0600 Subject: [Openstack-operators] [nova] nova-compute automatically disabling itself? In-Reply-To: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> References: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> Message-ID: <2cf1c096-4d74-2a37-a398-7404f5aa8c88@fried.cc> There's [1], but I would have expected you to see error logs like [2] if that's what you're hitting. [1] https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L627-L645 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1714-L1716 efried On 01/31/2018 03:16 PM, Chris Apsey wrote: > All, > > Running in to a strange issue I haven't seen before. > > Randomly, the nova-compute services on compute nodes are disabling > themselves (as if someone ran openstack compute service set --disable > hostX nova-compute.  When this happens, the node continues to report > itself as 'up' - the service is just disabled.  As a result, if enough > of these occur, we get scheduling errors due to lack of available > resources (which makes sense).  Re-enabling them works just fine and > they continue on as if nothing happened.  I looked through the logs and > I can find the API calls where we re-enable the services (PUT > /v2.1/os-services/enable), but I do not see any API calls where the > services are getting disabled initially. > > Is anyone aware of any cases where compute nodes will automatically > disable their nova-compute service on their own, or has anyone seen this > before and might know a root cause?  We have plenty of spare vcpus and > RAM on each node - like less than 25% utilization (both in absolute > terms and in terms of applied ratios). > > We're seeing follow-on errors regarding rmq messages getting lost and > vif-plug failures, but we think those are a symptom, not a cause. > > Currently running pike on Xenial. > > --- > v/r > > Chris Apsey > bitskrieg at bitskrieg.net > https://www.bitskrieg.net > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From bitskrieg at bitskrieg.net Wed Jan 31 21:47:06 2018 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Wed, 31 Jan 2018 16:47:06 -0500 Subject: [Openstack-operators] [nova] nova-compute automatically disabling itself? In-Reply-To: <172e32f3-e23e-15ce-33fa-6cd2af93eb73@gmail.com> References: <6dcf6baa4104c1923fc8e954dbf2737a@bitskrieg.net> <172e32f3-e23e-15ce-33fa-6cd2af93eb73@gmail.com> Message-ID: That looks promising. I'll report back to confirm the solution. Thanks! --- v/r Chris Apsey bitskrieg at bitskrieg.net https://www.bitskrieg.net On 2018-01-31 04:40 PM, Matt Riedemann wrote: > On 1/31/2018 3:16 PM, Chris Apsey wrote: >> All, >> >> Running in to a strange issue I haven't seen before. >> >> Randomly, the nova-compute services on compute nodes are disabling >> themselves (as if someone ran openstack compute service set --disable >> hostX nova-compute.  When this happens, the node continues to report >> itself as 'up' - the service is just disabled.  As a result, if enough >> of these occur, we get scheduling errors due to lack of available >> resources (which makes sense).  Re-enabling them works just fine and >> they continue on as if nothing happened.  I looked through the logs >> and I can find the API calls where we re-enable the services (PUT >> /v2.1/os-services/enable), but I do not see any API calls where the >> services are getting disabled initially. >> >> Is anyone aware of any cases where compute nodes will automatically >> disable their nova-compute service on their own, or has anyone seen >> this before and might know a root cause?  We have plenty of spare >> vcpus and RAM on each node - like less than 25% utilization (both in >> absolute terms and in terms of applied ratios). >> >> We're seeing follow-on errors regarding rmq messages getting lost and >> vif-plug failures, but we think those are a symptom, not a cause. >> >> Currently running pike on Xenial. >> >> --- >> v/r >> >> Chris Apsey >> bitskrieg at bitskrieg.net >> https://www.bitskrieg.net >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > This is actually a feature added in Pike: > > https://review.openstack.org/#/c/463597/ > > This came up in discussion with operators at the Forum in Boston. > > The vif-plug failures are likely the reason those computes are getting > disabled. > > There is a config option "consecutive_build_service_disable_threshold" > which you can set to disable the auto-disable behavior as some have > experienced issues with it: > > https://bugs.launchpad.net/nova/+bug/1742102 From jm6819 at att.com Wed Jan 31 23:00:56 2018 From: jm6819 at att.com (MCCABE, JAMEY A) Date: Wed, 31 Jan 2018 23:00:56 +0000 Subject: [Openstack-operators] [LCOO][RBAC] Topical session 2/1 1200 UTC Message-ID: <6146864A3C560F4BBA6764F9A4188941548010DD@MOSTLS1MSGUSRFA.ITServices.sbc.com> A topical session on Multi-cloud Security will be at 1200 UTC on Thursday 2/1. This is a type of LCOO working group discussion forum we conduct on a monthly basis. Agenda and logistics to join can be found at https://openstack-lcoo.atlassian.net/wiki/spaces/LCOO/pages/65175564/ -------------- next part -------------- An HTML attachment was scrubbed... URL: