From colleen at gazlene.net Sat Jun 1 00:17:56 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 31 May 2019 17:17:56 -0700 Subject: [keystone] Keystone Team Update - Week of 27 May 2019 Message-ID: <462238d5-1fb1-4e71-a7fe-6073fe58e2c7@www.fastmail.com> # Keystone Team Update - Week of 27 May 2019 ## News ### Admin Endpoint in Keystonemiddleware Currently, keystonemiddleware is hardcoded to use the admin endpoint to communicate with keystone. With the removal of the v2 API, having an admin endpoint shouldn't be necessary, so Jens is working on making this configurable[1]. There has been a fair amount of debate over how to do this transition and what the new default should be. Please respond on the patch with your thoughts. [1] https://review.opendev.org/651790 ### Unit Test Refactor Lance is working on refactoring the protection unit tests to avoid calling setUp() repetitively. There was discussion about the best way to do this[2], given that we make a lot of use of instance methods in the unit tests, especially with regard to fixtures. [2] http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2019-05-30.log.html#t2019-05-30T14:19:25 ### M-1 Team Check-in As discussed at the PTG, we'l lbe holding milestone-ly check-ins and retrospectives to try to keep up momentum throughout the cycle. The first one is scheduled for June 11, 15:00-17:00 UTC[3]. [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006783.html ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh Spec proposals for Train are due next week! If you are planning a feature for Train, please propose the spec ASAP or it will not be accepted for Train. ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 7 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 38 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. This includes important changes like Train specs, the removal of PKI support from keystonemiddleware[4], an update to our vision reflection[5], and a change to make keystoneauth's error handling conform to the API-SIG's guidelines[6]. [4] https://review.opendev.org/613675 [5] https://review.opendev.org/662106 [6] https://review.opendev.org/662281 ## Bugs This week we opened 1 new bugs and closed 5. Bugs opened (1) Bug #1831100 (keystone:Undecided) opened by Kris Watson https://bugs.launchpad.net/keystone/+bug/1831100 Bugs closed (1) Bug #1807697 (keystone:Wishlist) https://bugs.launchpad.net/keystone/+bug/1807697 Bugs fixed (4) Bug #1815771 (keystone:Medium) fixed by Jose Castro Leon https://bugs.launchpad.net/keystone/+bug/1815771 Bug #1804700 (keystone:Low) fixed by Gage Hugo https://bugs.launchpad.net/keystone/+bug/1804700 Bug #1801101 (keystoneauth:Undecided) fixed by Chinmay Naik https://bugs.launchpad.net/keystoneauth/+bug/1801101 Bug #1827008 (keystoneauth:Undecided) fixed by jacky06 https://bugs.launchpad.net/keystoneauth/+bug/1827008 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Next week is spec proposal freeze. Please ensure that the specs you are planning are proposed ASAP. Reviews of proposed specs are also welcome. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From gagehugo at gmail.com Sat Jun 1 00:34:17 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Fri, 31 May 2019 19:34:17 -0500 Subject: [security] Security SIG Newsletter Message-ID: #Week of: 30 May 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-05-30-15.00.html - nickthetait offered to start helping with cleaning up & updating the security guide docs ## News - Interesting article: https://duo.com/decipher/docker-bug-allows-root-access-to-host-file-system # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Sat Jun 1 07:26:45 2019 From: aj at suse.com (Andreas Jaeger) Date: Sat, 1 Jun 2019 09:26:45 +0200 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: <20190503190538.GB3377@localhost.localdomain> <20190515175110.26i2xuclkksgx744@arabian.linksys.moosehall> <8d81b9a7-b460-43e1-a774-9bd65ee42143@www.fastmail.com> <20190530180658.xgpcy35au72ccmzt@yuggoth.org> Message-ID: On 01/06/2019 01.50, Clark Boylan wrote: > Close, I think we can archive all repos in openstack-dev and openstack-infra. Part of the repo renames we did today were to get the repos that were left behind in those two orgs into their longer term homes. Then any project in https://github.com/openstack that is not in https://opendev.org/openstack can be archived in Github too. > Once https://review.opendev.org/661803 merged, we can archive openstack-infra. openstack-dev is already unused. We have then only retired repos in openstack-infra, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From mnaser at vexxhost.com Sat Jun 1 12:35:14 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 1 Jun 2019 08:35:14 -0400 Subject: [qa][openstack-ansible] redefining devstack Message-ID: Hi everyone, This is something that I've discussed with a few people over time and I think I'd probably want to bring it up by now. I'd like to propose and ask if it makes sense to perhaps replace devstack entirely with openstack-ansible. I think I have quite a few compelling reasons to do this that I'd like to outline, as well as why I *feel* (and I could be biased here, so call me out!) that OSA is the best option in terms of a 'replacement' # Why not another deployment project? I actually thought about this part too and considered this mainly for ease of use for a *developer*. At this point, Puppet-OpenStack pretty much only deploys packages (which means that it has no build infrastructure, a developer can't just get $commit checked out and deployed). TripleO uses Kolla containers AFAIK and those have to be pre-built beforehand, also, I feel they are much harder to use as a developer because if you want to make quick edits and restart services, you have to enter a container and make the edit there and somehow restart the service without the container going back to it's original state. Kolla-Ansible and the other combinations also suffer from the same "issue". OpenStack Ansible is unique in the way that it pretty much just builds a virtualenv and installs packages inside of it. The services are deployed as systemd units. This is very much similar to the current state of devstack at the moment (minus the virtualenv part, afaik). It makes it pretty straight forward to go and edit code if you need/have to. We also have support for Debian, CentOS, Ubuntu and SUSE. This allows "devstack 2.0" to have far more coverage and make it much more easy to deploy on a wider variety of operating systems. It also has the ability to use commits checked out from Zuul so all the fancy Depends-On stuff we use works. # Why do we care about this, I like my bash scripts! As someone who's been around for a *really* long time in OpenStack, I've seen a whole lot of really weird issues surface from the usage of DevStack to do CI gating. For example, one of the recent things is the fact it relies on installing package-shipped noVNC, where as the 'master' noVNC has actually changed behavior a few months back and it is completely incompatible at this point (it's just a ticking thing until we realize we're entirely broken). To this day, I still see people who want to POC something up with OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter how many warnings we'll put up, they'll always try to do it. With this way, at least they'll have something that has the shape of an actual real deployment. In addition, it would be *good* in the overall scheme of things for a deployment system to test against, because this would make sure things don't break in both ways. Also: we run Zuul for our CI which supports Ansible natively, this can remove one layer of indirection (Zuul to run Bash) and have Zuul run the playbooks directly from the executor. # So how could we do this? The OpenStack Ansible project is made of many roles that are all composable, therefore, you can think of it as a combination of both Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained the base modules (i.e. puppet-nova, etc) and TripleO was the integration of all of it in a distribution. OSA is currently both, but it also includes both Ansible roles and playbooks. In order to make sure we maintain as much of backwards compatibility as possible, we can simply run a small script which does a mapping of devstack => OSA variables to make sure that the service is shipped with all the necessary features as per local.conf. So the new process could be: 1) parse local.conf and generate Ansible variables files 2) install Ansible (if not running in gate) 3) run playbooks using variable generated in #1 The neat thing is after all of this, devstack just becomes a thin wrapper around Ansible roles. I also think it brings a lot of hands together, involving both the QA team and OSA team together, which I believe that pooling our resources will greatly help in being able to get more done and avoiding duplicating our efforts. # Conclusion This is a start of a very open ended discussion, I'm sure there is a lot of details involved here in the implementation that will surface, but I think it could be a good step overall in simplifying our CI and adding more coverage for real potential deployers. It will help two teams unite together and have more resources for something (that essentially is somewhat of duplicated effort at the moment). I will try to pick up sometime to POC a simple service being deployed by an OSA role instead of Bash, placement which seems like a very simple one and share that eventually. Thoughts? :) -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From skaplons at redhat.com Sat Jun 1 17:35:23 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Sat, 1 Jun 2019 19:35:23 +0200 Subject: [neutron][networking-ovn] Core team updates In-Reply-To: <2e3ac83e-63bd-2107-2d41-943d483b0687@redhat.com> References: <2e3ac83e-63bd-2107-2d41-943d483b0687@redhat.com> Message-ID: Congrats Kuba and good luck Miguel in Your new role :) > On 31 May 2019, at 10:53, Jakub Libosvar wrote: > > Thanks for your trust! I'll try to do my best! Looking forward to our > future collaboration. > > Jakub > > On 31/05/2019 10:38, Lucas Alvares Gomes wrote: >> Hi all, >> >> I'd like to welcome Jakub Libosvar to the networking-ovn core team. >> The team was in need for more reviewers with +2/+A power and Jakub's >> reviews have been super high quality [0][1]. He's also helping the >> project out in many other different efforts such as bringing in the >> full stack test suit and bug fixes. >> >> Also, Miguel Ajo has changed focus from OVN/networking-ovn and is been >> dropped from the core team. Of course, we will welcome him back when >> his activity picks back up again. >> >> Thank you Jakub and Miguel! >> >> [0] https://www.stackalytics.com/report/contribution/networking-ovn/30 >> [1] https://www.stackalytics.com/report/contribution/networking-ovn/90 >> >> Cheers, >> Lucas >> > > — Slawek Kaplonski Senior software engineer Red Hat From skaplons at redhat.com Sat Jun 1 17:46:11 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Sat, 1 Jun 2019 19:46:11 +0200 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: Hi, I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > — Slawek Kaplonski Senior software engineer Red Hat From mnaser at vexxhost.com Sat Jun 1 18:49:10 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 1 Jun 2019 14:49:10 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > Hi, > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? Not a dumb question at all. So, we do have this concept of 'roles' which you _could_ kinda technically identify similar to plugins. However, I think one of the things that would maybe come out of this is the inability for projects to maintain their own plugins (because now you can host neutron/devstack/plugins and you maintain that repo yourself), under this structure, you would indeed have to make those changes to the OpenStack Ansible Neutron role i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron However, I think from an OSA perspective, we would be more than happy to add project maintainers for specific projects to their appropriate roles. It would make sense that there is someone from the Neutron team that could be a core on os_neutron from example. > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. Indeed, with our current CI infrastructure with OSA, we have the ability to create these dynamic scenarios (which can actually be defined by a simple Zuul variable). https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 We do some really neat introspection of the project name being tested in order to run specific scenarios. Therefore, that is something that should be quite easy to accomplish simply by overriding a scenario name within Zuul. It also is worth mentioning we now support full metal deploys for a while now, so not having to worry about containers is something to keep in mind as well (with simplifying the developer experience again). > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From clemens.hardewig at crandale.de Sun Jun 2 19:09:23 2019 From: clemens.hardewig at crandale.de (Clemens Hardewig) Date: Sun, 2 Jun 2019 21:09:23 +0200 Subject: [nova] Bug #1755266: How to proceed with test failures Message-ID: Hi there, Since Pike I am struggling with instance migration/instance resize for flavors whose swap space is provided via an lvm volume; according to my understanding the default behavior if cinder uses lvm as a backend driver (at least I could not convince cinder to behave different …). I am somewhat surprised that I seem to be the only one who has some problems with that behavior - according to my understanding you are coming into this constellation automatically when simply following the manual installation procedure as being described in the official Openstack docs... Anyway I opened the bug above, however it did not find some interest and I tried then as a python newbie to get it fixed by my own. After a lengthy live test phase of my changes in the driver.py across Pike, Queens, Rocky, and now also Stein, I made then my first commit (yeee - got it done) to the master branch, had a good short conversation with Eric in Berlin on it, fixed some code format issues Zuul was rightfully complaining about but then unfortunately failed some further tests in Zuul and other test areas. (see https://review.opendev.org/#/c/618621/ ). Related to my changes, tox gets me an error: nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_finish_migration_power_on --------------------------------------------------------------------------------------------- Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "nova/tests/unit/virt/libvirt/test_driver.py", line 18707, in test_finish_migration_power_on self._test_finish_migration() File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "nova/tests/unit/virt/libvirt/test_driver.py", line 18662, in _test_finish_migration mock_raw_to_qcow2.assert_has_calls(convert_calls, any_order=True) File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", line 983, in assert_has_calls ), cause) File "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/six.py", line 737, in raise_from raise value AssertionError: (call('/tmp/tmpC60sPk/tmpOHikWL/8ea1de33-64d7-4d1d-af02-88e6f7ec91c1/disk.swap'),) not all found in call list but I failed yet to develop any idea what needs to be done to get the test failure fixed. I am missing context here how the logic of the test code works; therefore I would like to ask whether somebody could point me in the right direction what needs to be done to get the failed unit/Zuul/other Tests passed. Are there any docs or helps alongside testing in Nova where to learn what to do? Perhaps also someone could give me a hint whether there are some conceptually misleading ideas behind my fix proposal which need to be driven into another direction … I am looking forward to your reply Best regards Clemens Clemens Hardewig -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3898 bytes Desc: not available URL: From henry at thebonaths.com Mon Jun 3 03:45:35 2019 From: henry at thebonaths.com (Henry Bonath) Date: Sun, 2 Jun 2019 23:45:35 -0400 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: References: Message-ID: I think the idea here, at least for me, would be to have it rolled into the deployment automatically - in a similar fashion to how horizon themes are deployed within Openstack-Ansible. Obviously having this specific driver in the tree would solve my specific issue, but I don't know how many more third party Cinder drivers which are not packaged into the tree people are deploying these days. My question for the community is simply finding out if this mechanism exists already. On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard wrote: > > > > On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: > > Hello, I asked this into IRC but I thought this might be a more > > appropriate place to ask considering the IRC channel usage over the > > weekend. > > > > If I wanted to deploy a third party driver along with my Cinder-Volume > > container, is there a built-in mechanism for doing so? (I am > > specifically wanting to use: https://github.com/iXsystems/cinder) > > > > I am able to configure a cinder-backend in the > > "openstack_user_config.yml" file which works perfectly if I let it > > fail during the first run, then copy the driver into the containers > > and run "os-cinder-install.yml" a second time. > > > > I've found that you guys have built similar stuff into the system > > (e.g. Horizon custom Theme installation via .tgz) and was curious if > > there is a similar mechanism for Cinder Drivers that may be > > undocumented. > > > > http://paste.openstack.org/show/752132/ > > This is an example of my working config, which relies on the driver > > being copied into the > > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ > > folder. > > > > Thanks in advance! > > > > > > I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... > > Regards, > Jean-Philippe Evrard (evrardjp) > From madhuri.kumari at intel.com Mon Jun 3 05:53:56 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Mon, 3 Jun 2019 05:53:56 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Hi Ironic, Nova Developers, I am currently working on implementing Intel Speed Select(ISS) feature[1] in Ironic and I have a use case where I want to change ISS configuration in BIOS after a node is provisioned. Such use case of changing the configuration post deployment is common and not specific to ISS. A real-life example for such a required post-deploy configuration change is the change of BIOS settings to disable hyper-threading in order to address a security vulnerability. Currently there is no way of changing any BIOS configuration after a node is provisioned in Ironic. One solution for it is to allow manual deploy steps in Ironic[2](not implemented yet) which can be trigged by changing traits in Nova. For this purpose, we would need to change a trait of the server's flavor in Nova. This trait is mapped to a deploy step in Ironic which does some operation(change BIOS config and reboot in this use case). In Nova, the only API to change trait in flavor is resize whereas resize does migration and a reboot as well. In short, I am looking for a Nova API that only changes the traits, and trigger the ironic deploy steps but no reboot and migration. Please suggest. Thanks in advance. Regards, Madhuri [1] https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html [2] https://storyboard.openstack.org/#!/story/2005129 -------------- next part -------------- An HTML attachment was scrubbed... URL: From manu.km at idrive.com Mon Jun 3 07:31:20 2019 From: manu.km at idrive.com (Manu K M) Date: Mon, 3 Jun 2019 13:01:20 +0530 Subject: [swift] How to track the rest api call count Message-ID: Hi there I have to keep track of the no of rest call made by a specific account/tenant to my swift cluster. Ceilometer provides only the number of incoming and outgoing bytes. -- Regards Manu K M -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 08:46:11 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 09:46:11 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: On Mon, 3 Jun 2019 at 06:57, Kumari, Madhuri wrote: > Hi Ironic, Nova Developers, > > > > I am currently working on implementing Intel Speed Select(ISS) feature[1] > in Ironic and I have a use case where I want to change ISS configuration in > BIOS after a node is provisioned. > > Such use case of changing the configuration post deployment is common and > not specific to ISS. A real-life example for such a required post-deploy > configuration change is the change of BIOS settings to disable > hyper-threading in order to address a security vulnerability. > > Currently there is no way of changing any BIOS configuration after a node > is provisioned in Ironic. One solution for it is to allow manual deploy > steps in Ironic[2](not implemented yet) which can be trigged by changing > traits in Nova. > > For this purpose, we would need to change a trait of the server’s flavor > in Nova. This trait is mapped to a deploy step in Ironic which does some > operation(change BIOS config and reboot in this use case). > > In Nova, the only API to change trait in flavor is resize whereas resize > does migration and a reboot as well. > > In short, I am looking for a Nova API that only changes the traits, and > trigger the ironic deploy steps but no reboot and migration. Please suggest. > > > Hi, it is possible to modify a flavor (openstack flavor set --property =). However, changes to a flavor are not reflected in instances that were previously created from that flavor. Internally, nova stores an 'embedded flavor' in the instance state. I'm not aware of any API that would allow modifying the embedded flavor, nor any process that would synchronise those changes to ironic. Thanks in advance. > > > > Regards, > > Madhuri > > [1] > https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html > > [2] https://storyboard.openstack.org/#!/story/2005129 > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noonedeadpunk at ya.ru Mon Jun 3 11:04:37 2019 From: noonedeadpunk at ya.ru (Dmitriy Rabotyagov) Date: Mon, 03 Jun 2019 14:04:37 +0300 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: References: Message-ID: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> The quick answer - no, currently such option is not present for the cinder role. And by far the only way to install some custom things with cinder now is to provide a list of cinder_user_pip_packages[1], so that's why packaging driver into pypi might be an option for distribution of custom drivers. [1] https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/defaults/main.yml#L302 03.06.2019, 06:59, "Henry Bonath" : > I think the idea here, at least for me, would be to have it rolled > into the deployment automatically - in a similar fashion to how > horizon themes are deployed within Openstack-Ansible. > Obviously having this specific driver in the tree would solve my > specific issue, but I don't know how many more third party Cinder > drivers which are not packaged into the tree people are deploying > these days. > > My question for the community is simply finding out if this mechanism > exists already. > > On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard > wrote: >>  On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: >>  > Hello, I asked this into IRC but I thought this might be a more >>  > appropriate place to ask considering the IRC channel usage over the >>  > weekend. >>  > >>  > If I wanted to deploy a third party driver along with my Cinder-Volume >>  > container, is there a built-in mechanism for doing so? (I am >>  > specifically wanting to use: https://github.com/iXsystems/cinder) >>  > >>  > I am able to configure a cinder-backend in the >>  > "openstack_user_config.yml" file which works perfectly if I let it >>  > fail during the first run, then copy the driver into the containers >>  > and run "os-cinder-install.yml" a second time. >>  > >>  > I've found that you guys have built similar stuff into the system >>  > (e.g. Horizon custom Theme installation via .tgz) and was curious if >>  > there is a similar mechanism for Cinder Drivers that may be >>  > undocumented. >>  > >>  > http://paste.openstack.org/show/752132/ >>  > This is an example of my working config, which relies on the driver >>  > being copied into the >>  > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ >>  > folder. >>  > >>  > Thanks in advance! >>  > >>  > >> >>  I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... >> >>  Regards, >>  Jean-Philippe Evrard (evrardjp) --  Kind Regards, Dmitriy Rabotyagov From cdent+os at anticdent.org Mon Jun 3 11:24:31 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 3 Jun 2019 12:24:31 +0100 (BST) Subject: [placement] Office Hours In-Reply-To: References: <923884A4-E427-439E-AD76-9DDBB45550D9@leafe.com> <1559227066.23481.3@smtp.office365.com> Message-ID: On Thu, 30 May 2019, Eric Fried wrote: > +1 for 1500 UTC Wednesdays. wfm, as well Note, that since we've declared this office hours, it means it's ad hoc and nobody is required to be there. It's merely a time that we've designated as a reasonable point for the placement to check in with each other and for other people to check in with the placement team. That is: let's make sure this doesn't turn into "we moved the meeting to wednesday". -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From dangtrinhnt at gmail.com Mon Jun 3 11:34:31 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Mon, 3 Jun 2019 20:34:31 +0900 Subject: [searchlight] Team meeting today cancelled Message-ID: Hi team, I'm in a middle of something right now and will not expect to finish at 13:30 UTC today so we have to cancel the team meeting today. Ping me on the #openstack-searchlight channel. Sorry for this late notice. dangtrinhnt -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Mon Jun 3 11:55:05 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Mon, 3 Jun 2019 07:55:05 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: I don't think I have enough coffee in me to fully digest this, but wanted to point out a couple of things. FWIW, this is something I've thought we should do for a while now. On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which mounts the code as a volume, so you can make edits and just run "docker restart $service". Though systemd does make that a bit nicer due to globs (e.g. systemctl restart nova-*). That said, I do agree moving to something where systemd is running the services would make for a smoother transition for developers. > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > ++ > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > ++ > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > The reason this hasn't been pushed on in the past is to avoid the perception that the TC or QA team is choosing a "winner" in the deployment space. I don't think that's a good reason not to do something like this (especially with the drop in contributors since I've had that discussion). However, we do need to message this carefully at a minimum. > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Mon Jun 3 12:01:06 2019 From: thierry at openstack.org (Thierry Carrez) Date: Mon, 3 Jun 2019 14:01:06 +0200 Subject: [uc][tc][ops] reviving osops- repos In-Reply-To: <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> References: <20190530205552.falsvxcegehtyuge@yuggoth.org> <20190531123501.tawgvqgsw6yle2nu@csail.mit.edu> <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> Message-ID: Jeremy Stanley wrote: > On 2019-05-31 10:24:36 -0400 (-0400), Erik McCormick wrote: > [...] >> there's a project [1]. >> >> So either: >> A) Make a SIG out of that and assign the repos to the sig, or >> B) Maybe add it under / rename the Ops Docs SIG [2] as it might bring >> more eyes to both things which serve the same folks. > [...] > > I'd also be perfectly fine with C) say that it's being vouched for > by the UC through its Osops project, stick these repos in a list > *somewhere* as a durable record of that, and let decisions about > project vs. SIG decision be independent of the repository naming > decision. +2 to keep it under the openstack/ namespace one way or another. As to what construct should "own" it, the closest thing we have that would match history would be a UC "team"[1] or "working group"[2], both of which have repositories defined in [3]. Alternatively, I feel like a SIG (be it the Ops Docs SIG or a new "Operational tooling" SIG) would totally be a good idea to revive this. In that case we'd define the repository in [4]. My personal preference would be for a new SIG, but whoever is signing up to work on this should definitely have the final say. [1] https://opendev.org/openstack/governance-uc/src/branch/master/reference/teams.yaml [2] https://opendev.org/openstack/governance-uc/src/branch/master/reference/working-groups.yaml [3] https://opendev.org/openstack/governance/src/branch/master/reference/user-committee-repos.yaml [4] https://opendev.org/openstack/governance/src/branch/master/reference/sigs-repos.yaml -- Thierry Carrez (ttx) From strigazi at gmail.com Mon Jun 3 12:02:32 2019 From: strigazi at gmail.com (Spyros Trigazis) Date: Mon, 3 Jun 2019 14:02:32 +0200 Subject: [magnum] Meeting at 2019-06-04 2100 UTC Message-ID: Hello all, I would like to discuss moving the drivers out-of-tree, as we briefly discussed it in the PTG. Can you all make it for the next meeting [1]? This is not super urgent, but it will accelerate development and bug fixes at the driver level. Cheers, Spyros [0] https://etherpad.openstack.org/p/magnum-train-ptg [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Mon Jun 3 12:27:40 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Mon, 3 Jun 2019 13:27:40 +0100 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: <2EF51CC9-4CF6-4C94-87AF-E93158842D45@stackhpc.com> Sounds good to me! > On 3 Jun 2019, at 13:02, Spyros Trigazis wrote: > > Hello all, > > I would like to discuss moving the drivers out-of-tree, as > we briefly discussed it in the PTG. Can you all make it for the > next meeting [1]? > > This is not super urgent, but it will accelerate development and bug > fixes at the driver level. > > Cheers, > Spyros > > [0] https://etherpad.openstack.org/p/magnum-train-ptg > [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Mon Jun 3 12:27:53 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 3 Jun 2019 14:27:53 +0200 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Hi, > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >> >> Hi, >> >> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > Not a dumb question at all. So, we do have this concept of 'roles' > which you _could_ kinda technically identify similar to plugins. > However, I think one of the things that would maybe come out of this > is the inability for projects to maintain their own plugins (because > now you can host neutron/devstack/plugins and you maintain that repo > yourself), under this structure, you would indeed have to make those > changes to the OpenStack Ansible Neutron role > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > However, I think from an OSA perspective, we would be more than happy > to add project maintainers for specific projects to their appropriate > roles. It would make sense that there is someone from the Neutron > team that could be a core on os_neutron from example. Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and install it together with everything else by simply adding one line (usually) in local.conf file. I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or driver which isn’t official OpenStack project. > >> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > > Indeed, with our current CI infrastructure with OSA, we have the > ability to create these dynamic scenarios (which can actually be > defined by a simple Zuul variable). > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > We do some really neat introspection of the project name being tested > in order to run specific scenarios. Therefore, that is something that > should be quite easy to accomplish simply by overriding a scenario > name within Zuul. It also is worth mentioning we now support full > metal deploys for a while now, so not having to worry about containers > is something to keep in mind as well (with simplifying the developer > experience again). > >>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >>> >>> Hi everyone, >>> >>> This is something that I've discussed with a few people over time and >>> I think I'd probably want to bring it up by now. I'd like to propose >>> and ask if it makes sense to perhaps replace devstack entirely with >>> openstack-ansible. I think I have quite a few compelling reasons to >>> do this that I'd like to outline, as well as why I *feel* (and I could >>> be biased here, so call me out!) that OSA is the best option in terms >>> of a 'replacement' >>> >>> # Why not another deployment project? >>> I actually thought about this part too and considered this mainly for >>> ease of use for a *developer*. >>> >>> At this point, Puppet-OpenStack pretty much only deploys packages >>> (which means that it has no build infrastructure, a developer can't >>> just get $commit checked out and deployed). >>> >>> TripleO uses Kolla containers AFAIK and those have to be pre-built >>> beforehand, also, I feel they are much harder to use as a developer >>> because if you want to make quick edits and restart services, you have >>> to enter a container and make the edit there and somehow restart the >>> service without the container going back to it's original state. >>> Kolla-Ansible and the other combinations also suffer from the same >>> "issue". >>> >>> OpenStack Ansible is unique in the way that it pretty much just builds >>> a virtualenv and installs packages inside of it. The services are >>> deployed as systemd units. This is very much similar to the current >>> state of devstack at the moment (minus the virtualenv part, afaik). >>> It makes it pretty straight forward to go and edit code if you >>> need/have to. We also have support for Debian, CentOS, Ubuntu and >>> SUSE. This allows "devstack 2.0" to have far more coverage and make >>> it much more easy to deploy on a wider variety of operating systems. >>> It also has the ability to use commits checked out from Zuul so all >>> the fancy Depends-On stuff we use works. >>> >>> # Why do we care about this, I like my bash scripts! >>> As someone who's been around for a *really* long time in OpenStack, >>> I've seen a whole lot of really weird issues surface from the usage of >>> DevStack to do CI gating. For example, one of the recent things is >>> the fact it relies on installing package-shipped noVNC, where as the >>> 'master' noVNC has actually changed behavior a few months back and it >>> is completely incompatible at this point (it's just a ticking thing >>> until we realize we're entirely broken). >>> >>> To this day, I still see people who want to POC something up with >>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >>> how many warnings we'll put up, they'll always try to do it. With >>> this way, at least they'll have something that has the shape of an >>> actual real deployment. In addition, it would be *good* in the >>> overall scheme of things for a deployment system to test against, >>> because this would make sure things don't break in both ways. >>> >>> Also: we run Zuul for our CI which supports Ansible natively, this can >>> remove one layer of indirection (Zuul to run Bash) and have Zuul run >>> the playbooks directly from the executor. >>> >>> # So how could we do this? >>> The OpenStack Ansible project is made of many roles that are all >>> composable, therefore, you can think of it as a combination of both >>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >>> the base modules (i.e. puppet-nova, etc) and TripleO was the >>> integration of all of it in a distribution. OSA is currently both, >>> but it also includes both Ansible roles and playbooks. >>> >>> In order to make sure we maintain as much of backwards compatibility >>> as possible, we can simply run a small script which does a mapping of >>> devstack => OSA variables to make sure that the service is shipped >>> with all the necessary features as per local.conf. >>> >>> So the new process could be: >>> >>> 1) parse local.conf and generate Ansible variables files >>> 2) install Ansible (if not running in gate) >>> 3) run playbooks using variable generated in #1 >>> >>> The neat thing is after all of this, devstack just becomes a thin >>> wrapper around Ansible roles. I also think it brings a lot of hands >>> together, involving both the QA team and OSA team together, which I >>> believe that pooling our resources will greatly help in being able to >>> get more done and avoiding duplicating our efforts. >>> >>> # Conclusion >>> This is a start of a very open ended discussion, I'm sure there is a >>> lot of details involved here in the implementation that will surface, >>> but I think it could be a good step overall in simplifying our CI and >>> adding more coverage for real potential deployers. It will help two >>> teams unite together and have more resources for something (that >>> essentially is somewhat of duplicated effort at the moment). >>> >>> I will try to pick up sometime to POC a simple service being deployed >>> by an OSA role instead of Bash, placement which seems like a very >>> simple one and share that eventually. >>> >>> Thoughts? :) >>> >>> -- >>> Mohammed Naser — vexxhost >>> ----------------------------------------------------- >>> D. 514-316-8872 >>> D. 800-910-1726 ext. 200 >>> E. mnaser at vexxhost.com >>> W. http://vexxhost.com >>> >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com — Slawek Kaplonski Senior software engineer Red Hat From mnaser at vexxhost.com Mon Jun 3 12:37:54 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 08:37:54 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 8:02 AM Jim Rollenhagen wrote: > > I don't think I have enough coffee in me to fully digest this, but wanted to > point out a couple of things. FWIW, this is something I've thought we should do > for a while now. > > On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: >> >> Hi everyone, >> >> This is something that I've discussed with a few people over time and >> I think I'd probably want to bring it up by now. I'd like to propose >> and ask if it makes sense to perhaps replace devstack entirely with >> openstack-ansible. I think I have quite a few compelling reasons to >> do this that I'd like to outline, as well as why I *feel* (and I could >> be biased here, so call me out!) that OSA is the best option in terms >> of a 'replacement' >> >> # Why not another deployment project? >> I actually thought about this part too and considered this mainly for >> ease of use for a *developer*. >> >> At this point, Puppet-OpenStack pretty much only deploys packages >> (which means that it has no build infrastructure, a developer can't >> just get $commit checked out and deployed). >> >> TripleO uses Kolla containers AFAIK and those have to be pre-built >> beforehand, also, I feel they are much harder to use as a developer >> because if you want to make quick edits and restart services, you have >> to enter a container and make the edit there and somehow restart the >> service without the container going back to it's original state. >> Kolla-Ansible and the other combinations also suffer from the same >> "issue". > > > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which mounts > the code as a volume, so you can make edits and just run "docker restart > $service". Though systemd does make that a bit nicer due to globs (e.g. > systemctl restart nova-*). > > That said, I do agree moving to something where systemd is running the services > would make for a smoother transition for developers. I didn't know about this (and this wasn't around for the time that I was trying and experimenting with Kolla). This does seem like a possible solution if we're okay with adding the Docker dependency into DevStack and the workflow changing from restarting services to restarting containers. >> >> >> OpenStack Ansible is unique in the way that it pretty much just builds >> a virtualenv and installs packages inside of it. The services are >> deployed as systemd units. This is very much similar to the current >> state of devstack at the moment (minus the virtualenv part, afaik). >> It makes it pretty straight forward to go and edit code if you >> need/have to. We also have support for Debian, CentOS, Ubuntu and >> SUSE. This allows "devstack 2.0" to have far more coverage and make >> it much more easy to deploy on a wider variety of operating systems. >> It also has the ability to use commits checked out from Zuul so all >> the fancy Depends-On stuff we use works. >> >> # Why do we care about this, I like my bash scripts! >> As someone who's been around for a *really* long time in OpenStack, >> I've seen a whole lot of really weird issues surface from the usage of >> DevStack to do CI gating. For example, one of the recent things is >> the fact it relies on installing package-shipped noVNC, where as the >> 'master' noVNC has actually changed behavior a few months back and it >> is completely incompatible at this point (it's just a ticking thing >> until we realize we're entirely broken). >> >> To this day, I still see people who want to POC something up with >> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> how many warnings we'll put up, they'll always try to do it. With >> this way, at least they'll have something that has the shape of an >> actual real deployment. In addition, it would be *good* in the >> overall scheme of things for a deployment system to test against, >> because this would make sure things don't break in both ways. > > > ++ > >> >> >> Also: we run Zuul for our CI which supports Ansible natively, this can >> remove one layer of indirection (Zuul to run Bash) and have Zuul run >> the playbooks directly from the executor. >> >> # So how could we do this? >> The OpenStack Ansible project is made of many roles that are all >> composable, therefore, you can think of it as a combination of both >> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> the base modules (i.e. puppet-nova, etc) and TripleO was the >> integration of all of it in a distribution. OSA is currently both, >> but it also includes both Ansible roles and playbooks. >> >> In order to make sure we maintain as much of backwards compatibility >> as possible, we can simply run a small script which does a mapping of >> devstack => OSA variables to make sure that the service is shipped >> with all the necessary features as per local.conf. > > > ++ > >> >> >> So the new process could be: >> >> 1) parse local.conf and generate Ansible variables files >> 2) install Ansible (if not running in gate) >> 3) run playbooks using variable generated in #1 >> >> The neat thing is after all of this, devstack just becomes a thin >> wrapper around Ansible roles. I also think it brings a lot of hands >> together, involving both the QA team and OSA team together, which I >> believe that pooling our resources will greatly help in being able to >> get more done and avoiding duplicating our efforts. >> >> # Conclusion >> This is a start of a very open ended discussion, I'm sure there is a >> lot of details involved here in the implementation that will surface, >> but I think it could be a good step overall in simplifying our CI and >> adding more coverage for real potential deployers. It will help two >> teams unite together and have more resources for something (that >> essentially is somewhat of duplicated effort at the moment). >> >> I will try to pick up sometime to POC a simple service being deployed >> by an OSA role instead of Bash, placement which seems like a very >> simple one and share that eventually. >> >> Thoughts? :) > > > The reason this hasn't been pushed on in the past is to avoid the perception > that the TC or QA team is choosing a "winner" in the deployment space. I don't > think that's a good reason not to do something like this (especially with the > drop in contributors since I've had that discussion). However, we do need to > message this carefully at a minimum. Right. I think that's because in OpenStack-Ansible world, we have two things - OSA roles: nothing but basic roles to deploy OpenStack services, with external consumers - Integrated: contains all the playbooks In a way, our roles is "Puppet OpenStack" and our integrated repo is "TripleO", back when TripleO deployed via Puppet anyways... I have to be honest, I wish that our roles lived under a different name so we can collaborate all on them (because an Ansible role to deploy something generically is needed regardless). We've actually done a lot of work with the TripleO team and they are consuming one of our roles (os_tempest) to do all their tempest testing, we gate TripleO and they gate us for the role. >> >> >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com >> -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Mon Jun 3 12:39:22 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 08:39:22 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > Hi, > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > >> > >> Hi, > >> > >> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > which you _could_ kinda technically identify similar to plugins. > > However, I think one of the things that would maybe come out of this > > is the inability for projects to maintain their own plugins (because > > now you can host neutron/devstack/plugins and you maintain that repo > > yourself), under this structure, you would indeed have to make those > > changes to the OpenStack Ansible Neutron role > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > However, I think from an OSA perspective, we would be more than happy > > to add project maintainers for specific projects to their appropriate > > roles. It would make sense that there is someone from the Neutron > > team that could be a core on os_neutron from example. > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and install it together with everything else by simply adding one line (usually) in local.conf file. > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or driver which isn’t official OpenStack project. You raise a really good concern. Indeed, we might have to change the workflow from "write a plugin" to "write an Ansible role" to be able to test your project with DevStack at that page (or maintain both a "legacy" solution) with a new one. > > > >> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in mind. > > > > Indeed, with our current CI infrastructure with OSA, we have the > > ability to create these dynamic scenarios (which can actually be > > defined by a simple Zuul variable). > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > We do some really neat introspection of the project name being tested > > in order to run specific scenarios. Therefore, that is something that > > should be quite easy to accomplish simply by overriding a scenario > > name within Zuul. It also is worth mentioning we now support full > > metal deploys for a while now, so not having to worry about containers > > is something to keep in mind as well (with simplifying the developer > > experience again). > > > >>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > >>> > >>> Hi everyone, > >>> > >>> This is something that I've discussed with a few people over time and > >>> I think I'd probably want to bring it up by now. I'd like to propose > >>> and ask if it makes sense to perhaps replace devstack entirely with > >>> openstack-ansible. I think I have quite a few compelling reasons to > >>> do this that I'd like to outline, as well as why I *feel* (and I could > >>> be biased here, so call me out!) that OSA is the best option in terms > >>> of a 'replacement' > >>> > >>> # Why not another deployment project? > >>> I actually thought about this part too and considered this mainly for > >>> ease of use for a *developer*. > >>> > >>> At this point, Puppet-OpenStack pretty much only deploys packages > >>> (which means that it has no build infrastructure, a developer can't > >>> just get $commit checked out and deployed). > >>> > >>> TripleO uses Kolla containers AFAIK and those have to be pre-built > >>> beforehand, also, I feel they are much harder to use as a developer > >>> because if you want to make quick edits and restart services, you have > >>> to enter a container and make the edit there and somehow restart the > >>> service without the container going back to it's original state. > >>> Kolla-Ansible and the other combinations also suffer from the same > >>> "issue". > >>> > >>> OpenStack Ansible is unique in the way that it pretty much just builds > >>> a virtualenv and installs packages inside of it. The services are > >>> deployed as systemd units. This is very much similar to the current > >>> state of devstack at the moment (minus the virtualenv part, afaik). > >>> It makes it pretty straight forward to go and edit code if you > >>> need/have to. We also have support for Debian, CentOS, Ubuntu and > >>> SUSE. This allows "devstack 2.0" to have far more coverage and make > >>> it much more easy to deploy on a wider variety of operating systems. > >>> It also has the ability to use commits checked out from Zuul so all > >>> the fancy Depends-On stuff we use works. > >>> > >>> # Why do we care about this, I like my bash scripts! > >>> As someone who's been around for a *really* long time in OpenStack, > >>> I've seen a whole lot of really weird issues surface from the usage of > >>> DevStack to do CI gating. For example, one of the recent things is > >>> the fact it relies on installing package-shipped noVNC, where as the > >>> 'master' noVNC has actually changed behavior a few months back and it > >>> is completely incompatible at this point (it's just a ticking thing > >>> until we realize we're entirely broken). > >>> > >>> To this day, I still see people who want to POC something up with > >>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > >>> how many warnings we'll put up, they'll always try to do it. With > >>> this way, at least they'll have something that has the shape of an > >>> actual real deployment. In addition, it would be *good* in the > >>> overall scheme of things for a deployment system to test against, > >>> because this would make sure things don't break in both ways. > >>> > >>> Also: we run Zuul for our CI which supports Ansible natively, this can > >>> remove one layer of indirection (Zuul to run Bash) and have Zuul run > >>> the playbooks directly from the executor. > >>> > >>> # So how could we do this? > >>> The OpenStack Ansible project is made of many roles that are all > >>> composable, therefore, you can think of it as a combination of both > >>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > >>> the base modules (i.e. puppet-nova, etc) and TripleO was the > >>> integration of all of it in a distribution. OSA is currently both, > >>> but it also includes both Ansible roles and playbooks. > >>> > >>> In order to make sure we maintain as much of backwards compatibility > >>> as possible, we can simply run a small script which does a mapping of > >>> devstack => OSA variables to make sure that the service is shipped > >>> with all the necessary features as per local.conf. > >>> > >>> So the new process could be: > >>> > >>> 1) parse local.conf and generate Ansible variables files > >>> 2) install Ansible (if not running in gate) > >>> 3) run playbooks using variable generated in #1 > >>> > >>> The neat thing is after all of this, devstack just becomes a thin > >>> wrapper around Ansible roles. I also think it brings a lot of hands > >>> together, involving both the QA team and OSA team together, which I > >>> believe that pooling our resources will greatly help in being able to > >>> get more done and avoiding duplicating our efforts. > >>> > >>> # Conclusion > >>> This is a start of a very open ended discussion, I'm sure there is a > >>> lot of details involved here in the implementation that will surface, > >>> but I think it could be a good step overall in simplifying our CI and > >>> adding more coverage for real potential deployers. It will help two > >>> teams unite together and have more resources for something (that > >>> essentially is somewhat of duplicated effort at the moment). > >>> > >>> I will try to pick up sometime to POC a simple service being deployed > >>> by an OSA role instead of Bash, placement which seems like a very > >>> simple one and share that eventually. > >>> > >>> Thoughts? :) > >>> > >>> -- > >>> Mohammed Naser — vexxhost > >>> ----------------------------------------------------- > >>> D. 514-316-8872 > >>> D. 800-910-1726 ext. 200 > >>> E. mnaser at vexxhost.com > >>> W. http://vexxhost.com > >>> > >> > >> — > >> Slawek Kaplonski > >> Senior software engineer > >> Red Hat > >> > > > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > — > Slawek Kaplonski > Senior software engineer > Red Hat > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From ed at leafe.com Mon Jun 3 13:34:42 2019 From: ed at leafe.com (Ed Leafe) Date: Mon, 3 Jun 2019 08:34:42 -0500 Subject: [placement] Office Hours In-Reply-To: References: <923884A4-E427-439E-AD76-9DDBB45550D9@leafe.com> <1559227066.23481.3@smtp.office365.com> Message-ID: <2829F729-B385-4B06-9C2E-2E8A0A21F7BF@leafe.com> On Jun 3, 2019, at 6:24 AM, Chris Dent wrote: > > Note, that since we've declared this office hours, it means it's ad > hoc and nobody is required to be there. It's merely a time that > we've designated as a reasonable point for the placement to check in > with each other and for other people to check in with the placement > team. That is: let's make sure this doesn't turn into "we moved the > meeting to wednesday". Agreed, but it should also be emphasized that if you *can* make it, you should. It would be nice to know that if there is something to be discussed, that there is a good chance that the discussion might be fruitful. -- Ed Leafe From kennelson11 at gmail.com Mon Jun 3 13:46:53 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 3 Jun 2019 06:46:53 -0700 Subject: Elections for Airship In-Reply-To: <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> References: <7C64A75C21BB8D43BD75BB18635E4D89709A2256@MOSTLS1MSGUSRFF.ITServices.sbc.com> <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> Message-ID: Might also be helpful to look at our document that outlines the process we go through[1]. If you have any questions, let us know! -Kendall (diablo_rojo) [1] https://opendev.org/openstack/election/src/branch/master/README.rst On Thu, May 30, 2019 at 3:37 PM Jeremy Stanley wrote: > On 2019-05-30 19:04:56 +0000 (+0000), MCEUEN, MATT wrote: > > OpenStack Infra team, > > The OpenStack Infrastructure team hasn't been officially involved in > running technical elections for OpenStack for several years now > (subject tag removed accordingly). With the advent of Gerrit's REST > API, contributor data can be queried and assembled anonymously by > anyone. While I happen to be involved in these activities for longer > than that's been the case, I'll be answering while wearing my > OpenStack Technical Election Official hat throughout the remainder > of this reply. > > > As the Airship project works to finalize our governance and > > elected positions [1], we need to be ready to hold our first > > elections. I wanted to reach out and ask for any experience, > > guidance, materials, or tooling you can share that would help this > > run correctly and smoothly? This is an area where the Airship team > > doesn't have much experience so we may not know the right > > questions to ask. > > > > Aside from a member of the Airship community creating a poll in > > CIVS [2], is there anything else you would recommend? Is there any > > additional tooling in place in the OpenStack world? Any potential > > pitfalls, or other hard-won advice for us? > [...] > > As Sean mentioned in his reply, the OpenStack community has been > building and improving tooling in the openstack/election Git > repository on OpenDev over the past few years. The important bits > (in my opinion) center around querying Gerrit for a list of > contributors whose changes have merged to sets of official project > repositories within a qualifying date range. I've recently been > assisting StarlingX's election officials with a similar request, and > do have some recommendations. > > Probably the best place to start is adding an official structured > dataset with your team/project information following the same schema > used by OpenStack[0] and now StarlingX[1], then applying a couple of > feature patches[2][3] (if they haven't merged by the time you read > this) to the openstack/election master branch. After that, you ought > to be able to run something along the lines of: > > tox -e venv -- owners --after 2018-05-30 --before 2019-05-31 > --nonmember --outdir airship-electorate > --projects ../../airship/governance/projects.yaml > --ref master > > (Note that the --after and --before dates work like in Gerrit's > query language and carry with them an implied midnight UTC, so one > is the actual start date but the other is the day after the end > date; "on or after" and "before but not on" is how I refer to them > in prose.) > > You'll see the resulting airship-electorate directory includes a lot > of individual files. There are two basic types: .yaml files which > are structured data meant for human auditing as well as scripted > analysis, and .txt files which are a strict list of one Gerrit > preferred E-mail address per line for each voter (the format > expected by the https://civs.cs.cornell.edu/ voting service). It's > probably also obvious that there are sets of these named for each > team in your governance, as well as a set which start with > underscore (_). The former represent contributions to the > deliverable repositories of each team, while the latter are produced > from an aggregate of all deliverable repositories for all teams > (this is what you might use for electing an Airship-wide governing > body). > > There are a couple of extra underscore files... > _duplicate_owners.yaml includes information on deduplicated entries > for contributors where the script was able to detect more than one > Gerrit account for the same individual, while the _invites.csv file > isn't really election-related at all and is what the OSF normally > feeds into the automation which sends event discounts to > contributors. In case you're curious about the _invites.csv file, > the first column is the OSF member ID (if known) or 0 (if no > matching membership was found), the second column is the display > name from Gerrit, the third column is the preferred E-mail address > from Gerrit (this corresponds to the address used for the > _electorate.txt file), and any subsequent columns are the extra > non-preferred addresses configured in Gerrit for that account. > > Please don't hesitate to follow up with any additional questions you > might have! > > [0] > https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml > [1] > https://opendev.org/starlingx/governance/src/branch/master/reference/tsc/projects.yaml > [2] https://review.opendev.org/661647 > [3] https://review.opendev.org/661648 > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From henry at thebonaths.com Mon Jun 3 13:52:01 2019 From: henry at thebonaths.com (Henry Bonath) Date: Mon, 3 Jun 2019 09:52:01 -0400 Subject: [openstack-ansible] Installing Third-Party drivers into the Cinder-Volume container during playbook execution In-Reply-To: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> References: <1112501559559877@sas2-0106f63be698.qloud-c.yandex.net> Message-ID: Dmitriy, Thank you for answering my question. That's good to know that we can deploy additional pip packages within the container, I'll look into what it takes to package the driver in pypi and start moving in this direction. On Mon, Jun 3, 2019 at 7:25 AM Dmitriy Rabotyagov wrote: > > The quick answer - no, currently such option is not present for the cinder role. And by far the only way to install some custom things with cinder now is to provide a list of cinder_user_pip_packages[1], so that's why packaging driver into pypi might be an option for distribution of custom drivers. > > [1] https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/defaults/main.yml#L302 > > 03.06.2019, 06:59, "Henry Bonath" : > > I think the idea here, at least for me, would be to have it rolled > > into the deployment automatically - in a similar fashion to how > > horizon themes are deployed within Openstack-Ansible. > > Obviously having this specific driver in the tree would solve my > > specific issue, but I don't know how many more third party Cinder > > drivers which are not packaged into the tree people are deploying > > these days. > > > > My question for the community is simply finding out if this mechanism > > exists already. > > > > On Thu, May 30, 2019 at 11:11 AM Jean-Philippe Evrard > > wrote: > >> On Tue, May 28, 2019, at 04:10, Henry Bonath wrote: > >> > Hello, I asked this into IRC but I thought this might be a more > >> > appropriate place to ask considering the IRC channel usage over the > >> > weekend. > >> > > >> > If I wanted to deploy a third party driver along with my Cinder-Volume > >> > container, is there a built-in mechanism for doing so? (I am > >> > specifically wanting to use: https://github.com/iXsystems/cinder) > >> > > >> > I am able to configure a cinder-backend in the > >> > "openstack_user_config.yml" file which works perfectly if I let it > >> > fail during the first run, then copy the driver into the containers > >> > and run "os-cinder-install.yml" a second time. > >> > > >> > I've found that you guys have built similar stuff into the system > >> > (e.g. Horizon custom Theme installation via .tgz) and was curious if > >> > there is a similar mechanism for Cinder Drivers that may be > >> > undocumented. > >> > > >> > http://paste.openstack.org/show/752132/ > >> > This is an example of my working config, which relies on the driver > >> > being copied into the > >> > /openstack/venvs/cinder-19.x.x.x/lib/python2.7/site-packages/cinder/volume/drivers/ixsystems/ > >> > folder. > >> > > >> > Thanks in advance! > >> > > >> > > >> > >> I suppose the community would be okay to have this in tree, so no need for a third party system here (and no need to maintain this on your own, separately). However... if it's just about copying the content of this repo, did you think of packaging this, and publish it to pypi ? This way you could just pip install the necessary package into your cinder venv... > >> > >> Regards, > >> Jean-Philippe Evrard (evrardjp) > > -- > Kind Regards, > Dmitriy Rabotyagov > From openstack at fried.cc Mon Jun 3 14:49:48 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 09:49:48 -0500 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: References: Message-ID: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> Hi Clemens. First of all, thank you for digging into the code and working to fix the issue you're seeing. > I am missing context here how the logic of the test > code works; Note that it is normal (in fact almost always required) to change test code with any change to the production side. The purpose of unit tests (you can tell this is a unit test from '/unit/' in the file path) is to exercise a small chunk ("unit" :) of code and make sure branching and method calls are all as expected. You've changed some logic, so the test is (rightly) failing, and you'll need to change what it's expecting accordingly. These tests are making use of mock [0] to hide the guts of some of the methods being called by the unit in question, just to make sure that those methods are being invoked the correct number of times, with the correct arguments. In this case... > Related to my changes, tox gets me an error: > > nova.tests.unit.virt.libvirt.test_driver.LibvirtDriverTestCase.test_finish_migration_power_on > --------------------------------------------------------------------------------------------- > > Captured traceback: > ~~~~~~~~~~~~~~~~~~~ >     Traceback (most recent call last): >       File "nova/tests/unit/virt/libvirt/test_driver.py", line 18707, in > test_finish_migration_power_on >         self._test_finish_migration() >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", > line 1305, in patched >         return func(*args, **keywargs) >       File "nova/tests/unit/virt/libvirt/test_driver.py", line 18662, in > _test_finish_migration >         mock_raw_to_qcow2.assert_has_calls(convert_calls, any_order=True) >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/mock/mock.py", > line 983, in assert_has_calls >         ), cause) >       File > "/opt/stack/nova/.tox/py27/local/lib/python2.7/site-packages/six.py", > line 737, in raise_from >         raise value >     AssertionError: > (call('/tmp/tmpC60sPk/tmpOHikWL/8ea1de33-64d7-4d1d-af02-88e6f7ec91c1/disk.swap'),) > not all found in call list The stack trace is pointing you at [1] for all three failures. I can see from your code change that you're skipping the qcow conversion for disk.swap here [2]. So when I remove 'disk.swap' from the list on L19438, the three tests pass. That will get you green in zuul, but I should warn you that one of the first things reviewers will notice is that you *only* had to change tests for the piece of your change at [2]. That means there's missing/incomplete test coverage for the other code paths you've touched, and you'll have to add some (or justify why it's unnecessary). > Are there any docs or helps alongside > testing in Nova where to learn what to do? I'm not sure if this is targeted at the right level for you, but here's the nova contributor guide [3]. If you're more of an interactive learner, feel free to jump on IRC in #openstack-nova and I'll be happy to walk you through some basics. > Perhaps also someone could give me a hint whether there are some > conceptually misleading ideas behind my fix proposal which need to be > driven into another direction … Yup, that's what code review is for. Nova has a very high "open changes" to "reviewer bandwidth" ratio, so it's unfortunately pretty normal for changes to go unreviewed while they're still failing zuul testing. Getting those fixed up, and bringing attention to the issue here on the mailing list and/or IRC, should all get your change some better attention. Thanks again for diving in. efried [0] https://docs.python.org/3/library/unittest.mock.html [1] https://opendev.org/openstack/nova/src/branch/master/nova/tests/unit/virt/libvirt/test_driver.py#L19437-L19439 [2] https://review.opendev.org/#/c/618621/4/nova/virt/libvirt/driver.py at 8410 [3] https://docs.openstack.org/nova/latest/contributor/index.html From cboylan at sapwetik.org Mon Jun 3 14:56:58 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 03 Jun 2019 07:56:58 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). Clark From jean-philippe at evrard.me Mon Jun 3 15:08:12 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Mon, 03 Jun 2019 17:08:12 +0200 Subject: =?UTF-8?Q?Re:_[openstack-ansible]_Installing_Third-Party_drivers_into_th?= =?UTF-8?Q?e_Cinder-Volume_container_during_playbook_execution?= In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019, at 05:45, Henry Bonath wrote: > I think the idea here, at least for me, would be to have it rolled > into the deployment automatically - in a similar fashion to how > horizon themes are deployed within Openstack-Ansible. > Obviously having this specific driver in the tree would solve my > specific issue, but I don't know how many more third party Cinder > drivers which are not packaged into the tree people are deploying > these days. > > My question for the community is simply finding out if this mechanism > exists already. As you might have seen, we have documentation in the cinder role that points to different third party cinder drivers [1]. That's why i think it would be fine to have your specific code integrated into the cinder role. There is a precedent there. This way you would have it part of the deployment automatically. On the technical aspect of the matter, I believe it would be better to package that code into a python package though, so you can install and use it directly. It will reduce the maintainance burden in the long run, and would be easier to test in CI: The OpenStack infrastructure have a cache (or mirror?) of PyPI, and we don't have a mirror of this code. Regards, Jean-Philippe Evrard (evrardjp) [1]: https://docs.openstack.org/openstack-ansible-os_cinder/latest/ From mnaser at vexxhost.com Mon Jun 3 15:15:15 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 11:15:15 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 11:05 AM Clark Boylan wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. Yeah. I guess that's fair, but there's still other things like lack of coverage for many other operating systems as well. > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. That's fair enough, that's always been the odd thing of driving things directly via Zuul or with a small executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). The idea is *not* to use OpenStack Ansible to deploy DevStack, it's to use the roles to deploy the specific services. Therefore, the log collection stuff should all still be the same, as long as it pulls down the correct systemd unit (which should be matching). The idea that it should be 100% transparent to the user at the end of the day, there should be no functional changes in how DevStack runs or what it logs in the gate. > Clark > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jim at jimrollenhagen.com Mon Jun 3 15:18:25 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Mon, 3 Jun 2019 11:18:25 -0400 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: <20190503190538.GB3377@localhost.localdomain> <20190515175110.26i2xuclkksgx744@arabian.linksys.moosehall> <8d81b9a7-b460-43e1-a774-9bd65ee42143@www.fastmail.com> <20190530180658.xgpcy35au72ccmzt@yuggoth.org> Message-ID: On Fri, May 31, 2019 at 7:51 PM Clark Boylan wrote: > On Fri, May 31, 2019, at 11:09 AM, Jim Rollenhagen wrote: > > On Thu, May 30, 2019 at 3:15 PM Jim Rollenhagen > wrote: > > > On Thu, May 30, 2019 at 2:18 PM Jeremy Stanley > wrote: > > >> On 2019-05-30 09:00:20 -0700 (-0700), Clark Boylan wrote: > > >> [...] > > >> > If you provide us with the canonical list of things to archive I > > >> > think we can probably script that up or do lots of clicking > > >> > depending on the size of the list I guess. > > >> [...] > > >> > > >> Alternatively, I's like to believe we're at the point where we can > > >> add other interested parties to the curating group for the openstack > > >> org on GH, at which point any of them could volunteer to do the > > >> archiving. > > > > > > Thanks Clark/Jeremy. I'll make a list tomorrow, as we'll > > > need that in either case. :) > > > > I think what we want is to archive all Github repos in the > > openstack, openstack-dev, and openstack-infra orgs, > > which don't have something with the same name on > > Gitea in the openstack namespace. Is that right? > > Close, I think we can archive all repos in openstack-dev and > openstack-infra. Part of the repo renames we did today were to get the > repos that were left behind in those two orgs into their longer term homes. > Then any project in https://github.com/openstack that is not in > https://opendev.org/openstack can be archived in Github too. > Cool, that made me realize I wasn't outputting the org, and now I don't need to. :) New list (gathered only from the openstack org): http://paste.openstack.org/show/752443/ And new code: http://paste.openstack.org/show/752444/ And yes, I realize I pasted my token there, it's no longer valid :) // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From cboylan at sapwetik.org Mon Jun 3 15:18:36 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Mon, 03 Jun 2019 08:18:36 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <45ca1872-187b-46cf-b23a-a9a7be7acb85@www.fastmail.com> On Mon, Jun 3, 2019, at 8:15 AM, Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 11:05 AM Clark Boylan wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: snip > > > I will try to pick up sometime to POC a simple service being deployed > > > by an OSA role instead of Bash, placement which seems like a very > > > simple one and share that eventually. > > > > > > Thoughts? :) > > > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). > > The idea is *not* to use OpenStack Ansible to deploy DevStack, it's to > use the roles > to deploy the specific services. Therefore, the log collection stuff > should all still > be the same, as long as it pulls down the correct systemd unit (which should > be matching). I know. I'm saying the logging that these other systems produce is typically lacking compared to devstack. So any change needs to address that. > > The idea that it should be 100% transparent to the user at the end of > the day, there > should be no functional changes in how DevStack runs or what it logs > in the gate. If this is the plan then the logging concerns should be addressed as part of the "don't make it noticeable change" work. Clark From clemens.hardewig at crandale.de Mon Jun 3 15:25:02 2019 From: clemens.hardewig at crandale.de (Clemens Hardewig) Date: Mon, 3 Jun 2019 17:25:02 +0200 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> References: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> Message-ID: <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> Hi Eric, thank you that you have taken the time leading me through the process and answering extensively. Very much appreciated and insightful. Having digged now into the code in /nova/nova/tests/unit/virt/libvirt/test_driver.py, it is obvious that my proposal is not universal but fixes only my specific config (and make then other configs fail). However, it seems to me that a config that running cinder on each compute node with lvm backend creating root volume as lvm volume creating swap not as ephermal or swap (raw) disk but as lvm volume (as lvm/qemu does automatically) is not a supported model in nova yet to deal with instance resizing/migrations. Thanks again for your guidance, will go through it ... Br Clemens -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3898 bytes Desc: not available URL: From openstack at fried.cc Mon Jun 3 15:51:11 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 10:51:11 -0500 Subject: [nova] Bug #1755266: How to proceed with test failures In-Reply-To: <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> References: <598f1be2-2eb9-c288-077e-90c17267921d@fried.cc> <2FE9B647-2484-4ADF-A0AD-12C0DE197394@crandale.de> Message-ID: Clemens- > However, it seems to me that a config that > > * running cinder on each compute node with lvm backend > * creating root volume as lvm volume > * creating swap not as ephermal or swap (raw) disk but as lvm volume > (as lvm/qemu does automatically) > > > is not a supported model in nova yet to deal with instance > resizing/migrations. I've asked a subject matter expert (which I certainly am not) to have a look at your change. Hopefully he can answer the above, which is pretty much Greek to me :) Thanks, efried . From liujinxin at xiangcloud.com.cn Mon Jun 3 11:08:25 2019 From: liujinxin at xiangcloud.com.cn (liujinxin at xiangcloud.com.cn) Date: Mon, 3 Jun 2019 19:08:25 +0800 Subject: ovn L3 TCP protocol has a large number of retransmissions Message-ID: <2019060319082531569215@xiangcloud.com.cn> Hi: I have the following two questions. What shall I do? problem1:When the cloud host accesses the external network through L3 router. TCP protocol has a large number of retransmissions, leading to TCP link failure, TCP data transmission error problem2:TCP links data packets, duplicates ACK and TCP data transmission disorderly when the instances communicate across hosts through geneve, but the quality impact of TCP is relatively acceptable. openstack queens with ovn environment OS: CentOS Linux release 7.3.1611 (Core) kernel: 3.10.0-514.el7.x86_64 openstack: kolla-ansible queens networking-ovn:python-networking-ovn-4.0.3 ovs and ovn: openvswitch-ovn-central-2.10.90 openvswitch-2.10.90 openvswitch-ovn-host-2.10.90 openvswitch-ovn-common-2.10.90 topology: openstack controller 10.200.105.19 openstack compute 10.200.105.16,10.200.105.17,10.200.105.18 openstack gateway 10.200.105.20 openstack controller gateway compute 10.200.105.19 10.200.105.20 10.200.105.[16-18] neutron_server ovn-northd ---------bond0------------|------------------------------------------------------------------| | | | ovn-controller ovn-controller ovn-controller | | | ovs ovs ovs | | | | | | | |----------------------------------|--|------bond0-------------------------------------------------|--| |-------------------------------------|--------bond1--------------------------------------------------| Packet forwarding: | compute1 | compute2 | gateway | | 10.200.105.16 | 10.200.105.17 | 10.200.105.20 | | vm1 | vm2 | | | | | | | | | br-int <-> br-ex | br-int <-> br-ex | br-int <-> br-ex | | |_____bond1_vlan___|___________|____________|________| |__________bond0_____________|_______________________| 1、L3 data flow 10.200.100.16 | 10.200.105.20 vm1<--->br-int<-->geneve <->bond0 <―-> bond0<-->geneve<--->br-ex<-->bond1<-->vlan<---->internet 2、vm1<->vm2 10.200.100.16 | 10.200.105.17 vm1<--->br-int<-->geneve <->bond0 <―-> bond0<-->geneve<--->br-int<--->vm2 Configure: Openstack Configure 1、neutron.conf ... service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin,qos ... 2、cat /etc/kolla/neutron-server/ml2_conf.ini [ml2] type_drivers = flat,vlan,local,geneve tenant_network_types = geneve mechanism_drivers = ovn extension_drivers = port_security,qos overlay_ip_version = 4 [ml2_type_vlan] network_vlan_ranges = physnet1 [securitygroup] enable_security_group = true [ml2_type_geneve] vni_ranges = 1:65536 max_header_size = 38 [ovn] ovn_nb_connection = tcp:10.200.105.19:6641 ovn_sb_connection = tcp:10.200.105.19:6642 ovn_l3_mode = True ovn_l3_scheduler = leastloaded ovn_native_dhcp = True neutron_sync_mode = repair enable_distributed_floating_ip = True ovsdb_log_level = DEBUG [qos] notification_drivers = ovn-qos Ovn Configure 10.200.105.19 ovs-vsctl get open . external_ids {hostname="10-200-105-19", ovn-bridge-mappings="physnet1:br-ex", ovn-encap-ip="10.200.105.19", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="160e569c-a12f-41a3-8d2a-37bd9af0c7ed"} 10.200.105.20 ovs-vsctl get open . external_ids {hostname="10-200-105-20", ovn-bridge-mappings="physnet1:br-ex", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.200.105.20", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="96e89c3c-5c85-498d-b42f-5aea559bdd42"} 10.200.105.[16-18] ovs-vsctl get open . external_ids {hostname="10-200-105-17", ovn-bridge-mappings="physnet1:br-ex", ovn-encap-ip="10.200.105.17", ovn-encap-type="geneve,vxlan", ovn-remote="tcp:10.200.105.19:6642", rundir="/var/run/openvswitch", system-id="a768ca6e-905d-4aac-aa1e-d18b38dedadf"} ovn-nbctl show 2019-06-03T10:51:46Z|00001|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks ipsec column (database needs upgrade?) 2019-06-03T10:51:46Z|00002|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks options column (database needs upgrade?) switch eddff890-b515-41d3-ad49-edcae9a3197b (neutron-7489be65-074f-49f0-9cf3-c520dcd3b08d) (aka v) port 066c4c72-a1f7-4311-8d40-ed7ca0f942b3 addresses: ["fa:16:3e:a8:9d:05 192.168.2.212"] port edc6e2a9-47db-4a8a-8857-d8afa63d900d type: router router-port: lrp-edc6e2a9-47db-4a8a-8857-d8afa63d900d port provnet-7489be65-074f-49f0-9cf3-c520dcd3b08d type: localnet addresses: ["unknown"] switch 23d3676d-9d95-403e-947c-bcd4b298bde0 (neutron-7dd91bd0-10dd-4022-868c-6d17be7380f7) (aka bb) port a764f462-7897-475f-9ef0-04b7c83e44db addresses: ["fa:16:3e:cd:23:b2 10.0.0.11"] port 71247f19-21bd-4eac-b3db-94e770abb50c type: router router-port: lrp-71247f19-21bd-4eac-b3db-94e770abb50c port 659f304c-266f-4b3f-946a-b3cf4ea988c5 addresses: ["fa:16:3e:f8:5f:1b 10.0.0.9"] router 3c5d2c44-e3c4-46e9-9f43-64c1cbc7e065 (neutron-f8611590-42a1-4c6a-b433-db9ade3194a2) (aka v) port lrp-edc6e2a9-47db-4a8a-8857-d8afa63d900d mac: "fa:16:3e:06:f4:ca" networks: ["192.168.2.205/16"] gateway chassis: [311c4582-71d1-4886-baf0-1aefa5f2ceab d61a09c2-87e2-4dff-91be-82e705ab85f4] port lrp-71247f19-21bd-4eac-b3db-94e770abb50c mac: "fa:16:3e:ef:06:c6" networks: ["10.0.0.1/24"] nat 4bc0e7cf-3bdb-4725-94e4-a29b62f7d8e0 external ip: "192.168.2.205" logical ip: "10.0.0.0/24" type: "snat" liujinxin at xiangcloud.com.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: From jabach at blizzard.com Mon Jun 3 12:03:39 2019 From: jabach at blizzard.com (James Bach) Date: Mon, 3 Jun 2019 12:03:39 +0000 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: I’m OOO until next week but I’d be glad to anytime after that Jim ________________________________ From: Spyros Trigazis Sent: Monday, June 3, 2019 8:02:32 AM To: openstack-discuss at lists.openstack.org Cc: Fei Long Wang; James Bach; Erik Olof Gunnar Andersson; Bharat Kunwar Subject: [magnum] Meeting at 2019-06-04 2100 UTC Hello all, I would like to discuss moving the drivers out-of-tree, as we briefly discussed it in the PTG. Can you all make it for the next meeting [1]? This is not super urgent, but it will accelerate development and bug fixes at the driver level. Cheers, Spyros [0] https://etherpad.openstack.org/p/magnum-train-ptg [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jayachander.it at gmail.com Mon Jun 3 15:42:13 2019 From: jayachander.it at gmail.com (Jay See) Date: Mon, 3 Jun 2019 17:42:13 +0200 Subject: [Floating IP][Networking issue] Not able to connect to VM using Floating IP Message-ID: Hi, I have followed OpenStack installation guide for Queens [0][1]. In my setup: I have 3 servers. 1 controller , 2 compute nodes - with Ubuntu 16.04, behind my firewall (OpenBSD) *Issue 1:* All my severs have several NIC, but I wanted to use at least two NIC, but I am able to connect to my servers only with one of the NIC. I could not figure, what is wrong with my settings. root at h018:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback iface eth5 inet static iface eth4 inet static auto eth3 iface eth3 inet static address 10.4.15.118 netmask 255.255.255.0 network 10.4.15.0 broadcast 10.4.15.255 gateway 10.4.15.1 auto eth2 iface eth2 inet static address 10.3.15.118 netmask 255.255.255.0 network 10.3.15.0 broadcast 10.3.15.255 gateway 10.3.15.1 auto eth1 iface eth1 inet static address 10.2.14.118 netmask 255.255.255.0 network 10.2.14.0 broadcast 10.2.14.255 gateway 10.2.14.1 # The primary network interface auto eth0 iface eth0 inet static address 10.1.14.118 netmask 255.255.255.0 network 10.1.14.0 broadcast 10.1.14.255 gateway 10.1.14.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 10.1.14.1 8.8.8.8 8.8.4.4 *Issue 2:* I have completed my OpenStack installation by following [1], after creating the VM and associating the floating IP, everything is fine. But I am not able to ping or SSH to the VM. I have add the ICMP and SSH to my security group rules. I configured my l2 bridge to use Eth1, which is not reachable from firewall or this might be all together a different problem, as my VM creation is successful without any errors. root at h018:~# openstack network create --share --external --provider-physical-network provider --provider-network-type flat provider-network +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | UP | | availability_zone_hints | | | availability_zones | | | created_at | 2019-06-03T09:45:20Z | | description | | | dns_domain | None | | id | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | | ipv4_address_scope | None | | ipv6_address_scope | None | | is_default | False | | is_vlan_transparent | None | | mtu | 1500 | | name | provider-network | | port_security_enabled | True | | project_id | bb0f22d6efd64b31be6c37edc796d53e | | provider:network_type | flat | | provider:physical_network | provider | | provider:segmentation_id | None | | qos_policy_id | None | | revision_number | 5 | | router:external | External | | segments | None | | shared | True | | status | ACTIVE | | subnets | | | tags | | | updated_at | 2019-06-03T09:45:20Z | +---------------------------+--------------------------------------+ root at h018:~# root at h018:~# openstack subnet create --network provider-network \ > --allocation-pool start=XX.XX.169.101,end=XX.XX.169.250 \ > --dns-nameserver 8.8.4.4 --gateway XX.XX.169.1 \ > --subnet-range XX.XX.169.0/24 provider +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | allocation_pools | XX.XX.169.101-XX.XX.169.250 | | cidr | XX.XX.169.0/24 | | created_at | 2019-06-03T09:49:45Z | | description | | | dns_nameservers | 8.8.4.4 | | enable_dhcp | True | | gateway_ip | XX.XX.169.1 | | host_routes | | | id | 51fb740f-1f06-4f6c-93c5-3690488e3980 | | ip_version | 4 | | ipv6_address_mode | None | | ipv6_ra_mode | None | | name | provider | | network_id | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | | project_id | bb0f22d6efd64b31be6c37edc796d53e | | revision_number | 0 | | segment_id | None | | service_types | | | subnetpool_id | None | | tags | | | updated_at | 2019-06-03T09:49:45Z | +-------------------+--------------------------------------+ root at h018:~# neutron net-external-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ | id | name | tenant_id | subnets | +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | provider-network | bb0f22d6efd64b31be6c37edc796d53e | 51fb740f-1f06-4f6c-93c5-3690488e3980 XX.XX.169.0/24 | +--------------------------------------+------------------+----------------------------------+------------------------------------------------------+ root at h018:~# openstack network list +--------------------------------------+------------------+--------------------------------------+ | ID | Name | Subnets | +--------------------------------------+------------------+--------------------------------------+ | 3ee95928-012f-4a55-a0b3-e277c2d45080 | demo-network | 3427b6ac-3bc0-4529-9035-33e1ab05cb64 | | 5e8f5ec9-9a65-4259-a246-1c7f95a2f33a | provider-network | 51fb740f-1f06-4f6c-93c5-3690488e3980 | +--------------------------------------+------------------+--------------------------------------+ root at h018:~# nova list +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ | 3f8ab4c2-9047-47c4-8634-0c93cf7d7460 | test15 | ACTIVE | - | Running | demo-network=10.1.0.12, XX.XX.169.108 | +--------------------------------------+--------+--------+------------+-------------+----------------------------------------+ root at h018:~# openstack port list +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ | 037d801d-5cae-4d88-ae2d-a4289a542057 | | fa:16:3e:a6:68:7b | ip_address='10.1.0.2', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | | 327fe5fe-4288-4d80-850c-fa7d7e29d3aa | | fa:16:3e:2f:0f:dd | ip_address='XX.XX.169.101', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | ACTIVE | | 4208ac23-42bf-44ed-8b0d-af1e615b2542 | | fa:16:3e:c5:cb:94 | ip_address='XX.XX.169.108', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | N/A | (VM) | 642729e6-f84c-4742-89b2-e5924d8e188e | | fa:16:3e:37:97:eb | ip_address='XX.XX.169.107', subnet_id='51fb740f-1f06-4f6c-93c5-3690488e3980' | ACTIVE | | bf5c3061-0c40-41da-bebf-95650e055ce2 | | fa:16:3e:03:bd:f8 | ip_address='10.1.0.1', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | | fdf976c0-99c6-49e4-b3db-9f26a09da7a9 | | fa:16:3e:c0:be:e9 | ip_address='10.1.0.12', subnet_id='3427b6ac-3bc0-4529-9035-33e1ab05cb64' | ACTIVE | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+ root at h018:~# ping -c4 XX.XX.169.101 PING XX.XX.169.101 (XX.XX.169.101) 56(84) bytes of data. --- XX.XX.169.101 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3024ms root at h018:~# ping -c4 XX.XX.169.107 PING XX.XX.169.107 (XX.XX.169.107) 56(84) bytes of data. --- XX.XX.169.107 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3023ms root at h018:~# ping -c4 XX.XX.169.108 PING XX.XX.169.108 (XX.XX.169.108) 56(84) bytes of data. --- XX.XX.169.108 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3001ms root at h018:~# openstack server list +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ | 3f8ab4c2-9047-47c4-8634-0c93cf7d7460 | test15 | ACTIVE | demo-network=10.1.0.12, XX.XX.169.108 | Ubuntu16.04 | m1.small | +--------------------------------------+--------+--------+----------------------------------------+-------------+----------+ root at h018:~# ip route default via 10.1.14.1 dev eth0 10.1.14.0/24 dev eth0 proto kernel scope link src 10.1.14.118 10.2.14.0/24 dev brq5e8f5ec9-9a proto kernel scope link src 10.2.14.118 10.3.15.0/24 dev eth2 proto kernel scope link src 10.3.15.118 10.4.15.0/24 dev eth3 proto kernel scope link src 10.4.15.118 root at h018:~# ifconfig brq3ee95928-01 Link encap:Ethernet HWaddr 72:77:4f:54:6a:93 inet6 addr: fe80::4459:b6ff:feb0:3352/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:34 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3144 (3.1 KB) TX bytes:828 (828.0 B) brq5e8f5ec9-9a Link encap:Ethernet HWaddr 24:6e:96:84:25:1a inet addr:10.2.14.118 Bcast:10.2.14.255 Mask:255.255.255.0 inet6 addr: fe80::286d:e0ff:fefa:15a4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:118004 errors:0 dropped:0 overruns:0 frame:0 TX packets:10175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5834402 (5.8 MB) TX bytes:1430189 (1.4 MB) eth0 Link encap:Ethernet HWaddr 24:6e:96:84:25:18 inet addr:10.1.14.118 Bcast:10.1.14.255 Mask:255.255.255.0 inet6 addr: fe80::266e:96ff:fe84:2518/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1977142 errors:0 dropped:0 overruns:0 frame:0 TX packets:2514801 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1013827869 (1.0 GB) TX bytes:1529933345 (1.5 GB) eth1 Link encap:Ethernet HWaddr 24:6e:96:84:25:1a inet6 addr: fe80::266e:96ff:fe84:251a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2622581 errors:0 dropped:14027 overruns:0 frame:0 TX packets:327841 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:166482697 (166.4 MB) TX bytes:28701550 (28.7 MB) eth2 Link encap:Ethernet HWaddr b4:96:91:0f:cd:28 inet addr:10.3.15.118 Bcast:10.3.15.255 Mask:255.255.255.0 inet6 addr: fe80::b696:91ff:fe0f:cd28/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:272 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16452 (16.4 KB) TX bytes:2370 (2.3 KB) eth3 Link encap:Ethernet HWaddr b4:96:91:0f:cd:2a inet addr:10.4.15.118 Bcast:10.4.15.255 Mask:255.255.255.0 inet6 addr: fe80::b696:91ff:fe0f:cd2a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7546483 errors:0 dropped:0 overruns:0 frame:0 TX packets:43 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:452789254 (452.7 MB) TX bytes:2118 (2.1 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:42373349 errors:0 dropped:0 overruns:0 frame:0 TX packets:42373349 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:12244256693 (12.2 GB) TX bytes:12244256693 (12.2 GB) tap037d801d-5c Link encap:Ethernet HWaddr ba:7a:4c:72:fb:05 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9 errors:0 dropped:0 overruns:0 frame:0 TX packets:40 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1950 (1.9 KB) TX bytes:4088 (4.0 KB) tap327fe5fe-42 Link encap:Ethernet HWaddr 6e:a2:fd:08:dc:bb UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:107768 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:618 (618.0 B) TX bytes:6253098 (6.2 MB) tap642729e6-f8 Link encap:Ethernet HWaddr 5a:11:77:05:54:e0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11858 errors:0 dropped:0 overruns:0 frame:0 TX packets:94601 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:498656 (498.6 KB) TX bytes:5676060 (5.6 MB) tapbf5c3061-0c Link encap:Ethernet HWaddr 72:77:4f:54:6a:93 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9122 errors:0 dropped:0 overruns:0 frame:0 TX packets:9186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:928979 (928.9 KB) TX bytes:711090 (711.0 KB) vxlan-8 Link encap:Ethernet HWaddr a6:77:6e:2b:f7:1f UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:9186 errors:0 dropped:0 overruns:0 frame:0 TX packets:9113 errors:0 dropped:19 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:582486 (582.4 KB) TX bytes:801919 (801.9 KB) root at h018:~# If any other information is required , please let me know. I will share the info. I have seen many posts with similar issues, steps which worked for them are not working in my setup. May be I have done something wrong, not able to figure out that on my own. Thanks and regards, Jayachander. [0] https://docs.openstack.org/install-guide/. [1] https://docs.openstack.org/install-guide/openstack-services.html#minimal-deployment-for-queens -- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Mon Jun 3 16:27:42 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 3 Jun 2019 11:27:42 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible Message-ID: Hello Stackers, I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. [0] - https://review.opendev.org/662763 [1] - https://opendev.org/openstack/tripleo-ansible -- Kevin Carter IRC: cloudnull -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Mon Jun 3 17:07:23 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 13:07:23 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: > > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. > > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. +1 > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. Most definitely in. We've had great success working with the TripleO team on integrating the Tempest role and on the OSA side, we'd be more than happy to help try and converge our roles to maintain them together. If there's any meetings or anything that will be scheduled, I'd be happy to attend. > [0] - https://review.opendev.org/662763 > [1] - https://opendev.org/openstack/tripleo-ansible > -- > > Kevin Carter > IRC: cloudnull -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From kecarter at redhat.com Mon Jun 3 17:42:19 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 3 Jun 2019 12:42:19 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: > > > > Hello Stackers, > > > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > > > The effort to convert tripleo Puppet and heat templates with embedded > Ansible to a more consumable set of playbooks and roles is in full effect. > As we're working through this effort we believe co-locating all of the > Ansible tasks/roles/libraries/plugins throughout the code base into a > single purpose-built repository will assist us in streamlining and > simplifying. Structurally, at this time, most of tripleo will remain the > same. However, the inclusion of tripleo-Ansible will allow us to create > more focused solutions which are independently testable, much easier > understand, and simple to include into the current heat template deployment > methodologies. While a straight port of the existing Ansible tasks will not > be entirely possible, the goal of this ongoing effort will be zero impact > on our existing workflow and solutions. > > > > To reigniting this effort, I've put up a review to create a new > "transformation" squad[0] geared toward building the structure around > tripleo-ansible[1] and converting our current solutions into > roles/playbooks/libraries/plugins. Initially, we'll be focused on our > existing code base; however, long term, I believe it makes sense for this > squad to work across projects to breakdown deployment barriers for folks > using similar technologies. > > +1 > > > We're excited to get this effort rolling again and would love to work > with anyone and everyone throughout the community. If folks are interested > in this effort, please let us know. > > Most definitely in. We've had great success working with the TripleO team > on > integrating the Tempest role and on the OSA side, we'd be more than happy > to help try and converge our roles to maintain them together. > ++ > If there's any meetings or anything that will be scheduled, I'd be > happy to attend. > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. > > [0] - https://review.opendev.org/662763 > > [1] - https://opendev.org/openstack/tripleo-ansible > > -- > > > > Kevin Carter > > IRC: cloudnull > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Mon Jun 3 18:20:24 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Mon, 3 Jun 2019 19:20:24 +0100 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> I am really happy to hear about that as this could be much more effective than having an uncontrollable number of roles scattered across lots of repositories which usually do not play very nice with each other. I hope that testing these roles using molecule (official ansible testing platform) is also part of this plan. Cheers Sorin > On 3 Jun 2019, at 18:42, Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser > wrote: > On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter > wrote: > > > > Hello Stackers, > > > > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. > > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > > > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. > > > > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. > > +1 > > > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. > > Most definitely in. We've had great success working with the TripleO team on > integrating the Tempest role and on the OSA side, we'd be more than happy > to help try and converge our roles to maintain them together. > > ++ > > If there's any meetings or anything that will be scheduled, I'd be > happy to attend. > > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. > > > [0] - https://review.opendev.org/662763 > > [1] - https://opendev.org/openstack/tripleo-ansible > > -- > > > > Kevin Carter > > IRC: cloudnull > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 18:21:30 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 19:21:30 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, 3 Jun 2019, 12:57 Jim Rollenhagen, wrote: > I don't think I have enough coffee in me to fully digest this, but wanted > to > point out a couple of things. FWIW, this is something I've thought we > should do > for a while now. > > On Sat, Jun 1, 2019 at 8:43 AM Mohammed Naser wrote: > >> Hi everyone, >> >> This is something that I've discussed with a few people over time and >> I think I'd probably want to bring it up by now. I'd like to propose >> and ask if it makes sense to perhaps replace devstack entirely with >> openstack-ansible. I think I have quite a few compelling reasons to >> do this that I'd like to outline, as well as why I *feel* (and I could >> be biased here, so call me out!) that OSA is the best option in terms >> of a 'replacement' >> >> # Why not another deployment project? >> I actually thought about this part too and considered this mainly for >> ease of use for a *developer*. >> >> At this point, Puppet-OpenStack pretty much only deploys packages >> (which means that it has no build infrastructure, a developer can't >> just get $commit checked out and deployed). >> >> TripleO uses Kolla containers AFAIK and those have to be pre-built >> beforehand, also, I feel they are much harder to use as a developer >> because if you want to make quick edits and restart services, you have >> to enter a container and make the edit there and somehow restart the >> service without the container going back to it's original state. >> Kolla-Ansible and the other combinations also suffer from the same >> "issue". >> > > FWIW, kolla-ansible (and maybe tripleo?) has a "development" mode which > mounts > the code as a volume, so you can make edits and just run "docker restart > $service". Though systemd does make that a bit nicer due to globs (e.g. > systemctl restart nova-*). > > That said, I do agree moving to something where systemd is running the > services > would make for a smoother transition for developers. > > >> >> OpenStack Ansible is unique in the way that it pretty much just builds >> a virtualenv and installs packages inside of it. The services are >> deployed as systemd units. This is very much similar to the current >> state of devstack at the moment (minus the virtualenv part, afaik). >> It makes it pretty straight forward to go and edit code if you >> need/have to. We also have support for Debian, CentOS, Ubuntu and >> SUSE. This allows "devstack 2.0" to have far more coverage and make >> it much more easy to deploy on a wider variety of operating systems. >> It also has the ability to use commits checked out from Zuul so all >> the fancy Depends-On stuff we use works. >> >> # Why do we care about this, I like my bash scripts! >> As someone who's been around for a *really* long time in OpenStack, >> I've seen a whole lot of really weird issues surface from the usage of >> DevStack to do CI gating. For example, one of the recent things is >> the fact it relies on installing package-shipped noVNC, where as the >> 'master' noVNC has actually changed behavior a few months back and it >> is completely incompatible at this point (it's just a ticking thing >> until we realize we're entirely broken). >> >> To this day, I still see people who want to POC something up with >> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> how many warnings we'll put up, they'll always try to do it. With >> this way, at least they'll have something that has the shape of an >> actual real deployment. In addition, it would be *good* in the >> overall scheme of things for a deployment system to test against, >> because this would make sure things don't break in both ways. >> > > ++ > > >> >> Also: we run Zuul for our CI which supports Ansible natively, this can >> remove one layer of indirection (Zuul to run Bash) and have Zuul run >> the playbooks directly from the executor. >> >> # So how could we do this? >> The OpenStack Ansible project is made of many roles that are all >> composable, therefore, you can think of it as a combination of both >> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> the base modules (i.e. puppet-nova, etc) and TripleO was the >> integration of all of it in a distribution. OSA is currently both, >> but it also includes both Ansible roles and playbooks. >> >> In order to make sure we maintain as much of backwards compatibility >> as possible, we can simply run a small script which does a mapping of >> devstack => OSA variables to make sure that the service is shipped >> with all the necessary features as per local.conf. >> > > ++ > This strikes me as being a considerable undertaking, that would never get full compatibility due to the lack of a defined API. It might get close with a bit of effort. I expect there are scripts and plugins that don't have an analogue in OSA (ironic, I'm looking at you). > > >> >> So the new process could be: >> >> 1) parse local.conf and generate Ansible variables files >> 2) install Ansible (if not running in gate) >> 3) run playbooks using variable generated in #1 >> >> The neat thing is after all of this, devstack just becomes a thin >> wrapper around Ansible roles. I also think it brings a lot of hands >> together, involving both the QA team and OSA team together, which I >> believe that pooling our resources will greatly help in being able to >> get more done and avoiding duplicating our efforts. >> >> # Conclusion >> This is a start of a very open ended discussion, I'm sure there is a >> lot of details involved here in the implementation that will surface, >> but I think it could be a good step overall in simplifying our CI and >> adding more coverage for real potential deployers. It will help two >> teams unite together and have more resources for something (that >> essentially is somewhat of duplicated effort at the moment). >> >> I will try to pick up sometime to POC a simple service being deployed >> by an OSA role instead of Bash, placement which seems like a very >> simple one and share that eventually. >> >> Thoughts? :) >> > > The reason this hasn't been pushed on in the past is to avoid the > perception > that the TC or QA team is choosing a "winner" in the deployment space. I > don't > think that's a good reason not to do something like this (especially with > the > drop in contributors since I've had that discussion). However, we do need > to > message this carefully at a minimum. > > With my Kolla hat on, this does concern me. If you're trying out OpenStack and spend enough quality time with OSA to become familiar with it, you're going to be less inclined to do your homework on deployment tools. It would be nice if the deployment space wasn't so fragmented, but we all have our reasons. > >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 3 18:28:02 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 3 Jun 2019 19:28:02 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, 3 Jun 2019, 15:59 Clark Boylan, wrote: > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software > for many of our dependencies. Everything from the kernel to the database to > rabbitmq to ovs (and so on) are consumed as prebuilt packages from our > distros. In many cases this is desirable to ensure that our software work > with the other software out there in the wild that people will be deploying > with. > > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy > this new development stack you should run that same wrapper in CI. This > ensure the wrapper doesn't break. > > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up > yet. The first is devstack's (lack of) speed. Any replacement should be at > least as quick as the current tooling because the current tooling is slow > enough already. This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast iteration for development. I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. The other is logging. I spend a lot of time helping people to debug CI job > runs and devstack has grown a fairly effective set of logging that just > about any time I have to help debug another deployment tool's CI jobs I > miss (because they tend to log only a tiny fraction of what devstack logs). > > Clark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jun 3 18:37:25 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 3 Jun 2019 13:37:25 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: Hi Madhuri- > For this purpose, we would need to change a trait of the server’s > flavor in Nova. This trait is mapped to a deploy step in Ironic > which does some operation(change BIOS config and reboot in this use > case).____ If your trait was something that wasn't tracked in the flavor (or elsewhere in the instance's db record), you could just update it directly in placement. Then you'd have to figure out how to make ironic notice that and effect the change. (Or perhaps the other way around: tell ironic you want to make the change, and it updates the trait in placement as part of the process.) > In Nova, the only API to change trait in flavor is resize whereas > resize does migration and a reboot as well.____ > > In short, I am  looking for a Nova API that only changes the traits, > and trigger the ironic deploy steps but no reboot and migration. > Please suggest.____ It's inconvenient, but I'm afraid "resize" is the right way to get this done, because that's the only way to get the appropriate validation and changes effected in the general case. Now, there's a spec [1] we've been talking about for ~4.5 years that would let you do a resize without rebooting, when only a certain subset of properties are being changed. It is currently proposed for "upsizing" CPU, memory, and disk, and adding PCI devices, but clearly this ISS configuration would be a reasonable candidate to include. In fact, it's possible that leading the charge with something this unobtrusive would reduce some of the points of contention that have stalled the blueprint up to this point. Food for thought. Thanks, efried [1] https://review.opendev.org/#/c/141219/ From lshort at redhat.com Mon Jun 3 18:57:13 2019 From: lshort at redhat.com (Luke Short) Date: Mon, 3 Jun 2019 14:57:13 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> References: <620D70C5-5EFD-42FF-A647-03164FA41A28@redhat.com> Message-ID: Hey Sorin, I'm glad to see you are excited as we are about this effort! Since you are one of the core developers of Molecule, I was hoping to get some of your insight on the work we have started in that regard. I have a few patches up for adding in Molecule tests to tripleo-common. At a later point in time, we can discuss the transition of moving all of the Ansible content into the tripleo-ansible repository. https://review.opendev.org/#/c/662803/ https://review.opendev.org/#/c/662577/ The first is to add the common files that will be used across most, if not all, of the Molecule tests in this repository. The second patch is where I actually implement Molecule tests and symlinks to those common files in the first patch. I wanted to get your thoughts on a few things. 1. How would we hook in the Molecule tests into tox so that it will be tested by CI? Do you have an example of this already being done? I believe from previous discussions you have already added a few Molecule tests to a TripleO repository before. Kevin also had a good ideal of creating an isolated Zuul job so that can be something we can investigate as well. 2. How should we handle the actual tests? In the second patch, I used the playbook.yaml to write the test in a playbook format (the actual test happens during the post_tasks phase). I have always done Molecule testing this way to keep things simple. However, would you recommend that we use the Python testinfra library instead to make sure that certain things exist? Thanks for any input you may have! Luke Short, RHCE Software Engineer, OpenStack Deployment Framework Red Hat, Inc. On Mon, Jun 3, 2019 at 2:29 PM Sorin Sbarnea wrote: > I am really happy to hear about that as this could be much more effective > than having an uncontrollable number of roles scattered across lots of > repositories which usually do not play very nice with each other. > > I hope that testing these roles using molecule (official ansible testing > platform) is also part of this plan. > > Cheers > Sorin > > On 3 Jun 2019, at 18:42, Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser > wrote: > >> On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: >> > >> > Hello Stackers, >> > >> > I wanted to follow up on this post from last year, pick up from where >> it left off, and bring together a squad to get things moving. >> > >> > > >> http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html >> > >> > The effort to convert tripleo Puppet and heat templates with embedded >> Ansible to a more consumable set of playbooks and roles is in full effect. >> As we're working through this effort we believe co-locating all of the >> Ansible tasks/roles/libraries/plugins throughout the code base into a >> single purpose-built repository will assist us in streamlining and >> simplifying. Structurally, at this time, most of tripleo will remain the >> same. However, the inclusion of tripleo-Ansible will allow us to create >> more focused solutions which are independently testable, much easier >> understand, and simple to include into the current heat template deployment >> methodologies. While a straight port of the existing Ansible tasks will not >> be entirely possible, the goal of this ongoing effort will be zero impact >> on our existing workflow and solutions. >> > >> > To reigniting this effort, I've put up a review to create a new >> "transformation" squad[0] geared toward building the structure around >> tripleo-ansible[1] and converting our current solutions into >> roles/playbooks/libraries/plugins. Initially, we'll be focused on our >> existing code base; however, long term, I believe it makes sense for this >> squad to work across projects to breakdown deployment barriers for folks >> using similar technologies. >> >> +1 >> >> > We're excited to get this effort rolling again and would love to work >> with anyone and everyone throughout the community. If folks are interested >> in this effort, please let us know. >> >> Most definitely in. We've had great success working with the TripleO >> team on >> integrating the Tempest role and on the OSA side, we'd be more than happy >> to help try and converge our roles to maintain them together. >> > > ++ > > >> If there's any meetings or anything that will be scheduled, I'd be >> happy to attend. >> >> > its still very early but I expect to begin regular meetings (even if > they're just impromptu IRC conversations to begin with) to work out what > needs to be done and where we can begin collaborating with other folks. > As soon as we have more I'll be sure to reach out here and on IRC. > > >> > [0] - https://review.opendev.org/662763 >> > [1] - https://opendev.org/openstack/tripleo-ansible >> > -- >> > >> > Kevin Carter >> > IRC: cloudnull >> >> >> >> -- >> Mohammed Naser — vexxhost >> ----------------------------------------------------- >> D. 514-316-8872 >> D. 800-910-1726 ext. 200 >> E. mnaser at vexxhost.com >> W. http://vexxhost.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From MM9745 at att.com Mon Jun 3 19:32:02 2019 From: MM9745 at att.com (MCEUEN, MATT) Date: Mon, 3 Jun 2019 19:32:02 +0000 Subject: Elections for Airship In-Reply-To: References: <7C64A75C21BB8D43BD75BB18635E4D89709A2256@MOSTLS1MSGUSRFF.ITServices.sbc.com> <20190530223625.7ao2hmxlrrj3ny4b@yuggoth.org> Message-ID: <7C64A75C21BB8D43BD75BB18635E4D89709A8D00@MOSTLS1MSGUSRFF.ITServices.sbc.com> Thanks Kendall, Jeremy, Sean – this is very helpful! I think this gives us the tools we need to run a successful election; if we do have any questions I’ll let y’all know. Appreciate your help, Matt From: Kendall Nelson Sent: Monday, June 3, 2019 8:47 AM To: Jeremy Stanley Cc: OpenStack Discuss Subject: Re: Elections for Airship Might also be helpful to look at our document that outlines the process we go through[1]. If you have any questions, let us know! -Kendall (diablo_rojo) [1] https://opendev.org/openstack/election/src/branch/master/README.rst On Thu, May 30, 2019 at 3:37 PM Jeremy Stanley > wrote: On 2019-05-30 19:04:56 +0000 (+0000), MCEUEN, MATT wrote: > OpenStack Infra team, The OpenStack Infrastructure team hasn't been officially involved in running technical elections for OpenStack for several years now (subject tag removed accordingly). With the advent of Gerrit's REST API, contributor data can be queried and assembled anonymously by anyone. While I happen to be involved in these activities for longer than that's been the case, I'll be answering while wearing my OpenStack Technical Election Official hat throughout the remainder of this reply. > As the Airship project works to finalize our governance and > elected positions [1], we need to be ready to hold our first > elections. I wanted to reach out and ask for any experience, > guidance, materials, or tooling you can share that would help this > run correctly and smoothly? This is an area where the Airship team > doesn't have much experience so we may not know the right > questions to ask. > > Aside from a member of the Airship community creating a poll in > CIVS [2], is there anything else you would recommend? Is there any > additional tooling in place in the OpenStack world? Any potential > pitfalls, or other hard-won advice for us? [...] As Sean mentioned in his reply, the OpenStack community has been building and improving tooling in the openstack/election Git repository on OpenDev over the past few years. The important bits (in my opinion) center around querying Gerrit for a list of contributors whose changes have merged to sets of official project repositories within a qualifying date range. I've recently been assisting StarlingX's election officials with a similar request, and do have some recommendations. Probably the best place to start is adding an official structured dataset with your team/project information following the same schema used by OpenStack[0] and now StarlingX[1], then applying a couple of feature patches[2][3] (if they haven't merged by the time you read this) to the openstack/election master branch. After that, you ought to be able to run something along the lines of: tox -e venv -- owners --after 2018-05-30 --before 2019-05-31 --nonmember --outdir airship-electorate --projects ../../airship/governance/projects.yaml --ref master (Note that the --after and --before dates work like in Gerrit's query language and carry with them an implied midnight UTC, so one is the actual start date but the other is the day after the end date; "on or after" and "before but not on" is how I refer to them in prose.) You'll see the resulting airship-electorate directory includes a lot of individual files. There are two basic types: .yaml files which are structured data meant for human auditing as well as scripted analysis, and .txt files which are a strict list of one Gerrit preferred E-mail address per line for each voter (the format expected by the https://civs.cs.cornell.edu/ voting service). It's probably also obvious that there are sets of these named for each team in your governance, as well as a set which start with underscore (_). The former represent contributions to the deliverable repositories of each team, while the latter are produced from an aggregate of all deliverable repositories for all teams (this is what you might use for electing an Airship-wide governing body). There are a couple of extra underscore files... _duplicate_owners.yaml includes information on deduplicated entries for contributors where the script was able to detect more than one Gerrit account for the same individual, while the _invites.csv file isn't really election-related at all and is what the OSF normally feeds into the automation which sends event discounts to contributors. In case you're curious about the _invites.csv file, the first column is the OSF member ID (if known) or 0 (if no matching membership was found), the second column is the display name from Gerrit, the third column is the preferred E-mail address from Gerrit (this corresponds to the address used for the _electorate.txt file), and any subsequent columns are the extra non-preferred addresses configured in Gerrit for that account. Please don't hesitate to follow up with any additional questions you might have! [0] https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml [1] https://opendev.org/starlingx/governance/src/branch/master/reference/tsc/projects.yaml [2] https://review.opendev.org/661647 [3] https://review.opendev.org/661648 -- Jeremy Stanley -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Mon Jun 3 19:45:47 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Mon, 03 Jun 2019 12:45:47 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Sat, Jun 1, 2019, at 05:36, Mohammed Naser wrote: > Hi everyone, > > This is something that I've discussed with a few people over time and > I think I'd probably want to bring it up by now. I'd like to propose > and ask if it makes sense to perhaps replace devstack entirely with > openstack-ansible. I think I have quite a few compelling reasons to > do this that I'd like to outline, as well as why I *feel* (and I could > be biased here, so call me out!) that OSA is the best option in terms > of a 'replacement' You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. Colleen > > # Why not another deployment project? > I actually thought about this part too and considered this mainly for > ease of use for a *developer*. > > At this point, Puppet-OpenStack pretty much only deploys packages > (which means that it has no build infrastructure, a developer can't > just get $commit checked out and deployed). > > TripleO uses Kolla containers AFAIK and those have to be pre-built > beforehand, also, I feel they are much harder to use as a developer > because if you want to make quick edits and restart services, you have > to enter a container and make the edit there and somehow restart the > service without the container going back to it's original state. > Kolla-Ansible and the other combinations also suffer from the same > "issue". > > OpenStack Ansible is unique in the way that it pretty much just builds > a virtualenv and installs packages inside of it. The services are > deployed as systemd units. This is very much similar to the current > state of devstack at the moment (minus the virtualenv part, afaik). > It makes it pretty straight forward to go and edit code if you > need/have to. We also have support for Debian, CentOS, Ubuntu and > SUSE. This allows "devstack 2.0" to have far more coverage and make > it much more easy to deploy on a wider variety of operating systems. > It also has the ability to use commits checked out from Zuul so all > the fancy Depends-On stuff we use works. > > # Why do we care about this, I like my bash scripts! > As someone who's been around for a *really* long time in OpenStack, > I've seen a whole lot of really weird issues surface from the usage of > DevStack to do CI gating. For example, one of the recent things is > the fact it relies on installing package-shipped noVNC, where as the > 'master' noVNC has actually changed behavior a few months back and it > is completely incompatible at this point (it's just a ticking thing > until we realize we're entirely broken). > > To this day, I still see people who want to POC something up with > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > how many warnings we'll put up, they'll always try to do it. With > this way, at least they'll have something that has the shape of an > actual real deployment. In addition, it would be *good* in the > overall scheme of things for a deployment system to test against, > because this would make sure things don't break in both ways. > > Also: we run Zuul for our CI which supports Ansible natively, this can > remove one layer of indirection (Zuul to run Bash) and have Zuul run > the playbooks directly from the executor. > > # So how could we do this? > The OpenStack Ansible project is made of many roles that are all > composable, therefore, you can think of it as a combination of both > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > the base modules (i.e. puppet-nova, etc) and TripleO was the > integration of all of it in a distribution. OSA is currently both, > but it also includes both Ansible roles and playbooks. > > In order to make sure we maintain as much of backwards compatibility > as possible, we can simply run a small script which does a mapping of > devstack => OSA variables to make sure that the service is shipped > with all the necessary features as per local.conf. > > So the new process could be: > > 1) parse local.conf and generate Ansible variables files > 2) install Ansible (if not running in gate) > 3) run playbooks using variable generated in #1 > > The neat thing is after all of this, devstack just becomes a thin > wrapper around Ansible roles. I also think it brings a lot of hands > together, involving both the QA team and OSA team together, which I > believe that pooling our resources will greatly help in being able to > get more done and avoiding duplicating our efforts. > > # Conclusion > This is a start of a very open ended discussion, I'm sure there is a > lot of details involved here in the implementation that will surface, > but I think it could be a good step overall in simplifying our CI and > adding more coverage for real potential deployers. It will help two > teams unite together and have more resources for something (that > essentially is somewhat of duplicated effort at the moment). > > I will try to pick up sometime to POC a simple service being deployed > by an OSA role instead of Bash, placement which seems like a very > simple one and share that eventually. > > Thoughts? :) > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > From feilong at catalyst.net.nz Mon Jun 3 20:01:07 2019 From: feilong at catalyst.net.nz (feilong) Date: Tue, 4 Jun 2019 08:01:07 +1200 Subject: [magnum] Meeting at 2019-06-04 2100 UTC In-Reply-To: References: Message-ID: <738dca30-1719-7034-ead1-b4c906681184@catalyst.net.nz> Thanks bringing this topic. Yes, we can discuss it on next weekly meeting. I have added it in our agenda https://wiki.openstack.org/wiki/Meetings/Containers On 4/06/19 12:02 AM, Spyros Trigazis wrote: > Hello all, > > I would like to discuss moving the drivers out-of-tree, as > we briefly discussed it in the PTG. Can you all make it for the > next meeting [1]? > > This is not super urgent, but it will accelerate development and bug > fixes at the driver level. > > Cheers, > Spyros > > [0] https://etherpad.openstack.org/p/magnum-train-ptg > [1] https://www.timeanddate.com/worldclock/fixedtime.html?msg=magnum-meeting&iso=20190604T21 -- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang at catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------ From mnaser at vexxhost.com Mon Jun 3 21:32:11 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Mon, 3 Jun 2019 17:32:11 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 3:51 PM Colleen Murphy wrote: > > On Sat, Jun 1, 2019, at 05:36, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. That's fair. My argument was that we have a QA team that is strapped for resources which is doing the same work as the OSA team as working on, so most probably deduplicating efforts can help us get more things done because work can split across more people now. I do totally get people might not want to do it. That's fine, it is after all a proposal and if the majority of the community feels like devstack is okay, and the amount of maintainers it has is fine, then I wouldn't want to change that either. > Colleen > > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > > > -- > > Mohammed Naser — vexxhost > > ----------------------------------------------------- > > D. 514-316-8872 > > D. 800-910-1726 ext. 200 > > E. mnaser at vexxhost.com > > W. http://vexxhost.com > > > > > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From egle.sigler at rackspace.com Mon Jun 3 21:34:03 2019 From: egle.sigler at rackspace.com (Egle Sigler) Date: Mon, 3 Jun 2019 21:34:03 +0000 Subject: [interop] [refstack] Interop WG Meeting this Wed. 11:00 AM CST/ 16:00 UTC Message-ID: Hello Everyone, We will be holding Interop WG meetings this Wednesday, at 1600 UTC in #openstack-meeting-3, everyone welcome. Etherpad for Wednesday’s meeting: https://etherpad.openstack.org/p/InteropWhistler.23 Please add items to the agenda. Web IRC link if you are not using IRC client: http://webchat.freenode.net/?channels=openstack-meeting-3 Meetbot quick reference guide: http://meetbot.debian.net/Manual.html#user-reference If you have general interop questions, please ask in #openstack-interopIRC channel. Thank you, Egle -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtroyer at gmail.com Tue Jun 4 03:36:24 2019 From: dtroyer at gmail.com (Dean Troyer) Date: Mon, 3 Jun 2019 22:36:24 -0500 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: [I have been trying to decide where to jump in here, this seems as good a place as any.] On Mon, Jun 3, 2019 at 2:48 PM Colleen Murphy wrote: > You laid out three reasons below to switch, and to be frank, I don't find any of them compelling. This is tooling that hundreds of people and machines rely on and are familiar with, and to undertake a massive change like this deserves some *really* compelling, even *dire*, rationalization for it, and metrics showing it is better than the old thing. This thread reads as proposing change for the sake of change. Colleen makes a great point here about the required scope of this proposal to actually be a replacement for DevStack... A few of us have wanted to replace DevStack with something better probably since a year after we introduced it in Boston (the first time). The primary problems with replacing it are both technical and business/political. There have been two serious attempts, the first was what became harlowja's Anvil project, which actually had different goals than DevStack, and the second was discussed at the first PTG in Atlanta as an OSA-based orchestrator that could replace parts incrementally and was going to (at least partially) leverage Zuul v3. That died with the rest of OSIC (RIP). The second proposal was very similar to mnaser's current one To actually _replace_ DevStack you have to meet a major fraction of its use cases, which are more than anyone imagined back in the day. Both prior attempts to replace it did not address all of the use cases and (I believe) that limited the number of people willing or able to get involved. Anything short of complete replacement fails to meet the 'deduplication of work' goal... (see https://xkcd.com/927/). IMHO the biggest problem here is finding anyone who is willing to fund this work. It is a huge project that will only count toward a sponsor company's stats in an area they usually do not pay much attention toward. I am not trying to throw cold water on this, I will gladly support from a moderate distance any effort to rid us of DevStack. I believe that knowing where things have been attempted in the past will either inform how to approach it differently now or identify what in our community has changed to make trying again worthwhile. Go for it! dt -- Dean Troyer dtroyer at gmail.com From amotoki at gmail.com Tue Jun 4 04:00:03 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Tue, 4 Jun 2019 13:00:03 +0900 Subject: [neutron] bug deputy report (week of May 27) Message-ID: Hi neutrinos, I was a neutron bug deputy last week (which covered from May 26 to Jun 2). Here is the bug deputy report from me. Last week was relatively quiet, but we got a couple of bugs on performance. Two of them seems related to RBAC mechanism and they are worth got attentions from the team. --- Get external networks too slowly because it would join subnet and rbac https://bugs.launchpad.net/neutron/+bug/1830630 Medium, New, loadimpact Security groups RBAC cause a major performance degradation https://bugs.launchpad.net/neutron/+bug/1830679 High, New, loadimpact Needs attentions amotoki is looking into it but more eyes would be appreciated. Improper close connection to database leading to mysql/mariadb block connection https://bugs.launchpad.net/neutron/+bug/1831009 Undecided, New This looks like a generic issue related to oslo.db, but it is better to get attentions in neutron side too as it happens in neutron. Debug neutron-tempest-plugin-dvr-multinode-scenario failures https://bugs.launchpad.net/neutron/+bug/1830763 High, Confirmed, assigned to mlavalle Best Regards Akihiro Motoki (irc: amotoki) From gael.therond at gmail.com Tue Jun 4 07:43:46 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 09:43:46 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia Message-ID: Hi guys, I’ve a weird situation here. I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. Cheers guys! -------------- next part -------------- An HTML attachment was scrubbed... URL: From madhuri.kumari at intel.com Tue Jun 4 07:47:25 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Tue, 4 Jun 2019 07:47:25 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC0142D@BGSMSX101.gar.corp.intel.com> Hi Mark, Replied inline. From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Monday, June 3, 2019 2:16 PM To: Kumari, Madhuri Cc: openstack-discuss at lists.openstack.org Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning On Mon, 3 Jun 2019 at 06:57, Kumari, Madhuri > wrote: Hi Ironic, Nova Developers, I am currently working on implementing Intel Speed Select(ISS) feature[1] in Ironic and I have a use case where I want to change ISS configuration in BIOS after a node is provisioned. Such use case of changing the configuration post deployment is common and not specific to ISS. A real-life example for such a required post-deploy configuration change is the change of BIOS settings to disable hyper-threading in order to address a security vulnerability. Currently there is no way of changing any BIOS configuration after a node is provisioned in Ironic. One solution for it is to allow manual deploy steps in Ironic[2](not implemented yet) which can be trigged by changing traits in Nova. For this purpose, we would need to change a trait of the server’s flavor in Nova. This trait is mapped to a deploy step in Ironic which does some operation(change BIOS config and reboot in this use case). In Nova, the only API to change trait in flavor is resize whereas resize does migration and a reboot as well. In short, I am looking for a Nova API that only changes the traits, and trigger the ironic deploy steps but no reboot and migration. Please suggest. Hi, it is possible to modify a flavor (openstack flavor set --property =). However, changes to a flavor are not reflected in instances that were previously created from that flavor. Internally, nova stores an 'embedded flavor' in the instance state. I'm not aware of any API that would allow modifying the embedded flavor, nor any process that would synchronise those changes to ironic. The resize API in Nova allows changing the flavor of an instance. It does migration and reboots. But the API is not implemented for IronicDriver. Though this doesn’t match our use case but seems to be the only available one that allows changing a flavor and ultimately a trait. Thanks in advance. Regards, Madhuri [1] https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/support-intel-speed-select.html [2] https://storyboard.openstack.org/#!/story/2005129 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Tue Jun 4 07:56:31 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Tue, 4 Jun 2019 08:56:31 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: Message-ID: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> I am in favour of ditching or at least refactoring devstack because during the last year I often found myself blocked from fixing some zuul/jobs issues because the buggy code was still required by legacy devstack jobs that nobody had time maintain or fix, so they were isolated and the default job configurations were forced to use dirty hack needed for keeping these working. One such example is that there is a task that does a "chmod -R 0777 -R" on the entire source tree, a total security threat. In order to make other jobs running correctly* I had to rely undoing the damage done by such chmod because I was not able to disable the historical hack. * ansible throws warning with unsafe file permissions * ssh refuses to load unsafe keys That is why I am in favor of dropping features that are slowing down the progress of others. I know that the reality is more complicated but I also think that sometimes less* is more. * deployment projects ;) > On 4 Jun 2019, at 04:36, Dean Troyer wrote: > > > > On Mon, 3 Jun 2019, 15:59 Clark Boylan, > wrote: > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > Hi everyone, > > > > This is something that I've discussed with a few people over time and > > I think I'd probably want to bring it up by now. I'd like to propose > > and ask if it makes sense to perhaps replace devstack entirely with > > openstack-ansible. I think I have quite a few compelling reasons to > > do this that I'd like to outline, as well as why I *feel* (and I could > > be biased here, so call me out!) that OSA is the best option in terms > > of a 'replacement' > > > > # Why not another deployment project? > > I actually thought about this part too and considered this mainly for > > ease of use for a *developer*. > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > (which means that it has no build infrastructure, a developer can't > > just get $commit checked out and deployed). > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > beforehand, also, I feel they are much harder to use as a developer > > because if you want to make quick edits and restart services, you have > > to enter a container and make the edit there and somehow restart the > > service without the container going back to it's original state. > > Kolla-Ansible and the other combinations also suffer from the same > > "issue". > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > a virtualenv and installs packages inside of it. The services are > > deployed as systemd units. This is very much similar to the current > > state of devstack at the moment (minus the virtualenv part, afaik). > > It makes it pretty straight forward to go and edit code if you > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > it much more easy to deploy on a wider variety of operating systems. > > It also has the ability to use commits checked out from Zuul so all > > the fancy Depends-On stuff we use works. > > > > # Why do we care about this, I like my bash scripts! > > As someone who's been around for a *really* long time in OpenStack, > > I've seen a whole lot of really weird issues surface from the usage of > > DevStack to do CI gating. For example, one of the recent things is > > the fact it relies on installing package-shipped noVNC, where as the > > 'master' noVNC has actually changed behavior a few months back and it > > is completely incompatible at this point (it's just a ticking thing > > until we realize we're entirely broken). > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. In many cases this is desirable to ensure that our software work with the other software out there in the wild that people will be deploying with. > > > > > To this day, I still see people who want to POC something up with > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > how many warnings we'll put up, they'll always try to do it. With > > this way, at least they'll have something that has the shape of an > > actual real deployment. In addition, it would be *good* in the > > overall scheme of things for a deployment system to test against, > > because this would make sure things don't break in both ways. > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > the playbooks directly from the executor. > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run that same wrapper in CI. This ensure the wrapper doesn't break. > > > > > # So how could we do this? > > The OpenStack Ansible project is made of many roles that are all > > composable, therefore, you can think of it as a combination of both > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > integration of all of it in a distribution. OSA is currently both, > > but it also includes both Ansible roles and playbooks. > > > > In order to make sure we maintain as much of backwards compatibility > > as possible, we can simply run a small script which does a mapping of > > devstack => OSA variables to make sure that the service is shipped > > with all the necessary features as per local.conf. > > > > So the new process could be: > > > > 1) parse local.conf and generate Ansible variables files > > 2) install Ansible (if not running in gate) > > 3) run playbooks using variable generated in #1 > > > > The neat thing is after all of this, devstack just becomes a thin > > wrapper around Ansible roles. I also think it brings a lot of hands > > together, involving both the QA team and OSA team together, which I > > believe that pooling our resources will greatly help in being able to > > get more done and avoiding duplicating our efforts. > > > > # Conclusion > > This is a start of a very open ended discussion, I'm sure there is a > > lot of details involved here in the implementation that will surface, > > but I think it could be a good step overall in simplifying our CI and > > adding more coverage for real potential deployers. It will help two > > teams unite together and have more resources for something (that > > essentially is somewhat of duplicated effort at the moment). > > > > I will try to pick up sometime to POC a simple service being deployed > > by an OSA role instead of Bash, placement which seems like a very > > simple one and share that eventually. > > > > Thoughts? :) > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough already. > > This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast iteration for development. > > I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. > > The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss (because they tend to log only a tiny fraction of what devstack logs). > > Clark -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcin.juszkiewicz at linaro.org Tue Jun 4 08:30:04 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Tue, 4 Jun 2019 10:30:04 +0200 Subject: [kolla] Python 3 status update Message-ID: <70478b1f-4de6-7e25-b115-01b74e2cec57@linaro.org> How we stand with Python 3 move in Kolla(-ansible) project? Quite good I would say but there are some issues still. # Kolla ## Debian/Ubuntu source Patch for Debian/Ubuntu source images [1] got 24th revision and depends on "make sure that there is /var/run/apache2 dir' patch [2]. CI jobs run fine except 'kolla-ansible-ubuntu-source-ceph' one where 'openstack image create' step fails in 'Run deploy.sh script' [3]. **Help needed to find out why it fails there as I am out of ideas.** On x86-64 I was able to deploy all-in-one setup using ubuntu/source images. Debian/source images require us to first do Ansible upgrade as 'kolla-toolbox' image contains 2.2 version which fails to run with Python 3.7 present in Debian 'buster'. We agreed to go for Ansible 2.7/2.8 version. On AArch64 we have issue with RabbitMQ container failing to run (restarts all over again). Possible fix on a way. 1. https://review.opendev.org/#/c/642375 2. https://review.opendev.org/#/c/661713 3. http://logs.openstack.org/75/642375/24/check/kolla-ansible-ubuntu-source-ceph/7650efd/ara-report/result/3f8beadd-8f66-472f-ab4e-12e1357851ac/ ## CentOS 7 binary RDO team decided to not provide binary Train packages for CentOS 7 [4]. This target needs to be replaced with CentOS 8 once it will be fully build and packages provided by RDO. 4. https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 ## CentOS 7 source This target will stay with Python 2.7 for now. Once CentOS 8 gets built we may move to it to get rid of py2. ## Debian binary Ignored for now. Would need to rebuild whole set of OpenStack packages from 'experimental' to 'buster'. ## Ubuntu binary Here we depend on UCA developers and will install whatever they use. # Kolla ansible Current version depends on Python 2. Typical "TypeError: cannot use a string pattern on a bytes-like object" issues need to be solved. From felix.huettner at mail.schwarz Tue Jun 4 08:38:36 2019 From: felix.huettner at mail.schwarz (=?utf-8?B?RmVsaXggSMO8dHRuZXI=?=) Date: Tue, 4 Jun 2019 08:38:36 +0000 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Gael, we had a similar issue in the past. You could check the octiava healthmanager log (should be on the same node where the worker is running). This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. But normally it should at least restart the LB with new Amphorae… Hope that helps Felix From: Gaël THEROND Sent: Tuesday, June 4, 2019 9:44 AM To: Openstack Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia Hi guys, I’ve a weird situation here. I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. Cheers guys! Hinweise zum Datenschutz finden Sie hier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Tue Jun 4 09:06:41 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 11:06:41 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Felix, « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ I well may have miss something in it, but I don't see something strange on from my point of view. Feel free to tell me if you spot something weird. Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : > Hi Gael, > > > > we had a similar issue in the past. > > You could check the octiava healthmanager log (should be on the same node > where the worker is running). > > This component monitors the status of the Amphorae and restarts them if > they don’t trigger a callback after a specific time. This might also happen > if there is some connection issue between the two components. > > > > But normally it should at least restart the LB with new Amphorae… > > > > Hope that helps > > > > Felix > > > > *From:* Gaël THEROND > *Sent:* Tuesday, June 4, 2019 9:44 AM > *To:* Openstack > *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > > > > Hi guys, > > > > I’ve a weird situation here. > > > > I smoothly operate a large scale multi-region Octavia service using the > default amphora driver which imply the use of nova instances as > loadbalancers. > > > > Everything is running really well and our customers (K8s and traditional > users) are really happy with the solution so far. > > > > However, yesterday one of those customers using the loadbalancer in front > of their ElasticSearch cluster poked me because this loadbalancer suddenly > passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer > available but yet the anchor/member/pool and listeners settings were still > existing. > > > > So I investigated and found out that the loadbalancer amphoras have been > destroyed by the octavia user. > > > > The weird part is, both the master and the backup instance have been > destroyed at the same moment by the octavia service user. > > > > Is there specific circumstances where the octavia service could decide to > delete the instances but not the anchor/members/pool ? > > > > It’s worrying me a bit as there is no clear way to trace why does Octavia > did take this action. > > > > I digged within the nova and Octavia DB in order to correlate the action > but except than validating my investigation it doesn’t really help as there > are no clue of why the octavia service did trigger the deletion. > > > > If someone have any clue or tips to give me I’ll be more than happy to > discuss this situation. > > > > Cheers guys! > Hinweise zum Datenschutz finden Sie hier . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Jun 4 10:23:57 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 4 Jun 2019 05:23:57 -0500 Subject: [nova] Spec review sprint Tuesday June 04 In-Reply-To: References: <52df6449-5d49-ee77-5309-90f2cd90283c@fried.cc> Message-ID: Reminder: This is happening. On 5/30/19 4:53 PM, Eric Fried wrote: > Here's a slightly tighter dashboard, filtering out specs with -W. 23 > total as of right now. > > https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+NOT+label:Workflow-1 > > On 5/30/19 10:47 AM, Eric Fried wrote: >> Hi all. We would like to do a nova-specs review push next Tuesday, June 4th. >> >> If you own one or more specs, please try to polish them and address any >> outstanding downvotes before Tuesday; and on Tuesday, please try to be >> available in #openstack-nova (or paying close attention to gerrit) to >> discuss them if needed. >> >> If you are a nova reviewer, contributor, or stakeholder, please try to >> spend a good chunk of your upstream time on Tuesday reviewing open Train >> specs [1]. >> >> Thanks, >> efried >> >> [1] Approximately: >> https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.* >> > From james.slagle at gmail.com Tue Jun 4 10:54:14 2019 From: james.slagle at gmail.com (James Slagle) Date: Tue, 4 Jun 2019 06:54:14 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 1:51 PM Kevin Carter wrote: > > On Mon, Jun 3, 2019 at 12:08 PM Mohammed Naser wrote: >> >> On Mon, Jun 3, 2019 at 12:55 PM Kevin Carter wrote: >> > >> > Hello Stackers, >> > >> > I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. >> > >> > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html >> > >> > The effort to convert tripleo Puppet and heat templates with embedded Ansible to a more consumable set of playbooks and roles is in full effect. As we're working through this effort we believe co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository will assist us in streamlining and simplifying. Structurally, at this time, most of tripleo will remain the same. However, the inclusion of tripleo-Ansible will allow us to create more focused solutions which are independently testable, much easier understand, and simple to include into the current heat template deployment methodologies. While a straight port of the existing Ansible tasks will not be entirely possible, the goal of this ongoing effort will be zero impact on our existing workflow and solutions. >> > >> > To reigniting this effort, I've put up a review to create a new "transformation" squad[0] geared toward building the structure around tripleo-ansible[1] and converting our current solutions into roles/playbooks/libraries/plugins. Initially, we'll be focused on our existing code base; however, long term, I believe it makes sense for this squad to work across projects to breakdown deployment barriers for folks using similar technologies. >> >> +1 >> >> > We're excited to get this effort rolling again and would love to work with anyone and everyone throughout the community. If folks are interested in this effort, please let us know. >> >> Most definitely in. We've had great success working with the TripleO team on >> integrating the Tempest role and on the OSA side, we'd be more than happy >> to help try and converge our roles to maintain them together. > > > ++ > >> >> If there's any meetings or anything that will be scheduled, I'd be >> happy to attend. >> > > its still very early but I expect to begin regular meetings (even if they're just impromptu IRC conversations to begin with) to work out what needs to be done and where we can begin collaborating with other folks. As soon as we have more I'll be sure to reach out here and on IRC. Organizing a squad and starting with IRC meetings sounds good to me, and I'll be participating in the work. Thanks for kicking off the conversation! -- -- James Slagle -- From smooney at redhat.com Tue Jun 4 10:59:40 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 11:59:40 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: > On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > > > Hi, > > > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > > > > > > > Hi, > > > > > > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in > > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something > > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > > which you _could_ kinda technically identify similar to plugins. > > > However, I think one of the things that would maybe come out of this > > > is the inability for projects to maintain their own plugins (because > > > now you can host neutron/devstack/plugins and you maintain that repo > > > yourself), under this structure, you would indeed have to make those > > > changes to the OpenStack Ansible Neutron role > > > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > > > However, I think from an OSA perspective, we would be more than happy > > > to add project maintainers for specific projects to their appropriate > > > roles. It would make sense that there is someone from the Neutron > > > team that could be a core on os_neutron from example. > > > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in > > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and > > install it together with everything else by simply adding one line (usually) in local.conf file. > > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or > > driver which isn’t official OpenStack project. > > You raise a really good concern. Indeed, we might have to change the workflow > from "write a plugin" to "write an Ansible role" to be able to test > your project with > DevStack at that page (or maintain both a "legacy" solution) with a new one. the real probalem with that is who is going to port all of the existing plugins. kolla-ansible has also tried to be a devstack replacement in the past via the introduction of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. the problem is it still breaks peoles plugins and workflow. some devstack feature that osa would need to support in order to be a replacement for me are. 1 the ablity to install all openstack project form git if needed including gerrit reviews. abiltiy to eailly specify gerrit reiews or commits for each project # here i am declaring the os-vif should be installed from git not pypi LIBS_FROM_GIT=os-vif # and here i am specifying that gerrit should be used as the source and # i am provide a gerrit/git refs branch for a specific un merged patch OS_VIF_REPO=https://git.openstack.org/openstack/os-vif OS_VIF_BRANCH=refs/changes/25/629025/9 # *_REPO can obvioulsy take anythign that is valid in a git clone command so # i can use a local repo too NEUTRON_REPO=file:///opt/repos/neutron # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. NEUTRON_BRANCH=bug/1788009 the next thing that would be needed is a way to simply override any config value like this [[post-config|/etc/nova/nova.conf]] #[compute] #live_migration_wait_for_vif_plug=True [libvirt] live_migration_uri = qemu+ssh://root@%s/system #cpu_mode = host-passthrough virt_type = kvm cpu_mode = custom cpu_model = kvm64 im sure that osa can do that but i really can just provide any path to any file if needed. so no need to update a role or plugin to set values in files created by plugins which is the next thing. we enable plugins with a single line like this enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices and devstack will clone and execute the plugins based on the single line above. plugins however can also read any varable defiend in the local.conf as it will be set in the environment which means i can easily share an exact configuration with someone by shareing a local.conf. im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack for all our testing in the gate it is actually has become one of the best openstack installer out there. we do not recommend people run it in production but with the ansible automation of grenade and the move to systemd for services there are less mainatined installers out there that devstack is proably a better foundation for a cloud to build on. people should still not use it in production but i can see why some might. > > > > > > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which > > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I > > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in > > > > mind. > > > > > > Indeed, with our current CI infrastructure with OSA, we have the > > > ability to create these dynamic scenarios (which can actually be > > > defined by a simple Zuul variable). > > > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > > > We do some really neat introspection of the project name being tested > > > in order to run specific scenarios. Therefore, that is something that > > > should be quite easy to accomplish simply by overriding a scenario > > > name within Zuul. It also is worth mentioning we now support full > > > metal deploys for a while now, so not having to worry about containers > > > is something to keep in mind as well (with simplifying the developer > > > experience again). > > > > > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > This is something that I've discussed with a few people over time and > > > > > I think I'd probably want to bring it up by now. I'd like to propose > > > > > and ask if it makes sense to perhaps replace devstack entirely with > > > > > openstack-ansible. I think I have quite a few compelling reasons to > > > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > > > be biased here, so call me out!) that OSA is the best option in terms > > > > > of a 'replacement' > > > > > > > > > > # Why not another deployment project? > > > > > I actually thought about this part too and considered this mainly for > > > > > ease of use for a *developer*. > > > > > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > > > (which means that it has no build infrastructure, a developer can't > > > > > just get $commit checked out and deployed). > > > > > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > > > beforehand, also, I feel they are much harder to use as a developer > > > > > because if you want to make quick edits and restart services, you have > > > > > to enter a container and make the edit there and somehow restart the > > > > > service without the container going back to it's original state. > > > > > Kolla-Ansible and the other combinations also suffer from the same > > > > > "issue". > > > > > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > > > a virtualenv and installs packages inside of it. The services are > > > > > deployed as systemd units. This is very much similar to the current > > > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > > > It makes it pretty straight forward to go and edit code if you > > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > > > it much more easy to deploy on a wider variety of operating systems. > > > > > It also has the ability to use commits checked out from Zuul so all > > > > > the fancy Depends-On stuff we use works. > > > > > > > > > > # Why do we care about this, I like my bash scripts! > > > > > As someone who's been around for a *really* long time in OpenStack, > > > > > I've seen a whole lot of really weird issues surface from the usage of > > > > > DevStack to do CI gating. For example, one of the recent things is > > > > > the fact it relies on installing package-shipped noVNC, where as the > > > > > 'master' noVNC has actually changed behavior a few months back and it > > > > > is completely incompatible at this point (it's just a ticking thing > > > > > until we realize we're entirely broken). > > > > > > > > > > To this day, I still see people who want to POC something up with > > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > > > how many warnings we'll put up, they'll always try to do it. With > > > > > this way, at least they'll have something that has the shape of an > > > > > actual real deployment. In addition, it would be *good* in the > > > > > overall scheme of things for a deployment system to test against, > > > > > because this would make sure things don't break in both ways. > > > > > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > > > the playbooks directly from the executor. > > > > > > > > > > # So how could we do this? > > > > > The OpenStack Ansible project is made of many roles that are all > > > > > composable, therefore, you can think of it as a combination of both > > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > > > integration of all of it in a distribution. OSA is currently both, > > > > > but it also includes both Ansible roles and playbooks. > > > > > > > > > > In order to make sure we maintain as much of backwards compatibility > > > > > as possible, we can simply run a small script which does a mapping of > > > > > devstack => OSA variables to make sure that the service is shipped > > > > > with all the necessary features as per local.conf. > > > > > > > > > > So the new process could be: > > > > > > > > > > 1) parse local.conf and generate Ansible variables files > > > > > 2) install Ansible (if not running in gate) > > > > > 3) run playbooks using variable generated in #1 > > > > > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > > > together, involving both the QA team and OSA team together, which I > > > > > believe that pooling our resources will greatly help in being able to > > > > > get more done and avoiding duplicating our efforts. > > > > > > > > > > # Conclusion > > > > > This is a start of a very open ended discussion, I'm sure there is a > > > > > lot of details involved here in the implementation that will surface, > > > > > but I think it could be a good step overall in simplifying our CI and > > > > > adding more coverage for real potential deployers. It will help two > > > > > teams unite together and have more resources for something (that > > > > > essentially is somewhat of duplicated effort at the moment). > > > > > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > > > by an OSA role instead of Bash, placement which seems like a very > > > > > simple one and share that eventually. > > > > > > > > > > Thoughts? :) > > > > > > > > > > -- > > > > > Mohammed Naser — vexxhost > > > > > ----------------------------------------------------- > > > > > D. 514-316-8872 > > > > > D. 800-910-1726 ext. 200 > > > > > E. mnaser at vexxhost.com > > > > > W. http://vexxhost.com > > > > > > > > > > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > > > > -- > > > Mohammed Naser — vexxhost > > > ----------------------------------------------------- > > > D. 514-316-8872 > > > D. 800-910-1726 ext. 200 > > > E. mnaser at vexxhost.com > > > W. http://vexxhost.com > > > > — > > Slawek Kaplonski > > Senior software engineer > > Red Hat > > > > From smooney at redhat.com Tue Jun 4 11:13:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 12:13:31 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> Message-ID: <85d1e5ef4070ad3b24a910adc12cf61308e85088.camel@redhat.com> On Tue, 2019-06-04 at 08:56 +0100, Sorin Sbarnea wrote: > I am in favour of ditching or at least refactoring devstack because during the last year I often found myself blocked > from fixing some zuul/jobs issues because the buggy code was still required by legacy devstack jobs that nobody had > time maintain or fix, so they were isolated and the default job configurations were forced to use dirty hack needed > for keeping these working. this sound like the issue is more realted to the fact that it is still useing a legacy job. why not move it over to the ansible native devstack jobs. > > One such example is that there is a task that does a "chmod -R 0777 -R" on the entire source tree, a total security > threat. in a ci env it is not. and in a development env if it was in devstack gate or in the ansible jobs it is not. i would not want this in a production system but it feels a little contived. > > In order to make other jobs running correctly* I had to rely undoing the damage done by such chmod because I was not > able to disable the historical hack. > > * ansible throws warning with unsafe file permissions > * ssh refuses to load unsafe keys > > That is why I am in favor of dropping features that are slowing down the progress of others. that is a self contracdicting statement. if i depend on a feature then droping it slows donw my progress. e.g. if you state that as a goal you will find you will almost always fail as to speed someone up you slow someone else down. what you want to aim for is a better solution that supports both usecase in a clean and defiend way. > > I know that the reality is more complicated but I also think that sometimes less* is more. > > > * deployment projects ;) > > > On 4 Jun 2019, at 04:36, Dean Troyer wrote: > > > > > > > > On Mon, 3 Jun 2019, 15:59 Clark Boylan, > wrote: > > On Sat, Jun 1, 2019, at 5:36 AM, Mohammed Naser wrote: > > > Hi everyone, > > > > > > This is something that I've discussed with a few people over time and > > > I think I'd probably want to bring it up by now. I'd like to propose > > > and ask if it makes sense to perhaps replace devstack entirely with > > > openstack-ansible. I think I have quite a few compelling reasons to > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > be biased here, so call me out!) that OSA is the best option in terms > > > of a 'replacement' > > > > > > # Why not another deployment project? > > > I actually thought about this part too and considered this mainly for > > > ease of use for a *developer*. > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > (which means that it has no build infrastructure, a developer can't > > > just get $commit checked out and deployed). > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > beforehand, also, I feel they are much harder to use as a developer > > > because if you want to make quick edits and restart services, you have > > > to enter a container and make the edit there and somehow restart the > > > service without the container going back to it's original state. > > > Kolla-Ansible and the other combinations also suffer from the same > > > "issue". > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > a virtualenv and installs packages inside of it. The services are > > > deployed as systemd units. This is very much similar to the current > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > It makes it pretty straight forward to go and edit code if you > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > it much more easy to deploy on a wider variety of operating systems. > > > It also has the ability to use commits checked out from Zuul so all > > > the fancy Depends-On stuff we use works. > > > > > > # Why do we care about this, I like my bash scripts! > > > As someone who's been around for a *really* long time in OpenStack, > > > I've seen a whole lot of really weird issues surface from the usage of > > > DevStack to do CI gating. For example, one of the recent things is > > > the fact it relies on installing package-shipped noVNC, where as the > > > 'master' noVNC has actually changed behavior a few months back and it > > > is completely incompatible at this point (it's just a ticking thing > > > until we realize we're entirely broken). > > > > I'm not sure this is a great example case. We consume prebuilt software for many of our dependencies. Everything > > from the kernel to the database to rabbitmq to ovs (and so on) are consumed as prebuilt packages from our distros. > > In many cases this is desirable to ensure that our software work with the other software out there in the wild that > > people will be deploying with. > > > > > > > > To this day, I still see people who want to POC something up with > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > how many warnings we'll put up, they'll always try to do it. With > > > this way, at least they'll have something that has the shape of an > > > actual real deployment. In addition, it would be *good* in the > > > overall scheme of things for a deployment system to test against, > > > because this would make sure things don't break in both ways. > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > the playbooks directly from the executor. > > > > I think if you have developers running a small wrapper locally to deploy this new development stack you should run > > that same wrapper in CI. This ensure the wrapper doesn't break. > > > > > > > > # So how could we do this? > > > The OpenStack Ansible project is made of many roles that are all > > > composable, therefore, you can think of it as a combination of both > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > integration of all of it in a distribution. OSA is currently both, > > > but it also includes both Ansible roles and playbooks. > > > > > > In order to make sure we maintain as much of backwards compatibility > > > as possible, we can simply run a small script which does a mapping of > > > devstack => OSA variables to make sure that the service is shipped > > > with all the necessary features as per local.conf. > > > > > > So the new process could be: > > > > > > 1) parse local.conf and generate Ansible variables files > > > 2) install Ansible (if not running in gate) > > > 3) run playbooks using variable generated in #1 > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > together, involving both the QA team and OSA team together, which I > > > believe that pooling our resources will greatly help in being able to > > > get more done and avoiding duplicating our efforts. > > > > > > # Conclusion > > > This is a start of a very open ended discussion, I'm sure there is a > > > lot of details involved here in the implementation that will surface, > > > but I think it could be a good step overall in simplifying our CI and > > > adding more coverage for real potential deployers. It will help two > > > teams unite together and have more resources for something (that > > > essentially is somewhat of duplicated effort at the moment). > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > by an OSA role instead of Bash, placement which seems like a very > > > simple one and share that eventually. > > > > > > Thoughts? :) > > > > For me there are two major items to consider that haven't been brought up yet. The first is devstack's (lack of) > > speed. Any replacement should be at least as quick as the current tooling because the current tooling is slow enough > > already. > > > > This is important. We would need to see benchmark comparisons between a devstack install and an OSA install. Shell > > may be slow but Ansible is generally slower. That's fine in production when reliability is king, but we need fast > > iteration for development. > > > > I haven't looked under the covers of devstack for some time, but it previously installed all python deps in one > > place, whereas OSA has virtualenvs for each service which could take a while to build. Perhaps this is configurable. > > > > The other is logging. I spend a lot of time helping people to debug CI job runs and devstack has grown a fairly > > effective set of logging that just about any time I have to help debug another deployment tool's CI jobs I miss > > (because they tend to log only a tiny fraction of what devstack logs). > > > > Clark > > From anlin.kong at gmail.com Tue Jun 4 11:38:27 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Tue, 4 Jun 2019 23:38:27 +1200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Gaël, We also met with the issue before which happened during the failover process, but I'm not sure your situation is the same with us. I just paste my previous investigation here, hope that will help. "With the Octavia version we have deployed in the production, the amphora record in the `amphora_health` table is deleted at the beginning of the failover process in order to disable the amphora health monitoring, while the amphora record in `amphora` table is marked as DELETED. On the other hand, the octavia-housekeeper service will delete the amphora record in `amphora` table if it doesn’t find its related record in `amphora_health` table which is always true during the current failover process. As a result, if the failover process fails, there will be no amphora records relating to the load balancer in the database." This patch is here https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701, unfortunately, it has not been backported to Rocky. Best regards, Lingxian Kong Catalyst Cloud On Tue, Jun 4, 2019 at 9:13 PM Gaël THEROND wrote: > Hi Felix, > > « Glad » you had the same issue before, and yes of course I looked at the > HM logs which is were I actually found out that this event was triggered > by octavia (Beside the DB data that validated that) here is my log trace > related to this event, It doesn't really shows major issue IMHO. > > Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > > http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > > I well may have miss something in it, but I don't see something strange on > from my point of view. > Feel free to tell me if you spot something weird. > > > Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> Hi Gael, >> >> >> >> we had a similar issue in the past. >> >> You could check the octiava healthmanager log (should be on the same node >> where the worker is running). >> >> This component monitors the status of the Amphorae and restarts them if >> they don’t trigger a callback after a specific time. This might also happen >> if there is some connection issue between the two components. >> >> >> >> But normally it should at least restart the LB with new Amphorae… >> >> >> >> Hope that helps >> >> >> >> Felix >> >> >> >> *From:* Gaël THEROND >> *Sent:* Tuesday, June 4, 2019 9:44 AM >> *To:* Openstack >> *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >> >> >> Hi guys, >> >> >> >> I’ve a weird situation here. >> >> >> >> I smoothly operate a large scale multi-region Octavia service using the >> default amphora driver which imply the use of nova instances as >> loadbalancers. >> >> >> >> Everything is running really well and our customers (K8s and traditional >> users) are really happy with the solution so far. >> >> >> >> However, yesterday one of those customers using the loadbalancer in front >> of their ElasticSearch cluster poked me because this loadbalancer suddenly >> passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer >> available but yet the anchor/member/pool and listeners settings were still >> existing. >> >> >> >> So I investigated and found out that the loadbalancer amphoras have been >> destroyed by the octavia user. >> >> >> >> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >> >> >> Is there specific circumstances where the octavia service could decide to >> delete the instances but not the anchor/members/pool ? >> >> >> >> It’s worrying me a bit as there is no clear way to trace why does Octavia >> did take this action. >> >> >> >> I digged within the nova and Octavia DB in order to correlate the action >> but except than validating my investigation it doesn’t really help as there >> are no clue of why the octavia service did trigger the deletion. >> >> >> >> If someone have any clue or tips to give me I’ll be more than happy to >> discuss this situation. >> >> >> >> Cheers guys! >> Hinweise zum Datenschutz finden Sie hier >> . >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Tue Jun 4 11:43:38 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Tue, 4 Jun 2019 13:43:38 +0200 Subject: [neutron] QoS meeting 4.06.2019 cancelled Message-ID: <0F4ED17F-D081-4ED5-9AEE-4DC6B7EBD3E8@redhat.com> Hi, Both me and Rodolfo can’t chair today’s QoS meeting so lets cancel it. See You on next meeting in 2 weeks. — Slawek Kaplonski Senior software engineer Red Hat From edmondsw at us.ibm.com Tue Jun 4 12:00:35 2019 From: edmondsw at us.ibm.com (William M Edmonds - edmondsw@us.ibm.com) Date: Tue, 4 Jun 2019 12:00:35 +0000 Subject: [openstack-ansible][powervm] dropping support In-Reply-To: References: Message-ID: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> On 5/31/19, 6:46 PM, "Mohammed Naser" wrote: > > Hi everyone, > > I've pushed up a patch to propose dropping support for PowerVM support > inside OpenStack Ansible. There has been no work done on this for a > few years now, the configured compute driver is the incorrect one for > ~2 years now which indicates that no one has been able to use it for > that long. > > It would be nice to have this driver however given the infrastructure > we have upstream, there would be no way for us to effectively test it > and bring it back to functional state. I'm proposing that we remove > the code here: > https://review.opendev.org/#/c/662587 powervm: drop support > > If you're using this code and would like to contribute to fixing it > and (somehow) adding coverage, please reach out, otherwise, we'll drop > this code to clean things up. Sadly, I don't know of anyone using it or willing to maintain it at this time. From doug at doughellmann.com Tue Jun 4 12:39:44 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 08:39:44 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Sean Mooney writes: > On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: >> On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: >> > >> > Hi, >> > >> > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: >> > > >> > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >> > > > >> > > > Hi, >> > > > >> > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in >> > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something >> > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? >> > > >> > > Not a dumb question at all. So, we do have this concept of 'roles' >> > > which you _could_ kinda technically identify similar to plugins. >> > > However, I think one of the things that would maybe come out of this >> > > is the inability for projects to maintain their own plugins (because >> > > now you can host neutron/devstack/plugins and you maintain that repo >> > > yourself), under this structure, you would indeed have to make those >> > > changes to the OpenStack Ansible Neutron role >> > > >> > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron >> > > >> > > However, I think from an OSA perspective, we would be more than happy >> > > to add project maintainers for specific projects to their appropriate >> > > roles. It would make sense that there is someone from the Neutron >> > > team that could be a core on os_neutron from example. >> > >> > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in >> > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and >> > install it together with everything else by simply adding one line (usually) in local.conf file. >> > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or >> > driver which isn’t official OpenStack project. >> >> You raise a really good concern. Indeed, we might have to change the workflow >> from "write a plugin" to "write an Ansible role" to be able to test >> your project with >> DevStack at that page (or maintain both a "legacy" solution) with a new one. > the real probalem with that is who is going to port all of the > existing plugins. Do all projects and all jobs have to be converted at once? Or ever? How much complexity do those plugins actually contain? Would they be fairly straightforward to convert? Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and then run devstack with just the plugin(s) needed for a given job? Is there enough appeal in the idea of replacing devstack with something closer to what is used for production deployments to drive us to find an iterative approach that doesn't require changing everything at one time? Or are we stuck with devstack forever? > kolla-ansible has also tried to be a devstack replacement in the past via the introduction > of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. > the problem is it still breaks peoles plugins and workflow. > > > some devstack feature that osa would need to support in order to be a > replacement for me are. You've made a good start on a requirements list for a devstack replacement. Perhaps a first step would be for some of the folks who support this idea to compile a more complete list of those requirements, and then we could analyze OSA to see how it might need to be changed or whether it makes sense to use OSA as the basis for a new toolset that takes on some of the "dev" features we might not want in a "production" deployment tool. Here's another potential gap for whoever is going to make that list: devstack pre-populates the environment with some data for things like flavors and images. I don't imagine OSA does that or, if it does, that they are an exact match. How do we change those settings? That leads to a good second step: Do the rest of the analysis to understand what it would take to set up a base job like we have for devstack, that produces a similar setup. Not necessarily identical, but similar enough to be able to run tempest. It seems likely that already exists in some form for testing OSA itself. Could a developer run that on a local system (clearly being able to build the test environment locally is a requirement for replacing devstack)? After that, I would want to see answers to some of the questions about dealing with plugins that I posed above. And only then, I think, could I provide an answer to the question of whether we should make the change or not. > 1 the ablity to install all openstack project form git if needed including gerrit reviews. > > abiltiy to eailly specify gerrit reiews or commits for each project > > # here i am declaring the os-vif should be installed from git not pypi > LIBS_FROM_GIT=os-vif > > # and here i am specifying that gerrit should be used as the source and > # i am provide a gerrit/git refs branch for a specific un merged patch > OS_VIF_REPO=https://git.openstack.org/openstack/os-vif > OS_VIF_BRANCH=refs/changes/25/629025/9 > > # *_REPO can obvioulsy take anythign that is valid in a git clone command so > # i can use a local repo too > NEUTRON_REPO=file:///opt/repos/neutron > # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. > NEUTRON_BRANCH=bug/1788009 > > > the next thing that would be needed is a way to simply override any config value like this > > [[post-config|/etc/nova/nova.conf]] > #[compute] > #live_migration_wait_for_vif_plug=True > [libvirt] > live_migration_uri = qemu+ssh://root@%s/system > #cpu_mode = host-passthrough > virt_type = kvm > cpu_mode = custom > cpu_model = kvm64 > > im sure that osa can do that but i really can just provide any path to any file if needed. > so no need to update a role or plugin to set values in files created > by plugins which is the next thing. Does OSA need to support *every* configuration value? Or could it deploy a stack, and then rely on a separate tool to modify config values and restart a service? Clearly some values need to be there when the cloud first starts, but do they all? > we enable plugins with a single line like this > > enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master > > meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices > and devstack will clone and execute the plugins based on the single > line above. plugins however can also This makes me think it might be most appropriate to be considering a tool that replaces devstack by wrapping OSA, rather than *being* OSA. Maybe that's just an extra playbook that runs before OSA, or maybe it's a simpler bash script that does some setup before invoking OSA. > read any varable defiend in the local.conf as it will be set in the environment which means i can easily share > an exact configuration with someone by shareing a local.conf. > > > im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack > for all our testing in the gate it is actually has become one of the best openstack installer out there. we do > not recommend people run it in production but with the ansible automation of grenade and the move to systemd for > services there are less mainatined installers out there that devstack is proably a better foundation for a cloud > to build on. people should still not use it in production but i can see why some might. > >> >> > > >> > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which >> > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I >> > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in >> > > > mind. >> > > >> > > Indeed, with our current CI infrastructure with OSA, we have the >> > > ability to create these dynamic scenarios (which can actually be >> > > defined by a simple Zuul variable). >> > > >> > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 >> > > >> > > We do some really neat introspection of the project name being tested >> > > in order to run specific scenarios. Therefore, that is something that >> > > should be quite easy to accomplish simply by overriding a scenario >> > > name within Zuul. It also is worth mentioning we now support full >> > > metal deploys for a while now, so not having to worry about containers >> > > is something to keep in mind as well (with simplifying the developer >> > > experience again). >> > > >> > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >> > > > > >> > > > > Hi everyone, >> > > > > >> > > > > This is something that I've discussed with a few people over time and >> > > > > I think I'd probably want to bring it up by now. I'd like to propose >> > > > > and ask if it makes sense to perhaps replace devstack entirely with >> > > > > openstack-ansible. I think I have quite a few compelling reasons to >> > > > > do this that I'd like to outline, as well as why I *feel* (and I could >> > > > > be biased here, so call me out!) that OSA is the best option in terms >> > > > > of a 'replacement' >> > > > > >> > > > > # Why not another deployment project? >> > > > > I actually thought about this part too and considered this mainly for >> > > > > ease of use for a *developer*. >> > > > > >> > > > > At this point, Puppet-OpenStack pretty much only deploys packages >> > > > > (which means that it has no build infrastructure, a developer can't >> > > > > just get $commit checked out and deployed). >> > > > > >> > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built >> > > > > beforehand, also, I feel they are much harder to use as a developer >> > > > > because if you want to make quick edits and restart services, you have >> > > > > to enter a container and make the edit there and somehow restart the >> > > > > service without the container going back to it's original state. >> > > > > Kolla-Ansible and the other combinations also suffer from the same >> > > > > "issue". >> > > > > >> > > > > OpenStack Ansible is unique in the way that it pretty much just builds >> > > > > a virtualenv and installs packages inside of it. The services are >> > > > > deployed as systemd units. This is very much similar to the current >> > > > > state of devstack at the moment (minus the virtualenv part, afaik). >> > > > > It makes it pretty straight forward to go and edit code if you >> > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and >> > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make >> > > > > it much more easy to deploy on a wider variety of operating systems. >> > > > > It also has the ability to use commits checked out from Zuul so all >> > > > > the fancy Depends-On stuff we use works. >> > > > > >> > > > > # Why do we care about this, I like my bash scripts! >> > > > > As someone who's been around for a *really* long time in OpenStack, >> > > > > I've seen a whole lot of really weird issues surface from the usage of >> > > > > DevStack to do CI gating. For example, one of the recent things is >> > > > > the fact it relies on installing package-shipped noVNC, where as the >> > > > > 'master' noVNC has actually changed behavior a few months back and it >> > > > > is completely incompatible at this point (it's just a ticking thing >> > > > > until we realize we're entirely broken). >> > > > > >> > > > > To this day, I still see people who want to POC something up with >> > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >> > > > > how many warnings we'll put up, they'll always try to do it. With >> > > > > this way, at least they'll have something that has the shape of an >> > > > > actual real deployment. In addition, it would be *good* in the >> > > > > overall scheme of things for a deployment system to test against, >> > > > > because this would make sure things don't break in both ways. >> > > > > >> > > > > Also: we run Zuul for our CI which supports Ansible natively, this can >> > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run >> > > > > the playbooks directly from the executor. >> > > > > >> > > > > # So how could we do this? >> > > > > The OpenStack Ansible project is made of many roles that are all >> > > > > composable, therefore, you can think of it as a combination of both >> > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >> > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the >> > > > > integration of all of it in a distribution. OSA is currently both, >> > > > > but it also includes both Ansible roles and playbooks. >> > > > > >> > > > > In order to make sure we maintain as much of backwards compatibility >> > > > > as possible, we can simply run a small script which does a mapping of >> > > > > devstack => OSA variables to make sure that the service is shipped >> > > > > with all the necessary features as per local.conf. >> > > > > >> > > > > So the new process could be: >> > > > > >> > > > > 1) parse local.conf and generate Ansible variables files >> > > > > 2) install Ansible (if not running in gate) >> > > > > 3) run playbooks using variable generated in #1 >> > > > > >> > > > > The neat thing is after all of this, devstack just becomes a thin >> > > > > wrapper around Ansible roles. I also think it brings a lot of hands >> > > > > together, involving both the QA team and OSA team together, which I >> > > > > believe that pooling our resources will greatly help in being able to >> > > > > get more done and avoiding duplicating our efforts. >> > > > > >> > > > > # Conclusion >> > > > > This is a start of a very open ended discussion, I'm sure there is a >> > > > > lot of details involved here in the implementation that will surface, >> > > > > but I think it could be a good step overall in simplifying our CI and >> > > > > adding more coverage for real potential deployers. It will help two >> > > > > teams unite together and have more resources for something (that >> > > > > essentially is somewhat of duplicated effort at the moment). >> > > > > >> > > > > I will try to pick up sometime to POC a simple service being deployed >> > > > > by an OSA role instead of Bash, placement which seems like a very >> > > > > simple one and share that eventually. >> > > > > >> > > > > Thoughts? :) >> > > > > >> > > > > -- >> > > > > Mohammed Naser — vexxhost >> > > > > ----------------------------------------------------- >> > > > > D. 514-316-8872 >> > > > > D. 800-910-1726 ext. 200 >> > > > > E. mnaser at vexxhost.com >> > > > > W. http://vexxhost.com >> > > > > >> > > > >> > > > — >> > > > Slawek Kaplonski >> > > > Senior software engineer >> > > > Red Hat >> > > > >> > > >> > > >> > > -- >> > > Mohammed Naser — vexxhost >> > > ----------------------------------------------------- >> > > D. 514-316-8872 >> > > D. 800-910-1726 ext. 200 >> > > E. mnaser at vexxhost.com >> > > W. http://vexxhost.com >> > >> > — >> > Slawek Kaplonski >> > Senior software engineer >> > Red Hat >> > >> >> > > -- Doug From dmendiza at redhat.com Tue Jun 4 12:45:36 2019 From: dmendiza at redhat.com (=?UTF-8?Q?Douglas_Mendiz=c3=a1bal?=) Date: Tue, 4 Jun 2019 07:45:36 -0500 Subject: [nova][cinder][glance][Barbican]Finding Timeslot for weekly Image Encryption IRC meeting In-Reply-To: <798dc164-1ed3-10f3-6de2-e902ae269869@secustack.com> References: <798dc164-1ed3-10f3-6de2-e902ae269869@secustack.com> Message-ID: <6dd8e9e3-6959-01c7-46ac-9f3df472b973@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Josephine, Thank you for organizing this. Do you still need more responses before scheduling the first meeting? It appears that Monday 1300 UTC is the best time slot for everyone so far. On a related note, we have the Secret Consumers spec up for review for Barbican: https://review.opendev.org/#/c/662013/ Regards, - - Douglas Mendizábal (redrobot) On 5/13/19 7:19 AM, Josephine Seifert wrote: > Just re-raising this :) > > Please vote, if you would like to participate: > https://doodle.com/poll/wtg9ha3e5dvym6yt > > Am 04.05.19 um 20:57 schrieb Josephine Seifert: >> Hello, >> >> as a result from the Summit and the PTG, I would like to hold a >> weekly IRC-meeting for the Image Encryption (soon to be a pop-up >> team). >> >> As I work in Europe I have made a doodle poll, with timeslots I >> can attend and hopefully many of you. If you would like to join >> in a weekly meeting, please fill out the poll and state your name >> and the project you are working in: >> https://doodle.com/poll/wtg9ha3e5dvym6yt >> >> Thank you Josephine (Luzi) >> >> >> > -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEwcapj5oGTj2zd3XogB6WFOq/OrcFAlz2Z7kACgkQgB6WFOq/ OreV5Q//Qf/kS4fBFJjzm2wK6NBeXPbdiLU+7RkKTE4RIePl6VmDnKnQbyIWSSMG oP7Ey9zIE52wdsSYNQwZ7eJfk2WQ3NojEtNoPkQspwMl66qlVWKCikBcJwyDFNWo qo5oD8fTQHm9EmUfGn0npYxyPaBRDiPAFJ4I9MakT6Vx5ChgXj9PStzdRZIOevfQ ezaT+j1ZziheTg6LClSxPE4jeOjTiTU4CupmDf70mqv6PRkq/1J82Nz9ZoLPgod0 lX8EJ15LXGnfUykP/GXZ56rVhkHxYSkK3TiQ26g/b90X3NBUVVAn2VdUhrEwnWXd i7U0lKFq7NMa6dlnU3g6VCQIT+oC7Hx173io+Bx6UjTrYPXur3cgApfLBufLM94S mvVWwcwXz7izf30fxZxa8E9cu1ZigILyp90UNGHLAPX0oNSdOrelnYmdhRoVv90+ IlfojnPG/GjCqAbimcMLL0wRRK946j8S/naa+32fTPTUrz/L/poCdi4x3gJzUQ9f x4Au96O1IoWEWChKsUID6su6kVfHfKH0U+6UuneDiYE3DBDdy+vUJlM6etoAes17 5fTgmk8tNPMbcmgZ9ajmh5iwZuooc+FSOgnE5cZt4U6UyY2k1Sr0n878sPgeiMH+ nHavQ8EEEGQ9jf+PDlyOc6yawrm28nyFGMyvH8LjLyN/7lZJwGI= =SWA4 -----END PGP SIGNATURE----- From gael.therond at gmail.com Tue Jun 4 13:03:20 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 15:03:20 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi Lingxian Kong, That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! Is there a reason why it didn’t was backported to rocky? Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : > Hi Felix, > > « Glad » you had the same issue before, and yes of course I looked at the > HM logs which is were I actually found out that this event was triggered > by octavia (Beside the DB data that validated that) here is my log trace > related to this event, It doesn't really shows major issue IMHO. > > Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > > http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > > I well may have miss something in it, but I don't see something strange on > from my point of view. > Feel free to tell me if you spot something weird. > > > Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> Hi Gael, >> >> >> >> we had a similar issue in the past. >> >> You could check the octiava healthmanager log (should be on the same node >> where the worker is running). >> >> This component monitors the status of the Amphorae and restarts them if >> they don’t trigger a callback after a specific time. This might also happen >> if there is some connection issue between the two components. >> >> >> >> But normally it should at least restart the LB with new Amphorae… >> >> >> >> Hope that helps >> >> >> >> Felix >> >> >> >> *From:* Gaël THEROND >> *Sent:* Tuesday, June 4, 2019 9:44 AM >> *To:* Openstack >> *Subject:* [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >> >> >> Hi guys, >> >> >> >> I’ve a weird situation here. >> >> >> >> I smoothly operate a large scale multi-region Octavia service using the >> default amphora driver which imply the use of nova instances as >> loadbalancers. >> >> >> >> Everything is running really well and our customers (K8s and traditional >> users) are really happy with the solution so far. >> >> >> >> However, yesterday one of those customers using the loadbalancer in front >> of their ElasticSearch cluster poked me because this loadbalancer suddenly >> passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer >> available but yet the anchor/member/pool and listeners settings were still >> existing. >> >> >> >> So I investigated and found out that the loadbalancer amphoras have been >> destroyed by the octavia user. >> >> >> >> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >> >> >> Is there specific circumstances where the octavia service could decide to >> delete the instances but not the anchor/members/pool ? >> >> >> >> It’s worrying me a bit as there is no clear way to trace why does Octavia >> did take this action. >> >> >> >> I digged within the nova and Octavia DB in order to correlate the action >> but except than validating my investigation it doesn’t really help as there >> are no clue of why the octavia service did trigger the deletion. >> >> >> >> If someone have any clue or tips to give me I’ll be more than happy to >> discuss this situation. >> >> >> >> Cheers guys! >> Hinweise zum Datenschutz finden Sie hier >> . >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue Jun 4 13:12:09 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 4 Jun 2019 09:12:09 -0400 Subject: [ops] no openstack ops meetups team meeting today Message-ID: We're skipping the meetups team meeting on IRC today - currently in good shape and some of us are busy with other things. Will try to have both a meeting and some progress to report next week. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgoncalves at redhat.com Tue Jun 4 13:16:47 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 4 Jun 2019 15:16:47 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: > > Hi Lingxian Kong, > > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! > > Is there a reason why it didn’t was backported to rocky? The patch was merged in master branch during Rocky development cycle, hence included in stable/rocky as well. > > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. > > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >> >> Hi Felix, >> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >> I well may have miss something in it, but I don't see something strange on from my point of view. >> Feel free to tell me if you spot something weird. >> >> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >>> >>> Hi Gael, >>> >>> >>> >>> we had a similar issue in the past. >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >>> >>> >>> >>> But normally it should at least restart the LB with new Amphorae… >>> >>> >>> >>> Hope that helps >>> >>> >>> >>> Felix >>> >>> >>> >>> From: Gaël THEROND >>> Sent: Tuesday, June 4, 2019 9:44 AM >>> To: Openstack >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >>> >>> >>> >>> Hi guys, >>> >>> >>> >>> I’ve a weird situation here. >>> >>> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >>> >>> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >>> >>> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >>> >>> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >>> >>> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >>> >>> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >>> >>> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >>> >>> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >>> >>> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >>> >>> >>> >>> Cheers guys! >>> >>> Hinweise zum Datenschutz finden Sie hier. From gael.therond at gmail.com Tue Jun 4 13:19:58 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 4 Jun 2019 15:19:58 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. You guys rocks!! (Pun intended ;-)). Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : > On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > > > > Hi Lingxian Kong, > > > > That’s actually very interesting as I’ve come to the same conclusion > this morning during my investigation and was starting to think about a fix, > which it seems you already made! > > > > Is there a reason why it didn’t was backported to rocky? > > The patch was merged in master branch during Rocky development cycle, > hence included in stable/rocky as well. > > > > > Very helpful, many many thanks to you you clearly spare me hours of > works! I’ll get a review of your patch and test it on our lab. > > > > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a > écrit : > >> > >> Hi Felix, > >> > >> « Glad » you had the same issue before, and yes of course I looked at > the HM logs which is were I actually found out that this event was > triggered by octavia (Beside the DB data that validated that) here is my > log trace related to this event, It doesn't really shows major issue IMHO. > >> > >> Here is the stacktrace that our octavia service archived for our both > controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >> > >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >> > >> I well may have miss something in it, but I don't see something strange > on from my point of view. > >> Feel free to tell me if you spot something weird. > >> > >> > >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >>> > >>> Hi Gael, > >>> > >>> > >>> > >>> we had a similar issue in the past. > >>> > >>> You could check the octiava healthmanager log (should be on the same > node where the worker is running). > >>> > >>> This component monitors the status of the Amphorae and restarts them > if they don’t trigger a callback after a specific time. This might also > happen if there is some connection issue between the two components. > >>> > >>> > >>> > >>> But normally it should at least restart the LB with new Amphorae… > >>> > >>> > >>> > >>> Hope that helps > >>> > >>> > >>> > >>> Felix > >>> > >>> > >>> > >>> From: Gaël THEROND > >>> Sent: Tuesday, June 4, 2019 9:44 AM > >>> To: Openstack > >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > >>> > >>> > >>> > >>> Hi guys, > >>> > >>> > >>> > >>> I’ve a weird situation here. > >>> > >>> > >>> > >>> I smoothly operate a large scale multi-region Octavia service using > the default amphora driver which imply the use of nova instances as > loadbalancers. > >>> > >>> > >>> > >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >>> > >>> > >>> > >>> However, yesterday one of those customers using the loadbalancer in > front of their ElasticSearch cluster poked me because this loadbalancer > suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were > no longer available but yet the anchor/member/pool and listeners settings > were still existing. > >>> > >>> > >>> > >>> So I investigated and found out that the loadbalancer amphoras have > been destroyed by the octavia user. > >>> > >>> > >>> > >>> The weird part is, both the master and the backup instance have been > destroyed at the same moment by the octavia service user. > >>> > >>> > >>> > >>> Is there specific circumstances where the octavia service could decide > to delete the instances but not the anchor/members/pool ? > >>> > >>> > >>> > >>> It’s worrying me a bit as there is no clear way to trace why does > Octavia did take this action. > >>> > >>> > >>> > >>> I digged within the nova and Octavia DB in order to correlate the > action but except than validating my investigation it doesn’t really help > as there are no clue of why the octavia service did trigger the deletion. > >>> > >>> > >>> > >>> If someone have any clue or tips to give me I’ll be more than happy to > discuss this situation. > >>> > >>> > >>> > >>> Cheers guys! > >>> > >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jun 4 14:02:21 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 15:02:21 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> On Tue, 2019-06-04 at 08:39 -0400, Doug Hellmann wrote: > Sean Mooney writes: > > > On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: > > > On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: > > > > > > > > Hi, > > > > > > > > > On 1 Jun 2019, at 20:49, Mohammed Naser wrote: > > > > > > > > > > On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it > > > > > > in > > > > > > separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something > > > > > > similar possible with OSA or will it be needed to contribute always every change to OSA repository? > > > > > > > > > > Not a dumb question at all. So, we do have this concept of 'roles' > > > > > which you _could_ kinda technically identify similar to plugins. > > > > > However, I think one of the things that would maybe come out of this > > > > > is the inability for projects to maintain their own plugins (because > > > > > now you can host neutron/devstack/plugins and you maintain that repo > > > > > yourself), under this structure, you would indeed have to make those > > > > > changes to the OpenStack Ansible Neutron role > > > > > > > > > > i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron > > > > > > > > > > However, I think from an OSA perspective, we would be more than happy > > > > > to add project maintainers for specific projects to their appropriate > > > > > roles. It would make sense that there is someone from the Neutron > > > > > team that could be a core on os_neutron from example. > > > > > > > > Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now > > > > in > > > > opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and > > > > install it together with everything else by simply adding one line (usually) in local.conf file. > > > > I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or > > > > driver which isn’t official OpenStack project. > > > > > > You raise a really good concern. Indeed, we might have to change the workflow > > > from "write a plugin" to "write an Ansible role" to be able to test > > > your project with > > > DevStack at that page (or maintain both a "legacy" solution) with a new one. > > > > the real probalem with that is who is going to port all of the > > existing plugins. > > Do all projects and all jobs have to be converted at once? Or ever? > > How much complexity do those plugins actually contain? Would they be > fairly straightforward to convert? that depends. some jsut add support for indivigual projects. others install infrastructure services like ceph or kubernetes which will be used by openstack services. others download and compiles c projects form source like networking-ovs-dpdk. the neutron devstack pluging also used to compiles ovs form source to work around some distro bugs and networking-ovn i belive also can? do the same. a devstack plugin allows all of the above to be done trivally. > > Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and > then run devstack with just the plugin(s) needed for a given job? that would likely be possible. im sure we could generate local.conf form osa's inventories and and run the plugsins after osa runs. devstack always runs it in tree code in each phase and then runs the plugins in the order they are enabled in each phase https://docs.openstack.org/devstack/latest/plugins.html networking-ovs-dpdk for example replaces the _neutron_ovs_base_install_agent_packages function https://github.com/openstack/networking-ovs-dpdk/blob/master/devstack/libs/ovs-dpdk#L11-L16 with a noop and then in the install pahse we install ovs-dpdk form souce. _neutron_ovs_base_install_agent_packages just install kernel ovs but we replace it as our patches to make it condtional in devstack were rejected. its not nessiarily a patteren i encurage but if you have to you can replace any functionality that devstack provides via a plugin although most usecase relly dont requrie that. > > Is there enough appeal in the idea of replacing devstack with something > closer to what is used for production deployments to drive us to find an > iterative approach that doesn't require changing everything at one time? > Or are we stuck with devstack forever? > > > kolla-ansible has also tried to be a devstack replacement in the past via the introduction > > of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. > > the problem is it still breaks peoles plugins and workflow. > > > > > > some devstack feature that osa would need to support in order to be a > > replacement for me are. > > You've made a good start on a requirements list for a devstack > replacement. Perhaps a first step would be for some of the folks who > support this idea to compile a more complete list of those requirements, > and then we could analyze OSA to see how it might need to be changed or > whether it makes sense to use OSA as the basis for a new toolset that > takes on some of the "dev" features we might not want in a "production" > deployment tool. > > Here's another potential gap for whoever is going to make that list: > devstack pre-populates the environment with some data for things like > flavors and images. I don't imagine OSA does that or, if it does, that > they are an exact match. How do we change those settings? +1 yes this is somehting i forgot about > > That leads to a good second step: Do the rest of the analysis to > understand what it would take to set up a base job like we have for > devstack, that produces a similar setup. Not necessarily identical, but > similar enough to be able to run tempest. It seems likely that already > exists in some form for testing OSA itself. Could a developer run that > on a local system (clearly being able to build the test environment > locally is a requirement for replacing devstack)? > > After that, I would want to see answers to some of the questions about > dealing with plugins that I posed above. > > And only then, I think, could I provide an answer to the question of > whether we should make the change or not. > > > 1 the ablity to install all openstack project form git if needed including gerrit reviews. > > > > abiltiy to eailly specify gerrit reiews or commits for each project > > > > # here i am declaring the os-vif should be installed from git not pypi > > LIBS_FROM_GIT=os-vif > > > > # and here i am specifying that gerrit should be used as the source and > > # i am provide a gerrit/git refs branch for a specific un merged patch > > OS_VIF_REPO=https://git.openstack.org/openstack/os-vif > > OS_VIF_BRANCH=refs/changes/25/629025/9 > > > > # *_REPO can obvioulsy take anythign that is valid in a git clone command so > > # i can use a local repo too > > NEUTRON_REPO=file:///opt/repos/neutron > > # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. > > NEUTRON_BRANCH=bug/1788009 > > > > > > the next thing that would be needed is a way to simply override any config value like this > > > > [[post-config|/etc/nova/nova.conf]] > > #[compute] > > #live_migration_wait_for_vif_plug=True > > [libvirt] > > live_migration_uri = qemu+ssh://root@%s/system > > #cpu_mode = host-passthrough > > virt_type = kvm > > cpu_mode = custom > > cpu_model = kvm64 > > > > im sure that osa can do that but i really can just provide any path to any file if needed. > > so no need to update a role or plugin to set values in files created > > by plugins which is the next thing. > > Does OSA need to support *every* configuration value? Or could it deploy > a stack, and then rely on a separate tool to modify config values and > restart a service? Clearly some values need to be there when the cloud > first starts, but do they all? i think to preserve the workflow yes we need to be able to override any config that is generated by OSA. kolla ansible supports a relly nice config override mechanism where you can supply overrieds are applied after it generates a template. even though i have used the generic functionality to change thing like libvirt configs in the past i generally have only used it for the openstack services and for development i think its very imporant to easibly configure different senarios without needing to lear the opinionated syntatic sugar provided by the install and just set the config values directly especially when developing a new feature that adds a new value. > > > we enable plugins with a single line like this > > > > enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master > > some bugs > > meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices > > and devstack will clone and execute the plugins based on the single > > line above. plugins however can also > > This makes me think it might be most appropriate to be considering a > tool that replaces devstack by wrapping OSA, rather than *being* > OSA. Maybe that's just an extra playbook that runs before OSA, or maybe > it's a simpler bash script that does some setup before invoking OSA. on that point i had considerd porting networking-ovs-dpdk to an ansible role and invoking from the devstack plugin in the past but i have not had time to do that. part of what is nice about devstack plugin model is you can write you plugin in any language you like provided you have a plug.sh file as an entrypoint. i doublt we have devstack plugins today that just run ansibel or puppet but it is totally valid to do so. > > > read any varable defiend in the local.conf as it will be set in the environment which means i can easily share > > an exact configuration with someone by shareing a local.conf. > > > > > > im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack > > for all our testing in the gate it is actually has become one of the best openstack installer out there. we do > > not recommend people run it in production but with the ansible automation of grenade and the move to systemd for > > services there are less mainatined installers out there that devstack is proably a better foundation for a cloud > > to build on. people should still not use it in production but i can see why some might. > > > > > > > > > > > > > > > > Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which > > > > > > uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I > > > > > > don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep > > > > > > in > > > > > > mind. > > > > > > > > > > Indeed, with our current CI infrastructure with OSA, we have the > > > > > ability to create these dynamic scenarios (which can actually be > > > > > defined by a simple Zuul variable). > > > > > > > > > > https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 > > > > > > > > > > We do some really neat introspection of the project name being tested > > > > > in order to run specific scenarios. Therefore, that is something that > > > > > should be quite easy to accomplish simply by overriding a scenario > > > > > name within Zuul. It also is worth mentioning we now support full > > > > > metal deploys for a while now, so not having to worry about containers > > > > > is something to keep in mind as well (with simplifying the developer > > > > > experience again). > > > > > > > > > > > > On 1 Jun 2019, at 14:35, Mohammed Naser wrote: > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > This is something that I've discussed with a few people over time and > > > > > > > I think I'd probably want to bring it up by now. I'd like to propose > > > > > > > and ask if it makes sense to perhaps replace devstack entirely with > > > > > > > openstack-ansible. I think I have quite a few compelling reasons to > > > > > > > do this that I'd like to outline, as well as why I *feel* (and I could > > > > > > > be biased here, so call me out!) that OSA is the best option in terms > > > > > > > of a 'replacement' > > > > > > > > > > > > > > # Why not another deployment project? > > > > > > > I actually thought about this part too and considered this mainly for > > > > > > > ease of use for a *developer*. > > > > > > > > > > > > > > At this point, Puppet-OpenStack pretty much only deploys packages > > > > > > > (which means that it has no build infrastructure, a developer can't > > > > > > > just get $commit checked out and deployed). > > > > > > > > > > > > > > TripleO uses Kolla containers AFAIK and those have to be pre-built > > > > > > > beforehand, also, I feel they are much harder to use as a developer > > > > > > > because if you want to make quick edits and restart services, you have > > > > > > > to enter a container and make the edit there and somehow restart the > > > > > > > service without the container going back to it's original state. > > > > > > > Kolla-Ansible and the other combinations also suffer from the same > > > > > > > "issue". > > > > > > > > > > > > > > OpenStack Ansible is unique in the way that it pretty much just builds > > > > > > > a virtualenv and installs packages inside of it. The services are > > > > > > > deployed as systemd units. This is very much similar to the current > > > > > > > state of devstack at the moment (minus the virtualenv part, afaik). > > > > > > > It makes it pretty straight forward to go and edit code if you > > > > > > > need/have to. We also have support for Debian, CentOS, Ubuntu and > > > > > > > SUSE. This allows "devstack 2.0" to have far more coverage and make > > > > > > > it much more easy to deploy on a wider variety of operating systems. > > > > > > > It also has the ability to use commits checked out from Zuul so all > > > > > > > the fancy Depends-On stuff we use works. > > > > > > > > > > > > > > # Why do we care about this, I like my bash scripts! > > > > > > > As someone who's been around for a *really* long time in OpenStack, > > > > > > > I've seen a whole lot of really weird issues surface from the usage of > > > > > > > DevStack to do CI gating. For example, one of the recent things is > > > > > > > the fact it relies on installing package-shipped noVNC, where as the > > > > > > > 'master' noVNC has actually changed behavior a few months back and it > > > > > > > is completely incompatible at this point (it's just a ticking thing > > > > > > > until we realize we're entirely broken). > > > > > > > > > > > > > > To this day, I still see people who want to POC something up with > > > > > > > OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter > > > > > > > how many warnings we'll put up, they'll always try to do it. With > > > > > > > this way, at least they'll have something that has the shape of an > > > > > > > actual real deployment. In addition, it would be *good* in the > > > > > > > overall scheme of things for a deployment system to test against, > > > > > > > because this would make sure things don't break in both ways. > > > > > > > > > > > > > > Also: we run Zuul for our CI which supports Ansible natively, this can > > > > > > > remove one layer of indirection (Zuul to run Bash) and have Zuul run > > > > > > > the playbooks directly from the executor. > > > > > > > > > > > > > > # So how could we do this? > > > > > > > The OpenStack Ansible project is made of many roles that are all > > > > > > > composable, therefore, you can think of it as a combination of both > > > > > > > Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained > > > > > > > the base modules (i.e. puppet-nova, etc) and TripleO was the > > > > > > > integration of all of it in a distribution. OSA is currently both, > > > > > > > but it also includes both Ansible roles and playbooks. > > > > > > > > > > > > > > In order to make sure we maintain as much of backwards compatibility > > > > > > > as possible, we can simply run a small script which does a mapping of > > > > > > > devstack => OSA variables to make sure that the service is shipped > > > > > > > with all the necessary features as per local.conf. > > > > > > > > > > > > > > So the new process could be: > > > > > > > > > > > > > > 1) parse local.conf and generate Ansible variables files > > > > > > > 2) install Ansible (if not running in gate) > > > > > > > 3) run playbooks using variable generated in #1 > > > > > > > > > > > > > > The neat thing is after all of this, devstack just becomes a thin > > > > > > > wrapper around Ansible roles. I also think it brings a lot of hands > > > > > > > together, involving both the QA team and OSA team together, which I > > > > > > > believe that pooling our resources will greatly help in being able to > > > > > > > get more done and avoiding duplicating our efforts. > > > > > > > > > > > > > > # Conclusion > > > > > > > This is a start of a very open ended discussion, I'm sure there is a > > > > > > > lot of details involved here in the implementation that will surface, > > > > > > > but I think it could be a good step overall in simplifying our CI and > > > > > > > adding more coverage for real potential deployers. It will help two > > > > > > > teams unite together and have more resources for something (that > > > > > > > essentially is somewhat of duplicated effort at the moment). > > > > > > > > > > > > > > I will try to pick up sometime to POC a simple service being deployed > > > > > > > by an OSA role instead of Bash, placement which seems like a very > > > > > > > simple one and share that eventually. > > > > > > > > > > > > > > Thoughts? :) > > > > > > > > > > > > > > -- > > > > > > > Mohammed Naser — vexxhost > > > > > > > ----------------------------------------------------- > > > > > > > D. 514-316-8872 > > > > > > > D. 800-910-1726 ext. 200 > > > > > > > E. mnaser at vexxhost.com > > > > > > > W. http://vexxhost.com > > > > > > > > > > > > > > > > > > > — > > > > > > Slawek Kaplonski > > > > > > Senior software engineer > > > > > > Red Hat > > > > > > > > > > > > > > > > > > > > > -- > > > > > Mohammed Naser — vexxhost > > > > > ----------------------------------------------------- > > > > > D. 514-316-8872 > > > > > D. 800-910-1726 ext. 200 > > > > > E. mnaser at vexxhost.com > > > > > W. http://vexxhost.com > > > > > > > > — > > > > Slawek Kaplonski > > > > Senior software engineer > > > > Red Hat > > > > > > > > > > > > > > > > From dpeacock at redhat.com Tue Jun 4 14:03:05 2019 From: dpeacock at redhat.com (David Peacock) Date: Tue, 4 Jun 2019 10:03:05 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Mon, Jun 3, 2019 at 12:52 PM Kevin Carter wrote: > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > Count me in. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jesse at odyssey4.me Tue Jun 4 14:17:20 2019 From: jesse at odyssey4.me (Jesse Pretorius) Date: Tue, 4 Jun 2019 14:17:20 +0000 Subject: [qa][openstack-ansible][tripleo-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Hi everyone, I find myself wondering whether doing this in reverse would potentially be more useful and less disruptive. If devstack plugins in service repositories are converted from bash to ansible role(s), then there is potential for OSA to make use of that. This could potentially be a drop-in replacement for devstack by using a #!/bin/ansible (or whatever known path) shebang in a playbook file, or by changing the devstack entry point into a wrapper that runs ansible from a known path. Using this implementation process would allow a completely independent development process for the devstack conversion, and would allow OSA to retire its independent role repositories as and when the service’s ansible role is ready. Using this method would also allow devstack, OSA, triple-o and kolla-ansible to consume those ansible roles in whatever way they see fit using playbooks which are tailored to their own deployment philosophy. At the most recent PTG there was a discussion between OSA and kolla-ansible about something like this and the conversation for how that could be done would be to ensure that the roles have a clear set of inputs and outputs, with variables enabling the code paths to key outputs. My opinion is that the convergence of all Ansible-based deployment tools to use a common set of roles would be advantageous in many ways: 1. There will be more hands & eyeballs on the deployment code. 2. There will be more eyeballs on the reviews for service and deployment code. 3. There will be a convergence of developer and operator communities on the reviews. 4. The deployment code will co-exist with the service code, so changes can be done together. 5. Ansible is more pythonic than bash, and using it can likely result in the removal of a bunch of devstack bash libs. As Doug suggested, this starts with putting together some requirements - for the wrapping frameworks, as well as the component roles. It may be useful to get some sort of representative sample service to put together a PoC on to help figure out these requirements. I think that this may be useful for the tripleo-ansible team to have a view on, I’ve added the tag to the subject of this email. Best regards, Jesse IRC: odyssey4me From cboylan at sapwetik.org Tue Jun 4 14:30:11 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 04 Jun 2019 07:30:11 -0700 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> Message-ID: <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > I am in favour of ditching or at least refactoring devstack because > during the last year I often found myself blocked from fixing some > zuul/jobs issues because the buggy code was still required by legacy > devstack jobs that nobody had time maintain or fix, so they were > isolated and the default job configurations were forced to use dirty > hack needed for keeping these working. > > One such example is that there is a task that does a "chmod -R 0777 -R" > on the entire source tree, a total security threat. This is needed by devstack-gate and *not* devstack. We have been trying now for almost two years to get people to stop using devstack-gate in favor of the zuul v3 jobs. Please don't conflate this with devstack itself, it is not related and not relevant to this discussion. > > In order to make other jobs running correctly* I had to rely undoing > the damage done by such chmod because I was not able to disable the > historical hack. In order to make other jobs run correctly we are asking you to stop using devstack-gate and use zuulv3 native jobs instead. > > * ansible throws warning with unsafe file permissions > * ssh refuses to load unsafe keys > > That is why I am in favor of dropping features that are slowing down > the progress of others. Again this has nothing to do with devstack. > > I know that the reality is more complicated but I also think that > sometimes less* is more. > > > * deployment projects ;) > From johfulto at redhat.com Tue Jun 4 14:37:53 2019 From: johfulto at redhat.com (John Fulton) Date: Tue, 4 Jun 2019 10:37:53 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 10:08 AM David Peacock wrote: > > On Mon, Jun 3, 2019 at 12:52 PM Kevin Carter wrote: >> >> Hello Stackers, >> >> I wanted to follow up on this post from last year, pick up from where it left off, and bring together a squad to get things moving. >> > > Count me in. +1 From openstack at nemebean.com Tue Jun 4 14:44:48 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 4 Jun 2019 09:44:48 -0500 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> On 6/4/19 7:39 AM, Doug Hellmann wrote: > Sean Mooney writes: > >> On Mon, 2019-06-03 at 08:39 -0400, Mohammed Naser wrote: >>> On Mon, Jun 3, 2019 at 8:27 AM Slawomir Kaplonski wrote: >>>> >>>> Hi, >>>> >>>>> On 1 Jun 2019, at 20:49, Mohammed Naser wrote: >>>>> >>>>> On Sat, Jun 1, 2019 at 1:46 PM Slawomir Kaplonski wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I don’t know OSA at all so sorry if my question is dumb but in devstack we can easily write plugins, keep it in >>>>>> separate repo and such plugin can be easy used in devstack (e.g. in CI jobs it’s used a lot). Is something >>>>>> similar possible with OSA or will it be needed to contribute always every change to OSA repository? >>>>> >>>>> Not a dumb question at all. So, we do have this concept of 'roles' >>>>> which you _could_ kinda technically identify similar to plugins. >>>>> However, I think one of the things that would maybe come out of this >>>>> is the inability for projects to maintain their own plugins (because >>>>> now you can host neutron/devstack/plugins and you maintain that repo >>>>> yourself), under this structure, you would indeed have to make those >>>>> changes to the OpenStack Ansible Neutron role >>>>> >>>>> i.e.: https://opendev.org/openstack/openstack-ansible-os_neutron >>>>> >>>>> However, I think from an OSA perspective, we would be more than happy >>>>> to add project maintainers for specific projects to their appropriate >>>>> roles. It would make sense that there is someone from the Neutron >>>>> team that could be a core on os_neutron from example. >>>> >>>> Yes, that may work for official projects like Neutron. But what about everything else, like projects hosted now in >>>> opendev.org/x/ repositories? Devstack gives everyone easy way to integrate own plugin/driver/project with it and >>>> install it together with everything else by simply adding one line (usually) in local.conf file. >>>> I think that it may be a bit hard to OSA team to accept and review patches with new roles for every project or >>>> driver which isn’t official OpenStack project. >>> >>> You raise a really good concern. Indeed, we might have to change the workflow >>> from "write a plugin" to "write an Ansible role" to be able to test >>> your project with >>> DevStack at that page (or maintain both a "legacy" solution) with a new one. >> the real probalem with that is who is going to port all of the >> existing plugins. > > Do all projects and all jobs have to be converted at once? Or ever? Perhaps not all at once, but I would say they all need to be converted eventually or we end up in the situation Dean mentioned where we have to maintain two different deployment systems. I would argue that's much worse than just continuing with devstack as-is. On the other hand, practically speaking I don't think we can probably do them all at once, unless there are a lot fewer devstack plugins in the wild than I think there are (which is possible). Also, I suspect there may be downstream plugins running in third-party ci that need to be considered. That said, while I expect this would be _extremely_ painful in the short to medium term, I'm also a big proponent of making the thing developers care about the same as the thing users care about. However, if we go down this path I think we need sufficient buy in from a diverse enough group of contributors that losing one group (see OSIC) doesn't leave us with a half-finished migration. That would be a disaster IMHO. > > How much complexity do those plugins actually contain? Would they be > fairly straightforward to convert? > > Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and > then run devstack with just the plugin(s) needed for a given job? > > Is there enough appeal in the idea of replacing devstack with something > closer to what is used for production deployments to drive us to find an > iterative approach that doesn't require changing everything at one time? > Or are we stuck with devstack forever? > >> kolla-ansible has also tried to be a devstack replacement in the past via the introduction >> of dev-mode which clones the git repo of the dev mode project locally and bind mounts them into the container. >> the problem is it still breaks peoles plugins and workflow. >> >> >> some devstack feature that osa would need to support in order to be a >> replacement for me are. > > You've made a good start on a requirements list for a devstack > replacement. Perhaps a first step would be for some of the folks who > support this idea to compile a more complete list of those requirements, > and then we could analyze OSA to see how it might need to be changed or > whether it makes sense to use OSA as the basis for a new toolset that > takes on some of the "dev" features we might not want in a "production" > deployment tool. > > Here's another potential gap for whoever is going to make that list: > devstack pre-populates the environment with some data for things like > flavors and images. I don't imagine OSA does that or, if it does, that > they are an exact match. How do we change those settings? > > That leads to a good second step: Do the rest of the analysis to > understand what it would take to set up a base job like we have for > devstack, that produces a similar setup. Not necessarily identical, but > similar enough to be able to run tempest. It seems likely that already > exists in some form for testing OSA itself. Could a developer run that > on a local system (clearly being able to build the test environment > locally is a requirement for replacing devstack)? > > After that, I would want to see answers to some of the questions about > dealing with plugins that I posed above. > > And only then, I think, could I provide an answer to the question of > whether we should make the change or not. > >> 1 the ablity to install all openstack project form git if needed including gerrit reviews. >> >> abiltiy to eailly specify gerrit reiews or commits for each project >> >> # here i am declaring the os-vif should be installed from git not pypi >> LIBS_FROM_GIT=os-vif >> >> # and here i am specifying that gerrit should be used as the source and >> # i am provide a gerrit/git refs branch for a specific un merged patch >> OS_VIF_REPO=https://git.openstack.org/openstack/os-vif >> OS_VIF_BRANCH=refs/changes/25/629025/9 >> >> # *_REPO can obvioulsy take anythign that is valid in a git clone command so >> # i can use a local repo too >> NEUTRON_REPO=file:///opt/repos/neutron >> # and *_BRANCH as the name implices works with branches, tag commits* and gerrit ref brances. >> NEUTRON_BRANCH=bug/1788009 >> >> >> the next thing that would be needed is a way to simply override any config value like this >> >> [[post-config|/etc/nova/nova.conf]] >> #[compute] >> #live_migration_wait_for_vif_plug=True >> [libvirt] >> live_migration_uri = qemu+ssh://root@%s/system >> #cpu_mode = host-passthrough >> virt_type = kvm >> cpu_mode = custom >> cpu_model = kvm64 >> >> im sure that osa can do that but i really can just provide any path to any file if needed. >> so no need to update a role or plugin to set values in files created >> by plugins which is the next thing. > > Does OSA need to support *every* configuration value? Or could it deploy > a stack, and then rely on a separate tool to modify config values and > restart a service? Clearly some values need to be there when the cloud > first starts, but do they all? > >> we enable plugins with a single line like this >> >> enable_plugin networking-ovs-dpdk https://github.com/openstack/networking-ovs-dpdk master >> >> meaning there is no need to preinstall or clone the repo. in theory the plugin should install all its dependeices >> and devstack will clone and execute the plugins based on the single >> line above. plugins however can also > > This makes me think it might be most appropriate to be considering a > tool that replaces devstack by wrapping OSA, rather than *being* > OSA. Maybe that's just an extra playbook that runs before OSA, or maybe > it's a simpler bash script that does some setup before invoking OSA. > >> read any varable defiend in the local.conf as it will be set in the environment which means i can easily share >> an exact configuration with someone by shareing a local.conf. >> >> >> im not against improving or replacing devstack but with the devstack ansible roles and the fact we use devstack >> for all our testing in the gate it is actually has become one of the best openstack installer out there. we do >> not recommend people run it in production but with the ansible automation of grenade and the move to systemd for >> services there are less mainatined installers out there that devstack is proably a better foundation for a cloud >> to build on. people should still not use it in production but i can see why some might. >> >>> >>>>> >>>>>> Speaking about CI, e.g. in neutron we currently have jobs like neutron-functional or neutron-fullstack which >>>>>> uses only some parts of devstack. That kind of jobs will probably have to be rewritten after such change. I >>>>>> don’t know if neutron jobs are only which can be affected in that way but IMHO it’s something worth to keep in >>>>>> mind. >>>>> >>>>> Indeed, with our current CI infrastructure with OSA, we have the >>>>> ability to create these dynamic scenarios (which can actually be >>>>> defined by a simple Zuul variable). >>>>> >>>>> https://github.com/openstack/openstack-ansible/blob/master/zuul.d/playbooks/pre-gate-scenario.yml#L41-L46 >>>>> >>>>> We do some really neat introspection of the project name being tested >>>>> in order to run specific scenarios. Therefore, that is something that >>>>> should be quite easy to accomplish simply by overriding a scenario >>>>> name within Zuul. It also is worth mentioning we now support full >>>>> metal deploys for a while now, so not having to worry about containers >>>>> is something to keep in mind as well (with simplifying the developer >>>>> experience again). >>>>> >>>>>>> On 1 Jun 2019, at 14:35, Mohammed Naser wrote: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> This is something that I've discussed with a few people over time and >>>>>>> I think I'd probably want to bring it up by now. I'd like to propose >>>>>>> and ask if it makes sense to perhaps replace devstack entirely with >>>>>>> openstack-ansible. I think I have quite a few compelling reasons to >>>>>>> do this that I'd like to outline, as well as why I *feel* (and I could >>>>>>> be biased here, so call me out!) that OSA is the best option in terms >>>>>>> of a 'replacement' >>>>>>> >>>>>>> # Why not another deployment project? >>>>>>> I actually thought about this part too and considered this mainly for >>>>>>> ease of use for a *developer*. >>>>>>> >>>>>>> At this point, Puppet-OpenStack pretty much only deploys packages >>>>>>> (which means that it has no build infrastructure, a developer can't >>>>>>> just get $commit checked out and deployed). >>>>>>> >>>>>>> TripleO uses Kolla containers AFAIK and those have to be pre-built >>>>>>> beforehand, also, I feel they are much harder to use as a developer >>>>>>> because if you want to make quick edits and restart services, you have >>>>>>> to enter a container and make the edit there and somehow restart the >>>>>>> service without the container going back to it's original state. >>>>>>> Kolla-Ansible and the other combinations also suffer from the same >>>>>>> "issue". >>>>>>> >>>>>>> OpenStack Ansible is unique in the way that it pretty much just builds >>>>>>> a virtualenv and installs packages inside of it. The services are >>>>>>> deployed as systemd units. This is very much similar to the current >>>>>>> state of devstack at the moment (minus the virtualenv part, afaik). >>>>>>> It makes it pretty straight forward to go and edit code if you >>>>>>> need/have to. We also have support for Debian, CentOS, Ubuntu and >>>>>>> SUSE. This allows "devstack 2.0" to have far more coverage and make >>>>>>> it much more easy to deploy on a wider variety of operating systems. >>>>>>> It also has the ability to use commits checked out from Zuul so all >>>>>>> the fancy Depends-On stuff we use works. >>>>>>> >>>>>>> # Why do we care about this, I like my bash scripts! >>>>>>> As someone who's been around for a *really* long time in OpenStack, >>>>>>> I've seen a whole lot of really weird issues surface from the usage of >>>>>>> DevStack to do CI gating. For example, one of the recent things is >>>>>>> the fact it relies on installing package-shipped noVNC, where as the >>>>>>> 'master' noVNC has actually changed behavior a few months back and it >>>>>>> is completely incompatible at this point (it's just a ticking thing >>>>>>> until we realize we're entirely broken). >>>>>>> >>>>>>> To this day, I still see people who want to POC something up with >>>>>>> OpenStack or *ACTUALLY* try to run OpenStack with DevStack. No matter >>>>>>> how many warnings we'll put up, they'll always try to do it. With >>>>>>> this way, at least they'll have something that has the shape of an >>>>>>> actual real deployment. In addition, it would be *good* in the >>>>>>> overall scheme of things for a deployment system to test against, >>>>>>> because this would make sure things don't break in both ways. >>>>>>> >>>>>>> Also: we run Zuul for our CI which supports Ansible natively, this can >>>>>>> remove one layer of indirection (Zuul to run Bash) and have Zuul run >>>>>>> the playbooks directly from the executor. >>>>>>> >>>>>>> # So how could we do this? >>>>>>> The OpenStack Ansible project is made of many roles that are all >>>>>>> composable, therefore, you can think of it as a combination of both >>>>>>> Puppet-OpenStack and TripleO (back then). Puppet-OpenStack contained >>>>>>> the base modules (i.e. puppet-nova, etc) and TripleO was the >>>>>>> integration of all of it in a distribution. OSA is currently both, >>>>>>> but it also includes both Ansible roles and playbooks. >>>>>>> >>>>>>> In order to make sure we maintain as much of backwards compatibility >>>>>>> as possible, we can simply run a small script which does a mapping of >>>>>>> devstack => OSA variables to make sure that the service is shipped >>>>>>> with all the necessary features as per local.conf. >>>>>>> >>>>>>> So the new process could be: >>>>>>> >>>>>>> 1) parse local.conf and generate Ansible variables files >>>>>>> 2) install Ansible (if not running in gate) >>>>>>> 3) run playbooks using variable generated in #1 >>>>>>> >>>>>>> The neat thing is after all of this, devstack just becomes a thin >>>>>>> wrapper around Ansible roles. I also think it brings a lot of hands >>>>>>> together, involving both the QA team and OSA team together, which I >>>>>>> believe that pooling our resources will greatly help in being able to >>>>>>> get more done and avoiding duplicating our efforts. >>>>>>> >>>>>>> # Conclusion >>>>>>> This is a start of a very open ended discussion, I'm sure there is a >>>>>>> lot of details involved here in the implementation that will surface, >>>>>>> but I think it could be a good step overall in simplifying our CI and >>>>>>> adding more coverage for real potential deployers. It will help two >>>>>>> teams unite together and have more resources for something (that >>>>>>> essentially is somewhat of duplicated effort at the moment). >>>>>>> >>>>>>> I will try to pick up sometime to POC a simple service being deployed >>>>>>> by an OSA role instead of Bash, placement which seems like a very >>>>>>> simple one and share that eventually. >>>>>>> >>>>>>> Thoughts? :) >>>>>>> >>>>>>> -- >>>>>>> Mohammed Naser — vexxhost >>>>>>> ----------------------------------------------------- >>>>>>> D. 514-316-8872 >>>>>>> D. 800-910-1726 ext. 200 >>>>>>> E. mnaser at vexxhost.com >>>>>>> W. http://vexxhost.com >>>>>>> >>>>>> >>>>>> — >>>>>> Slawek Kaplonski >>>>>> Senior software engineer >>>>>> Red Hat >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mohammed Naser — vexxhost >>>>> ----------------------------------------------------- >>>>> D. 514-316-8872 >>>>> D. 800-910-1726 ext. 200 >>>>> E. mnaser at vexxhost.com >>>>> W. http://vexxhost.com >>>> >>>> — >>>> Slawek Kaplonski >>>> Senior software engineer >>>> Red Hat >>>> >>> >>> >> >> > From fungi at yuggoth.org Tue Jun 4 15:47:26 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 15:47:26 +0000 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> Message-ID: <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: > On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > > I am in favour of ditching or at least refactoring devstack because > > during the last year I often found myself blocked from fixing some > > zuul/jobs issues because the buggy code was still required by legacy > > devstack jobs that nobody had time maintain or fix, so they were > > isolated and the default job configurations were forced to use dirty > > hack needed for keeping these working. > > > > One such example is that there is a task that does a "chmod -R 0777 -R" > > on the entire source tree, a total security threat. > > This is needed by devstack-gate and *not* devstack. We have been > trying now for almost two years to get people to stop using > devstack-gate in favor of the zuul v3 jobs. Please don't conflate > this with devstack itself, it is not related and not relevant to > this discussion. [...] Unfortunately this is not entirely the case. It's likely that the chmod workaround in question is only needed by legacy jobs using the deprecated devstack-gate wrappers, but it's actually being done by the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated in our base job[1]. I agree that the solution is to stop using devstack-gate (and the old zuul-cloner v2 compatibility shim for that matter), but for it to have the effect of removing the problem permissions we also need to move the fetch-zuul-cloner role out of our base job. I fully expect this will be a widely-disruptive change due to newer or converted jobs, which are no longer inheriting from legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining a dependency on this behavior. But the longer we wait, the worse that is going to get. [0] https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 [1] https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 [2] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From a.settle at outlook.com Tue Jun 4 16:18:49 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Tue, 4 Jun 2019 16:18:49 +0000 Subject: [tc] agenda for Technical Committee Meeting 6 June 2019 @ 1400 UTC Message-ID: TC Members, Our next meeting will be this Thursday, 6 June at 1400 UTC in #openstack-tc. Since there has not been a TC meeting for some time, the last suggested meeting agenda has been erased in favor of focusing on post-PTG and forum content. You will find the outlined agenda below. Any suggestions, please contact me before COB (relative to local time) on Wednesday 5 June. This email contains the agenda for the meeting, based on the content of the wiki [0]. If you will not be able to attend, please include your name in the "Apologies for Absence" section of the wiki page [0]. [0] https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee * Review Denver PTG All items are from the PTG etherpad at [1] ** Changes to the health check process. The suggestion was to remove the formality of "health checks" and focus on TC members being liaisons to projects in an effort to help manage a project's health (if required). We need to assign TC members to teams on the project team page. fungi offered to update the wiki, and asettle offered to update the ML on this change once the wiki was updated. See more action items on line 62 of the etherpad. ** Evolution of the help-most-needed list - this was discussed and decided to become an uncapped list, adding docs surrounding annual re-submission for items. We will go through the action items on this list (line 84 of the etherpad). Business cases requiring updates. ** Goal selection is changing. The way goals are being selected for the upcoming release will change. For Train, we will work on socialising the idea of proposing goals that are OpenStack-wide, but not tech-heavy. The new goal selection process splits the goals into "goal" and "implementation". Further details at the meeting. See line 101 of the etherpad for action items. ** Pop-up teams have been officially recognised and implemented into governance thanks to ttx. Please review his patch here [2] ** SIG governance is being defined (ricolin and asettle). See line 137 for action items. Will detail further at meeting. ** Python 3 check in. Finalising the migration (mugsie) ** Leaderless projects are becoming a concern - action items were on line 185 of the etherpad. Suggestions include reworking the documentation around the current role of the PTL and providing tips on how to "be a better PTL" and offering shadowing and mentoring for potential candidates. This all needs to be socialised further. ** Kickstarting innovation in Openstack - Zane proposed a zany (har har har) suggestion regarding a new multi-tenant cloud with ironic/neutron/optionally cinder/keytstone/octavia (vision will be completed with k8s on top of OpenStack). Suggestion was for new white paper written by zaneb and mnaser. ** Deleting all the things! [3] See line 234 for action items (mugsie). [1] https://etherpad.openstack.org/p/tc-train-ptg [2] https://review.opendev.org/#/c/661356/ [3] https://memegenerator.net/img/instances/14634311.jpg * Review Denver Forum ** Forum session planning for the next summit as this one was done rather hastily and we missed a few things (such as a goals session). See action items on line 243 of the PTG etherpad [1] * Other ** Socialising successbot and thanksbot. Get to it, team! Cheers, Alex p.s - I have a well known habit for getting meeting agenda's HORRIBLY wrong every time, so feel free to ping a gal and tell her what you know. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gr at ham.ie Tue Jun 4 16:23:46 2019 From: gr at ham.ie (Graham Hayes) Date: Tue, 4 Jun 2019 17:23:46 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> Message-ID: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> On 04/06/2019 16:47, Jeremy Stanley wrote: > On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: >> On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: >>> I am in favour of ditching or at least refactoring devstack because >>> during the last year I often found myself blocked from fixing some >>> zuul/jobs issues because the buggy code was still required by legacy >>> devstack jobs that nobody had time maintain or fix, so they were >>> isolated and the default job configurations were forced to use dirty >>> hack needed for keeping these working. >>> >>> One such example is that there is a task that does a "chmod -R 0777 -R" >>> on the entire source tree, a total security threat. >> >> This is needed by devstack-gate and *not* devstack. We have been >> trying now for almost two years to get people to stop using >> devstack-gate in favor of the zuul v3 jobs. Please don't conflate >> this with devstack itself, it is not related and not relevant to >> this discussion. > [...] > > Unfortunately this is not entirely the case. It's likely that the > chmod workaround in question is only needed by legacy jobs using the > deprecated devstack-gate wrappers, but it's actually being done by > the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated > in our base job[1]. I agree that the solution is to stop using > devstack-gate (and the old zuul-cloner v2 compatibility shim for > that matter), but for it to have the effect of removing the problem > permissions we also need to move the fetch-zuul-cloner role out of > our base job. I fully expect this will be a widely-disruptive change > due to newer or converted jobs, which are no longer inheriting from > legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining > a dependency on this behavior. But the longer we wait, the worse > that is going to get. I have been trying to limit this behaviour for nearly 4 years [3] (it can actually add 10-15 mins sometimes depending on what source trees I have mounted via NFS into a devstack VM when doing dev) > [0] https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 > [1] https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 > [2] https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 > [3] - https://review.opendev.org/#/c/203698 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From smooney at redhat.com Tue Jun 4 16:59:59 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 04 Jun 2019 17:59:59 +0100 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> Message-ID: <1b0924d73d2317dc1e6a9c0128f3ee5e1a3152d4.camel@redhat.com> On Tue, 2019-06-04 at 17:23 +0100, Graham Hayes wrote: > On 04/06/2019 16:47, Jeremy Stanley wrote: > > On 2019-06-04 07:30:11 -0700 (-0700), Clark Boylan wrote: > > > On Tue, Jun 4, 2019, at 1:01 AM, Sorin Sbarnea wrote: > > > > I am in favour of ditching or at least refactoring devstack because > > > > during the last year I often found myself blocked from fixing some > > > > zuul/jobs issues because the buggy code was still required by legacy > > > > devstack jobs that nobody had time maintain or fix, so they were > > > > isolated and the default job configurations were forced to use dirty > > > > hack needed for keeping these working. > > > > > > > > One such example is that there is a task that does a "chmod -R 0777 -R" > > > > on the entire source tree, a total security threat. > > > > > > This is needed by devstack-gate and *not* devstack. We have been > > > trying now for almost two years to get people to stop using > > > devstack-gate in favor of the zuul v3 jobs. Please don't conflate > > > this with devstack itself, it is not related and not relevant to > > > this discussion. > > > > [...] > > > > Unfortunately this is not entirely the case. It's likely that the > > chmod workaround in question is only needed by legacy jobs using the > > deprecated devstack-gate wrappers, but it's actually being done by > > the fetch-zuul-cloner role[0] from zuul-jobs which is incorporated > > in our base job[1]. I agree that the solution is to stop using > > devstack-gate (and the old zuul-cloner v2 compatibility shim for > > that matter), but for it to have the effect of removing the problem > > permissions we also need to move the fetch-zuul-cloner role out of > > our base job. I fully expect this will be a widely-disruptive change > > due to newer or converted jobs, which are no longer inheriting from > > legacy-base or legacy-dsvm-base in openstack-zuul-jobs[2], retaining > > a dependency on this behavior. But the longer we wait, the worse > > that is going to get. > > I have been trying to limit this behaviour for nearly 4 years [3] > (it can actually add 10-15 mins sometimes depending on what source trees > I have mounted via NFS into a devstack VM when doing dev) without looking into it i assuem this doeing this so that the stack user can read/execute scipts in the different git repos but chown -R stack:stack would be sainer. in anycase this is still a ci issue not a devstack one as devstack does not do this iteslf. by defualt it clones the repos if they dont exist as the current user so you dont need to change permissions. > > [0] > > https://opendev.org/zuul/zuul-jobs/src/commit/2f2d6ce3f7a0687fc8f655abc168d7afbfaf11aa/roles/fetch-zuul-cloner/tasks/main.yaml#L19-L25 > > [1] > > https://opendev.org/opendev/base-jobs/src/commit/dbb56dda99e8e2346b22479b4dae97a8fc137217/playbooks/base/pre.yaml#L38 > > [2] > > https://opendev.org/openstack/openstack-zuul-jobs/src/commit/a7aa530a6059b464b32df69509e3001dc97e2aed/zuul.d/jobs.yaml#L951-L1097 > > > > [3] - https://review.opendev.org/#/c/203698 > From dmsimard at redhat.com Tue Jun 4 17:08:16 2019 From: dmsimard at redhat.com (David Moreau Simard) Date: Tue, 4 Jun 2019 13:08:16 -0400 Subject: [all] Announcing the release of ARA Records Ansible 1.0 Message-ID: Hi openstack-discuss ! ARA 1.0 has been released and the announcement about it can be found here [1]. I wanted to personally thank the OpenStack community for believing in the project and the contributors who have helped get the project to where it is today. If you have any questions, feel free to reply to reach out ! [1]: https://ara.recordsansible.org/blog/2019/06/04/announcing-the-release-of-ara-records-ansible-1.0 David Moreau Simard dmsimard = [irc, github, twitter] From fungi at yuggoth.org Tue Jun 4 17:32:41 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 17:32:41 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> Message-ID: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote: [...] > I have been trying to limit this behaviour for nearly 4 years [3] > (it can actually add 10-15 mins sometimes depending on what source trees > I have mounted via NFS into a devstack VM when doing dev) > > [3] - https://review.opendev.org/#/c/203698 Similar I suppose, though the problem mentioned in this subthread is actually not about the mass permission change itself, rather about the resulting permissions. In particular the fetch-zuul-cloner role makes the entire set of provided repositories world-writeable because the zuul-cloner v2 compatibility shim performs clones from those file paths and Git wants to hardlink them if they're being cloned within the same filesystem. This is necessary to support occasions where the original copies aren't owned by the same user running the zuul-cloner shim, since you can't hardlink files for which your account lacks write access. I've done a bit of digging into the history of this now, so the following is probably boring to the majority of you. If you want to help figure out why it's still there at the moment and what's left to do, read on... Change https://review.openstack.org/512285 which added the chmod task includes a rather prescient comment from Paul about not adding it to the mirror-workspace-git-repos role because "we might not want to chmod 777 on no-legacy jobs." Unfortunately I think we failed to realize that it already would because we had added fetch-zuul-cloner to our base job a month earlier in https://review.openstack.org/501843 for reasons which are not recorded in the change (presumably a pragmatic compromise related to the scramble to convert our v2 jobs at the time, I did not resort to digging in IRC history just yet). Soon after, we added fetch-zuul-cloner to the main "legacy" pre playbook with https://review.opendev.org/513067 and prepared to test its removal from the base job with https://review.opendev.org/513079 but that was never completed and I can't seem to find the results of the testing (or even any indication it was ever actually performed). At this point, I feel like we probably just need to re-propose an equivalent of 513079 in our base-jobs repository, exercise it with some DNM changes running a mix of legacy imported v2 and modern v3 native jobs, announce a flag day for the cut over, and try to help address whatever fallout we're unable to predict ahead of time. This is somewhat complicated by the need to also do something similar in https://review.opendev.org/656195 with the bindep "fallback" packages list, so we're going to need to decide how those two efforts will be sequenced, or whether we want to combine them into a single (and likely doubly-painful) event. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From doug at doughellmann.com Tue Jun 4 18:07:21 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:07:21 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> <42a6e49abc54ab460c0a71529957e362f6d77eae.camel@redhat.com> Message-ID: Sean Mooney writes: > On Tue, 2019-06-04 at 08:39 -0400, Doug Hellmann wrote: >> Sean Mooney writes: >> > >> > the real probalem with that is who is going to port all of the >> > existing plugins. >> >> Do all projects and all jobs have to be converted at once? Or ever? >> >> How much complexity do those plugins actually contain? Would they be >> fairly straightforward to convert? > that depends. some jsut add support for indivigual projects. > others install infrastructure services like ceph or kubernetes which will be used by openstack > services. others download and compiles c projects form source like networking-ovs-dpdk. > the neutron devstack pluging also used to compiles ovs form source to work around some distro bugs > and networking-ovn i belive also can? do the same. > a devstack plugin allows all of the above to be done trivally. It's possible to do all of that sort of thing through Ansible, too. I compile a couple of different tools as part of my developer setup playbooks. If the logic is complicated, the playbook can always call a script. >> Could we build a "devstack plugin wrapper" for OSA? Could we run OSA and >> then run devstack with just the plugin(s) needed for a given job? > that would likely be possible. im sure we could generate local.conf form osa's inventories > and and run the plugsins after osa runs. devstack always runs it in tree code in each phase and > then runs the plugins in the order they are enabled in each phase > > https://docs.openstack.org/devstack/latest/plugins.html > > > networking-ovs-dpdk for example replaces the _neutron_ovs_base_install_agent_packages function > https://github.com/openstack/networking-ovs-dpdk/blob/master/devstack/libs/ovs-dpdk#L11-L16 > with a noop and then in the install pahse we install ovs-dpdk form souce. > _neutron_ovs_base_install_agent_packages just install kernel ovs but we replace it as > our patches to make it condtional in devstack were rejected. What we end up with after this transition might work differently. Is there any reason it would have to maintain the "phase" approach? The ovs-dpdk example you give feels like it would be swapping one role for another in the playbook for the job that needs ovs-dpdk. > its not nessiarily a patteren i encurage but if you have to you can replace any functionality > that devstack provides via a plugin although most usecase relly dont > requrie that. Maybe we don't need to design around that if the requirement isn't common, then? That's another question for the analysis someone needs to do. >> Does OSA need to support *every* configuration value? Or could it deploy >> a stack, and then rely on a separate tool to modify config values and >> restart a service? Clearly some values need to be there when the cloud >> first starts, but do they all? > i think to preserve the workflow yes we need to be able to override > any config that is generated OK, so it sounds like that's an area to look at for gaps for OSA. I would imagine it would be possible to create a role to change arbitrary config settings based on inputs from the playbook or a vars file. -- Doug From doug at doughellmann.com Tue Jun 4 18:10:17 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:10:17 -0400 Subject: [qa][openstack-ansible][tripleo-ansible] redefining devstack In-Reply-To: References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> Message-ID: Jesse Pretorius writes: > Hi everyone, > > I find myself wondering whether doing this in reverse would potentially be more useful and less disruptive. > > If devstack plugins in service repositories are converted from bash to ansible role(s), then there is potential for OSA to make use of that. This could potentially be a drop-in replacement for devstack by using a #!/bin/ansible (or whatever known path) shebang in a playbook file, or by changing the devstack entry point into a wrapper that runs ansible from a known path. > > Using this implementation process would allow a completely independent > development process for the devstack conversion, and would allow OSA > to retire its independent role repositories as and when the service’s > ansible role is ready. It depends on whether you want to delay the deprecation of devstack itself until enough services have done that, or if you want to make NewDevstack (someone should come up with a name for the OSA-based devstack replacement) consume those existing plugins in parallel with OSA. > Using this method would also allow devstack, OSA, triple-o and > kolla-ansible to consume those ansible roles in whatever way they see > fit using playbooks which are tailored to their own deployment > philosophy. That would be useful. > > At the most recent PTG there was a discussion between OSA and kolla-ansible about something like this and the conversation for how that could be done would be to ensure that the roles have a clear set of inputs and outputs, with variables enabling the code paths to key outputs. > > My opinion is that the convergence of all Ansible-based deployment tools to use a common set of roles would be advantageous in many ways: > > 1. There will be more hands & eyeballs on the deployment code. > 2. There will be more eyeballs on the reviews for service and deployment code. > 3. There will be a convergence of developer and operator communities > on the reviews. That might make all of this worth it, even if there is no other benefit. > 4. The deployment code will co-exist with the service code, so changes can be done together. > 5. Ansible is more pythonic than bash, and using it can likely result in the removal of a bunch of devstack bash libs. > > As Doug suggested, this starts with putting together some requirements - for the wrapping frameworks, as well as the component roles. It may be useful to get some sort of representative sample service to put together a PoC on to help figure out these requirements. > > I think that this may be useful for the tripleo-ansible team to have a view on, I’ve added the tag to the subject of this email. > > Best regards, > > Jesse > IRC: odyssey4me -- Doug From doug at doughellmann.com Tue Jun 4 18:15:09 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 04 Jun 2019 14:15:09 -0400 Subject: [qa][openstack-ansible] redefining devstack In-Reply-To: <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> References: <2B7DBB4B-103B-4453-AF00-5CB53D7C5081@redhat.com> <82388be7-88d3-ad4a-bba3-81fe86eb8034@nemebean.com> Message-ID: Ben Nemec writes: > On 6/4/19 7:39 AM, Doug Hellmann wrote: >> >> Do all projects and all jobs have to be converted at once? Or ever? > > Perhaps not all at once, but I would say they all need to be converted > eventually or we end up in the situation Dean mentioned where we have to > maintain two different deployment systems. I would argue that's much > worse than just continuing with devstack as-is. On the other hand, > practically speaking I don't think we can probably do them all at once, > unless there are a lot fewer devstack plugins in the wild than I think > there are (which is possible). Also, I suspect there may be downstream > plugins running in third-party ci that need to be considered. I think we can't do them all at once. We can never do anything all at once; we're too big. I don't think we should have a problem saying that devstack is frozen for new features but will continue to run as-is, and new things should use the replacement (when it is available). As soon as the new thing can provide a bridge with *some* level of support for plugins, we could start transitioning as teams have time and need. Jesse's proposal to rewrite devstack plugins as ansible roles may give us that bridge. > That said, while I expect this would be _extremely_ painful in the short > to medium term, I'm also a big proponent of making the thing developers > care about the same as the thing users care about. However, if we go > down this path I think we need sufficient buy in from a diverse enough > group of contributors that losing one group (see OSIC) doesn't leave us > with a half-finished migration. That would be a disaster IMHO. Oh, yes. We would need this not to be a project undertaken by a group of people from one funding source. It needs to be a shift in direction of the community as a whole to improve our developer and testing tools. -- Doug From fungi at yuggoth.org Tue Jun 4 20:50:02 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 20:50:02 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> Message-ID: <20190604205001.v7jc3y3sepsdfjcv@yuggoth.org> On 2019-06-04 17:32:41 +0000 (+0000), Jeremy Stanley wrote: [...] > Change https://review.openstack.org/512285 which added the chmod > task includes a rather prescient comment from Paul about not adding > it to the mirror-workspace-git-repos role because "we might not want > to chmod 777 on no-legacy jobs." Unfortunately I think we failed to > realize that it already would because we had added fetch-zuul-cloner > to our base job a month earlier in > https://review.openstack.org/501843 for reasons which are not > recorded in the change (presumably a pragmatic compromise related to > the scramble to convert our v2 jobs at the time, I did not resort to > digging in IRC history just yet). David Shrewsbury reminded me that the reason was we didn't have a separate legacy-base job yet at the time fetch-zuul-cloner was added, so it initially went into the normal base job. > Soon after, we added fetch-zuul-cloner to the main "legacy" pre > playbook with https://review.opendev.org/513067 and prepared to > test its removal from the base job with > https://review.opendev.org/513079 but that was never completed and > I can't seem to find the results of the testing (or even any > indication it was ever actually performed). > > At this point, I feel like we probably just need to re-propose an > equivalent of 513079 in our base-jobs repository, Proposed as https://review.opendev.org/663135 and once that merges we should be able to... > exercise it with some DNM changes running a mix of legacy imported > v2 and modern v3 native jobs, announce a flag day for the cut > over, and try to help address whatever fallout we're unable to > predict ahead of time. This is somewhat complicated by the need to > also do something similar in https://review.opendev.org/656195 > with the bindep "fallback" packages list, so we're going to need > to decide how those two efforts will be sequenced, or whether we > want to combine them into a single (and likely doubly-painful) > event. During the weekly Infrastructure team meeting which just wrapped up, we decided go ahead and combine the two cleanups for maximum pain and suffering. ;) http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-04-19.01.log.html#l-207 Tentatively, we're scheduling the removal of the fetch-zuul-cloner role and the bindep fallback package list from non-legacy jobs for Monday June 24. The details of this plan will of course be more widely disseminated in the coming days, assuming we don't identify any early blockers. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pabelanger at redhat.com Tue Jun 4 22:07:27 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Tue, 4 Jun 2019 18:07:27 -0400 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604173241.3r22gjulzwuvihbk@yuggoth.org> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> Message-ID: <20190604220727.GB32715@localhost.localdomain> On Tue, Jun 04, 2019 at 05:32:41PM +0000, Jeremy Stanley wrote: > On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote: > [...] > > I have been trying to limit this behaviour for nearly 4 years [3] > > (it can actually add 10-15 mins sometimes depending on what source trees > > I have mounted via NFS into a devstack VM when doing dev) > > > > [3] - https://review.opendev.org/#/c/203698 > > Similar I suppose, though the problem mentioned in this subthread is > actually not about the mass permission change itself, rather about > the resulting permissions. In particular the fetch-zuul-cloner role > makes the entire set of provided repositories world-writeable > because the zuul-cloner v2 compatibility shim performs clones from > those file paths and Git wants to hardlink them if they're being > cloned within the same filesystem. This is necessary to support > occasions where the original copies aren't owned by the same user > running the zuul-cloner shim, since you can't hardlink files for > which your account lacks write access. > > I've done a bit of digging into the history of this now, so the > following is probably boring to the majority of you. If you want to > help figure out why it's still there at the moment and what's left > to do, read on... > > Change https://review.openstack.org/512285 which added the chmod > task includes a rather prescient comment from Paul about not adding > it to the mirror-workspace-git-repos role because "we might not want > to chmod 777 on no-legacy jobs." Unfortunately I think we failed to > realize that it already would because we had added fetch-zuul-cloner > to our base job a month earlier in > https://review.openstack.org/501843 for reasons which are not > recorded in the change (presumably a pragmatic compromise related to > the scramble to convert our v2 jobs at the time, I did not resort to > digging in IRC history just yet). Soon after, we added > fetch-zuul-cloner to the main "legacy" pre playbook with > https://review.opendev.org/513067 and prepared to test its removal > from the base job with https://review.opendev.org/513079 but that > was never completed and I can't seem to find the results of the > testing (or even any indication it was ever actually performed). > Testing was done, you can see that in https://review.opendev.org/513506/. However the issue was, at the time, projects that were using tools/tox_install.sh would break (I have no idea is that is still the case). For humans interested, https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the etherpad to capture this work. Eventually I ended up abandoning the patch, because I wasn't able to keep pushing on it. > At this point, I feel like we probably just need to re-propose an > equivalent of 513079 in our base-jobs repository, exercise it with > some DNM changes running a mix of legacy imported v2 and modern v3 > native jobs, announce a flag day for the cut over, and try to help > address whatever fallout we're unable to predict ahead of time. This > is somewhat complicated by the need to also do something similar > in https://review.opendev.org/656195 with the bindep "fallback" > packages list, so we're going to need to decide how those two > efforts will be sequenced, or whether we want to combine them into a > single (and likely doubly-painful) event. > -- > Jeremy Stanley From fungi at yuggoth.org Tue Jun 4 22:20:16 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 4 Jun 2019 22:20:16 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604220727.GB32715@localhost.localdomain> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: <20190604222016.2cyx6ghdlk6oamqp@yuggoth.org> On 2019-06-04 18:07:27 -0400 (-0400), Paul Belanger wrote: [...] > Testing was done, you can see that in > https://review.opendev.org/513506/. However the issue was, at the time, > projects that were using tools/tox_install.sh would break (I have no > idea is that is still the case). > > For humans interested, > https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the > etherpad to capture this work. Aha! I missed the breadcrumbs which led to those, though I'll admit to only having performed a cursory grep through the relevant repo histories. > Eventually I ended up abandoning the patch, because I wasn't able to > keep pushing on it. [...] Happy to start pushing that boulder uphill again, and thanks for paving the way the first time! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From cboylan at sapwetik.org Tue Jun 4 22:45:58 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 04 Jun 2019 15:45:58 -0700 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs Message-ID: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> As part of our transition to Zuulv3 a year and a half ago, we carried over some compatibility tooling that we would now like to clean up. Specifically, we add a zuul-cloner (which went away in zuulv3) shim and set a global bindep fallback file value in all jobs. Zuulv3 native jobs are expected to use the repos zuul has precloned for you (no zuul-cloner required) as well as supply an in repo bindep.txt (or specify a bindep.txt path or install packages via some other method). This means that we should be able to remove both of these items from the non legacy base job in OpenDev's zuul. The legacy base job will continue to carry these for you so that you can write new native jobs over time. We have two changes [0][1] ready to go for this; however, due to the potential for disruption we would like to give everyone some time to test and prepare for this change. Fungi has a change to base-test [2] which will remove the zuul-cloner shim. Once this is in you can push "Do Not Merge" changes to your zuul config that reparent your tests from "base" to "base-test" and that will run the jobs without the zuul-cloner shim. Testing the bindep fallback removal is a bit more difficult as we set that in zuul's server config globally. What you can do is check your jobs' job-output.txt log files for usage of "bindep-fallback.txt". Our current plan is to merge these changes on June 24, 2019. We will be around to help debug any unexpected issues that come up. Jobs can be updated to use the "legacy-base" base job instead of the "base" base job if they need to be reverted to the old behavior quickly. Finally, Fungi did some excellent spelunking through history to understand how we got here. If you are curious you can find more details at: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006881.html. [0] https://review.opendev.org/656195 [1] https://review.opendev.org/663151 [2] https://review.opendev.org/663135 Clark From kecarter at redhat.com Tue Jun 4 23:51:00 2019 From: kecarter at redhat.com (Kevin Carter) Date: Tue, 4 Jun 2019 18:51:00 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: In doing a brief audit of the `tripleo-ansible` landscape it seems like we have many repositories [0] with little to no activity (most being simple role shells) [1]. While these repositories seem well intentioned, I'm not really sure we need them. All of these roles fall under the `tripleo-ansible` acls and, in my opinion, are at odds with the initial stated goal: > ... co-locating all of the Ansible tasks/roles/libraries/plugins throughout the code base into a single purpose-built repository ... While I can see a place for some of these roles and could rationalize building independent, reusable, repositories I don't think we're anywhere near ready for that at this time. I also believe that when where we're ready to begin building independent role repositories we should do so collaboratively; working with the likes of Infra, OpenStack-Ansible, Kolla, Airship, and anyone else who wants to contribute. So the questions at hand are: what, if anything, should we do with these repositories? Should we retire them or just ignore them? Is there anyone using any of the roles? [0] - https://opendev.org/openstack/project-config/src/commit/a12c6b531f58aaf9c838299cc0f2abc8c9ee9f40/gerrit/projects.yaml#L891-L1060= [1] - https://review.opendev.org/#/q/project:%255Eopenstack/ansible-role-tripleo.*+status:open Kevin Carter IRC: cloudnull On Mon, Jun 3, 2019 at 11:27 AM Kevin Carter wrote: > Hello Stackers, > > I wanted to follow up on this post from last year, pick up from where it > left off, and bring together a squad to get things moving. > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-August/133801.html > > The effort to convert tripleo Puppet and heat templates with embedded > Ansible to a more consumable set of playbooks and roles is in full effect. > As we're working through this effort we believe co-locating all of the > Ansible tasks/roles/libraries/plugins throughout the code base into a > single purpose-built repository will assist us in streamlining and > simplifying. Structurally, at this time, most of tripleo will remain the > same. However, the inclusion of tripleo-Ansible will allow us to create > more focused solutions which are independently testable, much easier > understand, and simple to include into the current heat template deployment > methodologies. While a straight port of the existing Ansible tasks will not > be entirely possible, the goal of this ongoing effort will be zero impact > on our existing workflow and solutions. > > To reigniting this effort, I've put up a review to create a new > "transformation" squad[0] geared toward building the structure around > tripleo-ansible[1] and converting our current solutions into > roles/playbooks/libraries/plugins. Initially, we'll be focused on our > existing code base; however, long term, I believe it makes sense for this > squad to work across projects to breakdown deployment barriers for folks > using similar technologies. > > We're excited to get this effort rolling again and would love to work with > anyone and everyone throughout the community. If folks are interested in > this effort, please let us know. > > [0] - https://review.opendev.org/662763 > [1] - https://opendev.org/openstack/tripleo-ansible > -- > > Kevin Carter > IRC: cloudnull > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Wed Jun 5 06:36:44 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Wed, 05 Jun 2019 15:36:44 +0900 Subject: [tc][form][ptg] Summary for "help-most-needed" list & its Future Message-ID: <16b265aa91f.ef3c27c179554.6636309287366653079@ghanshyammann.com> Hello Everyone, We had many discussions on the help-most-needed list [1] & its future at Denver conference ( Joint leadership meeting, Forum and PTG). This time we are able to get volunteers from the board of directors, also new ideas to redefine the list and give another chance if we can get contributors to help from companies or universities. I am summarising all the discussions below, feel free to append/correct if I missed anything. * Joint Leadership Meeting: TC raised the topic of help-most-needed list to Board of Directors during the Joint leadership meeting and briefed the progress on adding the business value to each item in the list(thanks lance for working on those). TC raised the point of not getting help on those items which are there for many years. For example, many companies using designate and glance but there is no help from those companies in term of contributors. There are few ideas to market and publish this list on different platform and stakeholders. Alan gave a good idea to pick and publish one of the items in the foundation newsletter. For the first time, we had two volunteers from the Board of Directors, 1. Prakash Ramchandran 2. Allison Randal who will broadcast this list in companies and universities to get some contributors. Big thanks for helping on this. * Forum: 'Planning & Defining Structure for ‘Help most needed’ list' [2] We hosted forum sessions dedicated to further discussion on this 'help most needed’ list planning and future Structure. We discussed the ideas of defining the template for new entry and if there can be any exit criteria. There was mixed suggestion on exit criteria and how we make this list more effective. It is not easy to get help from companies at least now where many of the companies are reducing their upstream developers. Allison suggested that foundation staff and BoD are the good candidates to bridge between list and companies. She also suggested to reach out to professor instead of the student at OpenStack-interested universities and volunteers to reach out to OSU for that. Below are the action items we collected from that session. If you would like to volunteer for any of the unassigned AI, feel free to reach out to TC. Action Items: - (suggestion made by Alan Clark at joint leadership meeting) Pick a random item from the list and highlight it in the foundation newsletter (ttx) - (suggestion made by wendar) OSF Staff and BoD acting as matchmaker between list items and ecosystem companies who may be a good fit for pitching in (prakash, wendar) - make clear that we don't need full-time commitments, but we do need long-term commitments - gmann - pair up glance contribution needs with Edge WG participants who are interested in integrating/improving it to ensure it's also maintained - UC has a stronger connection to users/operators and could also help identify potential organizations who are interested in the long-term survival of these projects - (wendar)Identify professors at OpenStack-interested universities and give them project ideas. No point in advertising directly to students (post-doctorate programs in particular, not undergraduate programs) - adjust the business value sections to make it clear where these contributions impact the bottom line/profits arising from solving the problem statements (evrardjp asettle) - Reach user groups? - wendar will reach out to OSU * PTG: 'Evolution of help-most-needed list' [3] PTG discussion collected the good steps forward and working items to redefine/replan this list. The suggestion is to remove the cap on the number of items in the help needed list. It will be uncapped and will be renamed to something else. zaneb already proposed the renaming the entry template idea [4]. The list will be reiterated annually and new list obviously can have items from previous year list. Leveraging User Survey on grabbing information about project usage and what project companies contribute can provide good data to TC to decide the reach out companies for the help needed items. Below are the AI from PTG discussion- Action Items: - Update ML regarding forum session + discussion around uncapping list (gmann) - Suggest adding questions in user survey (ttx) - Rename "help most needed" to (zaneb, asettle) - Uncap the list & add governance docs surrounding annual resubmission for items (ttx, evrardjp) - Include a "completition/exit/etc criteria" - Include (or not?) SIGs in all of this (to be discussed in reviews) [1] https://governance.openstack.org/tc/reference/help-most-needed.html [2] https://etherpad.openstack.org/p/Den-forum-help-most-needed [3] https://etherpad.openstack.org/p/tc-train-ptg [4] https://review.opendev.org/#/c/657447/ - gmann From aj at suse.com Wed Jun 5 06:47:35 2019 From: aj at suse.com (Andreas Jaeger) Date: Wed, 5 Jun 2019 08:47:35 +0200 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: <20190604220727.GB32715@localhost.localdomain> References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: On 05/06/2019 00.07, Paul Belanger wrote: > Testing was done, you can see that in > https://review.opendev.org/513506/. However the issue was, at the time, > projects that were using tools/tox_install.sh would break (I have no > idea is that is still the case). I have a couple of changes open to remove the final tools/tox_install.sh files, see: https://review.opendev.org/#/q/status:open+++topic:tox-siblings There are a few more repos that didn't take my changes from last year which I abandoned in the mean time - and a few dead repos that I did not submit to when double checking today ;( Also, compute-hyperv and nova-blazar need https://review.opendev.org/663234 (requirements change) first. So, we should be pretty good if these changes get reviewed and merged, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From dangtrinhnt at gmail.com Wed Jun 5 07:19:53 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 5 Jun 2019 16:19:53 +0900 Subject: [telemetry] Meeting tomorrow cancelled Message-ID: Hi team, I'm leaving the office for vacation tomorrow so I will not able to hold the meeting. The next meeting will be on June 20th. Mean while, if you have any thing to discuss, please let me know or put it on the agenda [1]. Thanks, [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Wed Jun 5 07:36:57 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 5 Jun 2019 08:36:57 +0100 Subject: [scientific-sig] IRC Meeting today: CERN OpenStack Day, SDN, Secure computing and more Message-ID: <9A2AA1FB-285A-4BEC-BA95-22A9A67627BE@telfer.org> Hi all - We have a Scientific SIG IRC meeting today at 1100 UTC in channel #openstack-meeting. Everyone is welcome. Today’s agenda is online here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_5th_2019 If you’d like anything added, please let us know. Today we’ll do a roundup of last week’s excellent CERN OpenStack day, and follow up on the issues with south-bound coherency between Neutron and SDN. We’d also like to restart some discussions around secure computing environments - please come along with your experiences. Finally, we are looking for EMEA-region contributors to the research computing private/hybrid cloud advocacy study discussed at the PTG. Plenty to cover! Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 09:10:39 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:39 +0100 Subject: [kolla] Moving weekly IRC meetings to #openstack-kolla Message-ID: Hi, In the recent virtual PTG we agreed to move the weekly IRC meetings to the #openstack-kolla channel. This will take effect from today's meeting at 1500 UTC. Cheers, Mark From mark at stackhpc.com Wed Jun 5 09:10:53 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:53 +0100 Subject: [kolla] Feedback request: removing ceph deployment Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for deploying Ceph, as a way to improve the long term sustainability of the project. Ceph support in kolla does require maintenance, and is not the core focus of the project. There are other good tools for deploying Ceph (ceph-deploy, ceph-ansible), and Kolla Ansible supports integration with an external Ceph cluster deployed using these or other methods. To avoid leaving people in a difficult position, we would recommend a deployment tool (probably ceph-ansible), and provide an automated, tested and documented migration path to it. Here is a rough proposed schedule for removal: * Train: deprecate Ceph deployment, add CI tests for kolla-ansible with ceph-ansible * U: Obsolete Ceph deployment, provide migration path to ceph-ansible * V: Remove Ceph deployment Please provide feedback on this plan and how it will affect you. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg From mark at stackhpc.com Wed Jun 5 09:10:56 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:56 +0100 Subject: [kolla] Feedback request: removing OracleLinux support Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for Oracle Linux, as a way to improve the long term sustainability of the project. Since (from afar) OracleLinux is very similar to CentOS, it does not require too much maintenance, however it is non-zero and does consume CI resources. Contributors from Oracle left the community some time ago, and we do not generally see Oracle Linux in bug reports, so must assume it is not well used. We propose dropping support for OracleLinux in the Train cycle. If this will affect you and you would like to help maintain it, please get in touch. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg From mark at stackhpc.com Wed Jun 5 09:10:59 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:10:59 +0100 Subject: [kolla] Feedback request: removing kolla-cli Message-ID: Hi, We discussed during the kolla virtual PTG [1] the option of removing support for the kolla-cli deliverable [2], as a way to improve the long term sustainability of the project. kolla-cli was a project started by Oracle, and accepted as a kolla deliverable. While it looks interesting and potentially useful, it never gained much traction (as far as I'm aware) and the maintainers left the community. We have never released it and CI has been failing for some time. We propose dropping support for kolla-cli in the Train cycle. If this will affect you and you would like to help maintain it, please get in touch. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://github.com/openstack/kolla-cli From mark at stackhpc.com Wed Jun 5 09:11:03 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:03 +0100 Subject: [kolla][tripleo] Podman/buildah Message-ID: Hi, At the Denver forum during the kolla feedback session [1], the topic of support for podman [2] and buildah [3] was raised. RHEL/CentOS 8 removes support for Docker [4], although presumably it will still be possible to pull down an RPM from https://download.docker.com/linux/. Currently both kolla and kolla-ansible interact with docker via the docker python module [5], which interacts with the docker daemon API. Given that podman and buildah are not daemons, I would expect the usage model to be a bit different. There is a python API for podman [6] which we might be able to use. I understand that Tripleo uses buildah to build images already (please correct me if I'm wrong). How is this achieved with kolla? Perhaps using 'kolla-build --template-only' to generate Dockerfiles then invoking buildah separately? Are you planning to work on adding buildah support to kolla itself? For running services, Tripleo uses Paunch [7] to abstract away the container engine, and it appears Podman support was added here - building CLI argument strings rather than via a python API [8]. For anyone using kolla/kolla-ansible, please provide feedback on how useful/necessary this would be to you. Thanks, Mark [1] https://etherpad.openstack.org/p/DEN-train-kolla-feedback [2] https://podman.io/ [3] https://buildah.io/ [4] https://access.redhat.com/solutions/3696691 [5] https://github.com/docker/docker-py [6] https://github.com/containers/python-podman [7] https://docs.openstack.org/developer/paunch/readme.html [8] https://github.com/openstack/paunch/blob/ecc2047b2ec5eaf39cce119abe1678ac19139d79/paunch/builder/podman.py From mark at stackhpc.com Wed Jun 5 09:11:06 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:06 +0100 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 Message-ID: Hi, At the recent kolla virtual PTG [1], we discussed the move to python 3 images in the Train cycle. hrw has started this effort for Ubuntu/Debian source images [2] and is making good progress. Next we will need to consider CentOS and RHEL. It seems that for Train RDO will provide only python 3 packages with support for CentOS 8 [3]. There may be some overlap in the trunk (master) packages where there is support for both CentOS 7 and 8. We will therefore need to combine the switch to python 3 with a switch to a CentOS/RHEL 8 base image. Some work was started during the Stein cycle to support RHEL 8 images with python 3 packages. There will no doubt be a few scripts that need updating to complete this work. We'll also need to test to ensure that both binary and source images work in this new world. Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this cycle? Are you planning to continue the work started in kolla during the Stein release? Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 [4] https://review.opendev.org/#/c/632156/ From mark at stackhpc.com Wed Jun 5 09:11:12 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:12 +0100 Subject: [nova][kolla][openstack-ansible][tripleo] Cells v2 upgrades Message-ID: Hi, At the recent kolla virtual PTG [1] we had a good discussion about adding support for multiple nova cells in kolla-ansible. We agreed a key requirement is to be able to perform operations on one or more cells without affecting the rest for damage limitation. This also seems like it would apply to upgrades. We're seeking input on ordering. Looking at the nova upgrade guide [2] I might propose something like this: 1. DB syncs 2. Upgrade API, super conductor For each cell: 3a. Upgrade cell conductor 3b. Upgrade cell computes 4. SIGHUP all services 5. Run online migrations At some point in here we also need to run the upgrade check. Presumably between steps 1 and 2? It would be great to get feedback both from the nova team and anyone running cells Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://docs.openstack.org/nova/latest/user/upgrade.html From mark at stackhpc.com Wed Jun 5 09:11:14 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:14 +0100 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status Message-ID: Hi, The Kayobe project [1] seeks to become an official OpenStack project during the Train cycle. Kayobe is a deployment tool that uses Kolla Ansible and Bifrost to deploy a containerised OpenStack control plane to bare metal. The project was started 2 years ago to fill in some gaps in Kolla Ansible, and has since been used for a number of production deployments. It's frequently deployed in Scientific Computing environments, but is not limited to this. We ran a packed workshop on Kayobe at the Denver summit and got some great feedback, with many people agreeing that it makes Kolla Ansible easier to adopt in environments with no existing provisioning system. We use OpenStack development workflows, including IRC, the mailing list, opendev, zuul, etc. We see two options for becoming an official OpenStack project: 1. become a deliverable of the Kolla project 2. become an official top level OpenStack project Given the affinity with the Kolla project I feel that option 1 seems natural. However, I do not want to use influence as PTL to force this approach. There is currently only one person (me) who is a member of both core teams, although all kayobe cores are active in the Kolla community. I would not expect core memberships to change, although we would probably end up combining IRC channels, meetings and design sessions. I would hope that combining these communities would be to the benefit of both. Please provide feedback on this matter - whether positive or negative. Thanks, Mark [1] http://kayobe.readthedocs.io From mark at stackhpc.com Wed Jun 5 09:11:19 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 10:11:19 +0100 Subject: [kolla] Priorities for the Train cycle Message-ID: Hi, Thanks to those who attended the recent kolla virtual PTG [1]. We had some good technical discussions but did not get to a topic that I was keen to cover - priorities for the Train cycle. As a community of volunteers it can be difficult to guide the project in a particular direction, but agreeing on some priorities can help us to focus our efforts - both with reviews and development. I have seen this work well in the ironic team. Based on our recent discussions and knowledge of community goals, I compiled a list of candidate work items [2]. If you are involved with the kolla community (operator, developer, core, etc.), please vote on these priorities to indicate what you think we should focus on in the Train cycle. I will leave this open for a week, then order the items. At this point we can try to assign an owner to each. Thanks, Mark [1] https://etherpad.openstack.org/p/kolla-train-ptg [2] https://etherpad.openstack.org/p/kolla-train-priorities From marcin.juszkiewicz at linaro.org Wed Jun 5 09:33:20 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 5 Jun 2019 11:33:20 +0200 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: <56d693a4-01f5-0865-3948-0974b14f1dab@linaro.org> W dniu 05.06.2019 o 11:11, Mark Goddard pisze: > The Kayobe project [1] seeks to become an official OpenStack project > during the Train cycle. > We see two options for becoming an official OpenStack project: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > Please provide feedback on this matter - whether positive or negative. As Kolla core I support both options. If Kayobe became kolla/kayobe then I will be fine. Similar with openstack/kayobe project. From marcin.juszkiewicz at linaro.org Wed Jun 5 09:42:27 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 5 Jun 2019 11:42:27 +0200 Subject: [kolla] Feedback request: removing OracleLinux support In-Reply-To: References: Message-ID: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> W dniu 05.06.2019 o 11:10, Mark Goddard pisze: > We propose dropping support for OracleLinux in the Train cycle. If > this will affect you and you would like to help maintain it, please > get in touch. First we drop it from CI. Then (IMHO) it will be removed once we move to CentOS 8. From luka.peschke at objectif-libre.com Wed Jun 5 12:17:35 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Wed, 05 Jun 2019 14:17:35 +0200 Subject: [cloudkitty] Core team updates Message-ID: Hi all, I'd like to propose some updates to the CloudKitty core team: * First of all I'd like to welcome Justin Ferrieu (jferrieu on IRC) to the core team. He's been around contributing (mostly on the Prometheus collector and fetcher) and reviewing a lot for the last two releases (https://www.stackalytics.com/report/contribution/cloudkitty/90). It would be great if he had +2/+A power. * Some cores have been inactive for a long time. For now, Pierre-Alexandre Bardina can be removed from the core team, and I've reached out to the other inactive cores. We'll wait a bit for a reply before we proceed. Of course, if these people want to contribute again in the future, we'd be glad to welcome them back in the core team. Thanks to all contributors! Cheers, -- Luka Peschke From emilien at redhat.com Wed Jun 5 12:28:18 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 08:28:18 -0400 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: [...] > I understand that Tripleo uses buildah to build images already (please > correct me if I'm wrong). How is this achieved with kolla? Perhaps > using 'kolla-build --template-only' to generate Dockerfiles then > invoking buildah separately? Are you planning to work on adding > buildah support to kolla itself? > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build containers. We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last time I checked). I wrote a blog post about it a while ago: https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From christophe.sauthier at objectif-libre.com Wed Jun 5 12:29:32 2019 From: christophe.sauthier at objectif-libre.com (Christophe Sauthier) Date: Wed, 05 Jun 2019 14:29:32 +0200 Subject: [cloudkitty] Core team updates In-Reply-To: References: Message-ID: Hello It is a great +1 from me to welcome Justin ! He's doing a great work on the team both commiting and reviewing stuffs. Thanks ! Christophe ---- Christophe Sauthier Directeur Général Objectif Libre : Au service de votre Cloud +33 (0) 6 16 98 63 96 | christophe.sauthier at objectif-libre.com https://www.objectif-libre.com | @objectiflibre Recevez la Pause Cloud Et DevOps : https://olib.re/abo-pause Le 2019-06-05 14:17, Luka Peschke a écrit : > Hi all, > > I'd like to propose some updates to the CloudKitty core team: > > * First of all I'd like to welcome Justin Ferrieu (jferrieu on IRC) > to the core team. He's been around contributing (mostly on the > Prometheus collector and fetcher) and reviewing a lot for the last two > releases > (https://www.stackalytics.com/report/contribution/cloudkitty/90). It > would be great if he had +2/+A power. > > * Some cores have been inactive for a long time. For now, > Pierre-Alexandre Bardina can be removed from the core team, and I've > reached out to the other inactive cores. We'll wait a bit for a reply > before we proceed. Of course, if these people want to contribute again > in the future, we'd be glad to welcome them back in the core team. > > Thanks to all contributors! > > Cheers, From doka.ua at gmx.com Wed Jun 5 12:34:52 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Wed, 5 Jun 2019 15:34:52 +0300 Subject: [glance] zeroing image, preserving other parameters Message-ID: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> Dear colleagues, for some reasons, I need to shrink image size to zero (freeing storage as well), while keeping this record in Glance database. First which come to my mind is to delete image and then create new one with same name/uuid/... and --file /dev/null, but this is impossible because Glance don't really delete records from database, marking them as 'deleted' instead. Next try was to use glance image-upload from /dev/null, but this is also prohibited with message "409 Conflict: Image status transition from [activated, deactivated] to saving is not allowed (HTTP 409)" I found https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's "image_destroy" but have no clues on how to access this API. Is it kind of library or kind of REST API, how to access it and whether it's safe to use it in terms of longevity and compatibility between versions? Or, may be, you can advise any other methods to solve the problem of zeroing glance image data / freeing storage, while keeping in database just a record about this image? Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From dpeacock at redhat.com Wed Jun 5 12:45:42 2019 From: dpeacock at redhat.com (David Peacock) Date: Wed, 5 Jun 2019 08:45:42 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: > So the questions at hand are: what, if anything, should we do with these > repositories? Should we retire them or just ignore them? Is there anyone > using any of the roles? > My initial reaction was to suggest we just ignore them, but on second thought I'm wondering if there is anything negative if we leave them lying around. Unless we're going to benefit from them in the future if we start actively working in these repos, they represent obfuscation and debt, so it might be best to retire / dispose of them. David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 12:47:07 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 13:47:07 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > [...] >> >> I understand that Tripleo uses buildah to build images already (please >> correct me if I'm wrong). How is this achieved with kolla? Perhaps >> using 'kolla-build --template-only' to generate Dockerfiles then >> invoking buildah separately? Are you planning to work on adding >> buildah support to kolla itself? > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build containers. > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last time I checked). > > I wrote a blog post about it a while ago: > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ Thanks for following up. It wouldn't be a trivial change to add buildah support in kolla, but it would have saved reimplementing the task parallelisation in Tripleo and would benefit others too. Never mind. > -- > Emilien Macchi From guoyongxhzhf at 163.com Wed Jun 5 12:52:28 2019 From: guoyongxhzhf at 163.com (=?GBK?B?ufnTwg==?=) Date: Wed, 5 Jun 2019 20:52:28 +0800 (CST) Subject: [airship] Is Ironic ready for Airship? Message-ID: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> I know Airship choose Maas as bare mental management tool. I want to know whether Maas is more suitable for Airship when it comes to under-infrastructure? If Maas is more suitable, then what feature should ironic develop? Thanks for your reply -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed Jun 5 14:31:56 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 10:31:56 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO Message-ID: Kamil has been working on TripleO for a while now and is providing really insightful reviews, specially on Python best practices but not only; he is one of the major contributors of the OVN integration, which was a ton of work. I believe he has the right knowledge to review any TripleO patch and provide excellent reviews in our project. We're lucky to have him with us in the team! I would like to propose him core on TripleO, please raise any objection if needed. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Wed Jun 5 14:37:33 2019 From: smooney at redhat.com (Sean Mooney) Date: Wed, 05 Jun 2019 15:37:33 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > > [...] > > > > > > I understand that Tripleo uses buildah to build images already (please > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > invoking buildah separately? Are you planning to work on adding > > > buildah support to kolla itself? > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build > > containers. > > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last > > time I checked). > > > > I wrote a blog post about it a while ago: > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. actully im not sure about that buildah should actully be pretty simple to add support for. its been a while but we looksed at swaping out the building with a python script a few years ago https://review.opendev.org/#/c/503882/ and it really did not take that much to enable so simply invoking buildah in a simlar maner should be trivail. podman support will be harder but again we confied all interaction with docker in kolla-ansibel to be via https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py so we sould jsut need to write a similar module that would work with podman and then select the correct one to use. the interface is a little large but it shoudld reactively mechanical to implement podman supprot. > > > -- > > Emilien Macchi > > From alexandre.arents at corp.ovh.com Wed Jun 5 14:38:12 2019 From: alexandre.arents at corp.ovh.com (Alexandre Arents) Date: Wed, 5 Jun 2019 16:38:12 +0200 Subject: [ops][nova][placement] NUMA topology vs non-NUMA workloads Message-ID: <20190605143812.4wqzswzr2xnbe6dp@corp.ovh.com> >From OVH point of view, We do not plan for now to mix NUMA aware and NUMA unaware workload on same compute. So you can go ahead without "can_split" feature if it helps. Alex >This message is primarily addressed at operators, and of those, >operators who are interested in effectively managing and mixing >workloads that care about NUMA with workloads that do not. There are >some questions within, after some background to explain the issue. > >At the PTG, Nova and Placement developers made a commitment to more >effectively manage NUMA topologies within Nova and Placement. On the >placement side this resulted in a spec which proposed several >features that would enable more expressive queries when requesting >allocation candidates (places for workloads to go), resulting in >fewer late scheduling failures. > >At first there was one spec that discussed all the features. This >morning it was split in two because one of the features is proving >hard to resolve. Those two specs can be found at: > >* https://review.opendev.org/658510 (has all the original discussion) >* https://review.opendev.org/662191 (the less contentious features split out) > >After much discussion, we would prefer to not do the feature >discussed in 658510. Called 'can_split', it would allow specified >classes of resource (notably VCPU and memory) to be split across >multiple numa nodes when each node can only contribute a portion of >the required resources and where those resources are modelled as >inventory on the NUMA nodes, not the host at large. > >While this is a good idea in principle it turns out (see the spec) >to cause many issues that require changes throughout the ecosystem, >for example enforcing pinned cpus for workloads that would normally >float. It's possible to make the changes, but it would require >additional contributors to join the effort, both in terms of writing >the code and understanding the many issues. > >So the questions: > >* How important, in your cloud, is it to co-locate guests needing a > NUMA topology with guests that do not? A review of documentation > (upstream and vendor) shows differing levels of recommendation on > this, but in many cases the recommendation is to not do it. > >* If your answer to the above is "we must be able to do that": How > important is it that your cloud be able to pack workloads as tight > as possible? That is: If there are two NUMA nodes and each has 2 > VCPU free, should a 4 VCPU demanding non-NUMA workload be able to > land there? Or would you prefer that not happen? > >* If the answer to the first question is "we can get by without > that" is it satisfactory to be able to configure some hosts as NUMA > aware and others as not, as described in the "NUMA topology with > RPs" spec [1]? In this set up some non-NUMA workloads could end up > on a NUMA host (unless otherwise excluded by traits or aggregates), > but only when there was contiguous resource available. > >This latter question articulates the current plan unless responses >to this message indicate it simply can't work or legions of >assistance shows up. Note that even if we don't do can_split, we'll >still be enabling significant progress with the other features >described in the second spec [2]. > >Thanks for your help in moving us in the right direction. > >[1] https://review.opendev.org/552924 >[2] https://review.opendev.org/662191 >-- >Chris Dent ٩◔̯◔۶ https://anticdent.org/ >freenode: cdent -- Alexandre Arents From rosmaita.fossdev at gmail.com Wed Jun 5 14:38:53 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 5 Jun 2019 10:38:53 -0400 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> Message-ID: <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> On 6/5/19 8:34 AM, Volodymyr Litovka wrote: > Dear colleagues, > > for some reasons, I need to shrink image size to zero (freeing storage > as well), while keeping this record in Glance database. > > First which come to my mind is to delete image and then create new one > with same name/uuid/... and --file /dev/null, but this is impossible > because Glance don't really delete records from database, marking them > as 'deleted' instead. The glance-manage utility program allows you to purge the database. The images table (where the image UUIDs are stored) is not purged by default because of OSSN-0075 [0]. See the glance docs [1] for details. [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 [1] https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance (That doesn't really help your issue, I just wanted to point out that there is a way to purge the database.) > Next try was to use glance image-upload from /dev/null, but this is also > prohibited with message "409 Conflict: Image status transition from > [activated, deactivated] to saving is not allowed (HTTP 409)" That's correct, Glance will not allow you to replace the image data once an image has gone to 'active' status. > I found > https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's > > "image_destroy" but have no clues on how to access this API. Is it kind > of library or kind of REST API, how to access it and whether it's safe > to use it in terms of longevity and compatibility between versions? The title of that document is misleading. It describes the interface that Glance developers can use when they need to interact with the database. There's no tool that exposes those operations to operators. > Or, may be, you can advise any other methods to solve the problem of > zeroing glance image data / freeing storage, while keeping in database > just a record about this image? If you purged the database, you could do your proposal to recreate the image with a zero-size file -- but that would give you an image with status 'active' that an end user could try to boot an instance with. I don't think that's a good idea. Additionally, purging the images table of all UUIDs, not just the few you want to replace, exposes you to OSSN-0075. An alternative--and I'm not sure this is a good idea either--would be to deactivate the image [2]. This would preserve all the current metadata but not allow the image to be downloaded by a non-administrator. With the image not in 'active' status, nova or cinder won't try to use it to create instances or volumes. The image data would still exist, though, so you'd need to delete it manually from the backend to really clear out the space. Additionally, the image size would remain, which might be useful for record-keeping, although on the other hand, it will still count against the user_storage_quota. And the image locations will still exist even though they won't refer to any existing data any more. (Like I said, I'm not sure this is a good idea.) [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image > Thank you. Not sure I was much help. Let's see if other operators have a good workaround or a need for this kind of functionality. > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison > > From emilien at redhat.com Wed Jun 5 14:48:02 2019 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 5 Jun 2019 10:48:02 -0400 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From aschultz at redhat.com Wed Jun 5 14:59:24 2019 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 5 Jun 2019 08:59:24 -0600 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 8:46 AM Sean Mooney wrote: > On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi wrote: > > > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard wrote: > > > [...] > > > > > > > > I understand that Tripleo uses buildah to build images already > (please > > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > > invoking buildah separately? Are you planning to work on adding > > > > buildah support to kolla itself? > > > > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then > call Buildah from tripleoclient to build > > > containers. > > > We have not planned (yet) to port that workflow to Kolla, which would > involve some refacto in the build code (last > > > time I checked). > > > > > > I wrote a blog post about it a while ago: > > > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > > > Thanks for following up. It wouldn't be a trivial change to add > > buildah support in kolla, but it would have saved reimplementing the > > task parallelisation in Tripleo and would benefit others too. Never > > mind. > actully im not sure about that buildah should actully be pretty simple to > add support for. > its been a while but we looksed at swaping out the building with a python > script a few years ago > https://review.opendev.org/#/c/503882/ > and it really did not take that much to enable so simply invoking buildah > in a simlar maner should be trivail. > > The issue was trying to build the appropriate parallelization logic based on the kolla container build order[0]. We're using the --list-dependencies to get the ordering for the build[1] and then run it through our builder[2]. You wouldn't want to do it serially because it's dramatically slower. Our buildah builder is only slightly slower than the docker one at this point. > podman support will be harder but again we confied all interaction with > docker in kolla-ansibel to be via > > https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py > so we sould jsut need to write a similar module that would work with > podman and then select the correct one to use. > the interface is a little large but it shoudld reactively mechanical to > implement podman supprot. > The podman support is a bit more complex because there is no daemon associated with it. We wrote systemd/podman support into paunch[3] for us to handle the management of the life cycles of the containers. We'd like to investigate switching our invocation of paunch from cli to an ansible plugin/module which might be beneficial for kolla-ansible as well. Thanks, -Alex [0] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/builder/buildah.py#L156 [1] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/kolla_builder.py#L496 [2] https://opendev.org/openstack/python-tripleoclient/src/branch/master/tripleoclient/v1/container_image.py#L207-L228 [3] https://opendev.org/openstack/paunch > > > > -- > > > Emilien Macchi > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aj at suse.com Wed Jun 5 15:20:37 2019 From: aj at suse.com (Andreas Jaeger) Date: Wed, 5 Jun 2019 17:20:37 +0200 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: On 05/06/2019 08.47, Andreas Jaeger wrote: > On 05/06/2019 00.07, Paul Belanger wrote: >> Testing was done, you can see that in >> https://review.opendev.org/513506/. However the issue was, at the time, >> projects that were using tools/tox_install.sh would break (I have no >> idea is that is still the case). > > I have a couple of changes open to remove the final tools/tox_install.sh > files, see: > > https://review.opendev.org/#/q/status:open+++topic:tox-siblings > > > There are a few more repos that didn't take my changes from last year > which I abandoned in the mean time - and a few dead repos that I did not > submit to when double checking today ;( > > Also, compute-hyperv and nova-blazar need > https://review.opendev.org/663234 (requirements change) first. That one has a -2 now. ;( I won't be able to work on alternative solutions and neither can access whether this blocks the changes. Anybody to take this over, please? > So, we should be pretty good if these changes get reviewed and merged, Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From openstack at nemebean.com Wed Jun 5 15:28:35 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 5 Jun 2019 10:28:35 -0500 Subject: [oslo] Bandit Strategy In-Reply-To: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> Message-ID: Since it seems we need to backport this to the stable branches, I've added stable branch columns to https://ethercalc.openstack.org/ml1qj9xrnyfg I know some backports have already been proposed, so if people can fill in the appropriate columns that would help avoid unnecessary work on projects that are already done. Hopefully these will be clean backports, but I know at least one included a change to requirements.txt too. We'll need to make sure we don't accidentally backport any of those or we won't be able to release the stable branches. As discussed in the meeting this week, we're only planning to backport to the active branches. The em branches can be updated if necessary, but we don't need to do a mass backport to them. I think that's it. Let me know if you have any comments or questions. Thanks. -Ben On 5/13/19 12:23 PM, Ben Nemec wrote: > Nefarious cap bandits are running amok in the OpenStack community! Won't > someone take a stand against these villainous headwear thieves?! > > Oh, sorry, just pasted the elevator pitch for my new novel. ;-) > > Actually, this email is to summarize the plan we came up with in the > Oslo meeting this morning. Since we have a bunch of projects affected by > the Bandit breakage I wanted to make sure we had a common fix so we > don't have a bunch of slightly different approaches in each project. The > plan we agreed on in the meeting was to push a two patch series to each > repo - one to cap bandit <1.6.0 and one to uncap it with a !=1.6.0 > exclusion. The first should be merged immediately to unblock ci, and the > latter can be rechecked once bandit 1.6.1 releases to verify that it > fixes the problem for us. > > We chose this approach instead of just tweaking the exclusion in tox.ini > because it's not clear that the current behavior will continue once > Bandit fixes the bug. Assuming they restore the old behavior, this > should require the least churn in our repos and means we're still > compatible with older versions that people may already have installed. > > I started pushing patches under > https://review.opendev.org/#/q/topic:cap-bandit (which prompted the > digression to start this email ;-) to implement this plan. This is > mostly intended to be informational, but if you have any concerns with > the plan above please do let us know immediately. > > Thanks. > > -Ben > From Kevin.Fox at pnnl.gov Wed Jun 5 15:31:26 2019 From: Kevin.Fox at pnnl.gov (Fox, Kevin M) Date: Wed, 5 Jun 2019 15:31:26 +0000 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: , Message-ID: <1A3C52DFCD06494D8528644858247BF01C359A88@EX10MBOX03.pnnl.gov> Whats the plan for Kubernetes integration at this point? I keep seeing more and more talk/work on integrating podman and paunch and such but its a lot of work that doesn't apply when switching to Kubernetes? Thanks, Kevin ________________________________ From: Alex Schultz [aschultz at redhat.com] Sent: Wednesday, June 05, 2019 7:59 AM To: Sean Mooney Cc: Mark Goddard; Emilien Macchi; openstack-discuss Subject: Re: [kolla][tripleo] Podman/buildah On Wed, Jun 5, 2019 at 8:46 AM Sean Mooney > wrote: On Wed, 2019-06-05 at 13:47 +0100, Mark Goddard wrote: > On Wed, 5 Jun 2019 at 13:28, Emilien Macchi > wrote: > > > > On Wed, Jun 5, 2019 at 5:21 AM Mark Goddard > wrote: > > [...] > > > > > > I understand that Tripleo uses buildah to build images already (please > > > correct me if I'm wrong). How is this achieved with kolla? Perhaps > > > using 'kolla-build --template-only' to generate Dockerfiles then > > > invoking buildah separately? Are you planning to work on adding > > > buildah support to kolla itself? > > > > > > That's what we did indeed, we use Kolla to generate Dockerfiles, then call Buildah from tripleoclient to build > > containers. > > We have not planned (yet) to port that workflow to Kolla, which would involve some refacto in the build code (last > > time I checked). > > > > I wrote a blog post about it a while ago: > > https://my1.fr/blog/openstack-containerization-with-podman-part-5-image-build/ > > Thanks for following up. It wouldn't be a trivial change to add > buildah support in kolla, but it would have saved reimplementing the > task parallelisation in Tripleo and would benefit others too. Never > mind. actully im not sure about that buildah should actully be pretty simple to add support for. its been a while but we looksed at swaping out the building with a python script a few years ago https://review.opendev.org/#/c/503882/ and it really did not take that much to enable so simply invoking buildah in a simlar maner should be trivail. The issue was trying to build the appropriate parallelization logic based on the kolla container build order[0]. We're using the --list-dependencies to get the ordering for the build[1] and then run it through our builder[2]. You wouldn't want to do it serially because it's dramatically slower. Our buildah builder is only slightly slower than the docker one at this point. podman support will be harder but again we confied all interaction with docker in kolla-ansibel to be via https://github.com/openstack/kolla-ansible/blob/master/ansible/library/kolla_docker.py so we sould jsut need to write a similar module that would work with podman and then select the correct one to use. the interface is a little large but it shoudld reactively mechanical to implement podman supprot. The podman support is a bit more complex because there is no daemon associated with it. We wrote systemd/podman support into paunch[3] for us to handle the management of the life cycles of the containers. We'd like to investigate switching our invocation of paunch from cli to an ansible plugin/module which might be beneficial for kolla-ansible as well. Thanks, -Alex [0] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/builder/buildah.py#L156 [1] https://opendev.org/openstack/tripleo-common/src/branch/master/tripleo_common/image/kolla_builder.py#L496 [2] https://opendev.org/openstack/python-tripleoclient/src/branch/master/tripleoclient/v1/container_image.py#L207-L228 [3] https://opendev.org/openstack/paunch > > > -- > > Emilien Macchi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jun 5 15:40:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 15:40:57 +0000 Subject: [infra] fetch-zuul-cloner and permissions (was: redefining devstack) In-Reply-To: References: <352C466C-0DA8-41DF-BACB-8FF2A00E6A47@redhat.com> <394bfb9a-be5d-4360-ad02-3082aede9361@www.fastmail.com> <20190604154726.vp2gdtmh5mjrtqxc@yuggoth.org> <28c5d62e-2148-f1d0-4d93-278f2dac1d49@ham.ie> <20190604173241.3r22gjulzwuvihbk@yuggoth.org> <20190604220727.GB32715@localhost.localdomain> Message-ID: <20190605154056.qpcrwa3jppgifece@yuggoth.org> On 2019-06-05 17:20:37 +0200 (+0200), Andreas Jaeger wrote: > On 05/06/2019 08.47, Andreas Jaeger wrote: [...] > > There are a few more repos that didn't take my changes from last year > > which I abandoned in the mean time - and a few dead repos that I did not > > submit to when double checking today ;( > > > > Also, compute-hyperv and nova-blazar need > > https://review.opendev.org/663234 (requirements change) first. > > That one has a -2 now. ;( > > I won't be able to work on alternative solutions and neither can access > whether this blocks the changes. Anybody to take this over, please? [...] It should be the responsibility of the compute-hyperv and nova-blazar maintainers to solve this problem, though your attempts to help them with a possible solution have been admirable. Thanks for this, and for all the others which did get merged already! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From aschultz at redhat.com Wed Jun 5 15:41:37 2019 From: aschultz at redhat.com (Alex Schultz) Date: Wed, 5 Jun 2019 09:41:37 -0600 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 3:19 AM Mark Goddard wrote: > Hi, > > At the recent kolla virtual PTG [1], we discussed the move to python 3 > images in the Train cycle. hrw has started this effort for > Ubuntu/Debian source images [2] and is making good progress. > > Next we will need to consider CentOS and RHEL. It seems that for Train > RDO will provide only python 3 packages with support for CentOS 8 [3]. > There may be some overlap in the trunk (master) packages where there > is support for both CentOS 7 and 8. We will therefore need to combine > the switch to python 3 with a switch to a CentOS/RHEL 8 base image. > > Some work was started during the Stein cycle to support RHEL 8 images > with python 3 packages. There will no doubt be a few scripts that need > updating to complete this work. We'll also need to test to ensure that > both binary and source images work in this new world. > > When CentOS8 is available, we'll be working on that more with TripleO to ensure it's working and if there are issues we'll likely submit fixes as necessary. Currently https://review.opendev.org/#/c/632156/ should be the actual support for the python3 bits as currently required when using the RDO provided packages. We're not aware of any outstanding issues but if we run into them, then we will help as needed. We currently use kolla to generate the related Dockerfiles for building with RHEL8 and have posted the issues that we've run across so far. The related work for podman/buildah (if desired) is currently being discussed in a different thread. > Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this > cycle? Are you planning to continue the work started in kolla during > the Stein release? > As mentioned, we're not currently aware of any outstanding issues around this so as of Stein, the python3 related packages (when available) combined with an 8 based base image+repos should work. > > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 > [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 > [4] https://review.opendev.org/#/c/632156/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed Jun 5 15:49:27 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 15:49:27 +0000 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> Message-ID: <20190605154927.nskpuosejx2of6rp@yuggoth.org> On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > Since it seems we need to backport this to the stable branches [...] You've probably been following along, but a fix for https://github.com/PyCQA/bandit/issues/488 was merged upstream on May 26, so now we're just waiting for a new release to be tagged. It may make sense to spend some time lobbying them to accelerate their release process if it means less time spent backporting exclusions to a bazillion projects. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From gsteinmuller at vexxhost.com Wed Jun 5 15:57:45 2019 From: gsteinmuller at vexxhost.com (=?UTF-8?Q?Guilherme_Steinm=C3=BCller?=) Date: Wed, 5 Jun 2019 12:57:45 -0300 Subject: [horizon] dropping 2012.2 tag on pypi Message-ID: Hello, As we've discussed with nova tag recently [1], I'd suggest the same for horizon. When we search on pypi the version it shows is 2012.2 and when we click release history we can see that the most recent version is 15.1.0 [2] [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html [2] https://pypi.org/project/horizon/#history Regards, Guilherme Steinmuller -------------- next part -------------- An HTML attachment was scrubbed... URL: From hberaud at redhat.com Wed Jun 5 16:16:25 2019 From: hberaud at redhat.com (Herve Beraud) Date: Wed, 5 Jun 2019 18:16:25 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: <20190605154927.nskpuosejx2of6rp@yuggoth.org> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> Message-ID: I think that waiting the bandit release is a good idea Le mer. 5 juin 2019 à 17:54, Jeremy Stanley a écrit : > On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > > Since it seems we need to backport this to the stable branches > [...] > > You've probably been following along, but a fix for > https://github.com/PyCQA/bandit/issues/488 was merged upstream on > May 26, so now we're just waiting for a new release to be tagged. It > may make sense to spend some time lobbying them to accelerate their > release process if it means less time spent backporting exclusions > to a bazillion projects. > -- > Jeremy Stanley > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Wed Jun 5 16:27:09 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 5 Jun 2019 11:27:09 -0500 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> Message-ID: <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> Agreed. There's probably an argument that we should cap bandit on stable branches anyway, but it would save us a lot of tedious patches if we just hope bandit doesn't break us again. :-) On 6/5/19 11:16 AM, Herve Beraud wrote: > I think that waiting the bandit release is a good idea > > Le mer. 5 juin 2019 à 17:54, Jeremy Stanley > a écrit : > > On 2019-06-05 10:28:35 -0500 (-0500), Ben Nemec wrote: > > Since it seems we need to backport this to the stable branches > [...] > > You've probably been following along, but a fix for > https://github.com/PyCQA/bandit/issues/488 was merged upstream on > May 26, so now we're just waiting for a new release to be tagged. It > may make sense to spend some time lobbying them to accelerate their > release process if it means less time spent backporting exclusions > to a bazillion projects. > -- > Jeremy Stanley > > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > From lshort at redhat.com Wed Jun 5 16:27:17 2019 From: lshort at redhat.com (Luke Short) Date: Wed, 5 Jun 2019 12:27:17 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey everyone, For the upcoming work on focusing on more Ansible automation and testing, I have created a dedicated #tripleo-transformation channel for our new squad. Feel free to join if you are interested in joining and helping out! +1 to removing repositories we don't use, especially if they have no working code. I'd like to see the consolidation of TripleO specific things into the tripleo-ansible repository and then using upstream Ansible roles for all of the different services (nova, glance, cinder, etc.). Sincerely, Luke Short, RHCE Software Engineer, OpenStack Deployment Framework Red Hat, Inc. On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: > >> So the questions at hand are: what, if anything, should we do with these >> repositories? Should we retire them or just ignore them? Is there anyone >> using any of the roles? >> > > My initial reaction was to suggest we just ignore them, but on second > thought I'm wondering if there is anything negative if we leave them lying > around. Unless we're going to benefit from them in the future if we start > actively working in these repos, they represent obfuscation and debt, so it > might be best to retire / dispose of them. > > David > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 5 16:31:45 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 5 Jun 2019 17:31:45 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 15:48, Emilien Macchi wrote: > > On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: >> >> Thanks for following up. It wouldn't be a trivial change to add >> buildah support in kolla, but it would have saved reimplementing the >> task parallelisation in Tripleo and would benefit others too. Never >> mind. > > > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. > That's fair, sorry to grumble :) > It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. > -- > Emilien Macchi From fungi at yuggoth.org Wed Jun 5 16:32:56 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 16:32:56 +0000 Subject: [oslo] Bandit Strategy In-Reply-To: <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> Message-ID: <20190605163256.6muy2ooxwlmdissq@yuggoth.org> On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: > Agreed. There's probably an argument that we should cap bandit on > stable branches anyway, but it would save us a lot of tedious > patches if we just hope bandit doesn't break us again. :-) [...] Oh, yes, I think capping on stable is probably a fine idea regardless (we should be doing that anyway for all our static analyzers on principle). What I meant is that it would likely render those updates no longer urgent. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed Jun 5 16:35:52 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 5 Jun 2019 16:35:52 +0000 Subject: [horizon] dropping 2012.2 tag on pypi In-Reply-To: References: Message-ID: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> On 2019-06-05 12:57:45 -0300 (-0300), Guilherme Steinmüller wrote: > As we've discussed with nova tag recently [1], I'd suggest the same for > horizon. > > When we search on pypi the version it shows is 2012.2 and when we click > release history we can see that the most recent version is 15.1.0 [2] > > [1] > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html > [2] https://pypi.org/project/horizon/#history Thanks for pointing this out. Since we basically got blanket approval to do this for any official OpenStack project some years back, I've removed the 2012.2 from the horizon project on PyPI just now. If anybody spots others, please do mention them! -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From mthode at mthode.org Wed Jun 5 16:58:07 2019 From: mthode at mthode.org (Matthew Thode) Date: Wed, 5 Jun 2019 11:58:07 -0500 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190530151739.nfzrqfstlb2sbrq5@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> Message-ID: <20190605165807.jmhogmfyrxltx5b3@mthode.org> On 19-05-30 10:17:39, Matthew Thode wrote: > On 19-05-30 17:07:54, Michał Dulko wrote: > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > Openshift upstream is giving us difficulty as they are capping the > > > version of urllib3 and kubernetes we are using. > > > > > > -urllib3===1.25.3 > > > +urllib3===1.24.3 > > > -kubernetes===9.0.0 > > > +kubernetes===8.0.1 > > > > > > I've opened an issue with them but not had much luck there (and their > > > prefered solution just pushes the can down the road). > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > much. > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > function with that import). I'm not sure exactly what you are doing > > > with it but would it be too much to ask to move to something else? > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > calls, but obviously we prefer the client. If there's much support for > > getting rid of it, we can do the switch. > > > > Right now Kyryr is only using it in that one place and it's blocking the > update of urllib3 and kubernetes for the rest of openstack. So if it's > not too much trouble it'd be nice to have happen. > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > perhaps it's a false flag. > > > > > > Please let me know what you think > > > > Any updates on this? I'd like to move forward on removing the dependency if possible. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From jp.methot at planethoster.info Wed Jun 5 17:01:50 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Wed, 5 Jun 2019 13:01:50 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq Message-ID: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Hi, We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : =ERROR REPORT==== 5-Jun-2019::18:50:08 === closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): missed heartbeats from client, timeout: 60s The neutron-server logs show this error: 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: The relevant service version numbers are as follow: rabbitmq-server-3.6.5-1.el7.noarch openstack-neutron-12.0.6-1.el7.noarch python2-oslo-messaging-5.35.4-1.el7.noarch Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haleyb.dev at gmail.com Wed Jun 5 19:09:59 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Wed, 5 Jun 2019 15:09:59 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote: > Hi, > > We had a Pike openstack setup that we updated to Queens earlier this > week. It’s a 30 compute nodes infrastructure with 2 controller nodes and > 2 network nodes, using openvswitch for networking. Since we upgraded to > queens, neutron-server on the controller nodes has been unable to > contact the openvswitch-agents through rabbitmq. The rabbitmq is > clustered on both controller nodes and has been giving us the following > error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 > - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit > [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] > [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 > is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 > seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit > [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on > controller1:5672 is unreachable: error>. Trying again in 1 seconds.: RecoverableConnectionError: > Are there possibly any firewall rules getting in the way? Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong. As a test, does this connect? $ telnet controller1 5672 Trying $IP... Connected to controller1. Escape character is '^]'. -Brian > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a > high enough file limit. The login user and credentials are fine as they > are used in other openstack services which can contact rabbitmq without > issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing > timeouts in neutron services, etc, to no avail. I find myself at a loss > and would appreciate if anyone has any idea as to where to go from there. > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > From jp.methot at planethoster.info Wed Jun 5 19:31:32 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Wed, 5 Jun 2019 15:31:32 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> <99c58ab2-0b01-16d0-859d-afbde7dab3fb@gmail.com> Message-ID: Hi, Thank you for your reply. There’s no firewall. However, we ended up figuring out that we were running out of tcp sockets. On a related note, we are still having issues but only with metadata fed through Neutron. Seems that it’s nova-api refusing the connection with http 500 error when the metadata-agent tries to connect to it. This is a completely different issue and may be more related to nova than neutron though, so it may very well not be the right mail thread to discuss it. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 5 juin 2019 à 15:09, Brian Haley a écrit : > > On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote: >> Hi, >> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : >> =ERROR REPORT==== 5-Jun-2019::18:50:08 === >> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): >> missed heartbeats from client, timeout: 60s >> The neutron-server logs show this error: >> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer >> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > Are there possibly any firewall rules getting in the way? Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong. > > As a test, does this connect? > > $ telnet controller1 5672 > Trying $IP... > Connected to controller1. > Escape character is '^]'. > > -Brian > > >> The relevant service version numbers are as follow: >> rabbitmq-server-3.6.5-1.el7.noarch >> openstack-neutron-12.0.6-1.el7.noarch >> python2-oslo-messaging-5.35.4-1.el7.noarch >> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. >> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. >> Best regards, >> Jean-Philippe Méthot >> Openstack system administrator >> Administrateur système Openstack >> PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Wed Jun 5 19:31:43 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Wed, 5 Jun 2019 15:31:43 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi Jean-Philippe, On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot wrote: > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We applied a bunch of fixes which I suspect are only a bunch of bandaids. Here are the changes we made: * Split neutron-api from neutron-server. Create a whole new controller running neutron-api with mod_wsgi. * Increase [database]/max_overflow = 200 * Disable RabbitMQ heartbeat in oslo.messaging: [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 * Increase [agent]/report_interval = 120 * Increase [DEFAULT]/agent_down_time = 600 We also have those sysctl configs due to firewall dropping sessions. But those have been on the server forever: net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_intvl = 1 net.ipv4.tcp_keepalive_probes = 5 We never figured out why a service that was working before the upgrade but no longer is. This is kind of frustrating as it caused us all short of intermittent issues and stress during our upgrade. Hope this helps. -- Mathieu From eandersson at blizzard.com Wed Jun 5 23:21:54 2019 From: eandersson at blizzard.com (Erik Olof Gunnar Andersson) Date: Wed, 5 Jun 2019 23:21:54 +0000 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: We have experienced similar issues when upgrading from Mitaka to Rocky. Distributing the RabbitMQ connections between the Rabbits helps a lot. At least with larger deployments. Since not all services re-connecting will be establishing it's connections against a single RabbitMQ server. > oslo_messaging_rabbit/kombu_failover_strategy = shuffle An alternative is to increase the SSL (and/or TCP) acceptors on RabbitMQ to allow it to process new connections faster. > num_tcp_acceptors / num_ssl_acceptors https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L35 https://groups.google.com/forum/#!topic/rabbitmq-users/0ApuN2ES0Ks > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We haven't quite figured this one out yet, but just after upgrade, Neutron handles about 1-2 of these per second. Restarting Neutron and it consumes messages super-fast for a few minutes and then slows down again. A few hours after the upgrade it consumes these without an issue. We ended up making similar tuning > report_interval 60 > agent_down_time 150 The most problematic for us so far has been the memory usage of Neutron. We see it peak at 8.2GB for neutron-server (rpc) instances. Which means we can only have ~10 neutron-rpc workers on a 128GB machine. Best Regards, Erik Olof Gunnar Andersson -----Original Message----- From: Mathieu Gagné Sent: Wednesday, June 5, 2019 12:32 PM To: Jean-Philippe Méthot Cc: openstack-discuss at lists.openstack.org Subject: Re: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq Hi Jean-Philippe, On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot wrote: > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === closing AMQP connection > <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit > [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] > [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 > is unreachable: [Errno 104] Connection reset by peer. Trying again in > 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit > [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on > controller1:5672 is unreachable: error>. Trying again in 1 seconds.: RecoverableConnectionError: > > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > We had a very similar issue after upgrading to Neutron Queens. In fact, all Neutron agents were "down" according to status API and messages weren't getting through. IIRC, this only happened in regions which had more load than the others. We applied a bunch of fixes which I suspect are only a bunch of bandaids. Here are the changes we made: * Split neutron-api from neutron-server. Create a whole new controller running neutron-api with mod_wsgi. * Increase [database]/max_overflow = 200 * Disable RabbitMQ heartbeat in oslo.messaging: [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 * Increase [agent]/report_interval = 120 * Increase [DEFAULT]/agent_down_time = 600 We also have those sysctl configs due to firewall dropping sessions. But those have been on the server forever: net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_intvl = 1 net.ipv4.tcp_keepalive_probes = 5 We never figured out why a service that was working before the upgrade but no longer is. This is kind of frustrating as it caused us all short of intermittent issues and stress during our upgrade. Hope this helps. -- Mathieu From cjeanner at redhat.com Thu Jun 6 05:43:36 2019 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Thu, 6 Jun 2019 07:43:36 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <379bd822-05dc-edd5-704c-8ae8ed37b32b@redhat.com> Even if I'm no core: huge +1 :) On 6/5/19 4:31 PM, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, which > was a ton of work. I believe he has the right knowledge to review any > TripleO patch and provide excellent reviews in our project. We're lucky > to have him with us in the team! > > I would like to propose him core on TripleO, please raise any objection > if needed. > -- > Emilien Macchi -- Cédric Jeanneret Software Engineer - OpenStack Platform Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mjozefcz at redhat.com Thu Jun 6 06:33:18 2019 From: mjozefcz at redhat.com (Maciej Jozefczyk) Date: Thu, 6 Jun 2019 08:33:18 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: Congratulations! On Wed, Jun 5, 2019 at 4:45 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -- Best regards, Maciej Józefczyk -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdulko at redhat.com Thu Jun 6 07:13:46 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Thu, 06 Jun 2019 09:13:46 +0200 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190605165807.jmhogmfyrxltx5b3@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> Message-ID: <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > On 19-05-30 10:17:39, Matthew Thode wrote: > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > Openshift upstream is giving us difficulty as they are capping the > > > > version of urllib3 and kubernetes we are using. > > > > > > > > -urllib3===1.25.3 > > > > +urllib3===1.24.3 > > > > -kubernetes===9.0.0 > > > > +kubernetes===8.0.1 > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > prefered solution just pushes the can down the road). > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > much. > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > function with that import). I'm not sure exactly what you are doing > > > > with it but would it be too much to ask to move to something else? > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > calls, but obviously we prefer the client. If there's much support for > > > getting rid of it, we can do the switch. > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > not too much trouble it'd be nice to have happen. > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > perhaps it's a false flag. > > > > > > > > Please let me know what you think > > > > > > Any updates on this? I'd like to move forward on removing the > dependency if possible. > Sure, I'm waiting for some spare time to do this. Fastest it may happen will probably be next week. From amotoki at gmail.com Thu Jun 6 07:19:18 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Thu, 6 Jun 2019 16:19:18 +0900 Subject: [upgrade][neutron] supported release window on rolling upgrade Message-ID: The neutron team is discussing how many releases we should support in RPC messages [1] (to drop downgrade codes in OVO). This affects rolling upgrade scenarios. Controller nodes are upgrade in FFU way, but we cannot upgrade compute nodes at once. This means controller nodes with N+X release need to talk compute nodes with N release. As of now, the neutron team is thinking to support LTS to LTS upgrade scenarios for major distributions and N->N+4 looks like the longest window. Rolling upgrade scenarios affect not only neutron but also other projects like nova, so I am sending this mail for broader input. [1] https://review.opendev.org/#/c/661995/ Thanks, Akihiro Motoki (irc: amotoki) From mark at stackhpc.com Thu Jun 6 08:19:09 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:19:09 +0100 Subject: [kolla][tripleo] Python 3, CentOS/RHEL 8 In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 16:42, Alex Schultz wrote: > > > > On Wed, Jun 5, 2019 at 3:19 AM Mark Goddard wrote: >> >> Hi, >> >> At the recent kolla virtual PTG [1], we discussed the move to python 3 >> images in the Train cycle. hrw has started this effort for >> Ubuntu/Debian source images [2] and is making good progress. >> >> Next we will need to consider CentOS and RHEL. It seems that for Train >> RDO will provide only python 3 packages with support for CentOS 8 [3]. >> There may be some overlap in the trunk (master) packages where there >> is support for both CentOS 7 and 8. We will therefore need to combine >> the switch to python 3 with a switch to a CentOS/RHEL 8 base image. >> >> Some work was started during the Stein cycle to support RHEL 8 images >> with python 3 packages. There will no doubt be a few scripts that need >> updating to complete this work. We'll also need to test to ensure that >> both binary and source images work in this new world. >> > > When CentOS8 is available, we'll be working on that more with TripleO to ensure it's working and if there are issues we'll likely submit fixes as necessary. Currently https://review.opendev.org/#/c/632156/ should be the actual support for the python3 bits as currently required when using the RDO provided packages. We're not aware of any outstanding issues but if we run into them, then we will help as needed. We currently use kolla to generate the related Dockerfiles for building with RHEL8 and have posted the issues that we've run across so far. The related work for podman/buildah (if desired) is currently being discussed in a different thread. Thanks for clarifying. I expect we'll have some kinks in the CentOS source images to iron out (e.g. install python3-devel) but hopefully the majority should be covered by https://review.opendev.org/#/c/632156/. There will also be the less glamorous cleanup tasks to remove python 2 support, but they won't block python 3 images. > >> >> Tripleo team - what are your plans for CentOS/RHEL 8 and python 3 this >> cycle? Are you planning to continue the work started in kolla during >> the Stein release? > > > As mentioned, we're not currently aware of any outstanding issues around this so as of Stein, the python3 related packages (when available) combined with an 8 based base image+repos should work. > >> >> >> Thanks, >> Mark >> >> [1] https://etherpad.openstack.org/p/kolla-train-ptg >> [2] https://blueprints.launchpad.net/kolla/+spec/debian-ubuntu-python3 >> [3] https://review.rdoproject.org/etherpad/p/moving-rdo-to-centos8 >> [4] https://review.opendev.org/#/c/632156/ >> From florian.engelmann at everyware.ch Thu Jun 6 08:20:53 2019 From: florian.engelmann at everyware.ch (Florian Engelmann) Date: Thu, 6 Jun 2019 10:20:53 +0200 Subject: [telemetry] volume_type_id stored instead of volume_type name Message-ID: <4081acb6-be89-3249-e535-67c192be3743@everyware.ch> Hi, some volumes are stored with the volume_type Id instead of the volume_type name: openstack metric resource history --details b5496a42-c766-4267-9248-6149aa9dd483 -c id -c revision_start -c revision_end -c instance_id -c volume_type +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ | id | revision_start | revision_end | instance_id | volume_type | +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-08T07:21:35.354474+00:00 | 2019-05-21T09:18:32.767426+00:00 | 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-21T09:18:32.767426+00:00 | 2019-05-21T09:18:32.845700+00:00 | 662998da-c3d1-45c5-9120-2cff6240e3b6 | v-ssd-std | | b5496a42-c766-4267-9248-6149aa9dd483 | 2019-05-21T09:18:32.845700+00:00 | None | 662998da-c3d1-45c5-9120-2cff6240e3b6 | 8bd7e1b1-3396-49bf-802c-8c31a9444895 | +--------------------------------------+----------------------------------+----------------------------------+--------------------------------------+--------------------------------------+ I was not able to find anything fishy in ceilometer. So I guess it could be some event/notification with a wrong payload? Could anyone please verify this error is not uniq to our (rocky) environment by running: openstack metric resource list --type volume -c id -c volume_type All the best, Florian -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5230 bytes Desc: not available URL: From thierry at openstack.org Thu Jun 6 08:22:53 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 10:22:53 +0200 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: Mark Goddard wrote: > [...] > We see two options for becoming an official OpenStack project: > > 1. become a deliverable of the Kolla project > 2. become an official top level OpenStack project > > Given the affinity with the Kolla project I feel that option 1 seems > natural. However, I do not want to use influence as PTL to force this > approach. > [...] From a governance perspective, the two options are definitely possible. Kayobe can be seen as one of the Kolla-derived deployment tools, or it can be seen as a new deployment tool combining two existing projects (Kolla and Bifrost). Project teams are cheap: the best solution is the one that best aligns to the social reality. So I'd say the decision depends on how much independence Kayobe wants to have from Kolla. Having a separate project team will for example make it easier to have separate meetings, but harder to have common meetings. How much of a separate team is Kayobe from Kolla? How much do you want it to stay that way? -- Thierry Carrez (ttx) From thierry at openstack.org Thu Jun 6 08:25:14 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 10:25:14 +0200 Subject: [airship] Is Ironic ready for Airship? In-Reply-To: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> References: <15fc408f.c51c.16b27b2a86d.Coremail.guoyongxhzhf@163.com> Message-ID: <09029ebf-93c1-2f1f-7e02-ec55fef51f60@openstack.org> 郭勇 wrote: > I know Airship choose Maas as bare mental management tool. > > I want to know whether Maas is more suitable for Airship when it comes > to under-infrastructure? > > If Maas is more suitable, then what feature should ironic develop? Note that airship has its own discussion list: http://lists.airshipit.org/cgi-bin/mailman/listinfo/airship-discuss This is an openstack-specific discussion list. -- Thierry Carrez (ttx) From hberaud at redhat.com Thu Jun 6 08:39:18 2019 From: hberaud at redhat.com (Herve Beraud) Date: Thu, 6 Jun 2019 10:39:18 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: <20190605163256.6muy2ooxwlmdissq@yuggoth.org> References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> <20190605163256.6muy2ooxwlmdissq@yuggoth.org> Message-ID: +1 Le mer. 5 juin 2019 à 18:38, Jeremy Stanley a écrit : > On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: > > Agreed. There's probably an argument that we should cap bandit on > > stable branches anyway, but it would save us a lot of tedious > > patches if we just hope bandit doesn't break us again. :-) > [...] > > Oh, yes, I think capping on stable is probably a fine idea > regardless (we should be doing that anyway for all our static > analyzers on principle). What I meant is that it would likely render > those updates no longer urgent. > -- > Jeremy Stanley > -- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu Jun 6 08:49:44 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:49:44 +0100 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: On Thu, 6 Jun 2019 at 09:27, Thierry Carrez wrote: > > Mark Goddard wrote: > > [...] > > We see two options for becoming an official OpenStack project: > > > > 1. become a deliverable of the Kolla project > > 2. become an official top level OpenStack project > > > > Given the affinity with the Kolla project I feel that option 1 seems > > natural. However, I do not want to use influence as PTL to force this > > approach. > > [...] > > From a governance perspective, the two options are definitely possible. > > Kayobe can be seen as one of the Kolla-derived deployment tools, or it > can be seen as a new deployment tool combining two existing projects > (Kolla and Bifrost). Project teams are cheap: the best solution is the > one that best aligns to the social reality. > > So I'd say the decision depends on how much independence Kayobe wants to > have from Kolla. Having a separate project team will for example make it > easier to have separate meetings, but harder to have common meetings. > How much of a separate team is Kayobe from Kolla? How much do you want > it to stay that way? Right now the intersection of the core teams is only me. While all Kayobe contributors are familiar with Kolla projects, the reverse is not true. This is partly because Kolla and/or Kolla Ansible can be used without Kayobe, and partly because Kayobe is a newer project which typically gets adopted at the beginning of a cloud deployment. It certainly seems to make sense from the Kayobe community perspective to join these communities. I think the question the Kolla team needs to ask is whether the benefit of a more complete set of tooling is worth the overhead of adding a new deliverable that may not be used by all contributors or in all deployments. > > -- > Thierry Carrez (ttx) > From mark at stackhpc.com Thu Jun 6 08:51:26 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 6 Jun 2019 09:51:26 +0100 Subject: [kolla][tripleo] Podman/buildah In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 17:31, Mark Goddard wrote: > > On Wed, 5 Jun 2019 at 15:48, Emilien Macchi wrote: > > > > On Wed, Jun 5, 2019 at 8:47 AM Mark Goddard wrote: > >> > >> Thanks for following up. It wouldn't be a trivial change to add > >> buildah support in kolla, but it would have saved reimplementing the > >> task parallelisation in Tripleo and would benefit others too. Never > >> mind. > > > > > > To be fair, at the time I wrote the code in python-tripleoclient the container tooling wasn't really stable and we weren't sure about the directions we would take yet; which is the main reason which drove us to not invest too much time into refactoring Kolla to support a tool that we weren't sure we would end up using in production for the container image building. > > > That's fair, sorry to grumble :) > > It has been a few months now and so far it works ok for our needs; so if there is interest in supporting Buildah in Kolla then we might want to do the refactor and of course TripleO would use this new feature. If there are resources to do it, I'm sure the Kolla team would be receptive. > > -- > > Emilien Macchi From moguimar at redhat.com Thu Jun 6 08:52:32 2019 From: moguimar at redhat.com (Moises Guimaraes de Medeiros) Date: Thu, 6 Jun 2019 10:52:32 +0200 Subject: [oslo] Bandit Strategy In-Reply-To: References: <4638b722-fff4-8387-726e-b75800f59186@nemebean.com> <20190605154927.nskpuosejx2of6rp@yuggoth.org> <26d80bb6-9893-6e5d-5be6-abce99c2dd76@nemebean.com> <20190605163256.6muy2ooxwlmdissq@yuggoth.org> Message-ID: +1 Jeremy Em qui, 6 de jun de 2019 às 10:42, Herve Beraud escreveu: > +1 > > Le mer. 5 juin 2019 à 18:38, Jeremy Stanley a écrit : > >> On 2019-06-05 11:27:09 -0500 (-0500), Ben Nemec wrote: >> > Agreed. There's probably an argument that we should cap bandit on >> > stable branches anyway, but it would save us a lot of tedious >> > patches if we just hope bandit doesn't break us again. :-) >> [...] >> >> Oh, yes, I think capping on stable is probably a fine idea >> regardless (we should be doing that anyway for all our static >> analyzers on principle). What I meant is that it would likely render >> those updates no longer urgent. >> -- >> Jeremy Stanley >> > > > -- > Hervé Beraud > Senior Software Engineer > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- > > -- Moisés Guimarães Software Engineer Red Hat -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-samuel.le-stang at corp.ovh.com Thu Jun 6 09:13:14 2019 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Thu, 6 Jun 2019 11:13:14 +0200 Subject: [ops] database archiving tool In-Reply-To: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> References: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> Message-ID: <20190606091215.p7pms5bvwyo7qm6d@corp.ovh.com> Hi all, We finally opensourced the tool on our github repository. You may get it here: https://github.com/ovh/osarchiver/ Thanks for your feedbacks. -- PS Pierre-Samuel LE STANG wrote on jeu. [2019-mai-09 17:14:35 +0200]: > Hi all, > > At OVH we needed to write our own tool that archive data from OpenStack > databases to prevent some side effect related to huge tables (slower response > time, changing MariaDB query plan) and to answer to some legal aspects. > > So we started to write a python tool which is called OSArchiver that I briefly > presented at Denver few days ago in the "Optimizing OpenStack at large scale" > talk. We think that this tool could be helpful to other and are ready to open > source it, first we would like to get the opinion of the ops community about > that tool. > > To sum-up OSArchiver is written to work regardless of Openstack project. The > tool relies on the fact that soft deleted data are recognizable because of > their 'deleted' column which is set to 1 or uuid and 'deleted_at' column which > is set to the date of deletion. > > The points to have in mind about OSArchiver: > * There is no knowledge of business objects > * One table might be archived if it contains 'deleted' column > * Children rows are archived before parents rows > * A row can not be deleted if it fails to be archived > > Here are features already implemented: > * Archive data in an other database and/or file (actually SQL and CSV > formats are supported) to be easily imported > * Delete data from Openstack databases > * Customizable (retention, exclude DBs, exclude tables, bulk insert/delete) > * Multiple archiving configuration > * Dry-run mode > * Easily extensible, you can add your own destination module (other file > format, remote storage etc...) > * Archive and/or delete only mode > > It also means that by design you can run osarchiver not only on OpenStack > databases but also on archived OpenStack databases. > > Thanks in advance for your feedbacks. > > -- > Pierre-Samuel Le Stang -- Pierre-Samuel Le Stang From Vrushali.Kamde at nttdata.com Thu Jun 6 09:32:09 2019 From: Vrushali.Kamde at nttdata.com (Kamde, Vrushali) Date: Thu, 6 Jun 2019 09:32:09 +0000 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' Message-ID: Hi, Working on implementation of 'Support filtering of allocation_candidates by forbidden aggregates' spec. Need discussion particularly for point [1] where traits needs to be sync along with aggregates at placement. Master implementation for 'nova-manage placement sync_aggregates' command is to sync the nova host aggregates. Modifying this command to sync trait metadata of aggregate at placement. Below are the aggregate restful APIs which currently supports: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) getting synced on the placement service 2. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) getting synced on the placement service 3. 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) Doesn't get sync on the placement service. 4. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) Doesn't get sync on the placement service. I have added code to sync traits for below APIs and I don't see any issues there: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) But there is an issue while removing traits for below APIs: 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) Ideally, we should remove traits set in the aggregate metadata from the resource providers associated with the aggregate for above two APIs but it could cause a problem for below scenario:- For example: 1. Create two aggregates 'agg1' and 'agg2' by using: 'POST'-- /os-aggregates(Create aggregate) 2. Associate above aggregates to host 'RP1' by using: 'POST'-- /os-aggregates/{aggregate_id}/action(Add host) 3. Setting metadata (trait:STORAGE_DISK_SSD='required') on the aggregate agg1 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) 4. Setting metadata (trait:STORAGE_DISK_SSD='required', trait:HW_CPU_X86_SGX='required') on the aggregate agg2 by using: 'POST'-- /os-aggregates/{aggregate_id}/action(set metadata) Traits set to 'RP1' are: STORAGE_DISK_SSD HW_CPU_X86_SGX Note: Here trait 'STORAGE_DISK_SSD' is set on both agg1 and agg2. Now, If we remove host 'RP1' from 'agg1' then the trait 'STORAGE_DISK_SSD' set to `RP1` also needs to be removed but since 'RP1' is also assigned to 'agg2', removing 'STORAGE_DISK_SSD' trait from 'RP1' is not correct. I have discussed about syncing traits issues with Eric on IRC [2], he has suggested few approaches as below: - Leave all traits alone. If they need to be removed, it would have to be manually via a separate step. - Support a new option so the caller can dictate whether the operation should remove the traits. (This is all-or-none.) - Define a "namespace" - a trait substring - and remove only traits in that namespace. If I'm not wrong, for last two approaches, we would need to change RestFul APIs. Need your feedback whether traits should be deleted from resource provider or not for below two cases? 1. 'POST'-- /os-aggregates/{aggregate_id}/action(Remove host) 2. 'POST'-- /os-aggregates/{aggregate_id}/action(unset metadata) [1]: https://review.opendev.org/#/c/609960/8/specs/train/approved/placement-req-filter-forbidden-aggregates.rst at 203 [2]: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-05-30.log.html Thanks & Regards, Vrushali Kamde Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From marios at redhat.com Thu Jun 6 09:43:43 2019 From: marios at redhat.com (Marios Andreou) Date: Thu, 6 Jun 2019 12:43:43 +0300 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 haven't really worked with Kamil but *have* noticed him out and about in gerrit reviews. So I just did a quick code review review ;) and i see that he is there [1] - not by itself the most important thing but it demonstrates some dedication to TripleO for a while now! Looking at some recent random reviews agree Kamil would be a great addition thanks! [1] https://www.stackalytics.com/report/contribution/tripleo-group/360 On Wed, Jun 5, 2019 at 5:34 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jun 6 09:55:17 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 11:55:17 +0200 Subject: [Release-job-failures] Tag of openstack/ironic failed In-Reply-To: References: Message-ID: <48722ba6-f862-490b-926b-508625631aef@openstack.org> zuul at openstack.org wrote: > Build failed. > > - publish-openstack-releasenotes-python3 http://logs.openstack.org/c9/c9009f704afed7579c9d8dfcf7b774623966ef5b/tag/publish-openstack-releasenotes-python3/d6f47db/ : POST_FAILURE in 13m 56s This error occurred after tagging openstack/ironic 11.1.3, due to some transient network issue during release notes build rsync: http://zuul.openstack.org/build/d6f47db4f78b44599c4036a7039a1f5b It prevented release notes for 11.1.3 from being published. However the ironic release notes were regenerated after that, resulting in proper publication: http://zuul.openstack.org/build/e1b75f44857e44b1a94c2499e8b5f742 https://docs.openstack.org/releasenotes/ironic/rocky.html#relnotes-11-1-3-stable-rocky Note that the release pipeline jobs completed successfully, so the release itself is OK. Impact: Release notes for ironic 11.1.3 were unavailable for 30min. TODO: None -- Thierry Carrez (ttx) From tobias.rydberg at citynetwork.eu Thu Jun 6 10:04:52 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 6 Jun 2019 12:04:52 +0200 Subject: [sigs][publiccloud][publiccloud-wg][publiccloud-sig][billing] Bi-weekly meeting today at 1400 UTC Message-ID: Hi all, This is a reminder for todays meeting for the Public Cloud SIG - 1400 UTC in #openstack-publiccloud. The main focus for the meeting will be continues discussions regarding the billing initiative. More information about that at https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal Agenda at: https://etherpad.openstack.org/p/publiccloud-wg See you all later today! Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From bdobreli at redhat.com Thu Jun 6 10:04:57 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Thu, 6 Jun 2019 12:04:57 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <7546e166-adc3-21fc-12c8-8ce1f75069b1@redhat.com> On 05.06.2019 16:31, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, which > was a ton of work. I believe he has the right knowledge to review any > TripleO patch and provide excellent reviews in our project. We're lucky +1 > to have him with us in the team! > > I would like to propose him core on TripleO, please raise any objection > if needed. > -- > Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando From hjensas at redhat.com Thu Jun 6 10:43:49 2019 From: hjensas at redhat.com (Harald =?ISO-8859-1?Q?Jens=E5s?=) Date: Thu, 06 Jun 2019 12:43:49 +0200 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: <21e6bd11ac136fffe325d6726d6c615574982f9e.camel@redhat.com> On Wed, 2019-06-05 at 10:31 -0400, Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing > really insightful reviews, specially on Python best practices but not > only; he is one of the major contributors of the OVN integration, > which was a ton of work. I believe he has the right knowledge to > review any TripleO patch and provide excellent reviews in our > project. We're lucky to have him with us in the team! > > I would like to propose him core on TripleO, please raise any > objection if needed. > -- > Emilien Macchi +1, I've seen Kamil around gerrit providing insightful reviews. Thanks Kamil! From doka.ua at gmx.com Thu Jun 6 10:49:37 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Thu, 6 Jun 2019 13:49:37 +0300 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> Message-ID: <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> Hi Brian, thanks for the response. I solved my issue from other (client) side - I'm using Heat and Heat don't look whether uuid of image changed, it just check for existense of image with specified name. So it's safe to delete image and then create another one with same name and parameters and zero size. But in fact, Glance has a bit contradictory approach: Documentation on db purge says: "Remember that image identifiers are used by other OpenStack services that require access to images. These services expect that when an image is requested by ID, they will receive the same data every time." but there are no ways to get list of images including 'deleted' or details of 'deleted' image, e.g. doka at lagavulin(admin at admin):~$ openstack image show b179ecee-775d-4ee4-81c0-d3ec3a769d35 Could not find resource b179ecee-775d-4ee4-81c0-d3ec3a769d35 so preserving image record in database makes no sense for 3rd party services, which talk to Glance over public API. On the other hand, having in DB API ready for use 'image_destroy' call, it's pretty easy (of course, for those who work with Glance code :-) ) to add public API call kind of images/{image_id}/actions/destroy , calling DB API's image_destroy. And, in that case, it makes sense to allow image uuid to be specified during image create (since client can purge specified record and recreate it using same characteristics), otherwise I don't see where, in general, specifying uuid (when creating image) can be useful. The good news is that I solved my problem. The bad news is that solution relies on relaxed requirements of 3rd party products but not on Glance's API itself :-) Thanks! On 6/5/19 5:38 PM, Brian Rosmaita wrote: > On 6/5/19 8:34 AM, Volodymyr Litovka wrote: >> Dear colleagues, >> >> for some reasons, I need to shrink image size to zero (freeing storage >> as well), while keeping this record in Glance database. >> >> First which come to my mind is to delete image and then create new one >> with same name/uuid/... and --file /dev/null, but this is impossible >> because Glance don't really delete records from database, marking them >> as 'deleted' instead. > The glance-manage utility program allows you to purge the database. The > images table (where the image UUIDs are stored) is not purged by default > because of OSSN-0075 [0]. See the glance docs [1] for details. > > [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 > [1] > https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance > > (That doesn't really help your issue, I just wanted to point out that > there is a way to purge the database.) > >> Next try was to use glance image-upload from /dev/null, but this is also >> prohibited with message "409 Conflict: Image status transition from >> [activated, deactivated] to saving is not allowed (HTTP 409)" > That's correct, Glance will not allow you to replace the image data once > an image has gone to 'active' status. > >> I found >> https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's >> >> "image_destroy" but have no clues on how to access this API. Is it kind >> of library or kind of REST API, how to access it and whether it's safe >> to use it in terms of longevity and compatibility between versions? > The title of that document is misleading. It describes the interface > that Glance developers can use when they need to interact with the > database. There's no tool that exposes those operations to operators. > >> Or, may be, you can advise any other methods to solve the problem of >> zeroing glance image data / freeing storage, while keeping in database >> just a record about this image? > If you purged the database, you could do your proposal to recreate the > image with a zero-size file -- but that would give you an image with > status 'active' that an end user could try to boot an instance with. I > don't think that's a good idea. Additionally, purging the images table > of all UUIDs, not just the few you want to replace, exposes you to > OSSN-0075. > > An alternative--and I'm not sure this is a good idea either--would be to > deactivate the image [2]. This would preserve all the current metadata > but not allow the image to be downloaded by a non-administrator. With > the image not in 'active' status, nova or cinder won't try to use it to > create instances or volumes. The image data would still exist, though, > so you'd need to delete it manually from the backend to really clear out > the space. Additionally, the image size would remain, which might be > useful for record-keeping, although on the other hand, it will still > count against the user_storage_quota. And the image locations will > still exist even though they won't refer to any existing data any more. > (Like I said, I'm not sure this is a good idea.) > > [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image > >> Thank you. > Not sure I was much help. Let's see if other operators have a good > workaround or a need for this kind of functionality. > >> -- >> Volodymyr Litovka >>   "Vision without Execution is Hallucination." -- Thomas Edison >> >> > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From beagles at redhat.com Thu Jun 6 10:51:55 2019 From: beagles at redhat.com (Brent Eagles) Date: Thu, 6 Jun 2019 08:21:55 -0230 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 12:04 PM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > +1, indeed!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 6 11:35:23 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 20:35:23 +0900 Subject: [nova] API updates week 19-23 Message-ID: <16b2c92714e.10196b432132392.265414056494798985@ghanshyammann.com> Hi All, Please find the Nova API updates of this week. API Related BP : ============ Code Ready for Review: ------------------------------ 1. Support adding description while locking an instance: - Topic: https://review.opendev.org/#/q/topic:bp/add-locked-reason+(status:open+OR+status:merged) - Weekly Progress: OSC patch has been updated by tssurya. 2. Add host and hypervisor_hostname flag to create server - Topic: https://review.opendev.org/#/q/topic:bp/add-host-and-hypervisor-hostname-flag-to-create-server+(status:open+OR+status:merged) - Weekly Progress: patches have been updated with review comments. 3. Detach and attach boot volumes: - Topic: https://review.openstack.org/#/q/topic:bp/detach-boot-volume+(status:open+OR+status:merged) - Weekly Progress: No Progress Spec Ready for Review: ----------------------------- 1. Nova API policy improvement - Spec: https://review.openstack.org/#/c/547850/ - PoC: https://review.openstack.org/#/q/topic:bp/policy-default-refresh+(status:open+OR+status:merged) - Weekly Progress: Under review and updates. 2. Support for changing deleted_on_termination after boot -Spec: https://review.openstack.org/#/c/580336/ - Weekly Progress: No update this week. 3. Nova API cleanup - Spec: https://review.openstack.org/#/c/603969/ - Weekly Progress: Spec is merged. 4. Specifying az when restore shelved server - Spec: https://review.openstack.org/#/c/624689/ - Weekly Progress: Spec is updated for review comments. 5. Support delete_on_termination in volume attach api -Spec: https://review.openstack.org/#/c/612949/ - Weekly Progress: No updates this week. 7. Add API ref guideline for body text - ~8 api-ref are left to fix. Previously approved Spec needs to be re-proposed for Train: --------------------------------------------------------------------------- 1. Servers Ips non-unique network names : - https://blueprints.launchpad.net/nova/+spec/servers-ips-non-unique-network-names - https://review.openstack.org/#/q/topic:bp/servers-ips-non-unique-network-names+(status:open+OR+status:merged) 2. Volume multiattach enhancements: - https://blueprints.launchpad.net/nova/+spec/volume-multiattach-enhancements - https://review.openstack.org/#/q/topic:bp/volume-multiattach-enhancements+(status:open+OR+status:merged) Bugs: ==== No progress report in this week. NOTE- There might be some bug which is not tagged as 'api' or 'api-ref', those are not in the above list. Tag such bugs so that we can keep our eyes. -gmann From geguileo at redhat.com Thu Jun 6 12:00:06 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 6 Jun 2019 14:00:06 +0200 Subject: [k8s-sig][cinder] Ember-CSI goes Beta! Message-ID: <20190606120006.rhqoiomfgm4mvksa@localhost> Hi, The Ember-CSI team is happy to announce the release of version v0.9, marking the graduation of the project into Beta. Ember-CSI is where Kubernetes storage and OpenStack storage intersect. By leveraging existing Cinder drivers via cinderlib [1], Ember-CSI is able to provide block and mount storage to containers in any Kubernetes cluster supporting the Container Storage Interface (CSI) in a lightweight solution, as it doesn't need to deploy a DBMS or Message Broker systems, and doesn't need to deploy the usual Cinder API, Volume, or Scheduler services either. Key features of the project are: - Multi-driver support on single container - Support for mount filesystems - Support for block - Topology support - Snapshot support - Liveness probe - No need to deploy a DBMS in K8s (uses CRD for metadata) - Multi-CSI version support on single container (v0.2, v0.3, and v1.0) - Storage driver list tool - Support live debugging of running driver - Duplicated requests queuing support (for k8s) - Support of mocked probe (when using faulty sidecars) - Configurable default mount filesystem The Beta is available in Docker Hub [2] -under "stable" and "ember_0.9.0-stein" tags- as well as in PyPi [3]. After this milestone, where we have achieved feature parity with CSI v1.0, we will mostly focus on the areas we consider necessary for the transition into GA: upgrading mechanism, performance improvements, and documentation. If time permits, we will also work on features from newer CSI spec versions, such as volume expansion. For those interested in the project, the team can be reached on FreeNode's #ember-csi channel and at the Google group [4]. We also have a small site [5] with articles on how to try it on K8s with 2 backends (LVM and Ceph) and the github org [6]. Cheers, Gorka. [1]: https://opendev.org/openstack/cinderlib [2]: https://hub.docker.com/r/embercsi/ember-csi [3]: https://pypi.org/project/ember-csi/ [4]: https://groups.google.com/forum/#!forum/embercsi [5]: https://ember-csi.io [6]: https://github.com/embercsi/ From rosmaita.fossdev at gmail.com Thu Jun 6 12:10:17 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Thu, 6 Jun 2019 08:10:17 -0400 Subject: [glance] zeroing image, preserving other parameters In-Reply-To: <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> References: <1e0e0c49-0488-f899-e866-18f438db82fb@gmx.com> <6faa9bd8-c273-ae19-1fa1-9ecaa5a7b94c@gmail.com> <22a9d4d2-ebec-ce9d-97c3-cbc25bd9f859@gmx.com> Message-ID: <158cc12f-ce3e-12ff-e76f-42393ce4088d@gmail.com> On 6/6/19 6:49 AM, Volodymyr Litovka wrote: [snip] > But in fact, Glance has a bit contradictory approach: > > Documentation on db purge says: "Remember that image identifiers are > used by other OpenStack services that require access to images. These > services expect that when an image is requested by ID, they will receive > the same data every time." but there are no ways to get list of images > including 'deleted' or details of 'deleted' image, e.g. > > doka at lagavulin(admin at admin):~$ openstack image show > b179ecee-775d-4ee4-81c0-d3ec3a769d35 > Could not find resource b179ecee-775d-4ee4-81c0-d3ec3a769d35 That's correct, Glance lets you know that an image doesn't exist anymore by returning a 404. > so preserving image record in database makes no sense for 3rd party > services, which talk to Glance over public API. That's right, Glance keeps this data around for its own internal bookkeeping purposes, namely, to ensure that an image_id isn't reused. (A design decision was made when the v2 API was introduced in Folsom not to expose deleted image records to end users. I think the idea was that "cloudy" behavior would entail lots of image creation and deletion, and there was no sense clogging up the database. This was before the discovery of OSSN-0075, however.) > On the other hand, having in DB API ready for use 'image_destroy' call, > it's pretty easy (of course, for those who work with Glance code :-) ) > to add public API call kind of images/{image_id}/actions/destroy , > calling DB API's image_destroy. And, in that case, it makes sense to > allow image uuid to be specified during image create (since client can > purge specified record and recreate it using same characteristics), > otherwise I don't see where, in general, specifying uuid (when creating > image) can be useful. The use case I've seen for specifying the uuid is where a provider has multiple independent clouds and wants to make it easy for an end user to find the "same" public image in each cloud. Unlike image_ids, there is no uniqueness requirement on image names. OSSN-0075 is the reason why we don't expose an destroy action through the API. A user could post a useful image with image_id 1, share it or make it a community image, then after a sufficient number of people are using it, replace it with a completely different image with some kind of malicious content, keeping all the other metadata (id, name, etc.) identical to the original image (except for the os_hash_value, which would definitely be different). > The good news is that I solved my problem. The bad news is that solution > relies on relaxed requirements of 3rd party products but not on Glance's > API itself :-) Glad you solved your problem! I think I don't quite grasp your use case, but I'm glad you got something working. > Thanks! > > On 6/5/19 5:38 PM, Brian Rosmaita wrote: >> On 6/5/19 8:34 AM, Volodymyr Litovka wrote: >>> Dear colleagues, >>> >>> for some reasons, I need to shrink image size to zero (freeing storage >>> as well), while keeping this record in Glance database. >>> >>> First which come to my mind is to delete image and then create new one >>> with same name/uuid/... and --file /dev/null, but this is impossible >>> because Glance don't really delete records from database, marking them >>> as 'deleted' instead. >> The glance-manage utility program allows you to purge the database.  The >> images table (where the image UUIDs are stored) is not purged by default >> because of OSSN-0075 [0].  See the glance docs [1] for details. >> >> [0] https://wiki.openstack.org/wiki/OSSN/OSSN-0075 >> [1] >> https://docs.openstack.org/glance/latest/admin/db.html#database-maintenance >> >> >> (That doesn't really help your issue, I just wanted to point out that >> there is a way to purge the database.) >> >>> Next try was to use glance image-upload from /dev/null, but this is also >>> prohibited with message "409 Conflict: Image status transition from >>> [activated, deactivated] to saving is not allowed (HTTP 409)" >> That's correct, Glance will not allow you to replace the image data once >> an image has gone to 'active' status. >> >>> I found >>> https://docs.openstack.org/glance/rocky/contributor/database_architecture.html#glance-database-public-api's >>> >>> >>> "image_destroy" but have no clues on how to access this API. Is it kind >>> of library or kind of REST API, how to access it and whether it's safe >>> to use it in terms of longevity and compatibility between versions? >> The title of that document is misleading.  It describes the interface >> that Glance developers can use when they need to interact with the >> database.  There's no tool that exposes those operations to operators. >> >>> Or, may be, you can advise any other methods to solve the problem of >>> zeroing glance image data / freeing storage, while keeping in database >>> just a record about this image? >> If you purged the database, you could do your proposal to recreate the >> image with a zero-size file -- but that would give you an image with >> status 'active' that an end user could try to boot an instance with.  I >> don't think that's a good idea.  Additionally, purging the images table >> of all UUIDs, not just the few you want to replace, exposes you to >> OSSN-0075. >> >> An alternative--and I'm not sure this is a good idea either--would be to >> deactivate the image [2].  This would preserve all the current metadata >> but not allow the image to be downloaded by a non-administrator.  With >> the image not in 'active' status, nova or cinder won't try to use it to >> create instances or volumes.  The image data would still exist, though, >> so you'd need to delete it manually from the backend to really clear out >> the space.  Additionally, the image size would remain, which might be >> useful for record-keeping, although on the other hand, it will still >> count against the user_storage_quota.  And the image locations will >> still exist even though they won't refer to any existing data any more. >>   (Like I said, I'm not sure this is a good idea.) >> >> [2] https://developer.openstack.org/api-ref/image/v2/#deactivate-image >> >>> Thank you. >> Not sure I was much help.  Let's see if other operators have a good >> workaround or a need for this kind of functionality. >> >>> -- >>> Volodymyr Litovka >>>    "Vision without Execution is Hallucination." -- Thomas Edison >>> >>> >> > > -- > Volodymyr Litovka >   "Vision without Execution is Hallucination." -- Thomas Edison > From gmann at ghanshyammann.com Thu Jun 6 12:53:27 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 21:53:27 +0900 Subject: [nova] Validation for requested host/node on server create In-Reply-To: References: <78fa937a-beb6-c63d-01a0-40e6519928be@gmail.com> Message-ID: <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> ---- On Fri, 24 May 2019 07:02:15 +0900 Matt Riedemann wrote ---- > On 5/22/2019 5:13 PM, Matt Riedemann wrote: > > 3. Validate both the host and node in the API. This can be broken down: > > > > a) If only host is specified, do #2 above. > > b) If only node is specified, iterate the cells looking for the node (or > > query a resource provider with that name in placement which would avoid > > down cell issues) > > c) If both host and node is specified, get the HostMapping and from that > > lookup the ComputeNode in the given cell (per the HostMapping) > > > > Pros: fail fast behavior in the API if either the host and/or node do > > not exist > > > > Cons: performance hit in the API to validate the host/node and > > redundancy with the scheduler to find the ComputeNode to get its uuid > > for the in_tree filtering on GET /allocation_candidates. > > > > Note that if we do find the ComputeNode in the API, we could also > > (later?) make a change to the Destination object to add a node_uuid > > field so we can pass that through on the RequestSpec from > > API->conductor->scheduler and that should remove the need for the > > duplicate query in the scheduler code for the in_tree logic. > > > > I'm personally in favor of option 3 since we know that users hate > > NoValidHost errors and we have ways to mitigate the performance overhead > > of that validation. > > > > Note that this isn't necessarily something that has to happen in the > > same change that introduces the host/hypervisor_hostname parameters to > > the API. If we do the validation in the API I'd probably split the > > validation logic into it's own patch to make it easier to test and > > review on its own. > > > > [1] https://review.opendev.org/#/c/645520/ > > [2] > > https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528 > > > > [3] https://docs.openstack.org/nova/latest/admin/availability-zones.html > > Per the nova meeting today [1] it sounds like we're going to go with > option 3 and do the validation in the API - check hostmapping for the > host, check placement for the node, we can optimize the redundant > scheduler calculation for in_tree later. For review and test sanity I > ask that the API validation code comes in a separate patch in the series. +1 on option3. For more optimization, can we skip b) and c) for non-baremental case assuming if there is Hostmapping then node also will be valid. -gmann > > [1] > http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-23-21.00.log.html#l-104 > > -- > > Thanks, > > Matt > > From luka.peschke at objectif-libre.com Thu Jun 6 13:00:00 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Thu, 06 Jun 2019 15:00:00 +0200 Subject: [cloudkitty] Shift IRC meeting of June 7th Message-ID: Hi all, Tomorrow's IRC meeting for CloudKitty will be held at 14h UTC (16 CEST) instead of 15h UTC as huats and I won't be available at 15h. Cheers, -- Luka Peschke From aschultz at redhat.com Thu Jun 6 13:04:00 2019 From: aschultz at redhat.com (Alex Schultz) Date: Thu, 6 Jun 2019 07:04:00 -0600 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Wed, Jun 5, 2019 at 8:42 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thierry at openstack.org Thu Jun 6 13:04:39 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 6 Jun 2019 15:04:39 +0200 Subject: [tc][tripleo][charms][helm][kolla][ansible][puppet][chef] Deployment tools capabilities Message-ID: <569964f6-ead3-a983-76d2-ffa59c753dbe@openstack.org> Hi everyone, In the "software" section of the openstack.org website, the deployment tools page is not very helpful for users looking into picking a way to deploy OpenStack: https://www.openstack.org/software/project-navigator/deployment-tools Furthermore, each detailed page is a bit dry. We do not display deliverable tags as most are irrelevant: https://www.openstack.org/software/releases/rocky/components/openstack-helm This was discussed in a forum session in Denver[1], and the outcome was that we should develop a taxonomy of deployment tools capabilities and characteristics, that would help users understand the technologies the various tools are based on, their prerequisites, which services and versions they cover, etc. The web UI should allow users to search deployment tools based on those tags. [1] https://etherpad.openstack.org/p/DEN-deployment-tools-capabilities As a first step, volunteers from that session worked on a draft categorized list[2] of those tags. If you are interested, please review that list, add to it or comment: [2] https://etherpad.openstack.org/p/deployment-tools-tags The next steps are: - commit the detailed list of tags (action:ttx) - apply it to existing deployment tools (action:deploy tools teams) - implementation those tags and data in the openstack website (action:jimmymcarthur) - maybe expand to list 3rd-party installers in a separate tab (tbd) The first two next steps will be implemented as patches to the osf/openstack-map repository, which already contains the base YAML data used in the software pages. Thanks for your help, -- Thierry Carrez (ttx) From e0ne at e0ne.info Thu Jun 6 13:06:16 2019 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Thu, 6 Jun 2019 16:06:16 +0300 Subject: [horizon] dropping 2012.2 tag on pypi In-Reply-To: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> References: <20190605163552.bnmjqxtoncct6lxr@yuggoth.org> Message-ID: Thank you, Jeremy and Guilherme! Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Wed, Jun 5, 2019 at 7:36 PM Jeremy Stanley wrote: > On 2019-06-05 12:57:45 -0300 (-0300), Guilherme Steinmüller wrote: > > As we've discussed with nova tag recently [1], I'd suggest the same for > > horizon. > > > > When we search on pypi the version it shows is 2012.2 and when we click > > release history we can see that the most recent version is 15.1.0 [2] > > > > [1] > > > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006780.html > > [2] https://pypi.org/project/horizon/#history > > Thanks for pointing this out. Since we basically got blanket > approval to do this for any official OpenStack project some years > back, I've removed the 2012.2 from the horizon project on PyPI just > now. > > If anybody spots others, please do mention them! > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfournie at redhat.com Thu Jun 6 13:27:12 2019 From: bfournie at redhat.com (Bob Fournier) Date: Thu, 6 Jun 2019 09:27:12 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Thu, Jun 6, 2019 at 9:13 AM Alex Schultz wrote: > +1 > > On Wed, Jun 5, 2019 at 8:42 AM Emilien Macchi wrote: > >> Kamil has been working on TripleO for a while now and is providing really >> insightful reviews, specially on Python best practices but not only; he is >> one of the major contributors of the OVN integration, which was a ton of >> work. I believe he has the right knowledge to review any TripleO patch and >> provide excellent reviews in our project. We're lucky to have him with us >> in the team! >> >> I would like to propose him core on TripleO, please raise any objection >> if needed. >> -- >> Emilien Macchi >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 6 13:37:26 2019 From: gmann at ghanshyammann.com (gmann at ghanshyammann.com) Date: Thu, 06 Jun 2019 22:37:26 +0900 Subject: [tc][kolla][kayobe] Feedback request: kayobe seeking official status In-Reply-To: References: Message-ID: <16b2d022f31.116dfa0cc138586.1656207772173337725@ghanshyammann.com> ---- On Thu, 06 Jun 2019 17:49:44 +0900 Mark Goddard wrote ---- > On Thu, 6 Jun 2019 at 09:27, Thierry Carrez wrote: > > > > Mark Goddard wrote: > > > [...] > > > We see two options for becoming an official OpenStack project: > > > > > > 1. become a deliverable of the Kolla project > > > 2. become an official top level OpenStack project > > > > > > Given the affinity with the Kolla project I feel that option 1 seems > > > natural. However, I do not want to use influence as PTL to force this > > > approach. > > > [...] > > > > From a governance perspective, the two options are definitely possible. > > > > Kayobe can be seen as one of the Kolla-derived deployment tools, or it > > can be seen as a new deployment tool combining two existing projects > > (Kolla and Bifrost). Project teams are cheap: the best solution is the > > one that best aligns to the social reality. > > > > So I'd say the decision depends on how much independence Kayobe wants to > > have from Kolla. Having a separate project team will for example make it > > easier to have separate meetings, but harder to have common meetings. > > How much of a separate team is Kayobe from Kolla? How much do you want > > it to stay that way? > > Right now the intersection of the core teams is only me. While all > Kayobe contributors are familiar with Kolla projects, the reverse is > not true. This is partly because Kolla and/or Kolla Ansible can be > used without Kayobe, and partly because Kayobe is a newer project > which typically gets adopted at the beginning of a cloud deployment. > > It certainly seems to make sense from the Kayobe community perspective > to join these communities. I think the question the Kolla team needs > to ask is whether the benefit of a more complete set of tooling is > worth the overhead of adding a new deliverable that may not be used by > all contributors or in all deployments. With my quick read on technical relation between Kolla-ansible and Kayobe, options1 make much sense to me too. It can give more benefits of working more closely and handle the dependencies and future roadmap etc. And having a completely separate team (in Kayobe case you have some overlap too even only you but that can increase in future) for repo under the same project is not new. We have a lot of existing projects which maintain the separate team for their different repo/deliverables without overlap. There are more extra work you need to consider if you go with a separate Project. For example, PTL things and its responsibility. I would say we can avoid that in Kayobe case because of its technical mission/relation to Kolla-ansible. -gmann > > > > > -- > > Thierry Carrez (ttx) > > > > From mthode at mthode.org Thu Jun 6 14:17:47 2019 From: mthode at mthode.org (Matthew Thode) Date: Thu, 6 Jun 2019 09:17:47 -0500 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> Message-ID: <20190606141747.gxoyrcels266rcgv@mthode.org> On 19-06-06 09:13:46, Michał Dulko wrote: > On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > > On 19-05-30 10:17:39, Matthew Thode wrote: > > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > > Openshift upstream is giving us difficulty as they are capping the > > > > > version of urllib3 and kubernetes we are using. > > > > > > > > > > -urllib3===1.25.3 > > > > > +urllib3===1.24.3 > > > > > -kubernetes===9.0.0 > > > > > +kubernetes===8.0.1 > > > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > > prefered solution just pushes the can down the road). > > > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > > much. > > > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > > function with that import). I'm not sure exactly what you are doing > > > > > with it but would it be too much to ask to move to something else? > > > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > > calls, but obviously we prefer the client. If there's much support for > > > > getting rid of it, we can do the switch. > > > > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > > not too much trouble it'd be nice to have happen. > > > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > > perhaps it's a false flag. > > > > > > > > > > Please let me know what you think > > > > > > > > > Any updates on this? I'd like to move forward on removing the > > dependency if possible. > > > > Sure, I'm waiting for some spare time to do this. Fastest it may happen > will probably be next week. > Sounds good, thanks for working on it. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From pierre at stackhpc.com Thu Jun 6 15:07:16 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Thu, 6 Jun 2019 16:07:16 +0100 Subject: [blazar] IRC meeting today Message-ID: Hello, We have our biweekly Blazar IRC meeting in less than one hour: https://wiki.openstack.org/wiki/Meetings/Blazar#Agenda_for_06_Jun_2019_.28Americas.29 We have the opportunity of discussing further how we can enforce policies for limiting reservation usage. I would also like to discuss downstream patches which could be contributed to the project. Everyone is welcome to join. Cheers, Pierre From madhuri.kumari at intel.com Thu Jun 6 15:19:48 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Thu, 6 Jun 2019 15:19:48 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Hi Eric, Thank you for your response. Please see my response inline. Regards, Madhuri >-----Original Message----- >From: Eric Fried [mailto:openstack at fried.cc] >Sent: Tuesday, June 4, 2019 12:07 AM >To: openstack-discuss at lists.openstack.org >Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post >Provisioning > >Hi Madhuri- > >> For this purpose, we would need to change a trait of the server’s >> flavor in Nova. This trait is mapped to a deploy step in Ironic >> which does some operation(change BIOS config and reboot in this use >> case).____ > >If your trait was something that wasn't tracked in the flavor (or elsewhere in >the instance's db record), you could just update it directly in placement. >Then you'd have to figure out how to make ironic notice that and effect the >change. (Or perhaps the other way around: >tell ironic you want to make the change, and it updates the trait in >placement as part of the process.) In this case, the trait is stored with flavor so it is known to Nova. The new trait should be added in the database and the old one removed. For an ex: An instance with flavor bm_hyperthreading with trait:CUSTOM_HYPERTHREADING_ON=required is created in Nova. Now the user wants to turn off the hyperthreading, than they could update the flavor with trait:CUSTOM_HYPERTHREADING_OFF=required. This should remove the trait:CUSTOM_HYPERTHREADING_ON and add trait:CUSTOM_HYPERTHREADING_OFF associated with the new flavor. > >> In Nova, the only API to change trait in flavor is resize whereas >> resize does migration and a reboot as well.____ >> >> In short, I am  looking for a Nova API that only changes the traits, >> and trigger the ironic deploy steps but no reboot and migration. >> Please suggest.____ > >It's inconvenient, but I'm afraid "resize" is the right way to get this done, >because that's the only way to get the appropriate validation and changes >effected in the general case. Yes, resize seems to be the only valid one. > >Now, there's a spec [1] we've been talking about for ~4.5 years that would >let you do a resize without rebooting, when only a certain subset of >properties are being changed. It is currently proposed for "upsizing" >CPU, memory, and disk, and adding PCI devices, but clearly this ISS >configuration would be a reasonable candidate to include. > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. So I am not sure if the spec actually satisfies the use case. I hope to get more response from the team to get more clarity. >In fact, it's possible that leading the charge with something this unobtrusive >would reduce some of the points of contention that have stalled the >blueprint up to this point. > >Food for thought. > >Thanks, >efried > >[1] https://review.opendev.org/#/c/141219/ From jaypipes at gmail.com Thu Jun 6 15:36:59 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Thu, 6 Jun 2019 11:36:59 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Message-ID: <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> On 6/6/19 11:19 AM, Kumari, Madhuri wrote: > In this case, the trait is stored with flavor so it is known to Nova. The new trait should be added in the database and the old one removed. For an ex: > > An instance with flavor bm_hyperthreading with trait:CUSTOM_HYPERTHREADING_ON=required is created in Nova. > Now the user wants to turn off the hyperthreading, than they could update the flavor with trait:CUSTOM_HYPERTHREADING_OFF=required. > This should remove the trait:CUSTOM_HYPERTHREADING_ON and add trait:CUSTOM_HYPERTHREADING_OFF associated with the new flavor. The absence of a trait on a provider should be represented by the provider not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that you either place on the provider or do not place on a provider. The flavor should then either request that the trait be present on a provider that the instance is scheduled to (trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* be present on a provider that the instance is scheduled to (trait:CUSTOM_HYPERTHREADING=forbidden). Best, -jay From a.settle at outlook.com Thu Jun 6 15:50:45 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Thu, 6 Jun 2019 15:50:45 +0000 Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC Message-ID: Hello all, Thanks to those who joined the TC meeting today and running through it with me at the speed of light. Gif game was impeccably strong and that's primarily what I like about this community. For a recap of the meeting, please see the eavesdrop [0] for full detailed logs and action items. All items in the agenda [1] were covered and no major concerns raised. Next meeting will be on the 8th of July 2019. Cheers, Alex [0] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.txt [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006877.html From dsneddon at redhat.com Thu Jun 6 16:33:13 2019 From: dsneddon at redhat.com (Dan Sneddon) Date: Thu, 6 Jun 2019 09:33:13 -0700 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: +1 On Wed, Jun 5, 2019 at 7:34 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > > -- > Emilien Macchi > -- Dan Sneddon | Senior Principal Software Engineer dsneddon at redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 6 17:01:57 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 12:01:57 -0500 Subject: [Openstack-operators] [nova][glance] nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures? In-Reply-To: <8c92910f-a8a0-5f1c-41b0-784ff2c3d00a@gmail.com> References: <81d14679c181cdc1a252570529ca5c4b@bitskrieg.net> <25e24abf-aebc-2881-9981-7f9683ffc700@gmail.com> <06029fcf4648d3aa784783389e986a8d@bitskrieg.net> <26839d31-18b8-ba76-56cc-8bbe4b73fc37@gmail.com> <34763ede-45a3-2d22-37a1-c3fc75ea84d2@gmail.com> <4fbe5786f0765d97229147cc1137a6ce@bitskrieg.net> <20180809172447.GB19251@redhat.com> <16524beca00.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> <8c92910f-a8a0-5f1c-41b0-784ff2c3d00a@gmail.com> Message-ID: <23b6eca0-592b-7d32-4789-98629fe0438c@gmail.com> On 8/18/2018 10:09 PM, Matt Riedemann wrote: >> This sounds promising and there seems to be a feasible way to do this, >> but it also sounds like a decent amount of effort and would be a new >> feature in a future release rather than a bugfix - am I correct in >> that assessment? > > Yes I'd say it's a blueprint and not a bug fix - it's not something we'd > backport to stable branches upstream, for example. Just an update on this since it came up in IRC today (unrelated discussion which reminded me of this thread), Kashyap has created a nova blueprint: https://blueprints.launchpad.net/nova/+spec/pick-guest-arch-based-on-host-arch-in-libvirt-driver -- Thanks, Matt From openstack at fried.cc Thu Jun 6 17:51:28 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 6 Jun 2019 12:51:28 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> Message-ID: <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. > So I am not sure if the spec actually satisfies the use case. > I hope to get more response from the team to get more clarity. Waitwait. The VM needs to be rebooted for the BIOS change to take effect? So (non-live) resize would actually satisfy your use case just fine. But the problem is that the ironic driver doesn't support resize at all? Without digging too hard, that seems like it would be a fairly straightforward thing to add. It would be limited to only "same host" and initially you could only change this one attribute (anything else would have to fail). Nova people, thoughts? efried . From colleen at gazlene.net Thu Jun 6 18:57:19 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Thu, 06 Jun 2019 11:57:19 -0700 Subject: [dev][keystone] M-1 check-in and retrospective meeting In-Reply-To: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> References: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> Message-ID: I've drafted an agenda[1] for the check-in/retrospective/review/planning meeting scheduled for next week (June 11, 1500 UTC: We'll be using jitsi.org[2] (hosted OSS video conferencing tool) for the call (I've only ever used it for one-on-one calls so we'll have to see how it performs with several people on the call). We'll keep the retrospective to no more than one hour (since we just had a retrospective that shouldn't be too hard). Still, it's an ambitious agenda for a two-hour meeting. You can help us get through it quickly by: * pre-filling out your thoughts on the retrospective etherpad[3] * reviewing the Train roadmap[4] and reflecting on the stories and tasks listed there, including updating task statuses or adding tasks where needed The agenda is a draft, so feel free to edit it or let me know if you have thoughts or questions on it. Colleen [1] https://etherpad.openstack.org/p/keystone-train-M-1-review-planning-meeting [2] https://meet.jit.si/keystone-train-m-1 [3] https://etherpad.openstack.org/p/keystone-train-m-1-retrospective [4] https://trello.com/b/ClKW9C8x/keystone-train-roadmap From corey.bryant at canonical.com Thu Jun 6 19:06:37 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Thu, 6 Jun 2019 15:06:37 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-14) Message-ID: This is the goal-14 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 14 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. That is the main goal. Other work items will consist of: * Dropping py35 and old py3 zuul templates (e.g. drop 'openstack-python35-jobs', 'openstack-python36-jobs', 'openstack-python37-jobs', etc) * Updating setup.cfg classifiers (e.g. drop 'Programming Language :: Python :: 3.5' and add 'Programming Language :: Python :: 3.7') * Updating the list of default tox.ini environment (e.g. drop py35 and add py37) For complete details please see [1]. == Role of Goal Champion == Ensure patches are proposed to all affected repositories, encourage teams to help land patches, and report weekly status. == Role of Project Teams == Fix failing tests so that the proposed patches can merge. Project teams should merge the change to the Zuul config before the end of the Train cycle. == Ongoing Work == I will be the goal champion for this goal. I'm just getting organized at this point. Over the next week I plan to get scripts working to automate patch generation for all supported projects and start submitting patches. Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged (Wow, look at that! Thanks Zhong Shengping.) == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in zuul. If you're interested in helping submit patches, please let me know. Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) == Reference Material == [1] Goal description: https://review.opendev.org/#/c/657908 [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/board/ Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 6 19:20:33 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 14:20:33 -0500 Subject: [nova] Validation for requested host/node on server create In-Reply-To: <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> References: <78fa937a-beb6-c63d-01a0-40e6519928be@gmail.com> <16b2cd9eb2e.119cf7a71136389.2518769832623783484@ghanshyammann.com> Message-ID: <35cafbbf-6e97-e1e1-e3b6-a390aa1e03f5@gmail.com> On 6/6/2019 7:53 AM, gmann at ghanshyammann.com wrote: > +1 on option3. For more optimization, can we skip b) and c) for non-baremental case > assuming if there is Hostmapping then node also will be valid. You won't know it's a baremetal node until you get the ComputeNode object and check the hypervisor_type, and at that point you've already validated that it exists. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 6 20:32:57 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 6 Jun 2019 15:32:57 -0500 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> Message-ID: <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> On 5/23/2019 7:00 AM, Nadathur, Sundar wrote: > Hi, > >     The feedback in the Nova – Cyborg interaction spec [1] is to move > the call for creating/binding accelerator requests (ARQs) from the > conductor (just before the call to build_and_run_instance, [2]) to the > compute manager (just before spawn, without holding the build sempahore > [3]). The point where the results of the bind are needed is in the virt > driver [4] – that is not changing. The reason for the move is to enable > Cyborg to notify Nova [5] instead of Nova virt driver polling Cyborg, > thus making the interaction similar to other services like Neutron. > > The binding involves device preparation by Cyborg, which may take some > time (ballpark: milliseconds to few seconds to perhaps 10s of seconds – > of course devices vary a lot). We want to overlap as much of this as > possible with other tasks, by starting the binding as early as possible > and making it asynchronous, so that bulk VM creation rate etc. are not > affected. These considerations are probably specific to Cyborg, so > trying to make it uniform with other projects deserve a closer look > before we commit to it. > > Moving the binding from [2] to [3] reduces this overlap. I did some > measurements of the time window from [2] to [3]: it was consistently > between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at > a time, etc. This seems acceptable. > > But this was just in a two-node deployment. Are there situations where > this window could get much larger (thus reducing the overlap)? Such as > in larger deployments, or issues with RabbitMQ messaging, etc. Are there > larger considerations of performance or scaling for this approach? > > Thanks in advance. > > [1] https://review.opendev.org/#/c/603955/ > > [2] > https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1501 > > [3] > https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882 > > [4] > https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3215 > > > [5] https://wiki.openstack.org/wiki/Nova/ExternalEventAPI > > Regards, > > Sundar > I'm OK with binding in the compute since that's where we trigger the callback event and want to setup something to wait for it before proceeding, like we do with port binding. What I've talked about in detail in the spec is doing the ARQ *creation* in conductor rather than compute. I realize that doing the creation in the compute service means fewer (if any) RPC API changes to get phase 1 of this code going, but I can't imagine any RPC API changes for that would be very big (it's a new parameter to the compute service methods, or something we lump into the RequestSpec). The bigger concern I have is that we've long talked about moving port (and at times volume) creation from the compute service to conductor because it's less expensive to manage external resources there if something fails, e.g. going over-quota creating volumes. The problem with failing late in the compute is we have to cleanup other things (ports and volumes) and then reschedule, which may also fail on the next alternate host. Failing fast in conductor is more efficient and also helps take some of the guesswork out of which service is managing the resources (we've had countless bugs over the years about ports and volumes being leaked because we didn't clean them up properly on failure). Take a look at any of the error handling in the server create flow in the ComputeManager and you'll see what I'm talking about. Anyway, if we're voting I vote that ARQ creation happens in conductor and binding happens in compute. -- Thanks, Matt From guoyongxhzhf at h3c.com Wed Jun 5 08:59:36 2019 From: guoyongxhzhf at h3c.com (Guoyong) Date: Wed, 5 Jun 2019 08:59:36 +0000 Subject: [airship] Is Ironic ready for Airship? Message-ID: I know Airship choose Maas as bare mental management tool. I want to know whether Maas is more suitable for Airship when it comes to under- infrastructure? If Maas is more suitable, then what feature should ironic develop? Thanks for your reply ------------------------------------------------------------------------------------------------------------------------------------- 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 邮件! This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anirudh.Gupta at hsc.com Thu Jun 6 04:59:28 2019 From: Anirudh.Gupta at hsc.com (Anirudh Gupta) Date: Thu, 6 Jun 2019 04:59:28 +0000 Subject: Unable to run ssh/iperf on StarlingX Vm Message-ID: Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Thu Jun 6 21:52:54 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 6 Jun 2019 16:52:54 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: Message-ID: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> Let me TL;DR this: The forbidden aggregates filter spec [1] says when we put trait metadata onto a host aggregate, we should add the same trait to the compute node RPs for all the hosts in that aggregate, so that the feature actually works when we use it. But we never talked about what to do when we *remove* a trait from such an aggregate, or trash an aggregate with traits, or remove a host from such an aggregate. Here are the alternatives, as Vrushali laid them out (letters added by me): > (a) Leave all traits alone. If they need to be removed, it would have to > be manually via a separate step. > > (b) Support a new option so the caller can dictate whether the operation > should remove the traits. (This is all-or-none.) > > (c) Define a "namespace" - a trait substring - and remove only traits in > that namespace. I'm going to -1 (b). It's too big a hammer, at too big a cost (including API changes). > If I’m not wrong, for last two approaches, we would need to change > RestFul APIs. No, (c) does not. By "define a namespace" I mean we would establish a naming convention for traits to be used with this feature. For example: CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE And when we do any of the removal things, we always and only remove any trait containing the substring _AGGREGATE_ISOLATION_ (assuming it's not asserted by other aggregates in which the host is also a member, yatta yatta). IMO (a) and (c) both suck, but (c) is a slightly better experience for the user. efried [1] http://specs.openstack.org/openstack/nova-specs/specs/train/approved/placement-req-filter-forbidden-aggregates.html P.S. > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the > intended recipient, please advise the sender by replying promptly to > this email and then delete and destroy this email and any attachments > without any further use, copying or forwarding. Hear that, pipermail? From smooney at redhat.com Thu Jun 6 23:31:54 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 00:31:54 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> Message-ID: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: > Let me TL;DR this: > > The forbidden aggregates filter spec [1] says when we put trait metadata > onto a host aggregate, we should add the same trait to the compute node > RPs for all the hosts in that aggregate, so that the feature actually > works when we use it. > > But we never talked about what to do when we *remove* a trait from such > an aggregate, or trash an aggregate with traits, or remove a host from > such an aggregate. > > Here are the alternatives, as Vrushali laid them out (letters added by me): > > > (a) Leave all traits alone. If they need to be removed, it would have to > > be manually via a separate step. > > > > (b) Support a new option so the caller can dictate whether the operation > > should remove the traits. (This is all-or-none.) > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > that namespace. > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > API changes). > > > If I’m not wrong, for last two approaches, we would need to change > > RestFul APIs. > > No, (c) does not. By "define a namespace" I mean we would establish a > naming convention for traits to be used with this feature. For example: > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE i personaly dislike c as it means we cannot use any standard traits in host aggrates. there is also an option d. when you remove a trait form a host aggregate for each host in the aggregate check if that traits exists on another aggregate the host is a member of and remove it if not found on another aggregate. > > And when we do any of the removal things, we always and only remove any > trait containing the substring _AGGREGATE_ISOLATION_ (assuming it's not > asserted by other aggregates in which the host is also a member, yatta > yatta). > > IMO (a) and (c) both suck, but (c) is a slightly better experience for > the user. c is only a good option if we are talking about a specific set of new traits for this usecase e.g. CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE but if we want to allow setting tratis genericaly on hosts via host-aggrages its not really that usefull. for example we talked about useing a hyperthreading trait in the cpu pinning spec which will not be managed by the compute driver. host aggages would be a convient way to be able to manage that trait if this was a generic feature. for c you still have to deal with the fact a host can be in multiple host aggrates too by the way so jsut because a thread is namespace d and it is removed from an aggrate does not mean its correcct to remove it from a host. From jrist at redhat.com Fri Jun 7 04:34:18 2019 From: jrist at redhat.com (Jason Rist) Date: Thu, 6 Jun 2019 22:34:18 -0600 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> Message-ID: Follow-up - this work is now done. https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) -J Jason Rist Red Hat  jrist / knowncitizen > On May 23, 2019, at 2:35 PM, Jason Rist wrote: > > Hi everyone - I’m writing the list to announce that we are retiring TripleO-UI and it will no longer be supported. It’s already deprecated in Zuul and removed from requirements, so I’ve submitted a patch to remove all code. > > https://review.opendev.org/661113 > > Thanks, > Jason > > Jason Rist > Red Hat  > jrist / knowncitizen > ` -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Fri Jun 7 05:17:30 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Fri, 7 Jun 2019 05:17:30 +0000 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> Message-ID: <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> > -----Original Message----- > From: Matt Riedemann > Sent: Thursday, June 6, 2019 1:33 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [nova] [cyborg] Impact of moving bind to compute > > On 5/23/2019 7:00 AM, Nadathur, Sundar wrote: > > [....] > > Moving the binding from [2] to [3] reduces this overlap. I did some > > measurements of the time window from [2] to [3]: it was consistently > > between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 > > at a time, etc. This seems acceptable. > > > > [2] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1501 > > > > [3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882 > > Regards, > > > > Sundar > I'm OK with binding in the compute since that's where we trigger the callback > event and want to setup something to wait for it before proceeding, like we > do with port binding. > > What I've talked about in detail in the spec is doing the ARQ *creation* in > conductor rather than compute. I realize that doing the creation in the > compute service means fewer (if any) RPC API changes to get phase 1 of this > code going, but I can't imagine any RPC API changes for that would be very big > (it's a new parameter to the compute service methods, or something we lump > into the RequestSpec). > The bigger concern I have is that we've long talked about moving port (and at > times volume) creation from the compute service to conductor because it's > less expensive to manage external resources there if something fails, e.g. > going over-quota creating volumes. The problem with failing late in the > compute is we have to cleanup other things (ports and volumes) and then > reschedule, which may also fail on the next alternate host. The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option? [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898 > Failing fast in > conductor is more efficient and also helps take some of the guesswork out of > which service is managing the resources (we've had countless bugs over the > years about ports and volumes being leaked because we didn't clean them up > properly on failure). Take a look at any of the error handling in the server > create flow in the ComputeManager and you'll see what I'm talking about. > > Anyway, if we're voting I vote that ARQ creation happens in conductor and > binding happens in compute. > > -- > > Thanks, > > Matt Regards, Sundar From aj at suse.com Fri Jun 7 05:24:32 2019 From: aj at suse.com (Andreas Jaeger) Date: Fri, 7 Jun 2019 07:24:32 +0200 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> Message-ID: <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> On 07/06/2019 06.34, Jason Rist wrote: > Follow-up - this work is now done. > > https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) > Not yet for ansible-role-tripleo-ui - please remove the repo from project-config and governance repo, step 4 and 5 of https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project are missing. Andreas -- Andreas Jaeger aj at suse.com Twitter: jaegerandi SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 From madhuri.kumari at intel.com Fri Jun 7 06:53:49 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 7 Jun 2019 06:53:49 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> Hi Jay, >-----Original Message----- >From: Jay Pipes [mailto:jaypipes at gmail.com] >The absence of a trait on a provider should be represented by the provider >not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that >you either place on the provider or do not place on a provider. > >The flavor should then either request that the trait be present on a provider >that the instance is scheduled to >(trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* >be present on a provider that the instance is scheduled to >(trait:CUSTOM_HYPERTHREADING=forbidden). > I understand that these traits are used for scheduling while server create in Nova. Whereas these traits means more to Ironic. Ironic runs multiple deploy steps matching the name of traits in flavor[1]. The use case explained in the email is about changing some BIOS configuration post server create. By changing the trait in flavor from CUSTOM_HYPERTHREADING_ON to CUSTOM_HYPERTHREADING_OFF, Ironic should run the matching deploy step to disable hyperthreading in BIOS and do a reboot. But currently there isn't a way in Nova about telling Ironic about the trait has changed in flavor, so perform the corresponding deploy steps. [1] https://docs.openstack.org/ironic/stein/admin/node-deployment.html#matching-deploy-templates Regards, Madhuri From balazs.gibizer at est.tech Fri Jun 7 07:38:31 2019 From: balazs.gibizer at est.tech (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Fri, 7 Jun 2019 07:38:31 +0000 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> Message-ID: <1559893108.11890.1@smtp.office365.com> On Fri, Jun 7, 2019 at 1:31 AM, Sean Mooney wrote: > On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: >> Let me TL;DR this: >> >> The forbidden aggregates filter spec [1] says when we put trait >> metadata >> onto a host aggregate, we should add the same trait to the compute >> node >> RPs for all the hosts in that aggregate, so that the feature >> actually >> works when we use it. >> >> But we never talked about what to do when we *remove* a trait from >> such >> an aggregate, or trash an aggregate with traits, or remove a host >> from >> such an aggregate. >> >> Here are the alternatives, as Vrushali laid them out (letters added >> by me): >> >> > (a) Leave all traits alone. If they need to be removed, it would >> have to >> > be manually via a separate step. >> > >> > (b) Support a new option so the caller can dictate whether the >> operation >> > should remove the traits. (This is all-or-none.) >> > >> > (c) Define a "namespace" - a trait substring - and remove only >> traits in >> > that namespace. >> >> I'm going to -1 (b). It's too big a hammer, at too big a cost >> (including >> API changes). >> >> > If Iʼm not wrong, for last two approaches, we would need to >> change >> > RestFul APIs. >> >> No, (c) does not. By "define a namespace" I mean we would establish >> a >> naming convention for traits to be used with this feature. For >> example: >> >> CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > i personaly dislike c as it means we cannot use any standard traits > in host > aggrates. > > there is also an option d. when you remove a trait form a host > aggregate for each host in > the aggregate check if that traits exists on another aggregate the > host is a member of and remove > it if not found on another aggregate. Besides possible performance impacts, I think this would be the logical behavior from nova to do. cheers, gibi From madhuri.kumari at intel.com Fri Jun 7 08:48:07 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 7 Jun 2019 08:48:07 +0000 Subject: '[Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC17DEA@BGSMSX102.gar.corp.intel.com> Hi Eric, >-----Original Message----- >From: Eric Fried [mailto:openstack at fried.cc] >Waitwait. The VM needs to be rebooted for the BIOS change to take effect? >So (non-live) resize would actually satisfy your use case just fine. But the >problem is that the ironic driver doesn't support resize at all? > Yes you're right here. But the resize in Nova does migration which we don't want to do. Just apply the new traits and IronicDriver will run the matching deploy steps. >Without digging too hard, that seems like it would be a fairly straightforward >thing to add. It would be limited to only "same host" >and initially you could only change this one attribute (anything else would >have to fail). > >Nova people, thoughts? > >efried >. Regards, Madhuri From Chris.Winnicki at windriver.com Thu Jun 6 21:38:56 2019 From: Chris.Winnicki at windriver.com (Winnicki, Chris) Date: Thu, 6 Jun 2019 21:38:56 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: References: Message-ID: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anirudh.Gupta at hsc.com Fri Jun 7 02:48:45 2019 From: Anirudh.Gupta at hsc.com (Anirudh Gupta) Date: Fri, 7 Jun 2019 02:48:45 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> References: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com> Message-ID: Hi Chris, I am pinging from one VM to another over graphical console, which is successful. But when I try to run ssh or iperf, then there is no success. I am using Ubuntu 16.04 Image, which is ssh enabled and yes the service sshd is running. I have created a flat network as well as vlan network and tried doing ssh/iperf on both, but with no success. There is no virtual router. I am suspecting the issue mentioned in the below bug https://bugs.launchpad.net/starlingx/+bug/1790514 But I have no understanding as why it is happening. Regards Anirudh Gupta (Senior Engineer) From: Winnicki, Chris Sent: 07 June 2019 03:09 To: Anirudh Gupta ; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Fri Jun 7 12:07:19 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Fri, 7 Jun 2019 08:07:19 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> Message-ID: <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> On 6/7/19 2:53 AM, Kumari, Madhuri wrote: > Hi Jay, > >> -----Original Message----- >> From: Jay Pipes [mailto:jaypipes at gmail.com] > > >> The absence of a trait on a provider should be represented by the provider >> not having a trait. Just have a single trait "CUSTOM_HYPERTHREADING" that >> you either place on the provider or do not place on a provider. >> >> The flavor should then either request that the trait be present on a provider >> that the instance is scheduled to >> (trait:CUSTOM_HYPERTHREADING=required) or that the trait should *not* >> be present on a provider that the instance is scheduled to >> (trait:CUSTOM_HYPERTHREADING=forbidden). >> > > I understand that these traits are used for scheduling while server create in Nova. Whereas these traits means more to Ironic. Ironic runs multiple deploy steps matching the name of traits in flavor[1]. > > The use case explained in the email is about changing some BIOS configuration post server create. By changing the trait in flavor from CUSTOM_HYPERTHREADING_ON to CUSTOM_HYPERTHREADING_OFF, Ironic should run the matching deploy step to disable hyperthreading in BIOS and do a reboot. > But currently there isn't a way in Nova about telling Ironic about the trait has changed in flavor, so perform the corresponding deploy steps. > > [1] https://docs.openstack.org/ironic/stein/admin/node-deployment.html#matching-deploy-templates Yes, I understand that theses aren't really traits but are actually configuration information. However, what I'm saying is that if you pass the flavor information during resize (as Eric has suggested), then you don't need *two* trait strings (one for CUSTOM_HYPERTHREADING_ON and one for CUSTOM_HYPERTHREADING_OFF). You only need the single CUSTOM_HYPERTHREADING trait and the driver should simply look for the absence of that trait (or, alternately, the flavor saying "=forbid" instead of "=required". Better still, add a standardized trait to os-traits for hyperthreading support, which is what I'd recommended in the original cpu-resource-tracking spec. Best, -jay From iurygregory at gmail.com Fri Jun 7 12:14:46 2019 From: iurygregory at gmail.com (Iury Gregory) Date: Fri, 7 Jun 2019 14:14:46 +0200 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? Message-ID: Greetings Ironicers! I would like to have your input on the matter of moving the ironic-prometheus-exporter to Ironic umbrella. *What is the ironic-prometheus-exporter? * The ironic-prometheus-exporter[1] provides a way to export hardware sensor data from Ironic project in OpenStack to Prometheus [2]. It's implemented as an oslo-messaging notification driver to get the sensor data and a Flask Application to export the metrics to Prometheus. It can not only be used in metal3-io but also in any OpenStack deployment which includes Ironic service. *How to ensure the sensor data will follow the Prometheus format?* We are using the prometheus client_python [3] to generate the file with the metrics that come trough the oslo notifier plugin. *How it will be tested on the gate?* Virtualbmc can't provide sensor data that the actual plugin supports. We would collect sample metrics from the hardware and use it in the unit tests. Maybe we should discuss this in the next ironic weekly meeting (10th June)? [1] https://github.com/metal3-io/ironic-prometheus-exporter [2] https://prometheus.io/ [3] https://github.com/prometheus/client_python -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Fri Jun 7 12:17:47 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 7 Jun 2019 13:17:47 +0100 (BST) Subject: [placement] update 19-22 Message-ID: HTML: https://anticdent.org/placement-update-19-22.html Welcome to placement update 19-22. # Most Important We are continuing to work through issues associated with the [spec for nested magic](https://review.opendev.org/662191). Unsurprisingly, there are edge cases where we need to be sure we're doing the right thing, both in terms of satisfying the use cases as well as making sure we don't violate the general model of how things are supposed to work. # What's Changed * We've had a few responses on [the thread to determine the fate of can_split](http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006726.html). The consensus at this point is to not worry about workloads that mix NUMA-aware guests with non-NUMA-aware on the same host. * Support forbidden traits (microversion 1.22) has been added to osc-placement. * Office hours will be 1500 UTC on Wednesdays. * os-traits 0.13.0 and 0.14.0 were released. * Code to optionaly run a [wsgi profiler](https://docs.openstack.org/placement/latest/contributor/testing.html#profiling) in placement has merged. * The request group mapping in allocation candidates spec has merged, more on that in themes, below. # Specs/Features * Support Consumer Types. This has some open questions that need to be addressed, but we're still go on the general idea. * Spec for nested magic 1. The easier parts of nested magic: same_subtree, resource request groups, verbose suffixes (already merged as 1.33). Recently some new discussion here. These and other features being considered can be found on the [feature worklist](https://storyboard.openstack.org/#!/worklist/594). Some non-placement specs are listed in the Other section below. # Stories/Bugs (Numbers in () are the change since the last pupdate.) There are 20 (1) stories in [the placement group](https://storyboard.openstack.org/#!/project_group/placement). 0 (0) are [untagged](https://storyboard.openstack.org/#!/worklist/580). 3 (1) are [bugs](https://storyboard.openstack.org/#!/worklist/574). 4 (0) are [cleanups](https://storyboard.openstack.org/#!/worklist/575). 11 (0) are [rfes](https://storyboard.openstack.org/#!/worklist/594). 2 (0) are [docs](https://storyboard.openstack.org/#!/worklist/637). If you're interested in helping out with placement, those stories are good places to look. * Placement related nova [bugs not yet in progress](https://goo.gl/TgiPXb) on launchpad: 15 (-1). * Placement related nova [in progress bugs](https://goo.gl/vzGGDQ) on launchpad: 7 (0). # osc-placement osc-placement is currently behind by 11 microversions. Pending Changes: * Add 'resource provider inventory update' command (that helps with aggregate allocation ratios). * Provide a useful message in the case of 500-error # Main Themes ## Nested Magic The overview of the features encapsulated by the term "nested magic" are in a [story](https://storyboard.openstack.org/#!/story/2005575). There is some in progress code, some of it WIPs to expose issues: * PoC: resourceless request, including some code from WIP: Allow RequestGroups without resources * Add NUMANetworkFixture for gabbits * Prepare objects for allocation request mappings. This work exposed a [bug in hash handling](https://storyboard.openstack.org/#!/story/2005822) that is [being fixed](https://review.opendev.org/663137). ## Consumer Types Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A [spec](https://review.opendev.org/654799) has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound. ## Cleanup We continue to do cleanup work to lay in reasonable foundations for the nested work above. As a nice bonus, we keep eking out additional performance gains too. * Add olso.middleware.cors to conf generator * Modernize CORS config and setup. Ed Leafe's ongoing work with using a graph database probably needs some kind of report or update. # Other Placement Miscellaneous changes can be found in [the usual place](https://review.opendev.org/#/q/project:openstack/placement+status:open). There are several [os-traits changes](https://review.opendev.org/#/q/project:openstack/os-traits+status:open) being discussed. # Other Service Users New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed. * Nova: spec: support virtual persistent memory * Nova: nova-manage: heal port allocations * nova-spec: Allow compute nodes to use DISK_GB from shared storage RP * Cyborg: Placement report * Nova: Spec to pre-filter disabled computes with placement * rpm-packaging: placement service * Delete resource providers for all nodes when deleting compute service * nova fix for: Drop source node allocations if finish_resize fails * nova: WIP: Hey let's support routed networks y'all! * starlingx: Add placement chart patch to openstack-helm * helm: WIP: add placement chart * kolla-ansible: Add a explanatory note for "placement_api_port" * neutron-spec: L3 agent capacity and scheduling * Nova: Use OpenStack SDK for placement * puppet: Implement generic placement::config::placement_config * puppet: Add parameter for `scheduler/query_placement_for_image_type_support` * Nova: Spec: Provider config YAML file * Nova: single pass instance info fetch in host manager * Watcher: Add Placement helper * docs: Add Placement service to Minimal deployment for Stein * devstack: Add setting of placement microversion on tempest conf * libvirt: report pmem namespaces resources by provider tree * Nova: Defaults missing group_policy to 'none' * Nova: Remove PlacementAPIConnectFailure handling from AggregateAPI # End Making good headway. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From smooney at redhat.com Fri Jun 7 12:23:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 13:23:31 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <1559893108.11890.1@smtp.office365.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <1559893108.11890.1@smtp.office365.com> Message-ID: <249c58bce0376638f713d2165782b6f094e319b3.camel@redhat.com> On Fri, 2019-06-07 at 07:38 +0000, Balázs Gibizer wrote: > > On Fri, Jun 7, 2019 at 1:31 AM, Sean Mooney wrote: > > On Thu, 2019-06-06 at 16:52 -0500, Eric Fried wrote: > > > Let me TL;DR this: > > > > > > The forbidden aggregates filter spec [1] says when we put trait > > > metadata > > > onto a host aggregate, we should add the same trait to the compute > > > node > > > RPs for all the hosts in that aggregate, so that the feature > > > actually > > > works when we use it. > > > > > > But we never talked about what to do when we *remove* a trait from > > > such > > > an aggregate, or trash an aggregate with traits, or remove a host > > > from > > > such an aggregate. > > > > > > Here are the alternatives, as Vrushali laid them out (letters added > > > by me): > > > > > > > (a) Leave all traits alone. If they need to be removed, it would > > > have to > > > > be manually via a separate step. > > > > > > > > (b) Support a new option so the caller can dictate whether the > > > operation > > > > should remove the traits. (This is all-or-none.) > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only > > > traits in > > > > that namespace. > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost > > > (including > > > API changes). > > > > > > > If Iʼm not wrong, for last two approaches, we would need to > > > change > > > > RestFul APIs. > > > > > > No, (c) does not. By "define a namespace" I mean we would establish > > > a > > > naming convention for traits to be used with this feature. For > > > example: > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > i personaly dislike c as it means we cannot use any standard traits > > in host > > aggrates. > > > > there is also an option d. when you remove a trait form a host > > aggregate for each host in > > the aggregate check if that traits exists on another aggregate the > > host is a member of and remove > > it if not found on another aggregate. > > Besides possible performance impacts, I think this would be the logical > behavior from nova to do. option d is worstcase aproxmatly O(NlogN) but is technicall between O(n) and O(NM) where N is the number of instance and M is the maxium number of aggrates a host is a memeber of. so it grows non linearly but the plroblem is not quadratic and is much closer to O(N) or O(NlogN) then it is to O(N^2) so long if we are smart about how we look up the data form the db its probably ok. we are basically asking for all the host in this aggrate, give me the hosts that dont have another aggrate with the trait i am about to remove form this aggragte. those host are the set of host we need to update in placemnt and sql is good at anserwing that type of question. if we do this in python it will make me sad. > > cheers, > gibi > From zoltan.langi at namecheap.com Fri Jun 7 12:53:56 2019 From: zoltan.langi at namecheap.com (Zoltan Langi) Date: Fri, 7 Jun 2019 14:53:56 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Message-ID: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. (I initially followed this ASAP2 guide, works well: https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf (page15) So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. The problem is only one direction of the traffic is offloaded when LAG is being used. I opened a mellanox case and they recommended to install the latest ovs version which I did: https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. Does anyone has any experience or any idea what should I look our for or check? Thank you very much, anything is appreciated! Zoltan From openstack at fried.cc Fri Jun 7 14:05:58 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 7 Jun 2019 09:05:58 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> Message-ID: <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> >>> (a) Leave all traits alone. If they need to be removed, it would have to >>> be manually via a separate step. >>> >>> (b) Support a new option so the caller can dictate whether the operation >>> should remove the traits. (This is all-or-none.) >>> >>> (c) Define a "namespace" - a trait substring - and remove only traits in >>> that namespace. >> >> I'm going to -1 (b). It's too big a hammer, at too big a cost (including >> API changes). >> >>> If I’m not wrong, for last two approaches, we would need to change >>> RestFul APIs. >> >> No, (c) does not. By "define a namespace" I mean we would establish a >> naming convention for traits to be used with this feature. For example: >> >> CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > i personaly dislike c as it means we cannot use any standard traits in host > aggrates. Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's the whole point. If we want to isolate our hyperthreading hosts, we put them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here should be a no-op because those hosts should already have HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a host, or destroy the aggregate, or whatever, we *don't want* HW_CPU_HYPERTHREADING to be removed from the providers, because they can still do that. (Unless you mean we can't make a standard trait that we can use for isolation that gets (conditionally) removed in these scenarios? There's nothing preventing us from creating a standard trait called COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the same.) > there is also an option d. when you remove a trait form a host aggregate for each host in > the aggregate check if that traits exists on another aggregate the host is a member of and remove > it if not found on another aggregate. Sorry I wasn't clear, (b) also does this ^ but with the condition that also checks for the _AGGREGATE_ISOLATION_ infix. > for c you still have to deal with the fact a host can be in multiple host aggrates too by > the way so jsut because a thread is namespace d and it is removed from an aggrate does not > mean its correcct to remove it from a host. Right - in reality, there should be one algorithm, idempotent, to sync host RP traits when anything happens to aggregates. It always goes out and does the appropriate {set} math to decide which traits should exist on which hosts and effects any necessary changes. And yes, the performance will suck in a large deployment, because we have to get all the compute RPs in all the aggregates (even the ones with no trait metadata) to do that calculation. But aggregate operations are fairly rare, aren't they? Perhaps this is where we provide a nova-manage tool to do (b)'s sync manually (which we'll surely have to do anyway as a "heal" or for upgrades). So if you're not using the feature, you don't suffer the penalty. > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > will not be managed by the compute driver. host aggages would be a convient way > to be able to manage that trait if this was a generic feature. Oh, okay, yeah, I don't accept this as a use case for this feature. It will work, but we shouldn't recommend it precisely because it's asymmetrical (you can't remove the trait by removing it from the aggregate). There are other ways to add a random trait to all hosts in an aggregate (for host in `get providers in aggregate`; do openstack resource provider trait add ...; done). But for the sake of discussion, what about: (e) Fully manual. Aggregate operations never touch (add or remove) traits on host RPs. You always have to do that manually. As noted above, it's easy to do - and we could make it easier with a tiny wrapper that takes an aggregate, a list of traits, and an --add/--remove command. So initially, setting up aggregate isolation is a two-step process, and in the future we can consider making new API/CLI affordance that combines the steps. efried . From openstack at fried.cc Fri Jun 7 15:23:48 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 7 Jun 2019 10:23:48 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> Message-ID: <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> > Better still, add a standardized trait to os-traits for hyperthreading > support, which is what I'd recommended in the original > cpu-resource-tracking spec. HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since 0.8.0. efried [1] https://review.opendev.org/#/c/576030/ From mriedemos at gmail.com Fri Jun 7 16:01:36 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 7 Jun 2019 11:01:36 -0500 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> References: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> Message-ID: <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> On 6/4/2019 5:45 PM, Clark Boylan wrote: > Once this is in you can push "Do Not Merge" changes to your zuul config that reparent your tests from "base" to "base-test" and that will run the jobs without the zuul-cloner shim. The nova dsvm jobs inherit from legacy-dsvm-base which doesn't have a parent (it's abstract). Given that, how would I go about testing this change on the nova legacy jobs? -- Thanks, Matt From fungi at yuggoth.org Fri Jun 7 16:42:45 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Fri, 7 Jun 2019 16:42:45 +0000 Subject: [all] Long overdue cleanups of Zuulv2 compatibility base configs In-Reply-To: <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> References: <5980f30f-d4c7-4e0d-8d3a-984e2dcd707a@www.fastmail.com> <6e5aad37-75ab-a53f-39fb-7b1de68dbeb2@gmail.com> Message-ID: <20190607164245.ilaxh25xsopl3n22@yuggoth.org> On 2019-06-07 11:01:36 -0500 (-0500), Matt Riedemann wrote: [...] > The nova dsvm jobs inherit from legacy-dsvm-base which doesn't > have a parent (it's abstract). Given that, how would I go about > testing this change on the nova legacy jobs? Sorry, I meant to get shim changes up yesterday for making this easier to test. Now you can add: Depends-On: https://review.opendev.org/663996 -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jaypipes at gmail.com Fri Jun 7 16:59:39 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Fri, 7 Jun 2019 12:59:39 -0400 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> Message-ID: <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> On 6/7/19 11:23 AM, Eric Fried wrote: >> Better still, add a standardized trait to os-traits for hyperthreading >> support, which is what I'd recommended in the original >> cpu-resource-tracking spec. > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > 0.8.0. Excellent, I had a faint recollection of that... -jay From mnaser at vexxhost.com Fri Jun 7 17:42:50 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:42:50 -0400 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? In-Reply-To: References: Message-ID: Hi Iury, This seems pretty awesome. I threw in some comments On Fri, Jun 7, 2019 at 11:08 AM Iury Gregory wrote: > > Greetings Ironicers! > > I would like to have your input on the matter of moving the ironic-prometheus-exporter to Ironic umbrella. > > What is the ironic-prometheus-exporter? > The ironic-prometheus-exporter[1] provides a way to export hardware sensor data from > Ironic project in OpenStack to Prometheus [2]. It's implemented as an oslo-messaging notification driver to get the sensor data and a Flask Application to export the metrics to Prometheus. It can not only be used in metal3-io but also in any OpenStack deployment which includes Ironic service. This seems really neat. From my perspective, it seems like it waits for notifications, and then writes it out to a file. The flask server seems to do nothing but pretty much serve the contents at /metrics. I think we should be doing more of this inside OpenStack to be honest and this can be really useful in the perspective of operators. I don't want to complicate this more however, but I would love for this to be a pattern/framework that other projects can adopt. > How to ensure the sensor data will follow the Prometheus format? > We are using the prometheus client_python [3] to generate the file with the metrics that come trough the oslo notifier plugin. > > How it will be tested on the gate? > Virtualbmc can't provide sensor data that the actual plugin supports. We would collect sample metrics from the hardware and use it in the unit tests. > > Maybe we should discuss this in the next ironic weekly meeting (10th June)? > > [1] https://github.com/metal3-io/ironic-prometheus-exporter > [2] https://prometheus.io/ > [3] https://github.com/prometheus/client_python > > -- > Att[]'s > Iury Gregory Melo Ferreira > MSc in Computer Science at UFCG > Part of the puppet-manager-core team in OpenStack > Software Engineer at Red Hat Czech > Social: https://www.linkedin.com/in/iurygregory > E-mail: iurygregory at gmail.com -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From mnaser at vexxhost.com Fri Jun 7 17:52:03 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:52:03 -0400 Subject: [nova][kolla][openstack-ansible][tripleo] Cells v2 upgrades In-Reply-To: References: Message-ID: On Wed, Jun 5, 2019 at 5:16 AM Mark Goddard wrote: > > Hi, > > At the recent kolla virtual PTG [1] we had a good discussion about > adding support for multiple nova cells in kolla-ansible. We agreed a > key requirement is to be able to perform operations on one or more > cells without affecting the rest for damage limitation. This also > seems like it would apply to upgrades. We're seeking input on > ordering. Looking at the nova upgrade guide [2] I might propose > something like this: > > 1. DB syncs > 2. Upgrade API, super conductor > > For each cell: > 3a. Upgrade cell conductor > 3b. Upgrade cell computes > > 4. SIGHUP all services Unfortunately, this is a problem right now: https://review.opendev.org/#/c/641907/ I sat down at the PTG to settle this down, I was going to finish this patch up but I didn't get around to it. That might be an action item to be able to do this successfully. > 5. Run online migrations > > At some point in here we also need to run the upgrade check. > Presumably between steps 1 and 2? > > It would be great to get feedback both from the nova team and anyone > running cells > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://docs.openstack.org/nova/latest/user/upgrade.html > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From jm at artfiles.de Fri Jun 7 13:15:19 2019 From: jm at artfiles.de (Jan Marquardt) Date: Fri, 7 Jun 2019 15:15:19 +0200 Subject: Neutron with LBA and BGP-EVPN over IP fabric Message-ID: Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From Chris.Winnicki at windriver.com Fri Jun 7 14:03:29 2019 From: Chris.Winnicki at windriver.com (Winnicki, Chris) Date: Fri, 7 Jun 2019 14:03:29 +0000 Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm In-Reply-To: References: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE3D9@ALA-MBD.corp.ad.wrs.com>, Message-ID: <7E4792BA14B1DE4BAB354DF77FE0233ABC8CE67D@ALA-MBD.corp.ad.wrs.com> Anirudh: Have you tried the workaround mentioned in comment #7 in the bug report ? Are you able to ssh to localhost (ssh to VM itself from within the VM) Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 10:48 PM To: Winnicki, Chris; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Chris, I am pinging from one VM to another over graphical console, which is successful. But when I try to run ssh or iperf, then there is no success. I am using Ubuntu 16.04 Image, which is ssh enabled and yes the service sshd is running. I have created a flat network as well as vlan network and tried doing ssh/iperf on both, but with no success. There is no virtual router. I am suspecting the issue mentioned in the below bug https://bugs.launchpad.net/starlingx/+bug/1790514 But I have no understanding as why it is happening. Regards Anirudh Gupta (Senior Engineer) From: Winnicki, Chris Sent: 07 June 2019 03:09 To: Anirudh Gupta ; openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: RE: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Anirudh, can you provide some details with respect to: 1) How are you pinging from one VM to the other (is it over the graphical console ? or namespace ?) 2) What VM image are you using? - Is the VM image enabled for SSH with password ? (assuming sshd is running) 3) Network topology 4) Are you trying to ssh from one VM to the other or from a different network segment? Is there a virtual router in the picture, etc.. Chris Winnicki chris.winnicki at windriver.com 613-963-1329 ________________________________ From: Anirudh Gupta [Anirudh.Gupta at hsc.com] Sent: Thursday, June 06, 2019 12:59 AM To: openstack at lists.openstack.org; openstack-dev at lists.openstack.org; starlingx-announce at lists.starlingx.io; starlingx-discuss at lists.starlingx.io Subject: [Starlingx-discuss] Unable to run ssh/iperf on StarlingX Vm Hi Team, I have created All in one simplex setup using release 2018.10. I have spawned 2 VM’s on it. The ping is successful between the VM’s, but I am unable to ssh or run iperf on it. Can you please help me in resolving the issue. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Fri Jun 7 17:54:44 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Fri, 7 Jun 2019 13:54:44 -0400 Subject: [openstack-ansible][powervm] dropping support In-Reply-To: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> References: <2C11DAD5-1ED6-409B-9374-0CB86059E5E2@us.ibm.com> Message-ID: On Tue, Jun 4, 2019 at 8:00 AM William M Edmonds - edmondsw at us.ibm.com wrote: > > On 5/31/19, 6:46 PM, "Mohammed Naser" wrote: > > > > Hi everyone, > > > > I've pushed up a patch to propose dropping support for PowerVM support > > inside OpenStack Ansible. There has been no work done on this for a > > few years now, the configured compute driver is the incorrect one for > > ~2 years now which indicates that no one has been able to use it for > > that long. > > > > It would be nice to have this driver however given the infrastructure > > we have upstream, there would be no way for us to effectively test it > > and bring it back to functional state. I'm proposing that we remove > > the code here: > > https://review.opendev.org/#/c/662587 powervm: drop support > > > > If you're using this code and would like to contribute to fixing it > > and (somehow) adding coverage, please reach out, otherwise, we'll drop > > this code to clean things up. > > Sadly, I don't know of anyone using it or willing to maintain it at this time. > :( We've merged that commit and we've now dropped PowerVM support for now (alongside the +1 Eric has provided too). We'd be happy to support it again if/when someone steps up :) -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From smooney at redhat.com Fri Jun 7 18:04:44 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 19:04:44 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> Message-ID: On Fri, 2019-06-07 at 09:05 -0500, Eric Fried wrote: > > > > (a) Leave all traits alone. If they need to be removed, it would have to > > > > be manually via a separate step. > > > > > > > > (b) Support a new option so the caller can dictate whether the operation > > > > should remove the traits. (This is all-or-none.) > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > > > that namespace. > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > > > API changes). > > > > > > > If I’m not wrong, for last two approaches, we would need to change > > > > RestFul APIs. > > > > > > No, (c) does not. By "define a namespace" I mean we would establish a > > > naming convention for traits to be used with this feature. For example: > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > i personaly dislike c as it means we cannot use any standard traits in host > > aggrates. > > Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's > the whole point. If we want to isolate our hyperthreading hosts, we put > them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here > should be a no-op because those hosts should already have > HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a > host, or destroy the aggregate, or whatever, we *don't want* > HW_CPU_HYPERTHREADING to be removed from the providers, because they can > still do that. in the cpu pinning spec we said HW_CPU_HYPERTHREADING was not going to be managed by the virt driver so it wont be reported unless the admin manulaly adds it. https://github.com/openstack/nova-specs/blob/master/specs/train/approved/cpu-resources.rst#add-hw_cpu_hyperthreading-trait "The HW_CPU_HYPERTHREADING trait will need to be among the traits that the virt driver cannot always override, since the operator may want to indicate that a single NUMA node on a multi-NUMA-node host is meant for guests that tolerate hyperthread siblings as dedicated CPUs." so i was suggesting this was a way to enable that the operator to manage whic host report that trait although as the spec suggest we may want to report this differently per numa node which would still require you to use osc-placment or some other way to set it manually. > > (Unless you mean we can't make a standard trait that we can use for > isolation that gets (conditionally) removed in these scenarios? There's > nothing preventing us from creating a standard trait called > COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the > same.) im suggesting it woudl be nice to be able to use host aggates to manage statdard or custom traits on hosts that are not managed by the driver thwer ethat is a COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE trait or something else. so i was hoping to make this feature more reusable for other usecase in the future. for example it would be nice to be able to say this set of host has the CUSTOM_DPDK_NETWORKING trait by putting them in a host aggrage and then adding a forbindent trait to my non hugepage backed guests. > > > there is also an option d. when you remove a trait form a host aggregate for each host in > > the aggregate check if that traits exists on another aggregate the host is a member of and remove > > it if not found on another aggregate. > > Sorry I wasn't clear, (b) also does this ^ but with the condition that > also checks for the _AGGREGATE_ISOLATION_ infix. > > > for c you still have to deal with the fact a host can be in multiple > > host aggrates too by > > the way so jsut because a thread is namespace d and it is removed from > > an aggrate does not > > mean its correcct to remove it from a host. > > Right - in reality, there should be one algorithm, idempotent, to sync > host RP traits when anything happens to aggregates. It always goes out > and does the appropriate {set} math to decide which traits should exist > on which hosts and effects any necessary changes. > > And yes, the performance will suck in a large deployment, because we > have to get all the compute RPs in all the aggregates (even the ones > with no trait metadata) to do that calculation. But aggregate operations > are fairly rare, aren't they? > > Perhaps this is where we provide a nova-manage tool to do (b)'s sync > manually (which we'll surely have to do anyway as a "heal" or for > upgrades). So if you're not using the feature, you don't suffer the penalty. > > > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > > will not be managed by the compute driver. host aggages would be a convient way > > to be able to manage that trait if this was a generic feature. > > Oh, okay, yeah, I don't accept this as a use case for this feature. It > will work, but we shouldn't recommend it precisely because it's > asymmetrical (you can't remove the trait by removing it from the > aggregate). why not we do not expect the virt driver to report the hypertreading trait since we said it can be extrenally managed. even if we allowed the virt drvier to conditionally report it only when it frst creates a RP it is not allowed to readd if it is remvoed by someone else. > There are other ways to add a random trait to all hosts in > an aggregate (for host in `get providers in aggregate`; do openstack > resource provider trait add ...; done). > > But for the sake of discussion, what about: > > (e) Fully manual. Aggregate operations never touch (add or remove) > traits on host RPs. You always have to do that manually. As noted above, > it's easy to do - and we could make it easier with a tiny wrapper that > takes an aggregate, a list of traits, and an --add/--remove command. So > initially, setting up aggregate isolation is a two-step process, and in > the future we can consider making new API/CLI affordance that combines > the steps. > > efried > . > From smooney at redhat.com Fri Jun 7 18:15:45 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 07 Jun 2019 19:15:45 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> Message-ID: <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> On Fri, 2019-06-07 at 19:04 +0100, Sean Mooney wrote: > On Fri, 2019-06-07 at 09:05 -0500, Eric Fried wrote: > > > > > (a) Leave all traits alone. If they need to be removed, it would have to > > > > > be manually via a separate step. > > > > > > > > > > (b) Support a new option so the caller can dictate whether the operation > > > > > should remove the traits. (This is all-or-none.) > > > > > > > > > > (c) Define a "namespace" - a trait substring - and remove only traits in > > > > > that namespace. > > > > > > > > I'm going to -1 (b). It's too big a hammer, at too big a cost (including > > > > API changes). > > > > > > > > > If I’m not wrong, for last two approaches, we would need to change > > > > > RestFul APIs. > > > > > > > > No, (c) does not. By "define a namespace" I mean we would establish a > > > > naming convention for traits to be used with this feature. For example: > > > > > > > > CUSTOM_AGGREGATE_ISOLATION_WINDOWS_LICENSE > > > > > > i personaly dislike c as it means we cannot use any standard traits in host > > > aggrates. > > > > Actually, it means we *can*, whereas (b) and (d) mean we *can't*. That's > > the whole point. If we want to isolate our hyperthreading hosts, we put > > them in an aggregate with HW_CPU_HYPERTHREADING on it. The sync here > > should be a no-op because those hosts should already have > > HW_CPU_HYPERTHREADING on them. And then if we decide to remove such a > > host, or destroy the aggregate, or whatever, we *don't want* > > HW_CPU_HYPERTHREADING to be removed from the providers, because they can > > still do that. > > in the cpu pinning spec we said HW_CPU_HYPERTHREADING was not going to be managed > by the virt driver so it wont be reported unless the admin manulaly adds it. > > https://github.com/openstack/nova-specs/blob/master/specs/train/approved/cpu-resources.rst#add-hw_cpu_hyperthreading-trait > "The HW_CPU_HYPERTHREADING trait will need to be among the traits that the virt driver cannot always override, since > the > operator may want to indicate that a single NUMA node on a multi-NUMA-node host is meant for guests that tolerate > hyperthread siblings as dedicated CPUs." > > so i was suggesting this was a way to enable that the operator to manage whic host report that trait > although as the spec suggest we may want to report this differently per numa node which would still > require you to use osc-placment or some other way to set it manually. > > > > (Unless you mean we can't make a standard trait that we can use for > > isolation that gets (conditionally) removed in these scenarios? There's > > nothing preventing us from creating a standard trait called > > COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE, which would work just the > > same.) > > im suggesting it woudl be nice to be able to use host aggates to manage statdard or custom traits on hosts that are > not managed by the driver thwer ethat is a COMPUTE_AGGREGATE_ISOLATION_WINDOWS_LICENSE trait or something > else. so i was hoping to make this feature more reusable for other usecase in the future. for example it would be nice > to be able to say this set of host has the CUSTOM_DPDK_NETWORKING trait by putting them in a host aggrage and then > adding a forbindent trait to my non hugepage backed guests. > > > > > there is also an option d. when you remove a trait form a host aggregate for each host in > > > the aggregate check if that traits exists on another aggregate the host is a member of and remove > > > it if not found on another aggregate. > > > > Sorry I wasn't clear, (b) also does this ^ but with the condition that > > also checks for the _AGGREGATE_ISOLATION_ infix. > > > > > for c you still have to deal with the fact a host can be in multiple > > > > host aggrates too by > > > the way so jsut because a thread is namespace d and it is removed from > > > > an aggrate does not > > > mean its correcct to remove it from a host. > > > > Right - in reality, there should be one algorithm, idempotent, to sync > > host RP traits when anything happens to aggregates. It always goes out > > and does the appropriate {set} math to decide which traits should exist > > on which hosts and effects any necessary changes. > > > > And yes, the performance will suck in a large deployment, because we > > have to get all the compute RPs in all the aggregates (even the ones > > with no trait metadata) to do that calculation. But aggregate operations > > are fairly rare, aren't they? > > > > Perhaps this is where we provide a nova-manage tool to do (b)'s sync > > manually (which we'll surely have to do anyway as a "heal" or for > > upgrades). So if you're not using the feature, you don't suffer the penalty. > > > > > for example we talked about useing a hyperthreading trait in the cpu pinning spec which > > > will not be managed by the compute driver. host aggages would be a convient way > > > to be able to manage that trait if this was a generic feature. > > > > Oh, okay, yeah, I don't accept this as a use case for this feature. It > > will work, but we shouldn't recommend it precisely because it's > > asymmetrical (you can't remove the trait by removing it from the > > aggregate). > > why not we do not expect the virt driver to report the hypertreading trait > since we said it can be extrenally managed. even if we allowed the virt drvier to > conditionally report it only when it frst creates a RP it is not allowed to readd > if it is remvoed by someone else. > > > There are other ways to add a random trait to all hosts in > > an aggregate (for host in `get providers in aggregate`; do openstack > > resource provider trait add ...; done). > > > > But for the sake of discussion, what about: > > > > (e) Fully manual. Aggregate operations never touch (add or remove) > > traits on host RPs. You always have to do that manually. As noted above, > > it's easy to do - and we could make it easier with a tiny wrapper that > > takes an aggregate, a list of traits, and an --add/--remove command. So > > initially, setting up aggregate isolation is a two-step process, and in > > the future we can consider making new API/CLI affordance that combines > > the steps. ya e could work too. melanie added a similar functionality to osc placment for managing the alloction ratios of specific resource classes per aggregate a few months ago https://review.opendev.org/#/c/640898/ we could proably provide somthing similar for managing traits but determining what RP to add the trait too would be a littel tricker. we would have to be able to filter to RP with either a specific inventory or with a specific trait or in a speicic subtree. you could have a --root or somthing to jsut say add or remove the tratit from the root RPs in an aggregate. but yes you could certely automate this in a simile cli extention. > > > > efried > > . > > > > From ssbarnea at redhat.com Fri Jun 7 18:38:57 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Fri, 7 Jun 2019 19:38:57 +0100 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule Message-ID: Hi! While we do now have a POC job running molecule with tox for testing one tripleo-common role, I would like to ask for some feedback from running the same test locally, on your dev box. The report generated but openstack-tox-mol job looks like http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html https://review.opendev.org/#/c/663336/ Just download it and run: tox -e mol You will either need docker or at least to define DOCKER_HOST=ssh://somehost as an alternative. Please send the feedback back to me or directly on the the review. Over the last days I fixed few minor issues related to differences between user environments and I want to make I improve it as much as possible. Thanks Sorin Sbarnea Tripleo CI -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshele at mellanox.com Fri Jun 7 19:05:46 2019 From: moshele at mellanox.com (Moshe Levi) Date: Fri, 7 Jun 2019 19:05:46 +0000 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: Hi Zoltan, What OS and kernel are you using? -----Original Message----- From: Zoltan Langi Sent: Friday, June 7, 2019 3:54 PM To: openstack-discuss at lists.openstack.org Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. (I initially followed this ASAP2 guide, works well: https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf (page15) So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. The problem is only one direction of the traffic is offloaded when LAG is being used. I opened a mellanox case and they recommended to install the latest ovs version which I did: https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. Does anyone has any experience or any idea what should I look our for or check? Thank you very much, anything is appreciated! Zoltan From zoltan.langi at namecheap.com Fri Jun 7 19:44:42 2019 From: zoltan.langi at namecheap.com (Zoltan Langi) Date: Fri, 7 Jun 2019 21:44:42 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: Hello Moshe, OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-generic According to Mellanox this os is definitely supported. Zoltan On 07.06.19 21:05, Moshe Levi wrote: > Hi Zoltan, > > What OS and kernel are you using? > > -----Original Message----- > From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM > To: openstack-discuss at lists.openstack.org > Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan > > Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. > > I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox > ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. > > When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. > > (I initially followed this ASAP2 guide, works well: > https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) > > Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: > > https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf > (page15) > > So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. > > The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. > > The problem is only one direction of the traffic is offloaded when LAG is being used. > > I opened a mellanox case and they recommended to install the latest ovs version which I did: > > https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem > > After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. > > Does anyone has any experience or any idea what should I look our for or check? > > > Thank you very much, anything is appreciated! > > Zoltan > > > > > From gagehugo at gmail.com Fri Jun 7 19:51:06 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Fri, 7 Jun 2019 14:51:06 -0500 Subject: [Security SIG] Weekly Newsletter - June 06th 2019 Message-ID: #Week of: 06 June 2019 - Security SIG Meeting Info: http://eavesdrop.openstack.org/#Security_SIG_meeting - Weekly on Thursday at 1500 UTC in #openstack-meeting - Agenda: https://etherpad.openstack.org/p/security-agenda - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG #Meeting Notes - Summary: http://eavesdrop.openstack.org/meetings/security/2019/security.2019-06-06-15.01.html - This week we discussed the [openstack-security] mailing list usage. Currently it's only being used for launchpad to send automated notifications on security bugs. Due to this, we are considering designating the [openstack-security] mailing list to only be used for automated notifications and rewording the description for the mailing list to state this for clarity. If anyone is wanting to ask questions about security related questions, we will suggest using the -discussion mailing list. - We will discuss more on this next week and hammer out the final details. ## News - the scientific sig meeting this week featured a discussion on secure computing environments, if anyone here is interested in the transcript or wants to reach out to the participants about anything: - http://eavesdrop.openstack.org/meetings/scientific_sig/2019/scientific_sig.2019-06-05-11.00.log.html#l-93 - Image Encryption pop-up team's proposal: - https://review.opendev.org/#/c/661983/ - Storyboard work is nearly in place for a mechanism to auto-assign security teams to private stories marked "security" - https://review.opendev.org/#/q/topic:security-teams # VMT Reports - A full list of publicly marked security issues can be found here: https://bugs.launchpad.net/ossa/ - No new public security bugs this week -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Fri Jun 7 20:16:29 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 7 Jun 2019 15:16:29 -0500 Subject: [nova] [cyborg] Impact of moving bind to compute In-Reply-To: <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> References: <1CC272501B5BC543A05DB90AA509DED52757522F@fmsmsx122.amr.corp.intel.com> <9f9ea648-34a7-9783-3372-40325a8ced27@gmail.com> <1CC272501B5BC543A05DB90AA509DED527591596@fmsmsx122.amr.corp.intel.com> Message-ID: <1533bd72-9f26-2873-4976-2bf25620acaa@gmail.com> On 6/7/2019 12:17 AM, Nadathur, Sundar wrote: > The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option? > > [1]https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898 If we created the ARQs in compute I think we'd do it in the ComputeManager._build_resources method to be consistent with where we create volumes and ports. My bigger point is if the ARQ creation fails in compute for whatever reason, then we have to rollback any other resources we create (ports and volumes) which gets messy. Doing the ARQ creation before _build_resources in ComputeManager (what you're suggesting) would side-step that bit but then we've got inconsistencies in where the server create flow creates external resources within the compute service, which I don't love. So I think if we're going to do the ARQ creation early then we should do it in the conductor so we can fail fast and avoid a reschedule from the compute. -- Thanks, Matt From emilien at redhat.com Fri Jun 7 20:16:58 2019 From: emilien at redhat.com (Emilien Macchi) Date: Fri, 7 Jun 2019 16:16:58 -0400 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule In-Reply-To: References: Message-ID: On Fri, Jun 7, 2019 at 3:50 PM Sorin Sbarnea wrote: > Hi! > > While we do now have a POC job running molecule with tox for testing one > tripleo-common role, I would like to ask for some feedback from running the > same test locally, on your dev box. > The report generated but openstack-tox-mol job looks like > http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html > > https://review.opendev.org/#/c/663336/ > Just download it and run: > tox -e mol > > You will either need docker or at least to define DOCKER_HOST= > ssh://somehost as an alternative. > is there a driver for podman? if yes, prefer it over docker on fedora. Otherwise, cool! Thanks for this work. It'll be useful with the forthcoming work in tripleo-ansible. -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Fri Jun 7 22:58:00 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Fri, 07 Jun 2019 15:58:00 -0700 Subject: [keystone] Keystone Team Update - Week of 3 June 2019 Message-ID: <40595d60-a8da-4656-aa60-c55a85e4c509@www.fastmail.com> # Keystone Team Update - Week of 3 June 2019 ## News ### Milestone 1 Check-in We scheduled our Milestone 1 meeting[1] and I proposed a draft agenda[2]. Looking forward to a productive meeting! [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006783.html [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006976.html ### Sphinx update and the gate This week the release of Sphinx 2.1.0 broke our documentation builds. The issue[3] is that module constants are now automatically included as members in the document, and we were using constants as shorthand for external, imported modules that had broken docstrings. We've come up with a workaround[4], but as Elod pointed out, we're now documenting constants like CONF, LOG, and others which is an unexpected change in behavior. [3] https://github.com/sphinx-doc/sphinx/issues/6447 [4] https://review.opendev.org/663373 ### Expiring users We discussed[5] the work needed[6] to allow federated users to create and use application credentials, which formerly was making application credentials refreshable and is now being reworked to move the refresh layer upwards, but we had differring recollections on whether that layer was with the user or the grant. Kristi will update the spec to mention both options and the implementation and user experience implications for each option. [5] http://eavesdrop.openstack.org/meetings/keystone/2019/keystone.2019-06-04-16.00.log.html#l-37 [6] https://review.opendev.org/604201 ## Open Specs Train specs: https://bit.ly/2uZ2tRl Ongoing specs: https://bit.ly/2OyDLTh All specs for Train should now be proposed, new specs will not be accepted for Train after this point. Please provide and respond to feedback on open specs so that we can merge them in a timely manner. ## Recently Merged Changes Search query: https://bit.ly/2pquOwT We merged 3 changes this week. ## Changes that need Attention Search query: https://bit.ly/2tymTje There are 42 changes that are passing CI, not in merge conflict, have no negative reviews and aren't proposed by bots. ## Bugs This week we opened 5 new bugs and closed 2. Bugs opened (5)  Bug #1831918 (keystone:Medium) opened by Nathan Oyler https://bugs.launchpad.net/keystone/+bug/1831918  Bug #1831400 (keystone:Undecided) opened by Brin Zhang https://bugs.launchpad.net/keystone/+bug/1831400  Bug #1832005 (keystone:Undecided) opened by Maciej Kucia https://bugs.launchpad.net/keystone/+bug/1832005  Bug #1831791 (keystonemiddleware:Undecided) opened by Nathan Oyler https://bugs.launchpad.net/keystonemiddleware/+bug/1831791  Bug #1831406 (oslo.limit:Undecided) opened by jacky06 https://bugs.launchpad.net/oslo.limit/+bug/1831406  Bugs closed (2)  Bug #1831400 (keystone:Undecided) https://bugs.launchpad.net/keystone/+bug/1831400  Bug #1831791 (keystonemiddleware:Undecided) https://bugs.launchpad.net/keystonemiddleware/+bug/1831791 ## Milestone Outlook https://releases.openstack.org/train/schedule.html Today is the last day to submit spec proposals for Train. The spec freeze is on the Train-2 milestone next month. Focus now should be on reviewing and updating specs. It's also not too early to get started on feature implementations. ## Help with this newsletter Help contribute to this newsletter by editing the etherpad: https://etherpad.openstack.org/p/keystone-team-newsletter From michjo at viviotech.net Sat Jun 8 00:11:07 2019 From: michjo at viviotech.net (Jordan Michaels) Date: Fri, 7 Jun 2019 17:11:07 -0700 (PDT) Subject: [Glance] Can Glance be installed on a server other than the controller? Message-ID: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Hi Folks, First time posting here so apologies if this question is inappropriate for this list. Just a quick question to see if Glance can be installed on a server other than the controller? By following the installation docs for Rocky I can get Glance installed just fine on the controller (works great!), but following that same documentation on a separate server I cannot get it to authenticate. It's probably just something I'm doing, but I've run out of ideas on what to check next (both the controller and the separate server use the same auth and config), and I just want to make sure it's possible. It's also possible I'm losing my mind, so, there's that. =P Posted about it in detail here: https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ Appreciate any advice! Kind regards, Jordan From mnaser at vexxhost.com Sat Jun 8 05:37:12 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 8 Jun 2019 01:37:12 -0400 Subject: [Glance] Can Glance be installed on a server other than the controller? In-Reply-To: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> References: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Message-ID: On Fri., Jun. 7, 2019, 8:15 p.m. Jordan Michaels, wrote: > Hi Folks, > > First time posting here so apologies if this question is inappropriate for > this list. > > Just a quick question to see if Glance can be installed on a server other > than the controller? By following the installation docs for Rocky I can get > Glance installed just fine on the controller (works great!), but following > that same documentation on a separate server I cannot get it to > authenticate. It's probably just something I'm doing, but I've run out of > ideas on what to check next (both the controller and the separate server > use the same auth and config), and I just want to make sure it's possible. > It's also possible I'm losing my mind, so, there's that. =P > > Posted about it in detail here: > > https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ > > Appreciate any advice! > It's possible :) Kind regards, > Jordan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cory at Hawkless.id.au Sat Jun 8 07:03:51 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:03:51 +0000 Subject: Neutron with LBA and BGP-EVPN over IP fabric In-Reply-To: References: Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C86B3@CorysCloudVPS.Oblivion.local> I have come across this exact same issue while building out our Rocky deployment My solution was to make modifications to the neutron/agent/linux/ip_lib.py and neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py files then commit them to my own fork. Checkout this commit for the information https://github.com/CoryHawkless/neutron/commit/8f337b47068ad8e69aea138c43eaeb218df90dfc I'd love to see this implemented as an option as opposed to a brute force hack like ive done here. Has anyone else found another way around this problem? -----Original Message----- From: Jan Marquardt [mailto:jm at artfiles.de] Sent: Friday, 7 June 2019 10:45 PM To: openstack-discuss at lists.openstack.org Subject: Neutron with LBA and BGP-EVPN over IP fabric Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 From Cory at Hawkless.id.au Sat Jun 8 07:05:36 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:05:36 +0000 Subject: Neutron with LBA and BGP-EVPN over IP fabric References: Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C86D4@CorysCloudVPS.Oblivion.local> Sorry, also meant to say that I then use Docker to build containers based on this modified source. We run everything in our own custom built containers including the L3Agent, DHCP agents, nova, cinder, neutron,.. the lot. -----Original Message----- From: Cory Hawkless Sent: Saturday, 8 June 2019 4:34 PM To: 'Jan Marquardt' ; openstack-discuss at lists.openstack.org Subject: RE: Neutron with LBA and BGP-EVPN over IP fabric I have come across this exact same issue while building out our Rocky deployment My solution was to make modifications to the neutron/agent/linux/ip_lib.py and neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py files then commit them to my own fork. Checkout this commit for the information https://github.com/CoryHawkless/neutron/commit/8f337b47068ad8e69aea138c43eaeb218df90dfc I'd love to see this implemented as an option as opposed to a brute force hack like ive done here. Has anyone else found another way around this problem? -----Original Message----- From: Jan Marquardt [mailto:jm at artfiles.de] Sent: Friday, 7 June 2019 10:45 PM To: openstack-discuss at lists.openstack.org Subject: Neutron with LBA and BGP-EVPN over IP fabric Hi, we are currently trying to build an Openstack Cloud with an IP fabric and FRR directly running on each host. Therefore each host is supposed to advertise its VNIs to the fabric. For this purpose I’d need VXLAN interfaces with the following config: 18: vx-50: mtu 1500 qdisc noqueue master br-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 7e:d2:e6:3c:5a:65 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 50 local 10.0.0.101 srcport 0 0 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning off flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.7e:d2:e6:3c:5a:65 designated_root 8000.7e:d2:e6:3c:5a:65 hold_timer 0.00 message_age_timer 0.00 forward_delay_timer 0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 It seems that Neutron/lba is not capable of creating VXLAN interfaces with such a config. By default lba creates them with mode multicast, but I’d need unicast. The only way to activate unicast mode seems to be setting l2pop, but then lba does not set local address. Furthermore, I don't think we really need l2pop, because this part is supposed to be done by BGP-EVPN. Is there any way to achieve such config with Neutron/lba? Best Regards Jan -- Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95 E-Mail: support at artfiles.de | Web: http://www.artfiles.de Geschäftsführer: Harald Oltmanns | Tim Evers Eingetragen im Handelsregister Hamburg - HRB 81478 From Cory at Hawkless.id.au Sat Jun 8 07:10:40 2019 From: Cory at Hawkless.id.au (Cory Hawkless) Date: Sat, 8 Jun 2019 07:10:40 +0000 Subject: Cinder Ceph backup concurrency Message-ID: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> I'm using Rocky and Cinders built in Ceph backup driver which is working ok but I'd like to limit each instance of the backup agent to X number of concurrent backups. For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the cinder-0backuip agent promptly starts the process of backup up all 20 volumes simultaneously and while this works ok it has the downside of over saturating links, causing high IO on the disks etc. Ideally I'd like to have each cinder-backup agent limited to running X(Perhaps 5) backups jobs at any one time and the remaining jobs will be 'queued' until an agent has less than X jobs remaining. Is this possible at all? Based on my understanding the Cinder scheduler services handles the allocation and distribution of the backup tasks, is that correct? Thanks in advance Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssbarnea at redhat.com Sat Jun 8 07:20:51 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Sat, 8 Jun 2019 08:20:51 +0100 Subject: [tripleo][molecule] feedback on testing ansible roles with molecule In-Reply-To: References: Message-ID: There is no podman driver (provider) yet, but it will be. Mainly we are waiting for Ansible modules and one done it will be easy to add one. My goal is to find a way to use both, probably based on detection and fallback. This could provide a better user experience as it would allow use of whatever you have available on your environment. -- sorin On 7 Jun 2019, 21:17 +0100, Emilien Macchi , wrote: > > > > On Fri, Jun 7, 2019 at 3:50 PM Sorin Sbarnea wrote: > > > Hi! > > > > > > While we do now have a POC job running molecule with tox for testing one tripleo-common role, I would like to ask for some feedback from running the same test locally, on your dev box. > > > The report generated but openstack-tox-mol job looks like http://logs.openstack.org/36/663336/14/check/openstack-tox-mol/aa7345d/tox/reports.html > > > > > > https://review.opendev.org/#/c/663336/ > > > Just download it and run: > > > tox -e mol > > > > > > You will either need docker or at least to define DOCKER_HOST=ssh://somehost as an alternative. > > > > is there a driver for podman? if yes, prefer it over docker on fedora. > > > Otherwise, cool! Thanks for this work. It'll be useful with the forthcoming work in tripleo-ansible. > -- > Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfb29 at cam.ac.uk Sat Jun 8 14:23:39 2019 From: pfb29 at cam.ac.uk (Paul Browne) Date: Sat, 8 Jun 2019 15:23:39 +0100 Subject: Cinder Ceph backup concurrency In-Reply-To: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> References: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> Message-ID: Hello all, I'd also be very interested in anyone's knowledge or experience of this topic, extending also to cinder-volume operation concurrency. We see similar behaviour in that many cinder-volume conversion operations started simultaneously can impact the cloud overall. Thanks, Paul On Sat, 8 Jun 2019, 08:15 Cory Hawkless, wrote: > I’m using Rocky and Cinders built in Ceph backup driver which is working > ok but I’d like to limit each instance of the backup agent to X number of > concurrent backups. > > For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the > cinder-0backuip agent promptly starts the process of backup up all 20 > volumes simultaneously and while this works ok it has the downside of over > saturating links, causing high IO on the disks etc. > > > > Ideally I’d like to have each cinder-backup agent limited to running > X(Perhaps 5) backups jobs at any one time and the remaining jobs will be > ‘queued’ until an agent has less than X jobs remaining. > > > > Is this possible at all? > > Based on my understanding the Cinder scheduler services handles the > allocation and distribution of the backup tasks, is that correct? > > > > Thanks in advance > > Cory > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshele at mellanox.com Sat Jun 8 17:14:18 2019 From: moshele at mellanox.com (Moshe Levi) Date: Sat, 8 Jun 2019 17:14:18 +0000 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: References: <19f2868b-419f-cee3-f137-dcf277719e42@namecheap.com> Message-ID: If you are using OFED please use OFED 4.6 If you are using inbox driver I know for sure that it works with Kernel 5.0. Thanks, Moshe -----Original Message----- From: Zoltan Langi Sent: Friday, June 7, 2019 10:45 PM To: Moshe Levi ; openstack-discuss at lists.openstack.org Subject: Re: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan Hello Moshe, OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-generic According to Mellanox this os is definitely supported. Zoltan On 07.06.19 21:05, Moshe Levi wrote: > Hi Zoltan, > > What OS and kernel are you using? > > -----Original Message----- > From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM > To: openstack-discuss at lists.openstack.org > Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan > > Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now. > > I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox > ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release. > > When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s. > > (I initially followed this ASAP2 guide, works well: > https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2) > > Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox: > > https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf > (page15) > > So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs. > > The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm. > > The problem is only one direction of the traffic is offloaded when LAG is being used. > > I opened a mellanox case and they recommended to install the latest ovs version which I did: > > https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem > > After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used. > > Does anyone has any experience or any idea what should I look our for or check? > > > Thank you very much, anything is appreciated! > > Zoltan > > > > > From mnaser at vexxhost.com Sat Jun 8 17:31:41 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Sat, 8 Jun 2019 13:31:41 -0400 Subject: [openstack-ansible] suse support for stable/queens Message-ID: Hi everyone, The most recent set of automatic proposal patches are all failing for opensuse-42 due to the fact that it seems the operating system is now shipping with LXC 3 instead of LXC 2. The patch that added support to LXC 3 for our newer branches was mainly done there to add support for Bionic, which means that we can't really backport it all the way to Queens. We have two options right now: 1. Someone can volunteer to implement LXC 3 support in stable/queens in order to get opensuse-42 working again 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no one fixes them, we drop them (because they're a waste of CI resources). I'd like to hear what the community has to say about this to be able to move forward. Thanks, Mohammed -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From zoltan.langi at namecheap.com Sat Jun 8 17:55:57 2019 From: zoltan.langi at namecheap.com (zoltan.langi) Date: Sat, 08 Jun 2019 19:55:57 +0200 Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan In-Reply-To: Message-ID: Hi Moshe,Yes I am using the latest OFED driver from mellanox.Any ideas where am I missing something?Thank you!ZoltanVon meinem Samsung Galaxy Smartphone gesendet. -------- Ursprüngliche Nachricht --------Von: Moshe Levi Datum: 08.06.19 19:14 (GMT+01:00) An: Zoltan Langi , openstack-discuss at lists.openstack.org Betreff: RE: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan If you are using OFED please use OFED 4.6If you are using inbox driver I know for sure that it works with  Kernel 5.0. Thanks,Moshe-----Original Message-----From: Zoltan Langi Sent: Friday, June 7, 2019 10:45 PMTo: Moshe Levi ; openstack-discuss at lists.openstack.orgSubject: Re: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlanHello Moshe,OS is Ubuntu 18.04.2 LTS, Kernel is: 4.18.0-21-genericAccording to Mellanox this os is definitely supported.ZoltanOn 07.06.19 21:05, Moshe Levi wrote:> Hi Zoltan,>> What OS and kernel are you using?>> -----Original Message-----> From: Zoltan Langi > Sent: Friday, June 7, 2019 3:54 PM> To: openstack-discuss at lists.openstack.org> Subject: [neutron] Mellanox Connectx5 ASAP2+LAG over VF+vxlan>> Hello everyone, I hope someone more experienced can help me with this problem I've been struggling for a while now.>> I'm trying to set up ASAP2 ovs vxlan offload on a dual port Mellanox> ConnectX5 card between 2 hosts using LACP link aggregation and Rocky release.>> When the LAG is not there, only one pf is being used, the offload works just fine, getting the line speed out of the vf-s.>> (I initially followed this ASAP2 guide, works well:> https://community.mellanox.com/s/article/getting-started-with-mellanox-asap-2)>> Decided, to provide HA for the vf-s, may as well use LACP as it's supported for ASAP2 according to Mellanox:>> https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf> (page15)>> So what I've done I've created a systemd script that puts the eswitch into switchdev mode on both ports before the networking starts at boot time, the bond0 comes up after the mode was changed just like in the docs.>> The bond0 interface wasn't added to the ovs as the doc recommends as it keeps the vxlan tunnel, only the vf is there after openstack creates the vm.>> The problem is only one direction of the traffic is offloaded when LAG is being used.>> I opened a mellanox case and they recommended to install the latest ovs version which I did:>> https://community.mellanox.com/s/question/0D51T00006kXkRzSAK/connectx5-asap2-vxlan-offload-bond-openstack-problem>> After using the latest OVS from master, the problem still exist, the offload simply doesn't work properly. The speed I am getting is way far away from the values that I get when only a single port is used.>> Does anyone has any experience or any idea what should I look our for or check?>>> Thank you very much, anything is appreciated!>> Zoltan>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Sun Jun 9 09:53:15 2019 From: mark at stackhpc.com (Mark Goddard) Date: Sun, 9 Jun 2019 10:53:15 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> Message-ID: On Fri, 7 Jun 2019, 18:02 Jay Pipes, wrote: > On 6/7/19 11:23 AM, Eric Fried wrote: > >> Better still, add a standardized trait to os-traits for hyperthreading > >> support, which is what I'd recommended in the original > >> cpu-resource-tracking spec. > > > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > > 0.8.0. > I think we need a tri-state here. There are three options: 1. Give me a node with hyperthreading enabled 2. Give me a node with hyperthreading disabled 3. I don't care For me, the lack of a trait is 3 - I wouldn't want existing flavours without this trait to cause hyperthreading to be disabled. The ironic deploy templates feature wasn't designed to support forbidden traits - I don't think they were implemented at the time. The example use cases so far have involved encoding values into a trait name, e.g. CUSTOM_HYPERTHREADING_ON. Forbidden traits could be made to work in this case, but it doesn't really extend to non Boolean things such as RAID levels. I'm not trying to shoot down new ideas, just explaining how we got here. > Excellent, I had a faint recollection of that... > > -jay > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheng1.li at intel.com Sun Jun 9 05:58:36 2019 From: cheng1.li at intel.com (Li, Cheng1) Date: Sun, 9 Jun 2019 05:58:36 +0000 Subject: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment In-Reply-To: References: Message-ID: Finally, I have been able to deploy airsloop on virtual env. I created two VMs(libvirt/kvm driven), one for genesis and the other for compute node. These two VMs were on the same host. As the compute node VM is supposed to be provisioned by maas via ipmi/pxe. So I used virtualbmc to simulate the ipmi. I authored the site by following these two guides[1][2]. It’s the mix of guide[1] and guide[2]. The commands I used are all these ones[3]. After fixing several issue, I have deployed the virtual airsloop env. I list here some issues I met: 1. Node identify failed. At the beginning of step ‘prepare_and_deploy_nodes’, the drydock power on the compute node VM via ipmi. Once the compute VM starts up via pxe boot, it runs script to detect local network interfaces and sends the info back to drycok. So the drydock can identify the node based on the received info. But the compute VM doesn’t have real ILO interface, so the drydock can’t identify it. What I did to workaround this was to manually fill the ipmi info on maas web page. 2. My host doesn’t have enough CPU cores, neither the VMs. So I had to increase --pods-per-core in kubelet.yaml. 3. The disk name in compute VM is vda, instead of sda. Drydock can’t map the alias device name to vda, so I had to used the fixed alias name ‘vda’ which is the same as it’s real device name.(it was ‘bootdisk’) 4. My host doesn’t have enough resource(CPU, memory), so I removed some resource consuming components(logging, monitoring). Besides, I disabled the neutron rally test. As it failed with timeout error because of the resource limits. I also paste my site changes[4] for reference. [1] https://airship-treasuremap.readthedocs.io/en/latest/authoring_and_deployment.html [2] https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html [3] https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html#getting-started [4] https://github.com/cheng1li/treasuremap/commit/7a8287720dacc6dc1921948aaddec96b8cf2645e Thanks, Cheng From: Anirudh Gupta [mailto:Anirudh.Gupta at hsc.com] Sent: Thursday, May 30, 2019 7:29 PM To: Li, Cheng1 ; airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: RE: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment Hi Team, I am trying to create Airship-Seaworthy from the link https://airship-treasuremap.readthedocs.io/en/latest/seaworthy.html It requires 6 DELL R720xd bare-metal servers: 3 control, and 3 compute nodes to be configured, but there is no documentation of how to install and getting started with Airship-Seaworthy. Do we need to follow the “Getting Started” section mentioned in Airsloop or will there be any difference in case of Seaworthy. https://airship-treasuremap.readthedocs.io/en/latest/airsloop.html#getting-started Also what all configurations need to be run from the 3 controller nodes and what needs to be run from 3 computes? Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) From: Li, Cheng1 > Sent: 30 May 2019 08:29 To: Anirudh Gupta >; airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: RE: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment I have the same question. I haven’t seen any docs which guides how to deploy airsloop/air-seaworthy in virtual env. I am trying to deploy airsloop on libvirt/kvm driven virtual env. Two VMs, one for genesis, the other for compute. Virtualbmc for ipmi simulation. The genesis.sh scripts has been run on genesis node without error. But deploy_site fails at prepare_and_deploy_nodes task(action ‘set_node_boot’ timeout). I am still investigating this issue. It will be great if we have official document for this scenario. Thanks, Cheng From: Anirudh Gupta [mailto:Anirudh.Gupta at hsc.com] Sent: Wednesday, May 29, 2019 3:31 PM To: airship-discuss at lists.airshipit.org; airship-announce at lists.airshipit.org; openstack-dev at lists.openstack.org; openstack at lists.openstack.org Subject: [Airship-Seaworthy] Deployment of Airship-Seaworthy on Virtual Environment Hi Team, We want to test Production Ready Airship-Seaworthy in our virtual environment The link followed is https://airship-treasuremap.readthedocs.io/en/latest/seaworthy.html As per the document we need 6 DELL R720xd bare-metal servers: 3 control, and 3 compute nodes. But we need to deploy our setup on Virtual Environment. Does Airship-Seaworthy support Installation on Virtual Environment? We have 2 Rack Servers with Dual-CPU Intel® Xeon® E5 26xx with 16 cores each and 128 GB RAM. Is it possible that we can create Virtual Machines on them and set up the complete environment. In that case, what possible infrastructure do we require for setting up the complete setup. Looking forward for your response. Regards अनिरुद्ध गुप्ता (वरिष्ठ अभियंता) Hughes Systique Corporation D-23,24 Infocity II, Sector 33, Gurugram, Haryana 122001 DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. DISCLAIMER: This electronic message and all of its contents, contains information which is privileged, confidential or otherwise protected from disclosure. The information contained in this electronic mail transmission is intended for use only by the individual or entity to which it is addressed. If you are not the intended recipient or may have received this electronic mail transmission in error, please notify the sender immediately and delete / destroy all copies of this electronic mail transmission without disclosing, copying, distributing, forwarding, printing or retaining any part of it. Hughes Systique accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Sun Jun 9 20:06:06 2019 From: zigo at debian.org (Thomas Goirand) Date: Sun, 9 Jun 2019 22:06:06 +0200 Subject: [Glance] Can Glance be installed on a server other than the controller? In-Reply-To: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> References: <1436864438.79118.1559952667149.JavaMail.zimbra@viviotech.net> Message-ID: On 6/8/19 2:11 AM, Jordan Michaels wrote: > Hi Folks, > > First time posting here so apologies if this question is inappropriate for this list. > > Just a quick question to see if Glance can be installed on a server other than the controller? By following the installation docs for Rocky I can get Glance installed just fine on the controller (works great!), but following that same documentation on a separate server I cannot get it to authenticate. It's probably just something I'm doing, but I've run out of ideas on what to check next (both the controller and the separate server use the same auth and config), and I just want to make sure it's possible. It's also possible I'm losing my mind, so, there's that. =P > > Posted about it in detail here: > https://ask.openstack.org/en/question/122501/glance-unauthorized-http-401-on-block1-but-not-controller/ > > Appreciate any advice! > > Kind regards, > Jordan Jordan, There's no such thing in the OpenStack code as a "controller". This thing only lives in the docs and in how people decide to deploy things. Users are free to install any component anywhere. Indeed, you must be doing something wrong. Cheers, Thomas Goirand (zigo) From anlin.kong at gmail.com Mon Jun 10 04:19:47 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 10 Jun 2019 16:19:47 +1200 Subject: [requirements] SQLAlchemy 1.3.4 backward compatible? Message-ID: Trove Jenkins jobs failed because of the SQLAlchemy upgrade from 1.2.19 to 1.3.4 in https://github.com/openstack/requirements/commit/4f3252cbd7c63fd1c60e9bd09748e39dc2d9f8fa#diff-0bdd949ed8a7fdd4f95240bd951779c8 yesterday. A lot of error messages like the following: sqlalchemy.exc.ArgumentError: Textual SQL expression 'visible=0 or auto_apply=1...' should be explicitly declared as text('visible=0 or auto_apply=1...') sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate column name: priority_apply I'm wondering who else is also affected? Any hints for the workaround? Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Mon Jun 10 04:21:46 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Mon, 10 Jun 2019 16:21:46 +1200 Subject: [requirements] SQLAlchemy 1.3.4 backward compatible? In-Reply-To: References: Message-ID: BTW, the error message came from Trove db migration script, and Trove is using sqlalchemy-migrate lib rather than alembic. Best regards, Lingxian Kong Catalyst Cloud On Mon, Jun 10, 2019 at 4:19 PM Lingxian Kong wrote: > Trove Jenkins jobs failed because of the SQLAlchemy upgrade from 1.2.19 to > 1.3.4 in > https://github.com/openstack/requirements/commit/4f3252cbd7c63fd1c60e9bd09748e39dc2d9f8fa#diff-0bdd949ed8a7fdd4f95240bd951779c8 > yesterday. > > A lot of error messages like the following: > > sqlalchemy.exc.ArgumentError: Textual SQL expression 'visible=0 or > auto_apply=1...' should be explicitly declared as text('visible=0 or > auto_apply=1...') > > sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate > column name: priority_apply > > I'm wondering who else is also affected? Any hints for the workaround? > > Best regards, > Lingxian Kong > Catalyst Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soulxu at gmail.com Mon Jun 10 05:17:44 2019 From: soulxu at gmail.com (Alex Xu) Date: Mon, 10 Jun 2019 13:17:44 +0800 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: Eric Fried 于2019年6月7日周五 上午1:59写道: > > Looking at the specs, it seems it's mostly talking about changing VMs > resources without rebooting. However that's not the actual intent of the > Ironic use case I explained in the email. > > Yes, it requires a reboot to reflect the BIOS changes. This reboot can > be either be done by Nova IronicDriver or Ironic deploy step can also do it. > > So I am not sure if the spec actually satisfies the use case. > > I hope to get more response from the team to get more clarity. > > Waitwait. The VM needs to be rebooted for the BIOS change to take > effect? So (non-live) resize would actually satisfy your use case just > fine. But the problem is that the ironic driver doesn't support resize > at all? > > Without digging too hard, that seems like it would be a fairly > straightforward thing to add. It would be limited to only "same host" > and initially you could only change this one attribute (anything else > would have to fail). > > Nova people, thoughts? > > Contribute another idea. So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those configuration isn't used for scheduling. Actually, Traits is designed for scheduling. So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. So another way is we should store the configuration into another place. Like the server's metadata. So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. Anyway, just put my idea at here. efried > . > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iurygregory at gmail.com Mon Jun 10 07:44:20 2019 From: iurygregory at gmail.com (Iury Gregory) Date: Mon, 10 Jun 2019 09:44:20 +0200 Subject: [ironic] Should we add ironic-prometheus-exporter under Ironic umbrella? In-Reply-To: References: Message-ID: Hi Mohammed, Thanks for your feedback =). Em sex, 7 de jun de 2019 às 19:43, Mohammed Naser escreveu: > Hi Iury, > > This seems pretty awesome. I threw in some comments > > On Fri, Jun 7, 2019 at 11:08 AM Iury Gregory > wrote: > > > > Greetings Ironicers! > > > > I would like to have your input on the matter of moving the > ironic-prometheus-exporter to Ironic umbrella. > > > > What is the ironic-prometheus-exporter? > > The ironic-prometheus-exporter[1] provides a way to export hardware > sensor data from > > Ironic project in OpenStack to Prometheus [2]. It's implemented as an > oslo-messaging notification driver to get the sensor data and a Flask > Application to export the metrics to Prometheus. It can not only be used in > metal3-io but also in any OpenStack deployment which includes Ironic > service. > > This seems really neat. From my perspective, it seems like it waits > for notifications, and then writes it out to a file. The flask server > seems to do nothing but pretty much serve the contents at /metrics. I > think we should be doing more of this inside OpenStack to be honest > and this can be really useful in the perspective of operators. > The notifications are the sensor data of each baremetal node, each node will have a file with the sensor data as metrics in the Prometheus format. Since Prometheus is pull-based the Flask application wlil merge the content of all files to provide to the Prometheus when necessary. > I don't want to complicate this more however, but I would love for > this to be a pattern/framework that other projects can adopt. > Agree, maybe we should talk in the IRC how the pattern/framework would look like and this can be done before moving the project or something to be done trough reviews after the project is moved. > > How to ensure the sensor data will follow the Prometheus format? > > We are using the prometheus client_python [3] to generate the file with > the metrics that come trough the oslo notifier plugin. > > > > How it will be tested on the gate? > > Virtualbmc can't provide sensor data that the actual plugin supports. We > would collect sample metrics from the hardware and use it in the unit tests. > > > > Maybe we should discuss this in the next ironic weekly meeting (10th > June)? > > > > [1] https://github.com/metal3-io/ironic-prometheus-exporter > > [2] https://prometheus.io/ > > [3] https://github.com/prometheus/client_python > > > > -- > > Att[]'s > > Iury Gregory Melo Ferreira > > MSc in Computer Science at UFCG > > Part of the puppet-manager-core team in OpenStack > > Software Engineer at Red Hat Czech > > Social: https://www.linkedin.com/in/iurygregory > > E-mail: iurygregory at gmail.com > > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > -- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Mon Jun 10 07:56:30 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 10 Jun 2019 08:56:30 +0100 Subject: [kolla] Feedback request: removing OracleLinux support In-Reply-To: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> References: <81438050-7699-c7a7-b883-a707cc3f53db@linaro.org> Message-ID: Received a reluctant go-ahead from Oracle. Proposed patches to disable CI jobs: https://review.opendev.org/664217 https://review.opendev.org/664216 Mark On Wed, 5 Jun 2019 at 10:43, Marcin Juszkiewicz wrote: > > W dniu 05.06.2019 o 11:10, Mark Goddard pisze: > > > We propose dropping support for OracleLinux in the Train cycle. If > > this will affect you and you would like to help maintain it, please > > get in touch. > > First we drop it from CI. > > Then (IMHO) it will be removed once we move to CentOS 8. > From mark at stackhpc.com Mon Jun 10 07:57:54 2019 From: mark at stackhpc.com (Mark Goddard) Date: Mon, 10 Jun 2019 08:57:54 +0100 Subject: [kolla] Feedback request: removing kolla-cli In-Reply-To: References: Message-ID: On Wed, 5 Jun 2019 at 10:10, Mark Goddard wrote: > > Hi, > > We discussed during the kolla virtual PTG [1] the option of removing > support for the kolla-cli deliverable [2], as a way to improve the > long term sustainability of the project. kolla-cli was a project > started by Oracle, and accepted as a kolla deliverable. While it looks > interesting and potentially useful, it never gained much traction (as > far as I'm aware) and the maintainers left the community. We have > never released it and CI has been failing for some time. > > We propose dropping support for kolla-cli in the Train cycle. If this > will affect you and you would like to help maintain it, please get in > touch. Received a reluctant go-ahead from the kolla-cli contributors. I'll work through the process of retiring the project. > > Thanks, > Mark > > [1] https://etherpad.openstack.org/p/kolla-train-ptg > [2] https://github.com/openstack/kolla-cli From geguileo at redhat.com Mon Jun 10 10:39:09 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Mon, 10 Jun 2019 12:39:09 +0200 Subject: Cinder Ceph backup concurrency In-Reply-To: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> References: <18C7C076CE65A443BC1DEC057949DEFE01273C8720@CorysCloudVPS.Oblivion.local> Message-ID: <20190610103909.iz72uejuuksb4yhx@localhost> On 08/06, Cory Hawkless wrote: > I'm using Rocky and Cinders built in Ceph backup driver which is working ok but I'd like to limit each instance of the backup agent to X number of concurrent backups. > For example, if I(Or a tenant) trigger a backup to run on 20 volumes, the cinder-0backuip agent promptly starts the process of backup up all 20 volumes simultaneously and while this works ok it has the downside of over saturating links, causing high IO on the disks etc. > > Ideally I'd like to have each cinder-backup agent limited to running X(Perhaps 5) backups jobs at any one time and the remaining jobs will be 'queued' until an agent has less than X jobs remaining. > > Is this possible at all? > Based on my understanding the Cinder scheduler services handles the allocation and distribution of the backup tasks, is that correct? > > Thanks in advance > Cory Hi Cory, Cinder doesn't have any kind of throttling mechanism specific for "heavy" operations. This also includes the cinder-backup service that doesn't make use of the cinder-scheduler service. I think there may be ways to do throttling for the case you describe, though I haven't tried them: Defining "executor_thread_pool_size" (defaults to 64) to reduce the number of concurrent operations that will be executed on the cinder-backup service (backup listings and such will not be affected, as they are executed by cinder-api). Some of the remaining requests will remain on the oslo messaging queue, and the rest in RabbitMQ message queue. For the RBD backend you could also limit the size of the native threads with "backup_native_threads_pool_size", which will limit the number of concurrent RBD calls (since they use native threads instead of green threads). Also, don't forget to ensure that "backup_workers" is set to 1, otherwise you will be running multiple processes, each with the previously defined limitations, resulting in N times what you wanted to have. I hope this helps. Cheers, Gorka. From gael.therond at gmail.com Mon Jun 10 13:14:06 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Mon, 10 Jun 2019 15:14:06 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Hi guys, Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? Many Thanks! Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : > Oh, that's perfect so, I'll just update my image and my platform as we're > using kolla-ansible and that's super easy. > > You guys rocks!! (Pun intended ;-)). > > Many many thanks to all of you, that will real back me a lot regarding the > Octavia solidity and Kolla flexibility actually ^^. > > Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a > écrit : > >> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND >> wrote: >> > >> > Hi Lingxian Kong, >> > >> > That’s actually very interesting as I’ve come to the same conclusion >> this morning during my investigation and was starting to think about a fix, >> which it seems you already made! >> > >> > Is there a reason why it didn’t was backported to rocky? >> >> The patch was merged in master branch during Rocky development cycle, >> hence included in stable/rocky as well. >> >> > >> > Very helpful, many many thanks to you you clearly spare me hours of >> works! I’ll get a review of your patch and test it on our lab. >> > >> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a >> écrit : >> >> >> >> Hi Felix, >> >> >> >> « Glad » you had the same issue before, and yes of course I looked at >> the HM logs which is were I actually found out that this event was >> triggered by octavia (Beside the DB data that validated that) here is my >> log trace related to this event, It doesn't really shows major issue IMHO. >> >> >> >> Here is the stacktrace that our octavia service archived for our both >> controllers servers, with the initial loadbalancer creation trace >> (Worker.log) and both controllers triggered task (Health-Manager.log). >> >> >> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >> >> >> I well may have miss something in it, but I don't see something >> strange on from my point of view. >> >> Feel free to tell me if you spot something weird. >> >> >> >> >> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner >> a écrit : >> >>> >> >>> Hi Gael, >> >>> >> >>> >> >>> >> >>> we had a similar issue in the past. >> >>> >> >>> You could check the octiava healthmanager log (should be on the same >> node where the worker is running). >> >>> >> >>> This component monitors the status of the Amphorae and restarts them >> if they don’t trigger a callback after a specific time. This might also >> happen if there is some connection issue between the two components. >> >>> >> >>> >> >>> >> >>> But normally it should at least restart the LB with new Amphorae… >> >>> >> >>> >> >>> >> >>> Hope that helps >> >>> >> >>> >> >>> >> >>> Felix >> >>> >> >>> >> >>> >> >>> From: Gaël THEROND >> >>> Sent: Tuesday, June 4, 2019 9:44 AM >> >>> To: Openstack >> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly >> deleted by octavia >> >>> >> >>> >> >>> >> >>> Hi guys, >> >>> >> >>> >> >>> >> >>> I’ve a weird situation here. >> >>> >> >>> >> >>> >> >>> I smoothly operate a large scale multi-region Octavia service using >> the default amphora driver which imply the use of nova instances as >> loadbalancers. >> >>> >> >>> >> >>> >> >>> Everything is running really well and our customers (K8s and >> traditional users) are really happy with the solution so far. >> >>> >> >>> >> >>> >> >>> However, yesterday one of those customers using the loadbalancer in >> front of their ElasticSearch cluster poked me because this loadbalancer >> suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were >> no longer available but yet the anchor/member/pool and listeners settings >> were still existing. >> >>> >> >>> >> >>> >> >>> So I investigated and found out that the loadbalancer amphoras have >> been destroyed by the octavia user. >> >>> >> >>> >> >>> >> >>> The weird part is, both the master and the backup instance have been >> destroyed at the same moment by the octavia service user. >> >>> >> >>> >> >>> >> >>> Is there specific circumstances where the octavia service could >> decide to delete the instances but not the anchor/members/pool ? >> >>> >> >>> >> >>> >> >>> It’s worrying me a bit as there is no clear way to trace why does >> Octavia did take this action. >> >>> >> >>> >> >>> >> >>> I digged within the nova and Octavia DB in order to correlate the >> action but except than validating my investigation it doesn’t really help >> as there are no clue of why the octavia service did trigger the deletion. >> >>> >> >>> >> >>> >> >>> If someone have any clue or tips to give me I’ll be more than happy >> to discuss this situation. >> >>> >> >>> >> >>> >> >>> Cheers guys! >> >>> >> >>> Hinweise zum Datenschutz finden Sie hier. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ed at leafe.com Mon Jun 10 13:46:09 2019 From: ed at leafe.com (Ed Leafe) Date: Mon, 10 Jun 2019 08:46:09 -0500 Subject: [placement] update 19-22 In-Reply-To: References: Message-ID: On Jun 7, 2019, at 7:17 AM, Chris Dent wrote: > > Ed Leafe's ongoing work with using a graph database probably needs > some kind of report or update. Sure, be happy to. A few weeks ago I completed the code changes to remove sqlalchemy from the objects, and replace it with code to talk to the graph DB (Neo4j). One big issue is that there isn’t a 1:1 relationship between what is needed in the two database approaches. For example, there is no need to use the two-ID design that (IMO) overly complicates placement data. There is also no need to store the IDs of parents and root providers, but so much of the code depends on these values, I left them in there for now. One other twist is that an Allocation cannot exist without a Consumer, so all the code to handle the early microversions that support that was removed. I then moved on to getting the functional tests passing. Some early runs revealed holes in my understanding of what the code was supposed to be doing, so I fixed those. Most of the failures were in the tests/functional/db directory. I mentioned that to Chris in a side conversation, and he agreed that those tests would not be relevant, as the system had a completely different database, so I removed those. I tried to integrate Neo4j’s transaction model into the transaction framework of oslo_db and sqla, and while it works for the most part, it fails when running tox. I get “Invalid transaction” messages, which you would expect when one process closes another process’s transaction. Since the Python adapter I’m using (py2neo) creates a pool for connections, I suspect that the way tox runs is causing py2neo to reuse live connections. I haven’t had time to dig into this yet, but it is my current focus. When I run the tests individually, they pass without a problem. I am also planning on doing some performance measurement, so I guess I’ll look into the perfload stuff to see if it can work for this. One thing that is clear to me from all this work is that the way Placement is coded is very much a result of its relational DB roots. There were so many places I encountered when converting the code where I needed to do what felt like unnecessary steps to make the objects continue to work the way that are currently designed. Had this been a greenfield effort, the code for implementing Placement with a graph DB would have been much more direct and understandable. But the converted objects are working, and working well. -- Ed Leafe From juliaashleykreger at gmail.com Mon Jun 10 14:01:28 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Mon, 10 Jun 2019 07:01:28 -0700 Subject: [ironic] To not have meetings? Message-ID: Last week the discussion came up of splitting the ironic meeting to alternate time zones as we have increasing numbers of contributors in the Asia/Pacific areas of the world[0]. With that discussion, an additional interesting question came up posing the question of shifting to the mailing list instead of our present IRC meeting[1]? It is definitely an interesting idea, one that I'm personally keen on because of time zones and daylight savings time. I think before we do this, we should collect thoughts and also try to determine how we would pull this off so we don't forget the weekly checkpoint that the meeting serves. I think we need to do something, so I guess now is a good time to provide input into what everyone thinks would be best for the project and facilitating the weekly check-in. What I think might work: By EOD UTC Monday: * Listed primary effort participants will be expected to update the whiteboard[2] weekly before EOD Monday UTC * Contributors propose patches to the whiteboard that they believe would be important for reviewers to examine this coming week. * PTL or designee sends weekly email to the mailing list to start an update thread shortly after EOD Monday UTC or early Tuesday UTC. ** Additional updates, questions, and topical discussion (new features, RFEs) would ideally be wrapped up by EOD UTC Tuesday. With that, I think we would also need to go ahead and begin having "office hours" as during the week we generally know some ironic contributors will be in IRC and able to respond to questions. I think this would initially consist of our meeting time and perhaps the other time that seems to be most friendly to the contributors int he Asia/Pacific area[3]. Thoughts/ideas/suggestions welcome! -Julia [0]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:31:33 [1]: http://eavesdrop.openstack.org/irclogs/%23openstack-ironic/%23openstack-ironic.2019-06-03.log.html#t2019-06-03T15:43:16 [2]: https://etherpad.openstack.org/p/IronicWhiteBoard [3]: https://doodle.com/poll/bv9a4qyqy44wiq92 From lbragstad at gmail.com Mon Jun 10 14:36:22 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Mon, 10 Jun 2019 09:36:22 -0500 Subject: [tc][all] Train Community Goals Message-ID: Hi all, The goals for the Train development cycle have merged. Both are available on governance.openstack.org [0][1]. Please have a look if you haven't already. Goal champions are asettle and gmann. As always, if you have any comments, questions, or concerns, please don't hesitate to reach out on the mailing list or in #openstack-tc. Thanks, Lance [0] https://governance.openstack.org/tc/goals/train/pdf-doc-generation.html [1] https://governance.openstack.org/tc/goals/train/ipv6-support-and-testing.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From haleyb.dev at gmail.com Mon Jun 10 14:36:30 2019 From: haleyb.dev at gmail.com (Brian Haley) Date: Mon, 10 Jun 2019 10:36:30 -0400 Subject: [neutron] Bug deputy report for week of June 3rd Message-ID: Hi, I was Neutron bug deputy last week. Below is a short summary about reported bugs. -Brian Critical bugs ------------- * https://bugs.launchpad.net/neutron/+bug/1832225 - Neutron-vpnaas unit tests broken - Broken by https://review.opendev.org/#/c/653903/ - https://review.opendev.org/#/c/664257/ High bugs --------- * https://bugs.launchpad.net/neutron/+bug/1831534 - [l3][dvr] with openflow security group east-west traffic between different vlan networks is broken - https://review.opendev.org/#/c/662925/ - https://review.opendev.org/#/c/663008/ * https://bugs.launchpad.net/neutron/+bug/1831575 - br-tun gets a wrong arp drop rule when dvr is connected to a network but not used as gateway - https://review.opendev.org/#/c/662999/ - https://review.opendev.org/#/c/663000/ * https://bugs.launchpad.net/neutron/+bug/1831647 - Creation of existing resource takes too much time or fails - https://review.opendev.org/#/c/663749/ * https://bugs.launchpad.net/neutron/+bug/1831919 - Impossible to change a list of static routes defined for subnet because of InvalidRequestError with Cisco ACI integration - https://review.opendev.org/#/c/663714/ - https://review.opendev.org/#/c/663713/ - https://review.opendev.org/#/c/663712/ Medium bugs ----------- * https://bugs.launchpad.net/neutron/+bug/1831404 - rarp packet will be dropped in flows cause vm connectivity broken after live-migration - Yang Li took ownership * https://bugs.launchpad.net/neutron/+bug/1831613 - SRIOV: agent may not register VFs - https://review.opendev.org/#/c/663031/ * https://bugs.launchpad.net/neutron/+bug/1831706 - [DVR] Modify `in_port` field of packets which from remote qr-* port - Possibly related to https://review.opendev.org/#/c/639009 and https://bugs.launchpad.net/neutron/+bug/1732067 * https://bugs.launchpad.net/neutron/+bug/1831811 - Unable to filter using same cidr value as used for subnet create - https://review.opendev.org/#/c/663464/ * https://bugs.launchpad.net/ubuntu/+source/neutron-fwaas/+bug/1832210 - incorrect decode of log prefix under python 3 - https://review.opendev.org/#/c/664234/ Low bugs -------- * https://bugs.launchpad.net/neutron/+bug/1831916 - BGP dynamic routing in neutron - https://review.opendev.org/#/c/663711/ Wishlist bugs ------------- None Invalid bugs ------------ * https://bugs.launchpad.net/bugs/1831613 - moved to lbaas storyboard Further triage required ----------------------- * https://bugs.launchpad.net/neutron/+bug/1831726 - neutron-cli port-update ipv6 fixed_ips Covering previous - Fixed IP getting replaced when using neutronclient, as API ref says will happen. - Looks like the openstackclient is doing things properly by appending the new fixed IP to the existing. - Might just be a documentation issue. * https://bugs.launchpad.net/neutron/+bug/1832021 - Checksum drop of metadata traffic on isolated provider networks - Related to recent revert of TCP checksum-fill iptables rule, https://review.opendev.org/#/c/654645/ - but since that was an invalid rule there is probably another issue here. From lbragstad at gmail.com Mon Jun 10 16:26:34 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Mon, 10 Jun 2019 11:26:34 -0500 Subject: [tc][all] Train Community Goals In-Reply-To: References: Message-ID: <15b7fbcb-7731-9598-c23e-fb3f6fb48487@gmail.com> I apologize, I missed a goal. coreycb is championing an effort to implement python runtimes for Train, which is being tracked in a community goal [0]. Updates are available on the mailing list if you haven't seen them already [1]. [0] https://governance.openstack.org/tc/goals/train/python3-updates.html [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006977.html On 6/10/19 9:36 AM, Lance Bragstad wrote: > Hi all, > > The goals for the Train development cycle have merged. Both are > available on governance.openstack.org [0][1]. Please have a look if > you haven't already. Goal champions are asettle and gmann. > > As always, if you have any comments, questions, or concerns, please > don't hesitate to reach out on the mailing list or in #openstack-tc. > > Thanks, > > Lance > > [0] > https://governance.openstack.org/tc/goals/train/pdf-doc-generation.html > [1] > https://governance.openstack.org/tc/goals/train/ipv6-support-and-testing.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From kennelson11 at gmail.com Mon Jun 10 16:35:12 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Mon, 10 Jun 2019 09:35:12 -0700 Subject: [all] [TC] Call for Election Officials Message-ID: Hello, The upcoming round for OpenStack's technical elections (both PTL and TC) is set to happen in September this year[1]. We would like to encourage anyone interested to volunteer to help administer the elections, especially as some of our long-service volunteers periodically become ineligible by way of being nominated for election themselves. While elections have historically been handled by a small number of volunteers, the intention was for this to NOT be a closed process. The election process is detailed in this document [2], with tooling managed in gerrit [3], and uses StoryBoard [4] to keep track of various election activities. We are happy to mentor individuals and share knowledge about the elections process in an effort to get more of the community involved. Involvement can be on an ongoing basis or simply volunteering to help with some small part of one election just to learn how the process works. The election officials team is explicitly delegated by the Technical Committee, who have historically consistently expressed interest in more volunteers to assist. Please let us know if you would like to volunteer! -Kendall Nelson & The Election Officials [1] https://review.opendev.org/#/c/661673/2 [2] https://governance.openstack.org/election/process.html [3] https://opendev.org/openstack/election [4] https://storyboard.openstack.org/#!/project/openstack/election -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Mon Jun 10 16:52:56 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Mon, 10 Jun 2019 12:52:56 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi, Can you give me an idea of how you split the API from the server part? I’m guessing it has to do with pointing the API endpoint to a specific server, but keeping the neutron info in config files pointing to the controller? Contrary to what I said on this thread last week, we’ve been plagued with this issue every 24 hours or so, needing to restart the controller nodes to restore stability. We did implement several of the tweaks that were suggested in this thread’s previous emails, but we are only now considering splitting the API from the main servers, as you did. Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 5 juin 2019 à 15:31, Mathieu Gagné a écrit : > > Hi Jean-Philippe, > > On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot > wrote: >> >> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : >> >> =ERROR REPORT==== 5-Jun-2019::18:50:08 === >> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): >> missed heartbeats from client, timeout: 60s >> >> The neutron-server logs show this error: >> >> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer >> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: >> >> The relevant service version numbers are as follow: >> rabbitmq-server-3.6.5-1.el7.noarch >> openstack-neutron-12.0.6-1.el7.noarch >> python2-oslo-messaging-5.35.4-1.el7.noarch >> >> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. >> >> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. >> > > We had a very similar issue after upgrading to Neutron Queens. In > fact, all Neutron agents were "down" according to status API and > messages weren't getting through. IIRC, this only happened in regions > which had more load than the others. > > We applied a bunch of fixes which I suspect are only a bunch of bandaids. > > Here are the changes we made: > * Split neutron-api from neutron-server. Create a whole new controller > running neutron-api with mod_wsgi. > * Increase [database]/max_overflow = 200 > * Disable RabbitMQ heartbeat in oslo.messaging: > [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 > * Increase [agent]/report_interval = 120 > * Increase [DEFAULT]/agent_down_time = 600 > > We also have those sysctl configs due to firewall dropping sessions. > But those have been on the server forever: > net.ipv4.tcp_keepalive_time = 30 > net.ipv4.tcp_keepalive_intvl = 1 > net.ipv4.tcp_keepalive_probes = 5 > > We never figured out why a service that was working before the upgrade > but no longer is. > This is kind of frustrating as it caused us all short of intermittent > issues and stress during our upgrade. > > Hope this helps. > > -- > Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Mon Jun 10 17:25:43 2019 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Mon, 10 Jun 2019 13:25:43 -0400 Subject: [ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq In-Reply-To: References: <2E8D97FD-B8B8-4A4C-88B8-A536E56A9C00@planethoster.info> Message-ID: Hi, On Mon, Jun 10, 2019 at 12:53 PM Jean-Philippe Méthot wrote: > > Hi, > > Can you give me an idea of how you split the API from the server part? I’m guessing it has to do with pointing the API endpoint to a specific server, but keeping the neutron info in config files pointing to the controller? > > Contrary to what I said on this thread last week, we’ve been plagued with this issue every 24 hours or so, needing to restart the controller nodes to restore stability. We did implement several of the tweaks that were suggested in this thread’s previous emails, but we are only now considering splitting the API from the main servers, as you did. > I followed this procedure to use mod_wsgi and updated DNS to point to the new machine/IP: https://docs.openstack.org/neutron/rocky/admin/config-wsgi.html#neutron-api-behind-mod-wsgi You can run neutron-rpc-server if you want to remove the API part from neutron-server. Mathieu > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 5 juin 2019 à 15:31, Mathieu Gagné a écrit : > > Hi Jean-Philippe, > > On Wed, Jun 5, 2019 at 1:01 PM Jean-Philippe Méthot > wrote: > > > We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail : > > =ERROR REPORT==== 5-Jun-2019::18:50:08 === > closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d): > missed heartbeats from client, timeout: 60s > > The neutron-server logs show this error: > > 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer > 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: . Trying again in 1 seconds.: RecoverableConnectionError: > > The relevant service version numbers are as follow: > rabbitmq-server-3.6.5-1.el7.noarch > openstack-neutron-12.0.6-1.el7.noarch > python2-oslo-messaging-5.35.4-1.el7.noarch > > Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues. > > I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there. > > > We had a very similar issue after upgrading to Neutron Queens. In > fact, all Neutron agents were "down" according to status API and > messages weren't getting through. IIRC, this only happened in regions > which had more load than the others. > > We applied a bunch of fixes which I suspect are only a bunch of bandaids. > > Here are the changes we made: > * Split neutron-api from neutron-server. Create a whole new controller > running neutron-api with mod_wsgi. > * Increase [database]/max_overflow = 200 > * Disable RabbitMQ heartbeat in oslo.messaging: > [oslo_messaging_rabbit]/heartbeat_timeout_threshold = 0 > * Increase [agent]/report_interval = 120 > * Increase [DEFAULT]/agent_down_time = 600 > > We also have those sysctl configs due to firewall dropping sessions. > But those have been on the server forever: > net.ipv4.tcp_keepalive_time = 30 > net.ipv4.tcp_keepalive_intvl = 1 > net.ipv4.tcp_keepalive_probes = 5 > > We never figured out why a service that was working before the upgrade > but no longer is. > This is kind of frustrating as it caused us all short of intermittent > issues and stress during our upgrade. > > Hope this helps. > > -- > Mathieu > > From rfolco at redhat.com Mon Jun 10 20:33:17 2019 From: rfolco at redhat.com (Rafael Folco) Date: Mon, 10 Jun 2019 17:33:17 -0300 Subject: [tripleo] TripleO CI Summary: Sprint 31 Message-ID: Greetings, The TripleO CI team has just completed Sprint 31 / Unified Sprint 10 (May 16 thru Jun 05). The following is a summary of completed work during this sprint cycle: - Created image and container build jobs for RDO on RHEL 7 in the internal instance of Software Factory. - Completed the bootstrapping of OSP 15 standalone job on RHEL8 running in the internal Software Factory. - Promotion status: green on all branches at most of the sprint. The planned work for the next sprint [1] are: - Complete RDO on RHEL7 work by having an independent pipeline running container and image build, standalone and ovb featureset001 jobs. This includes fixing ovb job and start consuming rhel containers from the standalone jobs. - Replicate RHEL7 jobs created in the last sprint for RHEL8 running in the internal Software Factory. Expected outcome is to have a preliminary job producing logs with successes or failures at the end of the sprint. - Create a design document for a staging environment to test changes in the promoter server. This will benefit CI team with less breakages in the promoter server and also prepare the grounds for the multi-arch builds. The Ruck and Rover for this sprint are Sorin Sbarnea (zbr) and Ronelle Landy (rlandy). Please direct questions or queries to them regarding CI status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on their nick. Ruck/rover notes are being tracked in etherpad [2]. Thanks, rfolco [1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/unified-sprint-11 [2] https://etherpad.openstack.org/p/ruckroversprint11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Mon Jun 10 21:01:00 2019 From: kecarter at redhat.com (Kevin Carter) Date: Mon, 10 Jun 2019 16:01:00 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: With the now merged structural changes it is time to organize an official meeting to get things moving. So without further ado: * When should we schedule our meetings (day, hour, frequency)? * Should the meeting take place in the main #tripleo channel or in one of the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? * How long should our meetings last? * Any volunteers to chair meetings? To capture some of our thoughts, questions, hopes, dreams, and aspirations I've created an etherpad which I'd like interested folks to throw ideas at: [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to see if we can get a confirmed list of folks who want to meet and, potentially, a generally good timezone. I'd also like to see if we can nail down some ideas for a plan of attack. While I have ideas and would be happy to talk at length about them (I wrote a few things down in the etherpad), I don't want to be the only voice given I'm new to the TripleO community (I could be, and likely I am, missing a lot of context). Assuming we can get something flowing, I'd like to shoot for an official meeting sometime next week (the week of 17 June, 2019). In the meantime, I'll look forward to chatting with folks in the #tripleo channel. -- Kevin Carter IRC: cloudnull On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > Hey everyone, > > For the upcoming work on focusing on more Ansible automation and testing, > I have created a dedicated #tripleo-transformation channel for our new > squad. Feel free to join if you are interested in joining and helping out! > > +1 to removing repositories we don't use, especially if they have no > working code. I'd like to see the consolidation of TripleO specific things > into the tripleo-ansible repository and then using upstream Ansible roles > for all of the different services (nova, glance, cinder, etc.). > > Sincerely, > > Luke Short, RHCE > Software Engineer, OpenStack Deployment Framework > Red Hat, Inc. > > > On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > >> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >> >>> So the questions at hand are: what, if anything, should we do with >>> these repositories? Should we retire them or just ignore them? Is there >>> anyone using any of the roles? >>> >> >> My initial reaction was to suggest we just ignore them, but on second >> thought I'm wondering if there is anything negative if we leave them lying >> around. Unless we're going to benefit from them in the future if we start >> actively working in these repos, they represent obfuscation and debt, so it >> might be best to retire / dispose of them. >> >> David >> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon Jun 10 22:38:36 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 10 Jun 2019 17:38:36 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> Message-ID: >>> (e) Fully manual. Aggregate operations never touch (add or remove) >>> traits on host RPs. You always have to do that manually. I'm going to come down in favor of this option. It's the shortest path to getting something viable working, in a way that is simple to understand, despite lacking magical DWIM-ness. >>> As noted above, >>> it's easy to do - and we could make it easier with a tiny wrapper that >>> takes an aggregate, a list of traits, and an --add/--remove command. So >>> initially, setting up aggregate isolation is a two-step process, and in >>> the future we can consider making new API/CLI affordance that combines >>> the steps. > ya e could work too. > melanie added a similar functionality to osc placment for managing the alloction ratios > of specific resource classes per aggregate a few months ago > https://review.opendev.org/#/c/640898/ > > we could proably provide somthing similar for managing traits but determining what RP to > add the trait too would be a littel tricker. we would have to be able to filter to RP with either a > specific inventory or with a specific trait or in a speicic subtree. We (Placement team) are still trying to figure out how to manage concepts like "resourceless request groups" and "traits/aggregates flow down". But for now, Nova is still always modeling VCPU/MEMORY_MB and traits on the root provider, so let's simply hit the providers in the aggregate (i.e. the root compute host RPs). I'm putting this on the agenda for Thursday's nova meeting [1] to hopefully get some more Nova opinions on it. efried [1] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting From smooney at redhat.com Tue Jun 11 00:13:51 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 01:13:51 +0100 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> Message-ID: <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> On Mon, 2019-06-10 at 17:38 -0500, Eric Fried wrote: > > > > (e) Fully manual. Aggregate operations never touch (add or remove) > > > > traits on host RPs. You always have to do that manually. > > I'm going to come down in favor of this option. It's the shortest path > to getting something viable working, in a way that is simple to > understand, despite lacking magical DWIM-ness. > > > > > As noted above, > > > > it's easy to do - and we could make it easier with a tiny wrapper that > > > > takes an aggregate, a list of traits, and an --add/--remove command. So > > > > initially, setting up aggregate isolation is a two-step process, and in > > > > the future we can consider making new API/CLI affordance that combines > > > > the steps. > > > > ya e could work too. > > melanie added a similar functionality to osc placment for managing the alloction ratios > > of specific resource classes per aggregate a few months ago > > https://review.opendev.org/#/c/640898/ > > > > we could proably provide somthing similar for managing traits but determining what RP to > > add the trait too would be a littel tricker. we would have to be able to filter to RP with either a > > specific inventory or with a specific trait or in a speicic subtree. > > We (Placement team) are still trying to figure out how to manage > concepts like "resourceless request groups" and "traits/aggregates flow > down". But for now, Nova is still always modeling VCPU/MEMORY_MB and > traits on the root provider, so let's simply hit the providers in the > aggregate (i.e. the root compute host RPs). > > I'm putting this on the agenda for Thursday's nova meeting [1] to > hopefully get some more Nova opinions on it. for what its worth for the host-aggate case teh aplity to add or remvoe a trait form all root providers is likely enough, so that would make a cli much simpeler to create. for the generic case of being able to add/remove a trait on an rp that could be anyhere in a nested tree for all trees in an aggaget, that is a much harder problem but we also do not need it to solve the usecase we have today so we can defer that until we actully need it and if we never need it we can defer it forever. so +1 for keeping it simple and just updating the root RPs. > > efried > > [1] https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting > From missile0407 at gmail.com Tue Jun 11 03:25:49 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Tue, 11 Jun 2019 11:25:49 +0800 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. Message-ID: Hi Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy Ceph as the env storage backend. But I found that it always stuck at bootstrap_gnocchi. The error log shows below when check with docker logs: 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize storage driver Traceback (most recent call last): File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, in call result = fn(*args, **kwargs) File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line 102, in get_driver conf.storage) File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, in __init__ self.rados, self.ioctx = ceph.create_rados_connection(conf) File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, in create_rados_connection raise ImportError("No module named 'rados' nor 'cradox'") ImportError: No module named 'rados' nor 'cradox' This error occurred not only the image from Docker Hub, but also build by Kolla-build in Ubuntu binary based. Strange is, no error occur if turn into use Ubuntu source based images. And I guess it only happen when enabled Ceph. Does anyone have idea about this? Many thanks, Eddie. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radoslaw.piliszek at gmail.com Tue Jun 11 06:12:11 2019 From: radoslaw.piliszek at gmail.com (=?UTF-8?Q?Rados=C5=82aw_Piliszek?=) Date: Tue, 11 Jun 2019 08:12:11 +0200 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. In-Reply-To: References: Message-ID: Hello Eddie, this classifies as a bug. Please file one at: https://bugs.launchpad.net/kolla with details on the used settings. Thank you. Kind regards, Radosław Piliszek wt., 11 cze 2019 o 05:29 Eddie Yen napisał(a): > Hi > > Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy > Ceph as the env storage backend. > > But I found that it always stuck at bootstrap_gnocchi. The error log shows > below when check with docker logs: > > 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize > storage driver > Traceback (most recent call last): > File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, in > call > result = fn(*args, **kwargs) > File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line > 102, in get_driver > conf.storage) > File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, > in __init__ > self.rados, self.ioctx = ceph.create_rados_connection(conf) > File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, > in create_rados_connection > raise ImportError("No module named 'rados' nor 'cradox'") > ImportError: No module named 'rados' nor 'cradox' > > This error occurred not only the image from Docker Hub, but also build by > Kolla-build in Ubuntu binary based. > Strange is, no error occur if turn into use Ubuntu source based images. > And I guess it only happen when enabled Ceph. > > Does anyone have idea about this? > > > Many thanks, > Eddie. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From missile0407 at gmail.com Tue Jun 11 06:53:51 2019 From: missile0407 at gmail.com (Eddie Yen) Date: Tue, 11 Jun 2019 14:53:51 +0800 Subject: [Kolla] Stuck at bootstrap_gnocchi during deployment using Ubuntu binary on Rocky release. In-Reply-To: References: Message-ID: Hi This issue just reported on the Launchpad. Many thanks, Eddie. Radosław Piliszek 於 2019年6月11日 週二 下午2:12寫道: > Hello Eddie, > > this classifies as a bug. > Please file one at: https://bugs.launchpad.net/kolla > with details on the used settings. > > Thank you. > > Kind regards, > Radosław Piliszek > > wt., 11 cze 2019 o 05:29 Eddie Yen napisał(a): > >> Hi >> >> Our env needs Gnocchi because Ceilometer, and we using Kolla to deploy >> Ceph as the env storage backend. >> >> But I found that it always stuck at bootstrap_gnocchi. The error log >> shows below when check with docker logs: >> >> 2019-06-11 10:59:17,707 [19] ERROR gnocchi.utils: Unable to initialize >> storage driver >> Traceback (most recent call last): >> File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 333, >> in call >> result = fn(*args, **kwargs) >> File "/usr/lib/python3/dist-packages/gnocchi/storage/__init__.py", line >> 102, in get_driver >> conf.storage) >> File "/usr/lib/python3/dist-packages/gnocchi/storage/ceph.py", line 52, >> in __init__ >> self.rados, self.ioctx = ceph.create_rados_connection(conf) >> File "/usr/lib/python3/dist-packages/gnocchi/common/ceph.py", line 51, >> in create_rados_connection >> raise ImportError("No module named 'rados' nor 'cradox'") >> ImportError: No module named 'rados' nor 'cradox' >> >> This error occurred not only the image from Docker Hub, but also build by >> Kolla-build in Ubuntu binary based. >> Strange is, no error occur if turn into use Ubuntu source based images. >> And I guess it only happen when enabled Ceph. >> >> Does anyone have idea about this? >> >> >> Many thanks, >> Eddie. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirk at dmllr.de Tue Jun 11 08:29:17 2019 From: dirk at dmllr.de (=?UTF-8?B?RGlyayBNw7xsbGVy?=) Date: Tue, 11 Jun 2019 10:29:17 +0200 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: Hi Mohammed, Am Sa., 8. Juni 2019 um 19:32 Uhr schrieb Mohammed Naser : > 1. Someone can volunteer to implement LXC 3 support in stable/queens > in order to get opensuse-42 working again > 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no > one fixes them, we drop them (because they're a waste of CI > resources). I suggest to stop caring about opensuse 42.x on stable/queens and older as we'd like to deprecate 42.x (it is going to be end of life and falling out of security support in the next few days) and focus on leap 15.x only. Greetings, Dirk From jean-philippe at evrard.me Tue Jun 11 08:56:06 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 11 Jun 2019 10:56:06 +0200 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: <0f7755d6-1b85-406c-a8db-bed40f07c195@www.fastmail.com> > I suggest to stop caring about opensuse 42.x on stable/queens and > older as we'd like to deprecate > 42.x (it is going to be end of life and falling out of security > support in the next few days) and focus on leap 15.x only. Agreed. Maybe I should clarify the whole story too: 1. Focus on bare metal deploys for ALL roles. See also [1]. 2. Focus on deploys using distro packages for ALL roles. See also [1], column F. 3. Making sure efforts 1 and 2 apply to lower branches. [1]: https://docs.google.com/spreadsheets/d/1coiPHGqaIKNgCGYsNhEgzswqwp4wedm2XoBCN9WMosY/edit#gid=752070695 Regards, JP From cgoncalves at redhat.com Tue Jun 11 10:59:33 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 11 Jun 2019 12:59:33 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND wrote: > > Hi guys, > > Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? Octavia. https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > > Many Thanks! > > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : >> >> Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. >> >> You guys rocks!! (Pun intended ;-)). >> >> Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : >>> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: >>> > >>> > Hi Lingxian Kong, >>> > >>> > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! >>> > >>> > Is there a reason why it didn’t was backported to rocky? >>> >>> The patch was merged in master branch during Rocky development cycle, >>> hence included in stable/rocky as well. >>> >>> > >>> > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. >>> > >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >>> >> >>> >> Hi Felix, >>> >> >>> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >>> >> >>> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >>> >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >>> >> >>> >> I well may have miss something in it, but I don't see something strange on from my point of view. >>> >> Feel free to tell me if you spot something weird. >>> >> >>> >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >>> >>> >>> >>> Hi Gael, >>> >>> >>> >>> >>> >>> >>> >>> we had a similar issue in the past. >>> >>> >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >>> >>> >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >>> >>> >>> >>> >>> >>> >>> >>> But normally it should at least restart the LB with new Amphorae… >>> >>> >>> >>> >>> >>> >>> >>> Hope that helps >>> >>> >>> >>> >>> >>> >>> >>> Felix >>> >>> >>> >>> >>> >>> >>> >>> From: Gaël THEROND >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM >>> >>> To: Openstack >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >>> >>> >>> >>> >>> >>> >>> >>> Hi guys, >>> >>> >>> >>> >>> >>> >>> >>> I’ve a weird situation here. >>> >>> >>> >>> >>> >>> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >>> >>> >>> >>> >>> >>> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >>> >>> >>> >>> >>> >>> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >>> >>> >>> >>> >>> >>> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >>> >>> >>> >>> >>> >>> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >>> >>> >>> >>> >>> >>> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >>> >>> >>> >>> >>> >>> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >>> >>> >>> >>> >>> >>> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >>> >>> >>> >>> >>> >>> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >>> >>> >>> >>> >>> >>> >>> >>> Cheers guys! >>> >>> >>> >>> Hinweise zum Datenschutz finden Sie hier. From madhuri.kumari at intel.com Tue Jun 11 11:22:13 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Tue, 11 Jun 2019 11:22:13 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <14abb6c1-2665-c529-37ae-fd0b1691a333@gmail.com> <0512CBBECA36994BAA14C7FEDE986CA60FC17D2B@BGSMSX102.gar.corp.intel.com> <06c5a684-4825-5f86-ae46-1c3e89389b2e@gmail.com> <85189fc6-0394-8330-9b6e-ad5abdd8438b@fried.cc> <0681bc52-ef1c-8dc5-be22-68fcabd9dbd2@gmail.com> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC1BB6C@BGSMSX102.gar.corp.intel.com> Hi All, Thank you for your responses. I agree with Mark here, the example stated here fits the Boolean case(feature enable/disable). However many other BIOS feature doesn’t fits the case. For example enabling Intel Speed Select also needs 3 configuration or traits: CUSTOM_ISS_CONFIG_BASE – 00 CUSTOM_ISS_CONFIG_1 – 01 CUSTOM_ISS_CONFIG_2 - 02 Each configuration/trait here represents different profiles to be set on the baremetal server. Does resize help with such use case? Regards, Madhuri From: Mark Goddard [mailto:mark at stackhpc.com] Sent: Sunday, June 9, 2019 3:23 PM To: Jay Pipes Cc: openstack-discuss Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning On Fri, 7 Jun 2019, 18:02 Jay Pipes, > wrote: On 6/7/19 11:23 AM, Eric Fried wrote: >> Better still, add a standardized trait to os-traits for hyperthreading >> support, which is what I'd recommended in the original >> cpu-resource-tracking spec. > > HW_CPU_HYPERTHREADING was added via [1] and has been in os-traits since > 0.8.0. I think we need a tri-state here. There are three options: 1. Give me a node with hyperthreading enabled 2. Give me a node with hyperthreading disabled 3. I don't care For me, the lack of a trait is 3 - I wouldn't want existing flavours without this trait to cause hyperthreading to be disabled. The ironic deploy templates feature wasn't designed to support forbidden traits - I don't think they were implemented at the time. The example use cases so far have involved encoding values into a trait name, e.g. CUSTOM_HYPERTHREADING_ON. Forbidden traits could be made to work in this case, but it doesn't really extend to non Boolean things such as RAID levels. I'm not trying to shoot down new ideas, just explaining how we got here. Excellent, I had a faint recollection of that... -jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-philippe at evrard.me Tue Jun 11 11:55:37 2019 From: jean-philippe at evrard.me (Jean-Philippe Evrard) Date: Tue, 11 Jun 2019 13:55:37 +0200 Subject: [uc][tc][ops] reviving osops- repos In-Reply-To: References: <20190530205552.falsvxcegehtyuge@yuggoth.org> <20190531123501.tawgvqgsw6yle2nu@csail.mit.edu> <20190531164102.5lwt2jyxk24u3vdz@yuggoth.org> Message-ID: > Alternatively, I feel like a SIG (be it the Ops Docs SIG or a new > "Operational tooling" SIG) would totally be a good idea to revive this. > In that case we'd define the repository in [4]. > > My personal preference would be for a new SIG, but whoever is signing up > to work on this should definitely have the final say. Agreed on having it inside OpenStack namespace, and code handled by a team/SIG/WG (with my preference being a SIG -- existing or not). When this team/SIG/WG retires, the repo would with it. It provides clean ownership, and clear cleanup when disbanding. Regards, Jean-Philippe Evrard (evrardjp) From gael.therond at gmail.com Tue Jun 11 12:09:35 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 11 Jun 2019 14:09:35 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Ok nice, do you have the commit hash? I would look at it and validate that it have been committed to Stein too so I could bump my service to stein using Kolla. Thanks! Le mar. 11 juin 2019 à 12:59, Carlos Goncalves a écrit : > On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND > wrote: > > > > Hi guys, > > > > Just a quick question regarding this bug, someone told me that it have > been patched within stable/rocky, BUT, were you talking about the > openstack/octavia repositoy or the openstack/kolla repository? > > Octavia. > > https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > > > > > Many Thanks! > > > > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a > écrit : > >> > >> Oh, that's perfect so, I'll just update my image and my platform as > we're using kolla-ansible and that's super easy. > >> > >> You guys rocks!! (Pun intended ;-)). > >> > >> Many many thanks to all of you, that will real back me a lot regarding > the Octavia solidity and Kolla flexibility actually ^^. > >> > >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves > a écrit : > >>> > >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > >>> > > >>> > Hi Lingxian Kong, > >>> > > >>> > That’s actually very interesting as I’ve come to the same conclusion > this morning during my investigation and was starting to think about a fix, > which it seems you already made! > >>> > > >>> > Is there a reason why it didn’t was backported to rocky? > >>> > >>> The patch was merged in master branch during Rocky development cycle, > >>> hence included in stable/rocky as well. > >>> > >>> > > >>> > Very helpful, many many thanks to you you clearly spare me hours of > works! I’ll get a review of your patch and test it on our lab. > >>> > > >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND > a écrit : > >>> >> > >>> >> Hi Felix, > >>> >> > >>> >> « Glad » you had the same issue before, and yes of course I looked > at the HM logs which is were I actually found out that this event was > triggered by octavia (Beside the DB data that validated that) here is my > log trace related to this event, It doesn't really shows major issue IMHO. > >>> >> > >>> >> Here is the stacktrace that our octavia service archived for our > both controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >>> >> > >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >>> >> > >>> >> I well may have miss something in it, but I don't see something > strange on from my point of view. > >>> >> Feel free to tell me if you spot something weird. > >>> >> > >>> >> > >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >>> >>> > >>> >>> Hi Gael, > >>> >>> > >>> >>> > >>> >>> > >>> >>> we had a similar issue in the past. > >>> >>> > >>> >>> You could check the octiava healthmanager log (should be on the > same node where the worker is running). > >>> >>> > >>> >>> This component monitors the status of the Amphorae and restarts > them if they don’t trigger a callback after a specific time. This might > also happen if there is some connection issue between the two components. > >>> >>> > >>> >>> > >>> >>> > >>> >>> But normally it should at least restart the LB with new Amphorae… > >>> >>> > >>> >>> > >>> >>> > >>> >>> Hope that helps > >>> >>> > >>> >>> > >>> >>> > >>> >>> Felix > >>> >>> > >>> >>> > >>> >>> > >>> >>> From: Gaël THEROND > >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM > >>> >>> To: Openstack > >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly > deleted by octavia > >>> >>> > >>> >>> > >>> >>> > >>> >>> Hi guys, > >>> >>> > >>> >>> > >>> >>> > >>> >>> I’ve a weird situation here. > >>> >>> > >>> >>> > >>> >>> > >>> >>> I smoothly operate a large scale multi-region Octavia service > using the default amphora driver which imply the use of nova instances as > loadbalancers. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >>> >>> > >>> >>> > >>> >>> > >>> >>> However, yesterday one of those customers using the loadbalancer > in front of their ElasticSearch cluster poked me because this loadbalancer > suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were > no longer available but yet the anchor/member/pool and listeners settings > were still existing. > >>> >>> > >>> >>> > >>> >>> > >>> >>> So I investigated and found out that the loadbalancer amphoras > have been destroyed by the octavia user. > >>> >>> > >>> >>> > >>> >>> > >>> >>> The weird part is, both the master and the backup instance have > been destroyed at the same moment by the octavia service user. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Is there specific circumstances where the octavia service could > decide to delete the instances but not the anchor/members/pool ? > >>> >>> > >>> >>> > >>> >>> > >>> >>> It’s worrying me a bit as there is no clear way to trace why does > Octavia did take this action. > >>> >>> > >>> >>> > >>> >>> > >>> >>> I digged within the nova and Octavia DB in order to correlate the > action but except than validating my investigation it doesn’t really help > as there are no clue of why the octavia service did trigger the deletion. > >>> >>> > >>> >>> > >>> >>> > >>> >>> If someone have any clue or tips to give me I’ll be more than > happy to discuss this situation. > >>> >>> > >>> >>> > >>> >>> > >>> >>> Cheers guys! > >>> >>> > >>> >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgoncalves at redhat.com Tue Jun 11 12:13:28 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Tue, 11 Jun 2019 14:13:28 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: You can find the commit hash from the link I provided. The patch is available from Queens so it is also available in Stein. On Tue, Jun 11, 2019 at 2:10 PM Gaël THEROND wrote: > > Ok nice, do you have the commit hash? I would look at it and validate that it have been committed to Stein too so I could bump my service to stein using Kolla. > > Thanks! > > Le mar. 11 juin 2019 à 12:59, Carlos Goncalves a écrit : >> >> On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND wrote: >> > >> > Hi guys, >> > >> > Just a quick question regarding this bug, someone told me that it have been patched within stable/rocky, BUT, were you talking about the openstack/octavia repositoy or the openstack/kolla repository? >> >> Octavia. >> >> https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 >> >> > >> > Many Thanks! >> > >> > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a écrit : >> >> >> >> Oh, that's perfect so, I'll just update my image and my platform as we're using kolla-ansible and that's super easy. >> >> >> >> You guys rocks!! (Pun intended ;-)). >> >> >> >> Many many thanks to all of you, that will real back me a lot regarding the Octavia solidity and Kolla flexibility actually ^^. >> >> >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves a écrit : >> >>> >> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND wrote: >> >>> > >> >>> > Hi Lingxian Kong, >> >>> > >> >>> > That’s actually very interesting as I’ve come to the same conclusion this morning during my investigation and was starting to think about a fix, which it seems you already made! >> >>> > >> >>> > Is there a reason why it didn’t was backported to rocky? >> >>> >> >>> The patch was merged in master branch during Rocky development cycle, >> >>> hence included in stable/rocky as well. >> >>> >> >>> > >> >>> > Very helpful, many many thanks to you you clearly spare me hours of works! I’ll get a review of your patch and test it on our lab. >> >>> > >> >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND a écrit : >> >>> >> >> >>> >> Hi Felix, >> >>> >> >> >>> >> « Glad » you had the same issue before, and yes of course I looked at the HM logs which is were I actually found out that this event was triggered by octavia (Beside the DB data that validated that) here is my log trace related to this event, It doesn't really shows major issue IMHO. >> >>> >> >> >>> >> Here is the stacktrace that our octavia service archived for our both controllers servers, with the initial loadbalancer creation trace (Worker.log) and both controllers triggered task (Health-Manager.log). >> >>> >> >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ >> >>> >> >> >>> >> I well may have miss something in it, but I don't see something strange on from my point of view. >> >>> >> Feel free to tell me if you spot something weird. >> >>> >> >> >>> >> >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner a écrit : >> >>> >>> >> >>> >>> Hi Gael, >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> we had a similar issue in the past. >> >>> >>> >> >>> >>> You could check the octiava healthmanager log (should be on the same node where the worker is running). >> >>> >>> >> >>> >>> This component monitors the status of the Amphorae and restarts them if they don’t trigger a callback after a specific time. This might also happen if there is some connection issue between the two components. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> But normally it should at least restart the LB with new Amphorae… >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Hope that helps >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Felix >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> From: Gaël THEROND >> >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM >> >>> >>> To: Openstack >> >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Hi guys, >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I’ve a weird situation here. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I smoothly operate a large scale multi-region Octavia service using the default amphora driver which imply the use of nova instances as loadbalancers. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Everything is running really well and our customers (K8s and traditional users) are really happy with the solution so far. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> However, yesterday one of those customers using the loadbalancer in front of their ElasticSearch cluster poked me because this loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were no longer available but yet the anchor/member/pool and listeners settings were still existing. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> So I investigated and found out that the loadbalancer amphoras have been destroyed by the octavia user. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> The weird part is, both the master and the backup instance have been destroyed at the same moment by the octavia service user. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Is there specific circumstances where the octavia service could decide to delete the instances but not the anchor/members/pool ? >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> It’s worrying me a bit as there is no clear way to trace why does Octavia did take this action. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> I digged within the nova and Octavia DB in order to correlate the action but except than validating my investigation it doesn’t really help as there are no clue of why the octavia service did trigger the deletion. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> If someone have any clue or tips to give me I’ll be more than happy to discuss this situation. >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Cheers guys! >> >>> >>> >> >>> >>> Hinweise zum Datenschutz finden Sie hier. From gael.therond at gmail.com Tue Jun 11 12:15:46 2019 From: gael.therond at gmail.com (=?UTF-8?Q?Ga=C3=ABl_THEROND?=) Date: Tue, 11 Jun 2019 14:15:46 +0200 Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia In-Reply-To: References: Message-ID: Oh, really sorry, I was looking at your answer from my mobile mailing app and it didn't shows, sorry ^^ Many thanks for your help! Le mar. 11 juin 2019 à 14:13, Carlos Goncalves a écrit : > You can find the commit hash from the link I provided. The patch is > available from Queens so it is also available in Stein. > > On Tue, Jun 11, 2019 at 2:10 PM Gaël THEROND > wrote: > > > > Ok nice, do you have the commit hash? I would look at it and validate > that it have been committed to Stein too so I could bump my service to > stein using Kolla. > > > > Thanks! > > > > Le mar. 11 juin 2019 à 12:59, Carlos Goncalves > a écrit : > >> > >> On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND > wrote: > >> > > >> > Hi guys, > >> > > >> > Just a quick question regarding this bug, someone told me that it > have been patched within stable/rocky, BUT, were you talking about the > openstack/octavia repositoy or the openstack/kolla repository? > >> > >> Octavia. > >> > >> > https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701 > >> > >> > > >> > Many Thanks! > >> > > >> > Le mar. 4 juin 2019 à 15:19, Gaël THEROND a > écrit : > >> >> > >> >> Oh, that's perfect so, I'll just update my image and my platform as > we're using kolla-ansible and that's super easy. > >> >> > >> >> You guys rocks!! (Pun intended ;-)). > >> >> > >> >> Many many thanks to all of you, that will real back me a lot > regarding the Octavia solidity and Kolla flexibility actually ^^. > >> >> > >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves > a écrit : > >> >>> > >> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND > wrote: > >> >>> > > >> >>> > Hi Lingxian Kong, > >> >>> > > >> >>> > That’s actually very interesting as I’ve come to the same > conclusion this morning during my investigation and was starting to think > about a fix, which it seems you already made! > >> >>> > > >> >>> > Is there a reason why it didn’t was backported to rocky? > >> >>> > >> >>> The patch was merged in master branch during Rocky development > cycle, > >> >>> hence included in stable/rocky as well. > >> >>> > >> >>> > > >> >>> > Very helpful, many many thanks to you you clearly spare me hours > of works! I’ll get a review of your patch and test it on our lab. > >> >>> > > >> >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND > a écrit : > >> >>> >> > >> >>> >> Hi Felix, > >> >>> >> > >> >>> >> « Glad » you had the same issue before, and yes of course I > looked at the HM logs which is were I actually found out that this event > was triggered by octavia (Beside the DB data that validated that) here is > my log trace related to this event, It doesn't really shows major issue > IMHO. > >> >>> >> > >> >>> >> Here is the stacktrace that our octavia service archived for our > both controllers servers, with the initial loadbalancer creation trace > (Worker.log) and both controllers triggered task (Health-Manager.log). > >> >>> >> > >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/ > >> >>> >> > >> >>> >> I well may have miss something in it, but I don't see something > strange on from my point of view. > >> >>> >> Feel free to tell me if you spot something weird. > >> >>> >> > >> >>> >> > >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner > a écrit : > >> >>> >>> > >> >>> >>> Hi Gael, > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> we had a similar issue in the past. > >> >>> >>> > >> >>> >>> You could check the octiava healthmanager log (should be on the > same node where the worker is running). > >> >>> >>> > >> >>> >>> This component monitors the status of the Amphorae and restarts > them if they don’t trigger a callback after a specific time. This might > also happen if there is some connection issue between the two components. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> But normally it should at least restart the LB with new > Amphorae… > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Hope that helps > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Felix > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> From: Gaël THEROND > >> >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM > >> >>> >>> To: Openstack > >> >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances > unexpectedly deleted by octavia > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Hi guys, > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I’ve a weird situation here. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I smoothly operate a large scale multi-region Octavia service > using the default amphora driver which imply the use of nova instances as > loadbalancers. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Everything is running really well and our customers (K8s and > traditional users) are really happy with the solution so far. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> However, yesterday one of those customers using the > loadbalancer in front of their ElasticSearch cluster poked me because this > loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the > amphoras were no longer available but yet the anchor/member/pool and > listeners settings were still existing. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> So I investigated and found out that the loadbalancer amphoras > have been destroyed by the octavia user. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> The weird part is, both the master and the backup instance have > been destroyed at the same moment by the octavia service user. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Is there specific circumstances where the octavia service could > decide to delete the instances but not the anchor/members/pool ? > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> It’s worrying me a bit as there is no clear way to trace why > does Octavia did take this action. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> I digged within the nova and Octavia DB in order to correlate > the action but except than validating my investigation it doesn’t really > help as there are no clue of why the octavia service did trigger the > deletion. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> If someone have any clue or tips to give me I’ll be more than > happy to discuss this situation. > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> Cheers guys! > >> >>> >>> > >> >>> >>> Hinweise zum Datenschutz finden Sie hier. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Tue Jun 11 13:54:15 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 11 Jun 2019 09:54:15 -0400 Subject: [openstack-ansible] suse support for stable/queens In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019 at 4:29 AM Dirk Müller wrote: > > Hi Mohammed, > > Am Sa., 8. Juni 2019 um 19:32 Uhr schrieb Mohammed Naser : > > > 1. Someone can volunteer to implement LXC 3 support in stable/queens > > in order to get opensuse-42 working again > > 2. We move the opensuse-42 jobs to non-voting for 1/2 weeks and if no > > one fixes them, we drop them (because they're a waste of CI > > resources). > > I suggest to stop caring about opensuse 42.x on stable/queens and > older as we'd like to deprecate > 42.x (it is going to be end of life and falling out of security > support in the next few days) and focus on leap 15.x only. https://review.opendev.org/#/c/664599/ done > Greetings, > Dirk -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From pierre at stackhpc.com Tue Jun 11 14:12:20 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Tue, 11 Jun 2019 15:12:20 +0100 Subject: [requirements] paramiko 2.5.0 causing ImportError: cannot import name py31compat Message-ID: Hello, paramiko 2.5.0 was released yesterday [1]. It appears to trigger failures in the Kayobe molecule job with the following error [2]: ImportError: cannot import name py31compat It's not clear yet why this is happening, since py31compat lives in setuptools. paramiko 2.5.0 includes changes to paramiko/py3compat.py which could be related. For now, we're capping paramiko [3] as it is blocking our gate. I thought I would share with the list, in case other projects experience similar errors. Cheers, Pierre [1] https://pypi.org/project/paramiko/#history [2] http://logs.openstack.org/17/664417/1/check/kayobe-tox-molecule/0370fdd/job-output.txt.gz [3] https://review.opendev.org/#/c/664533/ From aschultz at redhat.com Tue Jun 11 14:37:15 2019 From: aschultz at redhat.com (Alex Schultz) Date: Tue, 11 Jun 2019 08:37:15 -0600 Subject: [tripleo] Outstanding specs & blueprints for the Train cycle Message-ID: Hey folks, I wanted to send a note about a last call for specs for the train cycle. In a previous mail back in May[0], I had mentioned that the plan was to try and have all the blueprints and specs finalized by Train milestone 1. Since milestone 1 was last week, this is your final call for specs & blueprints. Please let me know if there are any outstanding items by next week's IRC meeting on June 18, 2019. I will be applying a -2 to any outstanding specs that have not merged or been spoken for. Thanks, -Alex [0] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006223.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Tue Jun 11 15:33:49 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 11 Jun 2019 15:33:49 +0000 Subject: [ironic][neutron] Security groups on bare metal instances Message-ID: Hi all, We've been scratching our heads for a while, trying to figure out how security groups for bare metal instances are supposed to work. The configuration guide for Networking[1] implies that using the 'iptables_hybrid' firewall driver should work. We are using Neutron tenant networks[2] with Ironic. My understanding is that the iptables_hybrid driver creates a new OVS port (with prefix qvo), logically connects that to the integration bridge, and then creates a veth pair inside a new network namespace, and that veth device then gets some iptables rules to handle the security group rules. It is not clear to me how or when that qvo "hybrid" port is even created; I've combed through the Neutron code base for a while looking for clues. We had tried using the "pure" OVS firewall solution, where security group rules are expessed using OpenFlow flows. However, this doesn't work, as there is not OVS port for a bare metal instance (at least, not in our setup.) We are using networking-generic-switch[3], which provisions ports on a physical switch with a VLAN tag on the provider network. From OVS' perspective, the traffic exits OVS with that VLAN tag and that's that; OVS in this situation is only responsible for handling routing between provider networks and performing NAT for egress and ingress via Floating IP assignments. So, I'm wondering if others have had success getting security groups to work in a bare metal environment, and have any clues we could follow to get this working nicely. I'm beginning to suspect our problems have to do with the fact that we're doing VLAN isolation predominately via configuring physical switches, and as such there isn't a clear point where security groups can be inserted. The problem we are trying to solve is limiting ingress traffic on a Floating IP, so we only allow SSH from a given host, or only allow ports X and Y to be open externally, etc. Thanks in advance, as usual, for any insights! /Jason [1]: https://docs.openstack.org/ironic/latest/install/configure-networking.html [2]: https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html [3]: https://docs.openstack.org/networking-generic-switch/latest/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Tue Jun 11 15:43:02 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Tue, 11 Jun 2019 15:43:02 +0000 Subject: [nova][ironic] Lock-related performance issue with update_resources periodic job References: Message-ID: Hi Surya, On 5/13/19 3:15 PM, Surya Seetharaman wrote: We faced the same problem at CERN when we upgraded to rocky (we have ~2300 nodes on a single compute) like Eric said, and we set the [compute]resource_provider_association_refresh to a large value (this definitely helps by stopping the syncing of traits/aggregates and provider tree cache info stuff in terms of chattiness with placement) and inspite of that it doesn't scale that well for us. We still find the periodic task taking too much of time which causes the locking to hold up the claim for instances in BUILD state (the exact same problem you described). While one way to tackle this like you said is to set the "update_resources_interval" to a higher value - we were not sure how much out of sync things would get with placement, so it will be interesting to see how this spans out for you - another way out would be to use multiple computes and spread the nodes around (though this is also a pain to maintain IMHO) which is what we are looking into presently. I wanted to let you know that we've been running this way in production for a few weeks now and it's had a noticeable improvement: instances are no longer sticking in the "Build" stage, pre-networking, for ages. We were able to track the improvement by comparing the Nova conductor logs ("Took {seconds} to build the instance" vs "Took {seconds} to spawn the instance on the hypervisor"; the delta should be as small as possible and in our case went from ~30 minutes to ~1 minute.) There have been a few cases where a resource provider claim got "stuck", but in practice it has been so infrequent that it potentially has other causes. As such, I can recommend increasing the interval time significantly. Currently we have it set to 6 hours. I have not yet looked in to bringing in the other Nova patches used at CERN (and available in Stein). I did take a look at updating the locking mechanism, but do not have work to show for this yet. Cheers, /Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Tue Jun 11 16:00:38 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 17:00:38 +0100 Subject: [ironic][neutron] Security groups on bare metal instances In-Reply-To: References: Message-ID: On Tue, 2019-06-11 at 15:33 +0000, Jason Anderson wrote: > Hi all, > > We've been scratching our heads for a while, trying to figure out how security groups for bare metal instances are > supposed to work. The configuration guide for Networking< > https://docs.openstack.org/ironic/latest/install/configure-networking.html>[1] implies that using the > 'iptables_hybrid' firewall driver should work. We are using Neutron tenant networks< > https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html>[2] with Ironic. My understanding is > that the iptables_hybrid driver creates a new OVS port (with prefix qvo), logically connects that to the integration > bridge, and then creates a veth pair inside a new network namespace, and that veth device then gets some iptables > rules to handle the security group rules. It is not clear to me how or when that qvo "hybrid" port is even created; > I've combed through the Neutron code base for a while looking for clues. > > We had tried using the "pure" OVS firewall solution, where security group rules are expessed using OpenFlow flows. > However, this doesn't work, as there is not OVS port for a bare metal instance (at least, not in our setup.) We are > using networking-generic-switch[3], which provisions > ports on a physical switch with a VLAN tag on the provider network. From OVS' perspective, the traffic exits OVS with > that VLAN tag and that's that; OVS in this situation is only responsible for handling routing between provider > networks and performing NAT for egress and ingress via Floating IP assignments. > > So, I'm wondering if others have had success getting security groups to work in a bare metal environment, and have any > clues we could follow to get this working nicely. in a baremetal enviornment the only way to implement security groups for the baremetal instance is to rely on an ml2 driver that supports implementing security groups at the top of rack switch. the iptables and and openvswtich firewall dirvers can only be used in a vm deployment. > I'm beginning to suspect our problems have to do with the fact that we're doing VLAN isolation predominately via > configuring physical switches, and as such there isn't a clear point where security groups can be inserted. some switch vendors can implement security gorups directly in the TOR i belive either arrista or cisco support this in there top of rack swtich driver. e.g. https://github.com/openstack/networking-arista/blob/master/networking_arista/ml2/security_groups/arista_security_groups.py > The problem we are trying to solve is limiting ingress traffic on a Floating IP, so we only allow SSH from a given > host, or only allow ports X and Y to be open externally, etc. as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn routers rather than at the port level. > > Thanks in advance, as usual, for any insights! > > /Jason > > [1]: https://docs.openstack.org/ironic/latest/install/configure-networking.html > [2]: https://docs.openstack.org/ironic/latest/install/configure-tenant-networks.html > [3]: https://docs.openstack.org/networking-generic-switch/latest/ From ekultails at gmail.com Tue Jun 11 16:05:48 2019 From: ekultails at gmail.com (Luke Short) Date: Tue, 11 Jun 2019 12:05:48 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: Hey Kevin/all, I propose we have the meeting at 14:00 UTC weekly on Thursdays for about 30 minutes. This is similar to the normal #tripleo meeting except instead of Tuesday it is on Thursday. I believe this time will accommodate the most amount of people. Let's keep it on #tripleo instead of the OpenStack meeting rooms to avoid the concerns others have had about missing information from the TripleO community. Everyone will be kept in the loop and the IRC logs will be easy to find since it's consolidated on TripleO. I would be happy to help lead the meetings and I have also added some thoughts to the Etherpad. How does everyone feel about having our first meeting on June 20th? Sincerely, Luke Short On Mon, Jun 10, 2019 at 5:02 PM Kevin Carter wrote: > With the now merged structural changes it is time to organize an official > meeting to get things moving. > > So without further ado: > * When should we schedule our meetings (day, hour, frequency)? > * Should the meeting take place in the main #tripleo channel or in one of > the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? > * How long should our meetings last? > * Any volunteers to chair meetings? > > To capture some of our thoughts, questions, hopes, dreams, and aspirations > I've created an etherpad which I'd like interested folks to throw ideas at: > [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to > see if we can get a confirmed list of folks who want to meet and, > potentially, a generally good timezone. I'd also like to see if we can nail > down some ideas for a plan of attack. While I have ideas and would be happy > to talk at length about them (I wrote a few things down in the etherpad), I > don't want to be the only voice given I'm new to the TripleO community (I > could be, and likely I am, missing a lot of context). > > Assuming we can get something flowing, I'd like to shoot for an official > meeting sometime next week (the week of 17 June, 2019). In the meantime, > I'll look forward to chatting with folks in the #tripleo channel. > > -- > > Kevin Carter > IRC: cloudnull > > > On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > >> Hey everyone, >> >> For the upcoming work on focusing on more Ansible automation and testing, >> I have created a dedicated #tripleo-transformation channel for our new >> squad. Feel free to join if you are interested in joining and helping out! >> >> +1 to removing repositories we don't use, especially if they have no >> working code. I'd like to see the consolidation of TripleO specific things >> into the tripleo-ansible repository and then using upstream Ansible roles >> for all of the different services (nova, glance, cinder, etc.). >> >> Sincerely, >> >> Luke Short, RHCE >> Software Engineer, OpenStack Deployment Framework >> Red Hat, Inc. >> >> >> On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: >> >>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >>> >>>> So the questions at hand are: what, if anything, should we do with >>>> these repositories? Should we retire them or just ignore them? Is there >>>> anyone using any of the roles? >>>> >>> >>> My initial reaction was to suggest we just ignore them, but on second >>> thought I'm wondering if there is anything negative if we leave them lying >>> around. Unless we're going to benefit from them in the future if we start >>> actively working in these repos, they represent obfuscation and debt, so it >>> might be best to retire / dispose of them. >>> >>> David >>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jun 11 17:39:31 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 11 Jun 2019 18:39:31 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: On Mon, 10 Jun 2019 at 06:18, Alex Xu wrote: > > > > Eric Fried 于2019年6月7日周五 上午1:59写道: >> >> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. >> > So I am not sure if the spec actually satisfies the use case. >> > I hope to get more response from the team to get more clarity. >> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >> effect? So (non-live) resize would actually satisfy your use case just >> fine. But the problem is that the ironic driver doesn't support resize >> at all? >> >> Without digging too hard, that seems like it would be a fairly >> straightforward thing to add. It would be limited to only "same host" >> and initially you could only change this one attribute (anything else >> would have to fail). >> >> Nova people, thoughts? >> > > Contribute another idea. > > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those > configuration isn't used for scheduling. Actually, Traits is designed for scheduling. > > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. > > That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. > > So another way is we should store the configuration into another place. Like the server's metadata. > > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. > > When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) > > But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. > > Anyway, just put my idea at here. We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations. What might be nice is if we could use a flavor extra spec like this: deploy-config:hyperthreading=enabled The nova ironic virt driver could pass this to ironic, like it does with traits. Then in the ironic deploy template, have fields like this: name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: Ironic would then match on the config-type and config-value to find a suitable deploy template. As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits? Haven't thought this through completely, I'm sure it has holes. > >> efried >> . >> From saikrishna.ura at cloudseals.com Tue Jun 11 15:17:55 2019 From: saikrishna.ura at cloudseals.com (Saikrishna Ura) Date: Tue, 11 Jun 2019 15:17:55 +0000 Subject: getting issues while configuring the Trove Message-ID: Hi, I installed Openstack in Ubuntu 18.04 by cloning the devstack repository with this url "git clone https://git.openstack.org/openstack-dev/devstack", but i can't able create or access with the trove, I'm getting issues with the installation. Can anyone help on this issue please. If any reference document or any guidance much appreciated. Thanks, Saikrishna U. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grayudu at opentext.com Tue Jun 11 17:23:31 2019 From: grayudu at opentext.com (Garaga Rayudu) Date: Tue, 11 Jun 2019 17:23:31 +0000 Subject: Barbican support for Window Message-ID: Hi Team, Is it supported for window OS. If Yes, please let me know more details about installation. Also let me know should I integrate with our product freely to support key management. Since it look like open source product. Thanks, Rayudu -------------- next part -------------- An HTML attachment was scrubbed... URL: From johfulto at redhat.com Tue Jun 11 17:55:48 2019 From: johfulto at redhat.com (John Fulton) Date: Tue, 11 Jun 2019 13:55:48 -0400 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019 at 12:12 PM Luke Short wrote: > > Hey Kevin/all, > > I propose we have the meeting at 14:00 UTC weekly on Thursdays for about 30 minutes. This is similar to the normal #tripleo meeting except instead of Tuesday it is on Thursday. I believe this time will accommodate the most amount of people. Let's keep it on #tripleo instead of the OpenStack meeting rooms to avoid the concerns others have had about missing information from the TripleO community. Everyone will be kept in the loop and the IRC logs will be easy to find since it's consolidated on TripleO. I would be happy to help lead the meetings and I have also added some thoughts to the Etherpad. > > How does everyone feel about having our first meeting on June 20th? I've updated line 12 Etherpad with possible days/times including the one you suggested (FWIW I have a recurring conflict at that time). Maybe people who are interested can update the etherpad and we announce the winning date at the end of the week? https://etherpad.openstack.org/p/tripleo-ansible-agenda John > > Sincerely, > Luke Short > > On Mon, Jun 10, 2019 at 5:02 PM Kevin Carter wrote: >> >> With the now merged structural changes it is time to organize an official meeting to get things moving. >> >> So without further ado: >> * When should we schedule our meetings (day, hour, frequency)? >> * Should the meeting take place in the main #tripleo channel or in one of the dedicated meeting rooms (openstack-meeting-{1,2,3,4}, etc)? >> * How long should our meetings last? >> * Any volunteers to chair meetings? >> >> To capture some of our thoughts, questions, hopes, dreams, and aspirations I've created an etherpad which I'd like interested folks to throw ideas at: [ https://etherpad.openstack.org/p/tripleo-ansible-agenda ]. I'd like to see if we can get a confirmed list of folks who want to meet and, potentially, a generally good timezone. I'd also like to see if we can nail down some ideas for a plan of attack. While I have ideas and would be happy to talk at length about them (I wrote a few things down in the etherpad), I don't want to be the only voice given I'm new to the TripleO community (I could be, and likely I am, missing a lot of context). >> >> Assuming we can get something flowing, I'd like to shoot for an official meeting sometime next week (the week of 17 June, 2019). In the meantime, I'll look forward to chatting with folks in the #tripleo channel. >> >> -- >> >> Kevin Carter >> IRC: cloudnull >> >> >> On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: >>> >>> Hey everyone, >>> >>> For the upcoming work on focusing on more Ansible automation and testing, I have created a dedicated #tripleo-transformation channel for our new squad. Feel free to join if you are interested in joining and helping out! >>> >>> +1 to removing repositories we don't use, especially if they have no working code. I'd like to see the consolidation of TripleO specific things into the tripleo-ansible repository and then using upstream Ansible roles for all of the different services (nova, glance, cinder, etc.). >>> >>> Sincerely, >>> >>> Luke Short, RHCE >>> Software Engineer, OpenStack Deployment Framework >>> Red Hat, Inc. >>> >>> >>> On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: >>>> >>>> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >>>>> >>>>> So the questions at hand are: what, if anything, should we do with these repositories? Should we retire them or just ignore them? Is there anyone using any of the roles? >>>> >>>> >>>> My initial reaction was to suggest we just ignore them, but on second thought I'm wondering if there is anything negative if we leave them lying around. Unless we're going to benefit from them in the future if we start actively working in these repos, they represent obfuscation and debt, so it might be best to retire / dispose of them. >>>> >>>> David From cboylan at sapwetik.org Tue Jun 11 18:02:14 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 11 Jun 2019 11:02:14 -0700 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: On Tue, Jun 11, 2019, at 10:50 AM, Saikrishna Ura wrote: > Hi, > > I installed Openstack in Ubuntu 18.04 by cloning the devstack > repository with this url "git clone > https://git.openstack.org/openstack-dev/devstack", but i can't able > create or access with the trove, I'm getting issues with the > installation. > > Can anyone help on this issue please. If any reference document or any > guidance much appreciated. Here are Trove's docs on using the Trove devstack plugin: https://opendev.org/openstack/trove/src/branch/master/devstack/README.rst If you haven't seen those yet I would start there. Clark From ildiko.vancsa at gmail.com Tue Jun 11 18:43:52 2019 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Tue, 11 Jun 2019 20:43:52 +0200 Subject: [edge][ironic][neutron][starlingx] Open Infrastructure Summit and PTG Edge overview and next steps Message-ID: Hi, There were a lot of interesting discussions about edge computing at the Open Infrastructure Summit[1] and PTG in Denver. Hereby I would like to use the opportunity to share overviews and some progress and next steps the community has taken since. You can find a summary of the Forum discussions here: https://superuser.openstack.org/articles/edge-and-5g-not-just-the-future-but-the-present/ Check the following blog post for a recap on the PTG sessions: https://superuser.openstack.org/articles/edge-computing-takeaways-from-the-project-teams-gathering/ The Edge Computing Group is working towards testing the minimal reference architectures for which we are putting together hardware requirements. You can catch up and chime in on the discussion on this mail thread: http://lists.openstack.org/pipermail/edge-computing/2019-June/000597.html For Ironic related conversations since the event check these threads: * http://lists.openstack.org/pipermail/edge-computing/2019-May/000582.html * http://lists.openstack.org/pipermail/edge-computing/2019-May/000588.html We are also in progress to write up an RFE for Neutron to improve segment range management for edge use cases: http://lists.openstack.org/pipermail/edge-computing/2019-May/000589.html If you have any questions or comments to any of the above topics you can respond to this thread, chime in on the mail above threads, reach out on the edge-computing mailing[2] list or join the weekly edge group calls[3]. If you would like to get involved with StarlingX you can find pointers on the website[4]. Thanks, Ildikó (IRC: ildikov on Freenode) [1] https://www.openstack.org/videos/summits/denver-2019 [2] http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing [3] https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings [4] https://www.starlingx.io/community/ From emilien at redhat.com Tue Jun 11 19:23:04 2019 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 11 Jun 2019 15:23:04 -0400 Subject: [tripleo] Proposing Kamil Sambor core on TripleO In-Reply-To: References: Message-ID: Kamil, you're now core. Thanks again for your work! On Wed, Jun 5, 2019 at 10:31 AM Emilien Macchi wrote: > Kamil has been working on TripleO for a while now and is providing really > insightful reviews, specially on Python best practices but not only; he is > one of the major contributors of the OVN integration, which was a ton of > work. I believe he has the right knowledge to review any TripleO patch and > provide excellent reviews in our project. We're lucky to have him with us > in the team! > > I would like to propose him core on TripleO, please raise any objection if > needed. > -- > Emilien Macchi > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue Jun 11 19:40:44 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 11 Jun 2019 20:40:44 +0100 Subject: [kolla] meeting tomorrow Message-ID: Hi, I'm unable to chair the IRC meeting tomorrow. If someone else can stand in that would be great, otherwise we'll cancel. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Tue Jun 11 19:46:10 2019 From: openstack at fried.cc (Eric Fried) Date: Tue, 11 Jun 2019 14:46:10 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> > What might be nice is if we could use a flavor extra spec like this: > > deploy-config:hyperthreading=enabled > > The nova ironic virt driver could pass this to ironic, like it does with traits. > > Then in the ironic deploy template, have fields like this: > > name: Hyperthreading enabled > config-type: hyperthreading > config-value: enabled > steps: > > Ironic would then match on the config-type and config-value to find a > suitable deploy template. > > As an extension, the deploy template could define a trait (or list of > traits) that must be supported by a node in order for the template to > be applied. Perhaps this would even be a standard relationship between > config-type and traits? This. As rubber has hit road for traits-related-to-config, the pattern that has emerged as (IMO) most sensible has looked a lot like the above. To get a bit more specific: - HW_CPU_HYPERTHREADING is a trait indicating that a node is *capable* of switching hyperthreading on. There is no trait, ever, anywhere, that indicates that is is on or off on a particular node. - The ironic virt driver tags the node RP with the trait when it detects that the node is capable. - The flavor (or image) indicates a desire to enable hyperthreading as Mark says: via a (non-Placement-ese) property that conveys information in a way that ironic can understand. - A request filter [1] interprets the non-Placement-ese property and adds HW_CPU_HYPERTHREADING as a required trait to the request if it's `enabled`, so the scheduler will ensure we land on a node that can handle it. - During spawn, the ironic virt driver communicates whatever/however to ironic based on the (non-Placement-ese) property in the flavor/image. Getting back to the original issue of this thread, this still means we need to implement some limited subset of `resize` for ironic to allow us to turn this thing on or off on an established instance. That resize should still go through the scheduler so that, for example, the above process will punt if you try to switch on hyperthreading on a node that isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). efried [1] https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/request_filter.py From mriedemos at gmail.com Tue Jun 11 20:07:20 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 11 Jun 2019 15:07:20 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> Message-ID: <8ed93d57-9e61-8927-449f-6bab082df88b@gmail.com> On 6/11/2019 2:46 PM, Eric Fried wrote: >> What might be nice is if we could use a flavor extra spec like this: >> >> deploy-config:hyperthreading=enabled >> >> The nova ironic virt driver could pass this to ironic, like it does with traits. >> >> Then in the ironic deploy template, have fields like this: >> >> name: Hyperthreading enabled >> config-type: hyperthreading >> config-value: enabled >> steps: >> >> Ironic would then match on the config-type and config-value to find a >> suitable deploy template. >> >> As an extension, the deploy template could define a trait (or list of >> traits) that must be supported by a node in order for the template to >> be applied. Perhaps this would even be a standard relationship between >> config-type and traits? > This. > > As rubber has hit road for traits-related-to-config, the pattern that > has emerged as (IMO) most sensible has looked a lot like the above. > > To get a bit more specific: > - HW_CPU_HYPERTHREADING is a trait indicating that a node is*capable* > of switching hyperthreading on. There is no trait, ever, anywhere, that > indicates that is is on or off on a particular node. > - The ironic virt driver tags the node RP with the trait when it detects > that the node is capable. > - The flavor (or image) indicates a desire to enable hyperthreading as > Mark says: via a (non-Placement-ese) property that conveys information > in a way that ironic can understand. > - A request filter [1] interprets the non-Placement-ese property and > adds HW_CPU_HYPERTHREADING as a required trait to the request if it's > `enabled`, so the scheduler will ensure we land on a node that can > handle it. > - During spawn, the ironic virt driver communicates whatever/however to > ironic based on the (non-Placement-ese) property in the flavor/image. > > Getting back to the original issue of this thread, this still means we > need to implement some limited subset of `resize` for ironic to allow us > to turn this thing on or off on an established instance. That resize > should still go through the scheduler so that, for example, the above > process will punt if you try to switch on hyperthreading on a node that > isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). This sounds similar to the ARQ device profile stuff from the nova/cyborg spec [1] - is it? Also, I'm reminded of the glare/artifactory discussion for baremetal node config we talked about at the PTG in Dublin [2] - how does this compare/contrast? [1] https://review.opendev.org/#/c/603955/ [2] https://etherpad.openstack.org/p/nova-ptg-rocky (~L250) -- Thanks, Matt From smooney at redhat.com Tue Jun 11 20:09:22 2019 From: smooney at redhat.com (Sean Mooney) Date: Tue, 11 Jun 2019 21:09:22 +0100 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <7c19bc1d-f543-a9ed-d1cd-4363c8499389@fried.cc> Message-ID: <558bddb0926cc8c211d9b699c9850bf523c2f22a.camel@redhat.com> On Tue, 2019-06-11 at 14:46 -0500, Eric Fried wrote: > > What might be nice is if we could use a flavor extra spec like this: > > > > deploy-config:hyperthreading=enabled > > > > The nova ironic virt driver could pass this to ironic, like it does with traits. > > > > Then in the ironic deploy template, have fields like this: > > > > name: Hyperthreading enabled > > config-type: hyperthreading > > config-value: enabled > > steps: > > > > Ironic would then match on the config-type and config-value to find a > > suitable deploy template. > > > > As an extension, the deploy template could define a trait (or list of > > traits) that must be supported by a node in order for the template to > > be applied. Perhaps this would even be a standard relationship between > > config-type and traits? > > This. > > As rubber has hit road for traits-related-to-config, the pattern that > has emerged as (IMO) most sensible has looked a lot like the above. > > To get a bit more specific: > - HW_CPU_HYPERTHREADING is a trait indicating that a node is *capable* > of switching hyperthreading on. There is no trait, ever, anywhere, that > indicates that is is on or off on a particular node. > - The ironic virt driver tags the node RP with the trait when it detects > that the node is capable. > - The flavor (or image) indicates a desire to enable hyperthreading as > Mark says: via a (non-Placement-ese) property that conveys information > in a way that ironic can understand. > - A request filter [1] interprets the non-Placement-ese property and > adds HW_CPU_HYPERTHREADING as a required trait to the request if it's > `enabled`, so the scheduler will ensure we land on a node that can > handle it. just an fyi we are adding a request filter to do ^ as part of the pcpu in placment spec. if you set hw:cpu_thread_polciy=require or hw:cpu_thread_policy=isolate that will be converteded to a required or forbiden trait. in the libvirt driver already uses this to influcne how we pin vms to host cores requing that they land on hyperthreads or requiing the vm uses dedicated cores. ironic could add support for this existing extaspec and the corresponding image property to enable or disabel hyperthreading or SMT to use the generic term. > - During spawn, the ironic virt driver communicates whatever/however to > ironic based on the (non-Placement-ese) property in the flavor/image. > > Getting back to the original issue of this thread, this still means we > need to implement some limited subset of `resize` or rebuild in the image metadata case > for ironic to allow us > to turn this thing on or off on an established instance. That resize > should still go through the scheduler so that, for example, the above > process will punt if you try to switch on hyperthreading on a node that > isn't capable (doesn't have the HW_CPU_HYPERTHREADING trait). > > efried > > [1] > https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/request_filter.py > From colleen at gazlene.net Tue Jun 11 20:10:11 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Tue, 11 Jun 2019 13:10:11 -0700 Subject: [dev][keystone] M-1 check-in and retrospective meeting In-Reply-To: References: <627ae3a7-b998-4323-8981-2d1cd7bc3085@www.fastmail.com> Message-ID: <19164694-e3ed-4ac0-82d4-813abb0ecc59@www.fastmail.com> Thanks to everyone who attended this check-in today. I hope it felt worthwhile and helps us accomplish our goal of keeping up momentum through the cycle. The recording is available here: https://www.dropbox.com/s/7yx596ei2uazpib/keystone-train-m-1%20on%202019-06-11%2017%3A04.mp4 We also recorded some notes in the agenda etherpad: https://etherpad.openstack.org/p/keystone-train-M-1-review-planning-meeting This meeting didn't cover any in-depth technical discussion, rather we mainly focused on revisiting our past decisions: https://trello.com/b/VCCcnCGd/keystone-stein-retrospective and realigning and refining our roadmap: https://trello.com/b/ClKW9C8x/keystone-train-roadmap If you have any feedback about the format of this meeting, please let me know. Colleen From kennelson11 at gmail.com Tue Jun 11 22:56:35 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 11 Jun 2019 15:56:35 -0700 Subject: [ptl] Shanghai PTG Changes Message-ID: Hello All, After Denver we were able to take time to reflect on the improvements we can make now that the PTG will occur immediately following the summit for the near future. While Shanghai will have its own set of variables, it's still good to reevaluate how we allocate time for groups and how we structure the week overall. tldr; - Onboarding is moving into the PTG for this round (updates stay a part of the Summit) - You can still do regular PTG stuff (or both onboarding and regular PTG stuff) - PTG slots can be as short as 1/4 of a day - More shared space at the Shanghai venue, less dedicated space - New breakdown: 1.5 days of Forum and 3.5 days of PTG - Survey will be out in a few weeks for requesting PTG space We'll have our traditional project team meetings at the PTG in Shanghai as the default format, that won't change. However, we know many of you don't expect to have all your regulars attend the PTG in Shanghai. To combat this and still help project teams make use of the PTG in the most effective way possible we are encouraging teams that want to meet but might not have all the people they need to have technical discussions to meet anyway and instead focus on a more thorough onboarding of our Chinese contributors. Project teams could also do a combination of the two, spend an hour and a half on onboarding (or however much time you see fit) and then have your regular technical discussions after. Project Updates will still be a part of the Summit like normal, its just the onboardings that will be compacted into the PTG for Shanghai. We are making PTG days more granular as well and will have the option to request 1/4 day slots in an effort to leave less empty space in the schedule. So if you are only doing onboarding, you probably only need 1/4 to 1/2 of a day. If you are doing just your regular technical discussions and still need three days, thats fine too. The venue itself (similar to Denver) will have a few large rooms for bigger teams to meet, however, most teams will meet in shared space. For those teams meeting to have only technical discussions and for teams that have larger groups, we will try to prioritize giving them their own dedicated space. For the shared spaces, we will add to the PTGbot more clearly defined locations within the shared space so its easier to find teams meeting there. I regret to inform you that, again, projection will be a very limited commodity. Yeah.. please don't shoot the messenger. Due to using mainly shared space, projection is just something we are not able to offer. The other change I haven't already mentioned is that we are going to have the PTG start a half day early. Instead of only being 3 days like in Denver, we are going to add more time to the PTG and subtract a half day from the Forum. Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit overlapping the first two days. I will be sending the PTG survey out to PTLs/Project Leads in a couple weeks with a few changes. -Kendall Nelson (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From kennelson11 at gmail.com Tue Jun 11 22:56:37 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Tue, 11 Jun 2019 15:56:37 -0700 Subject: [SIG][WG] Shanghai PTG Changes Message-ID: Hello All, After Denver we were able to take time to reflect on the improvements we can make now that the PTG will occur immediately following the summit for the near future. While Shanghai will have its own set of variables, it's still good to reevaluate how we allocate time for groups and how we structure the week overall. tldr; - No BoFs, try to focus more specific discussions into topics for the Forum and follow up with PTG slot for more general conversations - PTG slots can be as short as 1/4 of a day - More shared space at the Shanghai venue, less dedicated space - New breakdown: 1.5 days of Forum and 3.5 days of PTG - Survey will be out in a few weeks for requesting PTG space For many of you (and myself as FC SIG Chair) there were a lot of different ways to get time to talk about topics. There was the forum, there were BoF sessions you could request, and there was also the option of having PTG sessions. Using the FC SIG as an example, we had two forum sessions (I think?), a BoF, and a half day at the PTG. This was WAY too much time for us. We didn't realize it when we were asking for space all the different ways, but we ended up with a lot of redundant discussions and time in which we didn't do much but just chat (which was great, but not the best use of the time/space since we could have done that in a hallway and not a dedicated room). To account for this duplication, we are going to get rid of the BoF mechanism for asking for space since largely the topics discussed there could be more cleanly divided into feedback focused Forum sessions and PTG team discussion time. The tentative plan is to try to condense as many of the SIG/WGs PTG slots towards the start of the PTG as we can so that they will more or less immediately follow the forum so that you can focus on making action items out of the conversations had and the feedback received at the Forum. We will also offer a smaller granularity of time that you can request at the PTG. Previously, a half day slot was as small as you could request; this time we will be offering 1/4 day slots (we found with more than one SIG/WG that even at a half day they were done in an hour and a half with all that they needed to talk about). The venue itself (similar to Denver) will have a few large rooms for bigger teams to meet, however, most teams will meet in shared space. That being said, we will add to the PTGbot more clearly defined locations in the shared space so its easier to find groups in shared spaces. I regret to inform you that, again, projection will be a very limited commodity. Yeah.. please don't shoot the messenger. Due to using mainly shared space, projection is just something we are not able to offer. The other change I haven't already mentioned is that we are going to have the PTG start a half day early. Instead of only being 3 days like in Denver, we are going to add more time to the PTG and subtract a half day from the Forum. Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit overlapping the first two days. I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple weeks with a few changes. -Kendall Nelson (diablo_rojo) -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Tue Jun 11 22:57:30 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Wed, 12 Jun 2019 10:57:30 +1200 Subject: getting issues while configuring the Trove In-Reply-To: References: Message-ID: Hi Saikrishna, Here is a local.conf file for Trove installation in DevStack i usually use http://dpaste.com/14DW815.txt Best regards, Lingxian Kong Catalyst Cloud On Wed, Jun 12, 2019 at 5:58 AM Saikrishna Ura < saikrishna.ura at cloudseals.com> wrote: > Hi, > > I installed Openstack in Ubuntu 18.04 by cloning the devstack repository > with this url "git clone https://git.openstack.org/openstack-dev/devstack", > but i can't able create or access with the trove, I'm getting issues with > the installation. > > Can anyone help on this issue please. If any reference document or any > guidance much appreciated. > > Thanks, > > Saikrishna U. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Wed Jun 12 07:47:59 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Wed, 12 Jun 2019 09:47:59 +0200 Subject: [SIG][WG] Shanghai PTG Changes In-Reply-To: References: Message-ID: Hi, Thank you for the update! Could you please clarify how many days the whole event will take in the end, 5 or 6? Dmitry On 6/12/19 12:56 AM, Kendall Nelson wrote: > Hello All, > > After Denver we were able to take time to reflect on the improvements we can > make now that the PTG will occur immediately following the summit for the near > future.While Shanghai will have its own set of variables, it's still good to > reevaluate how we allocate time for groups and how we structure the week overall. > > tldr; > > - No BoFs, try to focus more specific discussions into topics for the Forum and > follow up with PTG slot for more general conversations > - PTG slots can be as short as 1/4 of a day > - More shared space at the Shanghai venue, less dedicated space > - New breakdown: 1.5 days of Forum and 3.5 days of PTG > - Survey will be out in a few weeks for requesting PTG space > > For many of you (and myself as FC SIG Chair) there were a lot of different ways > to get time to talk about topics. There was the forum, there were BoF sessions > you could request, and there was also the option of having PTG sessions. Using > the FC SIG as an example, we had two forum sessions (I think?), a BoF, and a > half day at the PTG. This was WAY too much time for us. We didn't realize it > when we were asking for space all the different ways, but we ended up with a lot > of redundant discussions and time in which we didn't do much but just chat > (which was great, but not the best use of the time/space since we could have > done thatin a hallway and not a dedicated room). > > To account for thisduplication, we are going to get rid of the BoF mechanism for > asking for space since largely the topics discussed there could be more cleanly > divided into feedback focused Forum sessions and PTG team discussion time. The > tentative plan is to try to condense as many of the SIG/WGs PTG slots towards > the start of the PTG as we can so that theywill more or less immediately follow > the forum so that you can focus on making action items out of the conversations > had and the feedback received at the Forum. > > We will also offer a smaller granularity of time that you can request at the > PTG. Previously, a half day slot was as small as you could request; this time we > will be offering 1/4 day slots (we found with more than one SIG/WG that even at > a half day they were done in an hour and a half with all that they needed to > talk about). > > The venue itself (similar to Denver) will have a few large rooms for bigger > teams to meet, however, most teams will meet in shared space. That being said, > we willadd to the PTGbot more clearly defined locations in the shared space so > its easier to find groups in shared spaces. > > I regret to inform you that, again, projection will be a very limited commodity. > Yeah.. please don't shoot the messenger. Due to using mainly shared space, > projectionis just something we are not able to offer. > > The other change I haven't already mentioned is that we are going to have the > PTG start a half day early. Instead of only being 3 days like in Denver, we are > going to add more time to the PTG and subtract a half day from the Forum. > Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit > overlapping the first two days. > > I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple weeks > with a few changes. > > -Kendall Nelson (diablo_rojo) > > From ssbarnea at redhat.com Wed Jun 12 07:59:03 2019 From: ssbarnea at redhat.com (Sorin Sbarnea) Date: Wed, 12 Jun 2019 08:59:03 +0100 Subject: [requirements] paramiko 2.5.0 causing ImportError: cannot import name py31compat In-Reply-To: References: Message-ID: <1b33bf6b-5472-44bb-9fc0-53bc564daaa6@Spark> I used the new paramiko succesfully with ansible-molecule, so if you spot a bug in it, please include a link to that bug, so we can follow it. Paramiko lacked a new release for a very long time and someone even ended up with a fork paramiko-ng due to that. Hopefully this is about to change and new releases will be more often... the. cryptography deprecation warnings were very annoying. -- sorin On 11 Jun 2019, 15:17 +0100, Pierre Riteau , wrote: > Hello, > > paramiko 2.5.0 was released yesterday [1]. It appears to trigger > failures in the Kayobe molecule job with the following error [2]: > > ImportError: cannot import name py31compat > > It's not clear yet why this is happening, since py31compat lives in > setuptools. paramiko 2.5.0 includes changes to paramiko/py3compat.py > which could be related. > For now, we're capping paramiko [3] as it is blocking our gate. > > I thought I would share with the list, in case other projects > experience similar errors. > > Cheers, > Pierre > > [1] https://pypi.org/project/paramiko/#history > [2] http://logs.openstack.org/17/664417/1/check/kayobe-tox-molecule/0370fdd/job-output.txt.gz > [3] https://review.opendev.org/#/c/664533/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Wed Jun 12 09:10:04 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Wed, 12 Jun 2019 09:10:04 +0000 Subject: [nova] Spec: Standardize CPU resource tracking Message-ID: Hi All, Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- For example: I have two compute nodes say A and B: On Stein: Compute node A configurations: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) Compute node B Configuration: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) On Train, two possible scenarios: Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) vcpu_pin_set=0-3 (Keep same settings as in Stein) Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) cpu_dedicated_set=0-3 (change to the new config option) 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above configuration. 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute B. 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return only compute node B from placement service where as it should have returned both compute Node A and B. Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. Propose changes: Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can change instance and request_spec flavor extra specs. 1. Remove cpu_policy from extra specs 2. Add “resources:PCPU=” in extra specs We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the reshape functionality. Please give us your feedback on the proposed solution so that we can update specs accordingly. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst Thanks and Regards, -Bhagyashri Shewale- Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Wed Jun 12 13:14:17 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Wed, 12 Jun 2019 13:14:17 +0000 Subject: [tc][ptls][all] Change to the health check process Message-ID: Hi all, The TC have made the decision to stop health tracking team projects. During discussions at the Train cycle PTG, the OpenStack TC concluded that its experiments in formally tracking problems within individual project teams was not providing enough value for the investment of effort it required. The wiki has subsequently been closed down [0]. The component of it which is still deemed valuable is having specific TC members officially assigned as liaisons to each project team, so in future that will continue but will be documented in the openstack/governance Git repository's project metadata and on the project pages linked from the OpenStack Project Teams page [1]. This was discussed at the most recent TC meeting [2] and it was agreed upon that SIGs are to be included in the liaison roster for team health checks. Please keep your eyes out for changes coming up in the governance repo and PTLS - also please keep your inboxes open for TC members to reach out and introduce themselves as your liaison. In the mean time - if you have any concerns, please do not hesitate to reach out to any one of the TC members. Cheers, Alex [0] https://wiki.openstack.org/wiki/OpenStack_health_tracker [1] https://governance.openstack.org/tc/reference/projects/ [2] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.log.html#l-23 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kecarter at redhat.com Wed Jun 12 13:41:10 2019 From: kecarter at redhat.com (Kevin Carter) Date: Wed, 12 Jun 2019 08:41:10 -0500 Subject: [tripleo][tripleo-ansible] Reigniting tripleo-ansible In-Reply-To: References: Message-ID: I've submitted reviews under the topic "retire-role" to truncate all of the ansible-role-tripleo-* repos, that set can be seen here [0]. When folks get a chance, I'd greatly appreciate folks have a look at these reviews. [0] - https://review.opendev.org/#/q/topic:retire-role+status:open -- Kevin Carter IRC: kecarter On Wed, Jun 5, 2019 at 11:27 AM Luke Short wrote: > Hey everyone, > > For the upcoming work on focusing on more Ansible automation and testing, > I have created a dedicated #tripleo-transformation channel for our new > squad. Feel free to join if you are interested in joining and helping out! > > +1 to removing repositories we don't use, especially if they have no > working code. I'd like to see the consolidation of TripleO specific things > into the tripleo-ansible repository and then using upstream Ansible roles > for all of the different services (nova, glance, cinder, etc.). > > Sincerely, > > Luke Short, RHCE > Software Engineer, OpenStack Deployment Framework > Red Hat, Inc. > > > On Wed, Jun 5, 2019 at 8:53 AM David Peacock wrote: > >> On Tue, Jun 4, 2019 at 7:53 PM Kevin Carter wrote: >> >>> So the questions at hand are: what, if anything, should we do with >>> these repositories? Should we retire them or just ignore them? Is there >>> anyone using any of the roles? >>> >> >> My initial reaction was to suggest we just ignore them, but on second >> thought I'm wondering if there is anything negative if we leave them lying >> around. Unless we're going to benefit from them in the future if we start >> actively working in these repos, they represent obfuscation and debt, so it >> might be best to retire / dispose of them. >> >> David >> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kashwinkumar10 at gmail.com Wed Jun 12 10:37:35 2019 From: kashwinkumar10 at gmail.com (Ashwinkumar Kandasami) Date: Wed, 12 Jun 2019 16:07:35 +0530 Subject: Openstack Octavia Configuration Message-ID: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> Hi, I am a graduate student trying to deploy openstack in my own environment. I waana configure openstack octavia with my existing openstack cloud. I done the openstack deployment using RDO project. I tried configure openstack with ovn neutron l2 agent with octavia but i getting alert like we can’t able to use octavia for ovn type neutron agent, then how can i use it?? Sent from Mail for Windows 10 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kashwinkumar10 at gmail.com Wed Jun 12 10:41:04 2019 From: kashwinkumar10 at gmail.com (Ashwinkumar Kandasamy) Date: Wed, 12 Jun 2019 16:11:04 +0530 Subject: Openstack - Octavia Message-ID: Hi, I am a private cloud engineer in india. I deployed openstack in my own environment. I done that through openstack RDO project with openstack network type 1 (provider network). I want to configure openstack LBaaS (Octavia) in an existing openstack. How can i do that? please help me for that. -- Thank You, *Ashwinkumar K* *Software Associate,* *ZippyOPS Consulting Services LLP,* *Chennai.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Wed Jun 12 15:01:23 2019 From: mark at stackhpc.com (Mark Goddard) Date: Wed, 12 Jun 2019 16:01:23 +0100 Subject: [kolla] meeting tomorrow In-Reply-To: References: Message-ID: On Tue, 11 Jun 2019 at 20:40, Mark Goddard wrote: > > Hi, > > I'm unable to chair the IRC meeting tomorrow. If someone else can stand in that would be great, otherwise we'll cancel. Since no one was available to chair, this week's meeting is cancelled. We'll meet again next week. > > Thanks, > Mark From sean.mcginnis at gmail.com Wed Jun 12 15:50:43 2019 From: sean.mcginnis at gmail.com (Sean McGinnis) Date: Wed, 12 Jun 2019 10:50:43 -0500 Subject: [Release-job-failures] Pre-release of openstack/horizon failed In-Reply-To: References: Message-ID: This appears to have been a network issue that prevented the installation of one of the requirements. Fungi was able to reenqueue the job and it passed the second time through. Everything looks good now, but if anything unusual is noticed later, please let us know in the #openstack-release channel. Sean On Wed, Jun 12, 2019 at 9:48 AM wrote: > Build failed. > > - release-openstack-python > http://logs.openstack.org/e3/e30d8258f5993736dc8982e280ae43fe1ed22395/pre-release/release-openstack-python/4f35eb5/ > : FAILURE in 3m 06s > - announce-release announce-release : SKIPPED > - propose-update-constraints propose-update-constraints : SKIPPED > > _______________________________________________ > Release-job-failures mailing list > Release-job-failures at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/release-job-failures > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0ne at e0ne.info Wed Jun 12 16:05:05 2019 From: e0ne at e0ne.info (Ivan Kolodyazhny) Date: Wed, 12 Jun 2019 19:05:05 +0300 Subject: [Release-job-failures] Pre-release of openstack/horizon failed In-Reply-To: References: Message-ID: Thanks for the notice, Sean! Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ On Wed, Jun 12, 2019 at 6:52 PM Sean McGinnis wrote: > This appears to have been a network issue that prevented the installation > of one of the requirements. > > Fungi was able to reenqueue the job and it passed the second time through. > Everything looks good now, > but if anything unusual is noticed later, please let us know in the > #openstack-release channel. > > Sean > > On Wed, Jun 12, 2019 at 9:48 AM wrote: > >> Build failed. >> >> - release-openstack-python >> http://logs.openstack.org/e3/e30d8258f5993736dc8982e280ae43fe1ed22395/pre-release/release-openstack-python/4f35eb5/ >> : FAILURE in 3m 06s >> - announce-release announce-release : SKIPPED >> - propose-update-constraints propose-update-constraints : SKIPPED >> >> _______________________________________________ >> Release-job-failures mailing list >> Release-job-failures at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/release-job-failures >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michjo at viviotech.net Wed Jun 12 17:23:57 2019 From: michjo at viviotech.net (Jordan Michaels) Date: Wed, 12 Jun 2019 10:23:57 -0700 (PDT) Subject: [Glance] Can Glance be installed on a server other than the controller? Message-ID: <1277790298.106982.1560360237286.JavaMail.zimbra@viviotech.net> For anyone who's interested, this issue turned out to be caused by the system times being different on the separate server. I had set up Chrony according to the docs but never verified it was actually working. While reviewing the logs I noticed the time stamps were different on each server and that is what pointed me in the right direction. Just wanted to post the solution for posterity. Hopefully this helps someone in the future. -Jordan From johnsomor at gmail.com Wed Jun 12 17:53:11 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Wed, 12 Jun 2019 10:53:11 -0700 Subject: Openstack Octavia Configuration In-Reply-To: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> References: <5d00d5ee.1c69fb81.34121.41bd@mx.google.com> Message-ID: Hi Ashwinkumar, Welcome to using OpenStack and Octavia. I can talk to Octavia, but I do not yet have much experience with RDO deployments. RDO has a page for LBaaS (though it is using the old neutron-lbaas with Octavia) here: https://www.rdoproject.org/networking/lbaas/ They also have a users mailing list that might provide more help for deploying with RDO: http://rdoproject.org/contribute/mailing-lists/ RDO also has an IRC channel on Freenode called #rdo. As for Octavia, Octavia integrates with neutron for networking. Any of the supported ML2 drivers for neutron should work fine with Octavia. If you would like to chat about Octavia, the team has a channel on Freenode IRC called #openstack-lbaas. We would be happy to help you get started. Michael On Wed, Jun 12, 2019 at 7:54 AM Ashwinkumar Kandasami wrote: > > Hi, > > I am a graduate student trying to deploy openstack in my own environment. I waana configure openstack octavia with my existing openstack cloud. I done the openstack deployment using RDO project. I tried configure openstack with ovn neutron l2 agent with octavia but i getting alert like we can’t able to use octavia for ovn type neutron agent, then how can i use it?? > > > > Sent from Mail for Windows 10 > > From kennelson11 at gmail.com Wed Jun 12 18:03:55 2019 From: kennelson11 at gmail.com (Kendall Nelson) Date: Wed, 12 Jun 2019 11:03:55 -0700 Subject: [SIG][WG] Shanghai PTG Changes In-Reply-To: References: Message-ID: The whole event will be 5 days. Monday to Friday. -Kendall (diablo_rojo) On Wed, Jun 12, 2019 at 12:50 AM Dmitry Tantsur wrote: > Hi, > > Thank you for the update! > > Could you please clarify how many days the whole event will take in the > end, 5 or 6? > > Dmitry > > On 6/12/19 12:56 AM, Kendall Nelson wrote: > > Hello All, > > > > After Denver we were able to take time to reflect on the improvements we > can > > make now that the PTG will occur immediately following the summit for > the near > > future.While Shanghai will have its own set of variables, it's still > good to > > reevaluate how we allocate time for groups and how we structure the week > overall. > > > > tldr; > > > > - No BoFs, try to focus more specific discussions into topics for the > Forum and > > follow up with PTG slot for more general conversations > > - PTG slots can be as short as 1/4 of a day > > - More shared space at the Shanghai venue, less dedicated space > > - New breakdown: 1.5 days of Forum and 3.5 days of PTG > > - Survey will be out in a few weeks for requesting PTG space > > > > For many of you (and myself as FC SIG Chair) there were a lot of > different ways > > to get time to talk about topics. There was the forum, there were BoF > sessions > > you could request, and there was also the option of having PTG sessions. > Using > > the FC SIG as an example, we had two forum sessions (I think?), a BoF, > and a > > half day at the PTG. This was WAY too much time for us. We didn't > realize it > > when we were asking for space all the different ways, but we ended up > with a lot > > of redundant discussions and time in which we didn't do much but just > chat > > (which was great, but not the best use of the time/space since we could > have > > done thatin a hallway and not a dedicated room). > > > > To account for thisduplication, we are going to get rid of the BoF > mechanism for > > asking for space since largely the topics discussed there could be more > cleanly > > divided into feedback focused Forum sessions and PTG team discussion > time. The > > tentative plan is to try to condense as many of the SIG/WGs PTG slots > towards > > the start of the PTG as we can so that theywill more or less immediately > follow > > the forum so that you can focus on making action items out of the > conversations > > had and the feedback received at the Forum. > > > > We will also offer a smaller granularity of time that you can request at > the > > PTG. Previously, a half day slot was as small as you could request; this > time we > > will be offering 1/4 day slots (we found with more than one SIG/WG that > even at > > a half day they were done in an hour and a half with all that they > needed to > > talk about). > > > > The venue itself (similar to Denver) will have a few large rooms for > bigger > > teams to meet, however, most teams will meet in shared space. That being > said, > > we willadd to the PTGbot more clearly defined locations in the shared > space so > > its easier to find groups in shared spaces. > > > > I regret to inform you that, again, projection will be a very limited > commodity. > > Yeah.. please don't shoot the messenger. Due to using mainly shared > space, > > projectionis just something we are not able to offer. > > > > The other change I haven't already mentioned is that we are going to > have the > > PTG start a half day early. Instead of only being 3 days like in Denver, > we are > > going to add more time to the PTG and subtract a half day from the > Forum. > > Basically the breakdown will be 1.5 Forum and 3.5 PTG with the Summit > > overlapping the first two days. > > > > I will be sending the PTG survey out to SIG Chairs/ WG Leads in a couple > weeks > > with a few changes. > > > > -Kendall Nelson (diablo_rojo) > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 12 18:49:24 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 12 Jun 2019 20:49:24 +0200 Subject: [Kolla] ansible compatibility Message-ID: Hello All, I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. Installing kolla-ansible with pip it does not install ansible. Reverse Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed Jun 12 18:52:24 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 12 Jun 2019 20:52:24 +0200 Subject: [Kolla] ansible compatibility In-Reply-To: References: Message-ID: ---------- Forwarded message --------- Da: Ignazio Cassano Date: Mer 12 Giu 2019 20:49 Subject: [Kolla] ansible compatibility To: Hello All, I'd like to know if there is a Matrix for kolla-ansible and ansible version ...in other words, which ansible version must be used for a kolla-ansible version. For example ocata used kolla-ansible 4.0.5 but I do not know which version of ansible must be used. Installing kolla-ansible with pip it does not install ansible. Reverse Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed Jun 12 20:38:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 12 Jun 2019 15:38:30 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? Message-ID: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Before [1] when deleting a compute service in the API we did not check to see if the compute service was hosting any instances and just blindly deleted the service and related compute_node(s) records which orphaned the resource provider(s) for those nodes. With [2] we built on that and would cleanup the (first [3]) compute node resource provider by first deleting any allocations for instances still on that host - which because of the check in [1] should be none - and then deleted the resource provider itself. [2] forgot about ironic where a single compute service can be managing multiple (hundreds or even thousands) of baremetal compute nodes so I wrote [3] to delete *all* resource providers for compute nodes tied to the service - again barring there being any instances running on the service because of the check added in [1]. What we've failed to realize until recently is that there are cases where deleting the resource provider can still fail because there are allocations we haven't cleaned up, namely: 1. Residual allocations for evacuated instances from a source host. 2. Allocations held by a migration record for an unconfirmed (or not yet complete) migration. Because the delete_resource_provider method isn't checking for those, we can get ResourceProviderInUse errors which are then ignored [4]. Since that error is ignored, we continue on to delete the compute service record [5], effectively orphaning the providers (which is what [2] was meant to fix). I have recreated the evacuate scenario in a functional test here [6]. The question is what should we do about the fix? I'm getting lost thinking about this in a vacuum so trying to get some others to help think about it. Clearly with [1] we said you shouldn't be able to delete a compute service that has instances on it because that corrupts our resource tracking system. If we extend that to any allocations held against providers for that compute service, then the fix might be as simple as not ignoring the ResourceProviderInUse error and fail if we can't delete the provider(s). The question I'm struggling with is what does an operator do for the two cases mentioned above, not-yet-complete migrations and evacuated instances? For migrations, that seems pretty simple - wait for the migration to complete and confirm it (reverting a cold migration or resize would put the instance back on the compute service host you're trying to delete). The nastier thing is the allocations tied to an evacuated instance since those don't get cleaned up until the compute service is restarted [7]. If the operator never intends on restarting that compute service and just wants to clear the data, then they have to manually delete the allocations for the resource providers associated with that host before they can delete the compute service, which kind of sucks. What are our options? 1. Don't delete the compute service if we can't cleanup all resource providers - make sure to not orphan any providers. Manual cleanup may be necessary by the operator. 2. Change delete_resource_provider cascade=True logic to remove all allocations for the provider before deleting it, i.e. for not-yet-complete migrations and evacuated instances. For the evacuated instance allocations this is likely OK since restarting the source compute service is going to do that cleanup anyway. Also, if you delete the source compute service during a migration, confirming or reverting the resize later will likely fail since we'd be casting to something that is gone (and we'd orphan those allocations). Maybe we need a functional recreate test for the unconfirmed migration scenario before deciding on this? 3. Other things I'm not thinking of? Should we add a force parameter to the API to allow the operator to forcefully delete (#2 above) if #1 fails? Force parameters are hacky and usually seem to cause more problems than they solve, but it does put the control in the operators hands. If we did remove allocations for an instance when deleting it's compute service host, the operator should be able to get them back by running the "nova-manage placement heal_allocations" CLI - assuming they restart the compute service on that host. This would have to be tested of course. Help me Obi-Wan Kenobi. You're my only hope. [1] https://review.opendev.org/#/q/I0bd63b655ad3d3d39af8d15c781ce0a45efc8e3a [2] https://review.opendev.org/#/q/I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 [3] https://review.opendev.org/#/c/657016/ [4] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/scheduler/client/report.py#L2180 [5] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/api/openstack/compute/services.py#L279 [6] https://review.opendev.org/#/c/663737/ [7] https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/compute/manager.py#L706 -- Thanks, Matt From robson.rbarreto at gmail.com Wed Jun 12 20:40:04 2019 From: robson.rbarreto at gmail.com (Robson Ramos Barreto) Date: Wed, 12 Jun 2019 17:40:04 -0300 Subject: [openstack-helm] custom container images for helm Message-ID: Hi all I saw in the docker hub that there is just until rocky ubuntu xenial version. I'd like to know how can I create my own images centos-based from new versions like Stein to be used with the helm charts, if is there any specific customization to works with helm or, for example, if can I use the kolla images. Thank you Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilkers.steve at gmail.com Wed Jun 12 21:13:49 2019 From: wilkers.steve at gmail.com (Steve Wilkerson) Date: Wed, 12 Jun 2019 16:13:49 -0500 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hey Robson, We’ve recently started building images out of the openstack-helm-images repository. Currently, we use LOCI to build ubuntu based images for releases Ocata through Rocky and leap15 images for the Rocky release. We’ve recently started work on the multi-distro support spec which also added overrides and jobs required for the leap15 based images for Rocky. We’d love to see support added for centos images added to both openstack-helm-images and the openstack-helm charts themselves (and for releases beyond Rocky), but we just haven’t gotten there yet. If you’re interested in contributing and getting your hands dirty, we’d love to help provide guidance and help here. In regards to the Kolla images, it’s been awhile since I’ve used them myself so I can’t speak much there. Cheers, Steve On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < robson.rbarreto at gmail.com> wrote: > Hi all > > I saw in the docker hub that there is just until rocky ubuntu xenial > version. > > I'd like to know how can I create my own images centos-based from new > versions like Stein to be used with the helm charts, if is there any > specific customization to works with helm or, for example, if can I use > the kolla images. > > Thank you > > Regards > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Wed Jun 12 21:19:58 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 12 Jun 2019 21:19:58 +0000 Subject: [ironic][neutron] Security groups on bare metal instances References: Message-ID: Hi Sean, thanks for the reply. On 6/11/19 11:00 AM, Sean Mooney wrote: as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn routers rather than at the port level. This was a good idea! I found that it actually worked to solve our use-case. I set up FWaaS and configured a firewall group with the rules I wanted. Then I added my subnets's router_interface port to the firewall. Thank you! Re: the general issue of doing security groups in Ironic, I was wondering if this is something that others envision eventually being the job of networking-baremetal[1]. I looked and the storyboard[2] for the project doesn't show any planned work for this, but I saw it mentioned in this presentation[3] from 2017. Cheers, /Jason [1]: https://docs.openstack.org/networking-baremetal/latest/ [2]: https://storyboard.openstack.org/#!/project/955 [3]: https://www.slideshare.net/nyechiel/openstack-networking-the-road-ahead -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsneddon at redhat.com Wed Jun 12 22:03:16 2019 From: dsneddon at redhat.com (Dan Sneddon) Date: Wed, 12 Jun 2019 15:03:16 -0700 Subject: [ironic][neutron] Security groups on bare metal instances In-Reply-To: References: Message-ID: I helped to design the python-networking-ansible driver for ML2 + bare metal networking automation [1]. The idea behind it is a more production-grade alternative to networking-generic-switch that works with multiple makes/models of switches in the same environment. Behind the scenes, Ansible Networking is used to provide a vendor-neutral interface. I have tried to architect security groups for bare metal, but it’s a difficult challenge. I’d appreciate if anyone has suggestions. The main question is where to apply the security groups? Ideally, security groups would be applied at the port-level where the baremetal node is attached (we already configure VLAN assignment at the port level). Unfortunately, port security implementations vary wildly between vendors, and implementations may support only L2 filters, or very basic L3 filters only. The next logical place to apply the security group is at the VLAN router interface. That wouldn’t prevent hosts on the same network from talking to one another (access would be wide open between hosts on the same VLAN), but it would allow firewalling of hosts between networks. The challenge with this is that the plugin would have to know not only the switch and port where the baremetal node is attached, but also the switch/router where the VLAN router interface is located (or switches/routers in an HA environment). The baremetal port info is collected via Ironic Inspector, or it may be specified by the operator. How would we obtain the switch info and interface name for the VLAN L3 interface? What if there are multiple switch routers running with HA? Would the switch/interface have to be passed to Neutron when the network is created? I would love to discuss some ideas about how this could be implemented. [1] - https://pypi.org/project/networking-ansible/ On Wed, Jun 12, 2019 at 2:21 PM Jason Anderson wrote: > Hi Sean, thanks for the reply. > > On 6/11/19 11:00 AM, Sean Mooney wrote: > > as an alternitive you migth be able to use the firewall as a service api to implemtn traffic filtering in the neutorn > routers rather than at the port level. > > This was a good idea! I found that it actually worked to solve our > use-case. I set up FWaaS and configured a firewall group with the rules I > wanted. Then I added my subnets's router_interface port to the firewall. > Thank you! > > Re: the general issue of doing security groups in Ironic, I was wondering > if this is something that others envision eventually being the job of > networking-baremetal[1]. I looked and the storyboard[2] for the project > doesn't show any planned work for this, but I saw it mentioned in this > presentation[3] from 2017. > > Cheers, > /Jason > > [1]: https://docs.openstack.org/networking-baremetal/latest/ > [2]: https://storyboard.openstack.org/#!/project/955 > [3]: > https://www.slideshare.net/nyechiel/openstack-networking-the-road-ahead > -- Dan Sneddon | Senior Principal Software Engineer dsneddon at redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Wed Jun 12 22:16:13 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Wed, 12 Jun 2019 18:16:13 -0400 Subject: [doc][release][requirements] new warning and impending incompatible change in sphinxcontrib-datatemplates Message-ID: The new 0.4.0 release of sphinxcontrib-datatemplates will emit a deprecation warning message when the "datatemplate" directive is used. This may break jobs that run sphinx-build with the -W option enabled. That package includes support for the new form of the directive, which includes a different variation depending on the type of the data source, allowing different options to be used for each directive. See https://doughellmann.com/blog/2019/06/12/sphinxcontrib-datatemplates-0-4-0/ for details about the release and https://sphinxcontribdatatemplates.readthedocs.io/en/latest/index.html for details about the new syntax. The 1.0.0 release (not yet scheduled) will drop the legacy form of the directive entirely. -- Doug From openstack at fried.cc Wed Jun 12 22:36:20 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 12 Jun 2019 17:36:20 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? This seems like a win to me. If we can distinguish between the migratey ones and the evacuatey ones, maybe we fail on the former (forcing them to wait for completion) and automatically delete the latter (which is almost always okay for the reasons you state; and recoverable via heal if it's not okay for some reason). efried . From zigo at debian.org Wed Jun 12 22:50:16 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 13 Jun 2019 00:50:16 +0200 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Hi Matt, Hoping I can bring an operator's perspective. On 6/12/19 10:38 PM, Matt Riedemann wrote: > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. I'd say that this option is ok-ish *IF* the operators are given good enough directives saying what to do. It would really suck if we just get an error, and don't know what resource cleanup is needed. But if the error is: Cannot delete nova-compute on host mycloud-compute-5. Instances still running: 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 f8483448-39b5-4981-a731-5f4eeb28592c Currently live-migrating: 49a12659-9dc6-4b07-b38b-e0bf2a69820a Not confirmed migration/resize: cc3d4311-e252-4922-bf04-dedc31b3a425 then that's fine, we know what to do. And better: the operator will know better than nova what to do. Maybe live-migrate the instances? Or maybe just destroy them? Nova shouldn't attempt to double-guess what the operator has in mind. > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? I don't see how this is going to help more than an evacuate command. Or is the intend to do the evacuate, then right after it, the deletion of the resource provider? > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. Let's say the --force is just doing the resize --confirm for the operator, or do an evacuate, then that's fine (and in fact, a good idea, automations are great...). If it's going to create a mess in the DB, then it's IMO a terrible idea. However, I see a case that may happen: image a compute node is completely broken (think: broken motherboard...), then probably we do want to remove everything that's in there, and want to handle the case where nova-compute doesn't even respond. This very much is a real life scenario. If your --force is to address this case, then why not! Though again and of course, we don't want a mess in the db... :P I hope this helps, Cheers, Thomas Goirand (zigo) From mnaser at vexxhost.com Wed Jun 12 23:26:06 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Wed, 12 Jun 2019 19:26:06 -0400 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, Jun 12, 2019 at 4:44 PM Matt Riedemann wrote: > > Before [1] when deleting a compute service in the API we did not check > to see if the compute service was hosting any instances and just blindly > deleted the service and related compute_node(s) records which orphaned > the resource provider(s) for those nodes. > > With [2] we built on that and would cleanup the (first [3]) compute node > resource provider by first deleting any allocations for instances still > on that host - which because of the check in [1] should be none - and > then deleted the resource provider itself. > > [2] forgot about ironic where a single compute service can be managing > multiple (hundreds or even thousands) of baremetal compute nodes so I > wrote [3] to delete *all* resource providers for compute nodes tied to > the service - again barring there being any instances running on the > service because of the check added in [1]. > > What we've failed to realize until recently is that there are cases > where deleting the resource provider can still fail because there are > allocations we haven't cleaned up, namely: > > 1. Residual allocations for evacuated instances from a source host. > > 2. Allocations held by a migration record for an unconfirmed (or not yet > complete) migration. > > Because the delete_resource_provider method isn't checking for those, we > can get ResourceProviderInUse errors which are then ignored [4]. Since > that error is ignored, we continue on to delete the compute service > record [5], effectively orphaning the providers (which is what [2] was > meant to fix). I have recreated the evacuate scenario in a functional > test here [6]. > > The question is what should we do about the fix? I'm getting lost > thinking about this in a vacuum so trying to get some others to help > think about it. > > Clearly with [1] we said you shouldn't be able to delete a compute > service that has instances on it because that corrupts our resource > tracking system. If we extend that to any allocations held against > providers for that compute service, then the fix might be as simple as > not ignoring the ResourceProviderInUse error and fail if we can't delete > the provider(s). > > The question I'm struggling with is what does an operator do for the two > cases mentioned above, not-yet-complete migrations and evacuated > instances? For migrations, that seems pretty simple - wait for the > migration to complete and confirm it (reverting a cold migration or > resize would put the instance back on the compute service host you're > trying to delete). > > The nastier thing is the allocations tied to an evacuated instance since > those don't get cleaned up until the compute service is restarted [7]. > If the operator never intends on restarting that compute service and > just wants to clear the data, then they have to manually delete the > allocations for the resource providers associated with that host before > they can delete the compute service, which kind of sucks. > > What are our options? > > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. I'm personally in favor of this. I think that currently a lot of operators don't really think of the placement service much (or perhaps don't really know what it's doing). There's a lack of transparency in the data that exists in that service, a lot of users will actually rely on the information fed by *nova* and not *placement*. Because of this, I've seen a lot of deployments with stale placement records or issues with clouds where the hypervisors are not efficiently used because of a bunch of stale resource allocations that haven't been cleaned up (and counting on deployers watching logs for warnings.. eh) I would be more in favor of failing a delete if it will cause the cloud to reach an inconsistent state than brute-force a delete leaving you in a messy state where you need to login to the database to unkludge things. > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? > > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. > > If we did remove allocations for an instance when deleting it's compute > service host, the operator should be able to get them back by running > the "nova-manage placement heal_allocations" CLI - assuming they restart > the compute service on that host. This would have to be tested of course. > > Help me Obi-Wan Kenobi. You're my only hope. > > [1] https://review.opendev.org/#/q/I0bd63b655ad3d3d39af8d15c781ce0a45efc8e3a > [2] https://review.opendev.org/#/q/I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 > [3] https://review.opendev.org/#/c/657016/ > [4] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/scheduler/client/report.py#L2180 > [5] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/api/openstack/compute/services.py#L279 > [6] https://review.opendev.org/#/c/663737/ > [7] > https://github.com/openstack/nova/blob/cb0cfc90e1e03e82c42187ec60f46fb8fd590a06/nova/compute/manager.py#L706 > > -- > > Thanks, > > Matt > -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From smooney at redhat.com Thu Jun 13 00:05:31 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 01:05:31 +0100 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> Message-ID: On Wed, 2019-06-12 at 17:36 -0500, Eric Fried wrote: > > 2. Change delete_resource_provider cascade=True logic to remove all > > allocations for the provider before deleting it, i.e. for > > not-yet-complete migrations and evacuated instances. For the evacuated > > instance allocations this is likely OK since restarting the source > > compute service is going to do that cleanup anyway. Also, if you delete > > the source compute service during a migration, confirming or reverting > > the resize later will likely fail since we'd be casting to something > > that is gone (and we'd orphan those allocations). Maybe we need a > > functional recreate test for the unconfirmed migration scenario before > > deciding on this? > > This seems like a win to me. > > If we can distinguish between the migratey ones and the evacuatey ones, > maybe we fail on the former (forcing them to wait for completion) and > automatically delete the latter (which is almost always okay for the > reasons you state; and recoverable via heal if it's not okay for some > reason). for a cold migration the allcoation will be associated with a migration object for evacuate which is basically a rebuild to a different host we do not have a migration object so the consumer uuid for the allcotion are still associated with the instace uuid not a migration uuid. so technically we can tell yes but only if we pull back the allcoation form placmenet and then iterate over them and check if we have a migration object or an instance that has the same uuid. in the evac case we shoudl also be able to tell that its an evac as the uuid will match an instance but the instnace host will not match the RP name the allcoation is associated with. so we can figure this out on the nova side by looking at either the instances table or migrations table or in the futrue when we have consumer types in placement that will also make this simplete to do as the info will be in the allocation itself. personally i like option 2 but yes we could selectivly force for evac only if we wanted. > > efried > . > > From corey.bryant at canonical.com Thu Jun 13 03:04:37 2019 From: corey.bryant at canonical.com (Corey Bryant) Date: Wed, 12 Jun 2019 23:04:37 -0400 Subject: [goal][python3] Train unit tests weekly update (goal-13) Message-ID: This is the goal-13 weekly update for the "Update Python 3 test runtimes for Train" goal [1]. There are 13 weeks remaining for completion of Train community goals [2]. == What's the Goal? == To ensure (in the Train cycle) that all official OpenStack repositories with Python 3 unit tests are exclusively using the 'openstack-python3-train-jobs' Zuul template or one of its variants (e.g. 'openstack-python3-train-jobs-neutron') to run unit tests, and that tests are passing. This will ensure that all official projects are running py36 and py37 unit tests in Train. For complete details please see [1]. == Ongoing Work == I have initial scripts working to automate patch generation for all supported projects. I plan to get them cleaned up and submitted for review next week, and I plan to start submitting patches next week. For reference my goal-tools scripts are located at: https://github.com/coreycb/goal-tools/commit/6eaf2535af02d5c48ebd9762e280c73859427268. I'll be off Thurs/Fri this week. Open patches needing reviews: https://review.openstack.org/#/q/topic:python3-train+is:open Failing patches: https://review.openstack.org/#/q/topic:python3-train+status:open+(+label:Verified-1+OR+label:Verified-2+) == Completed Work == Merged patches: https://review.openstack.org/#/q/topic:python3-train+is:merged == How can you help? == Please take a look at the failing patches and help fix any failing unit tests for your project(s). Python 3.7 unit tests will be self-testing in zuul. If you're interested in helping submit patches, please let me know. == Reference Material == [1] Goal description: https://governance.openstack.org/tc/goals/train/python3-updates.html [2] Train release schedule: https://releases.openstack.org/train/schedule.html (see R-5 for "Train Community Goals Completed") Storyboard: https://storyboard.openstack.org/#!/board/ Porting to Python 3.7: https://docs.python.org/3/whatsnew/3.7.html#porting-to-python-3-7 Python Update Process: https://opendev.org/openstack/governance/src/branch/master/resolutions/20181024-python-update-process.rst Train runtimes: https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/train.rst Thanks, Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Thu Jun 13 03:32:21 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Thu, 13 Jun 2019 15:32:21 +1200 Subject: [nova] Admin user cannot create vm with user's port? Message-ID: Hi Nova team, In Nova, even the admin user cannot specify user's port to create a vm, is that designed intentionally or sounds like a bug? Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Thu Jun 13 04:42:28 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Thu, 13 Jun 2019 04:42:28 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: Hi All, After revisiting the spec [1] again and again, I got to know few points please check and let me know about my understanding: Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. Considering multiple combinations of various configuration options, I think we will need to implement below business rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. Rule 1: If operator sets ``[compute] cpu_shared_set`` in Train. 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU inventory. Rule 2: If operator sets ``[compute] cpu_dedicated_set`` in Train. 1. Report inventory as PCPU 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is used as shared compute node in previous release. Rule 3: If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, ``vcpu_pin_set``) in Train. 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, otherwise report inventory as VCPU. 2. If no instances, report inventory as VCPU. Rule 4: If operator sets ``vcpu_pin_set`` config option in Train. 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU inventory. 2. If no instances, report inventory as PCPU. Rule 5: If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options in Train 1. Simply raise an error Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my previous email. As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances only. Do you think this could cause an issue in providing backward compatibility? Please provide your suggestions on the above business rules. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 Thanks and Regards, -Bhagyashri Shewale- ________________________________ From: Shewale, Bhagyashri Sent: Wednesday, June 12, 2019 6:10:04 PM To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; jaypipes at gmail.com Subject: [nova] Spec: Standardize CPU resource tracking Hi All, Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- For example: I have two compute nodes say A and B: On Stein: Compute node A configurations: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) Compute node B Configuration: vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) On Train, two possible scenarios: Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) vcpu_pin_set=0-3 (Keep same settings as in Stein) Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) cpu_dedicated_set=0-3 (change to the new config option) 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above configuration. 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute B. 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return only compute node B from placement service where as it should have returned both compute Node A and B. Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. Propose changes: Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can change instance and request_spec flavor extra specs. 1. Remove cpu_policy from extra specs 2. Add “resources:PCPU=” in extra specs We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the reshape functionality. Please give us your feedback on the proposed solution so that we can update specs accordingly. [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst Thanks and Regards, -Bhagyashri Shewale- Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Thu Jun 13 04:55:42 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Thu, 13 Jun 2019 13:55:42 +0900 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: Message-ID: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong wrote ---- > Hi Nova team, > In Nova, even the admin user cannot specify user's port to create a vm, is that designed intentionally or sounds like a bug? You can specify that in networks object( networks.port field) [1]. This takes port_id of the existing port. [1] https://developer.openstack.org/api-ref/compute/?expanded=create-server-detail - https://opendev.org/openstack/nova/src/commit/52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/compute/schemas/servers.py -gmann > > Best regards, > Lingxian KongCatalyst Cloud From soulxu at gmail.com Thu Jun 13 05:54:52 2019 From: soulxu at gmail.com (Alex Xu) Date: Thu, 13 Jun 2019 13:54:52 +0800 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: Mark Goddard 于2019年6月12日周三 下午2:45写道: > > > On Wed, 12 Jun 2019, 06:23 Alex Xu, wrote: > >> >> >> Mark Goddard 于2019年6月12日周三 上午1:39写道: >> >>> On Mon, 10 Jun 2019 at 06:18, Alex Xu wrote: >>> > >>> > >>> > >>> > Eric Fried 于2019年6月7日周五 上午1:59写道: >>> >> >>> >> > Looking at the specs, it seems it's mostly talking about changing >>> VMs resources without rebooting. However that's not the actual intent of >>> the Ironic use case I explained in the email. >>> >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot >>> can be either be done by Nova IronicDriver or Ironic deploy step can also >>> do it. >>> >> > So I am not sure if the spec actually satisfies the use case. >>> >> > I hope to get more response from the team to get more clarity. >>> >> >>> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >>> >> effect? So (non-live) resize would actually satisfy your use case just >>> >> fine. But the problem is that the ironic driver doesn't support resize >>> >> at all? >>> >> >>> >> Without digging too hard, that seems like it would be a fairly >>> >> straightforward thing to add. It would be limited to only "same host" >>> >> and initially you could only change this one attribute (anything else >>> >> would have to fail). >>> >> >>> >> Nova people, thoughts? >>> >> >>> > >>> > Contribute another idea. >>> > >>> > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and >>> CUSTOM_HYPERTHREADING_OFF are configuration. Those >>> > configuration isn't used for scheduling. Actually, Traits is designed >>> for scheduling. >>> > >>> > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this >>> trait is used for indicating the host support HT. About whether enable it >>> in the instance is configuration info. >>> > >>> > That is also pain for change the configuration in the flavor. The >>> flavor is the spec of instance's virtual resource, not the configuration. >>> > >>> > So another way is we should store the configuration into another >>> place. Like the server's metadata. >>> > >>> > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in >>> the flavor, and fill a server metadata 'hyperthreading_config=on' in server >>> metadata. The nova will find out a BM node support HT. And ironic based on >>> the server metadata 'hyperthreading_config=on' to enable the HT. >>> > >>> > When change the configuration of HT to off, the user can update the >>> server's metadata. Currently, the nova will send a rpc call to the compute >>> node and calling a virt driver interface when the server metadata is >>> updated. In the ironic virt driver, it can trigger a hyper-threading >>> configuration deploy step to turn the HT off, and do a reboot of the >>> instance. (The reboot is a step inside deploy-step, not part of ironic virt >>> driver flow) >>> > >>> > But yes, this changes some design to the original deploy-steps and >>> deploy-templates. And we fill something into the server's metadata which >>> I'm not sure nova people like it. >>> > >>> > Anyway, just put my idea at here. >>> >>> We did consider using metadata. The problem is that it is >>> user-defined, so there is no way for an operator to restrict what can >>> be done by a user. Flavors are operator-defined and so allow for >>> selection from a 'menu' of types and configurations. >>> >> >> The end user can change the BIOS config by the ipmi inside the guest OS, >> and do a reboot. It is already out of control for the operator. >> (Correct me if ironic doesn't allow the end user change the config inside >> the guest OS) >> > > It depends. Normally you can't configure BIOS via IPMI, but need to use a > vendor interface such as racadm or on hardware that supports it, Redfish. > Access to the management controller can and should be locked down though. > It's also usually possible to reconfigure via serial console, if this is > exposed to users. > It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...") > >> So Flavor should be thing to strict the resource( or resource's capable) >> which can be requested by the end user. For example, flavor will say I need >> a BM node has hyper-thread capable. But enable or disable can be controlled >> by the end user. >> >> >>> >>> What might be nice is if we could use a flavor extra spec like this: >>> >>> deploy-config:hyperthreading=enabled >>> >>> The nova ironic virt driver could pass this to ironic, like it does with >>> traits. >>> >>> Then in the ironic deploy template, have fields like this: >>> >>> name: Hyperthreading enabled >>> config-type: hyperthreading >>> config-value: enabled >>> steps: >>> >>> Ironic would then match on the config-type and config-value to find a >>> suitable deploy template. >>> >>> As an extension, the deploy template could define a trait (or list of >>> traits) that must be supported by a node in order for the template to >>> be applied. Perhaps this would even be a standard relationship between >>> config-type and traits? >>> >>> Haven't thought this through completely, I'm sure it has holes. >>> >>> > >>> >> efried >>> >> . >>> >> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.rydberg at citynetwork.eu Thu Jun 13 06:51:25 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 13 Jun 2019 08:51:25 +0200 Subject: [sigs][publiccloud][publiccloud-wg][publiccloud-sig][billing] Meeting today at 1400 UTC regarding billing initiative Message-ID: <506b17fb-12c4-8f90-1ac5-a2a332d0b0c3@citynetwork.eu> Hi all, This is a reminder for todays meeting for the Public Cloud SIG - 1400 UTC in #openstack-publiccloud. The topic of the day will be continued discussions regarding the billing initiative. More information about that at https://etherpad.openstack.org/p/publiccloud-sig-billing-implementation-proposal See you all later today! Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From madhuri.kumari at intel.com Thu Jun 13 07:13:24 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Thu, 13 Jun 2019 07:13:24 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> Hi All, Thank you everyone for your responses. We have created an etherpad[1] with suggested solution and concerns. I request Nova and Ironic developers to provide their input on the etherpad. [1] https://etherpad.openstack.org/p/ironic-nova-reset-configuration Regards, Madhuri From: Alex Xu [mailto:soulxu at gmail.com] Sent: Thursday, June 13, 2019 11:25 AM To: Mark Goddard ; openstack-discuss Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning Mark Goddard > 于2019年6月12日周三 下午2:45写道: On Wed, 12 Jun 2019, 06:23 Alex Xu, > wrote: Mark Goddard > 于2019年6月12日周三 上午1:39写道: On Mon, 10 Jun 2019 at 06:18, Alex Xu > wrote: > > > > Eric Fried > 于2019年6月7日周五 上午1:59写道: >> >> > Looking at the specs, it seems it's mostly talking about changing VMs resources without rebooting. However that's not the actual intent of the Ironic use case I explained in the email. >> > Yes, it requires a reboot to reflect the BIOS changes. This reboot can be either be done by Nova IronicDriver or Ironic deploy step can also do it. >> > So I am not sure if the spec actually satisfies the use case. >> > I hope to get more response from the team to get more clarity. >> >> Waitwait. The VM needs to be rebooted for the BIOS change to take >> effect? So (non-live) resize would actually satisfy your use case just >> fine. But the problem is that the ironic driver doesn't support resize >> at all? >> >> Without digging too hard, that seems like it would be a fairly >> straightforward thing to add. It would be limited to only "same host" >> and initially you could only change this one attribute (anything else >> would have to fail). >> >> Nova people, thoughts? >> > > Contribute another idea. > > So just as Jay said in this thread. Those CUSTOM_HYPERTHREADING_ON and CUSTOM_HYPERTHREADING_OFF are configuration. Those > configuration isn't used for scheduling. Actually, Traits is designed for scheduling. > > So yes, there should be only one trait. CUSTOM_HYPERTHREADING, this trait is used for indicating the host support HT. About whether enable it in the instance is configuration info. > > That is also pain for change the configuration in the flavor. The flavor is the spec of instance's virtual resource, not the configuration. > > So another way is we should store the configuration into another place. Like the server's metadata. > > So for the HT case. We only fill the CUSTOM_HYPERTHREADING trait in the flavor, and fill a server metadata 'hyperthreading_config=on' in server metadata. The nova will find out a BM node support HT. And ironic based on the server metadata 'hyperthreading_config=on' to enable the HT. > > When change the configuration of HT to off, the user can update the server's metadata. Currently, the nova will send a rpc call to the compute node and calling a virt driver interface when the server metadata is updated. In the ironic virt driver, it can trigger a hyper-threading configuration deploy step to turn the HT off, and do a reboot of the instance. (The reboot is a step inside deploy-step, not part of ironic virt driver flow) > > But yes, this changes some design to the original deploy-steps and deploy-templates. And we fill something into the server's metadata which I'm not sure nova people like it. > > Anyway, just put my idea at here. We did consider using metadata. The problem is that it is user-defined, so there is no way for an operator to restrict what can be done by a user. Flavors are operator-defined and so allow for selection from a 'menu' of types and configurations. The end user can change the BIOS config by the ipmi inside the guest OS, and do a reboot. It is already out of control for the operator. (Correct me if ironic doesn't allow the end user change the config inside the guest OS) It depends. Normally you can't configure BIOS via IPMI, but need to use a vendor interface such as racadm or on hardware that supports it, Redfish. Access to the management controller can and should be locked down though. It's also usually possible to reconfigure via serial console, if this is exposed to users. It sounds that breaking the operator control partially. (Sorry for drop the mallist thread again...I will paste a note to the wall "click the "Reply All"...") So Flavor should be thing to strict the resource( or resource's capable) which can be requested by the end user. For example, flavor will say I need a BM node has hyper-thread capable. But enable or disable can be controlled by the end user. What might be nice is if we could use a flavor extra spec like this: deploy-config:hyperthreading=enabled The nova ironic virt driver could pass this to ironic, like it does with traits. Then in the ironic deploy template, have fields like this: name: Hyperthreading enabled config-type: hyperthreading config-value: enabled steps: Ironic would then match on the config-type and config-value to find a suitable deploy template. As an extension, the deploy template could define a trait (or list of traits) that must be supported by a node in order for the template to be applied. Perhaps this would even be a standard relationship between config-type and traits? Haven't thought this through completely, I'm sure it has holes. > >> efried >> . >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Thu Jun 13 09:04:22 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 10:04:22 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, 12 Jun 2019, Matt Riedemann wrote: > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for not-yet-complete > migrations and evacuated instances. For the evacuated instance allocations > this is likely OK since restarting the source compute service is going to do > that cleanup anyway. Also, if you delete the source compute service during a > migration, confirming or reverting the resize later will likely fail since > we'd be casting to something that is gone (and we'd orphan those > allocations). Maybe we need a functional recreate test for the unconfirmed > migration scenario before deciding on this? I think this is likely the right choice. If the service is being deleted (not disabled) it shouldn't have a resource provider and to not have a resource provider it needs to not have allocations, and of those left over allocations that it does have are either bogus now, or will be soon enough, may as well get them gone in a consistent and predictable way. That said, we shouldn't make a habit of a removing allocations just so we can remove a resource provider whenever we want, only in special cases like this. If/when we're modelling shared disk as a shared resource provider does this get any more complicated? Does the part of an allocation that is DISK_GB need special handling. > 3. Other things I'm not thinking of? Should we add a force parameter to the > API to allow the operator to forcefully delete (#2 above) if #1 fails? Force > parameters are hacky and usually seem to cause more problems than they solve, > but it does put the control in the operators hands. I'm sort of maybe on this. A #1, with an option to inspect and then #2 seems friendly and potentially useful but how often is someone going to want to inspect versus just "whatevs, #2"? I don't know. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From cdent+os at anticdent.org Thu Jun 13 09:12:32 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 10:12:32 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On Wed, 12 Jun 2019, Mohammed Naser wrote: > On Wed, Jun 12, 2019 at 4:44 PM Matt Riedemann wrote: >> 1. Don't delete the compute service if we can't cleanup all resource >> providers - make sure to not orphan any providers. Manual cleanup may be >> necessary by the operator. > > I'm personally in favor of this. I think that currently a lot of > operators don't > really think of the placement service much (or perhaps don't really know what > it's doing). > > There's a lack of transparency in the data that exists in that service, a lot of > users will actually rely on the information fed by *nova* and not *placement*. I agree, and this is part of why I prefer #2 over #1. For someone dealing with a deleted nova compute service, placement shouldn't be something they need to be all that concerned with. Nova should be mediating the interactions with placement to correct the model of reality that it is storing there. That's what option 2 is doing: fixing the model, from nova. (Obviously this is an idealisation that we've not achieved, which is I why I used that horrible word "should", but I do think it is something we should be striving towards.) Please: https://en.wikipedia.org/wiki/Posting_style#Trimming_and_reformatting /me scurries back to Usenet -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From anlin.kong at gmail.com Thu Jun 13 09:22:16 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Thu, 13 Jun 2019 21:22:16 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> Message-ID: Yeah, the api allows to specify port. What i mean is, the vm creation will fail for admin user if port belongs to a non-admin user. An exception is raised from nova-compute. 在 2019年6月13日星期四,Ghanshyam Mann 写道: > ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong < > anlin.kong at gmail.com> wrote ---- > > Hi Nova team, > > In Nova, even the admin user cannot specify user's port to create a vm, > is that designed intentionally or sounds like a bug? > > You can specify that in networks object( networks.port field) [1]. This > takes port_id of the existing port. > > [1] https://developer.openstack.org/api-ref/compute/?expanded= > create-server-detail > - https://opendev.org/openstack/nova/src/commit/ > 52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/ > compute/schemas/servers.py > > -gmann > > > > > Best regards, > > Lingxian KongCatalyst Cloud > > -- Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From smooney at redhat.com Thu Jun 13 10:48:32 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 11:48:32 +0100 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> Message-ID: <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: > Yeah, the api allows to specify port. What i mean is, the vm creation will > fail for admin user if port belongs to a non-admin user. An exception is > raised from nova-compute. i believe this is intentional. we do not currently allow you to trasfer ownerwhip of a vm form one user or proejct to another. but i also believe we currently do not allow a vm to be create from resouces with different owners it would cause issue with quota if we did. in this case the port would belong to the non admin and is currently being consumed from there quota. it woudld then be used by a vm created by the admin user which could result in the admin user being over there quota without use knowing. e.g. it would allow them to "steal" qutoa form the other project/user by using there resoucse. where it get tricky is if that first user hits there quota for ports and wants to delete it. shoulw we allow them too? the own the port after all but if delete the port it would break the admins vm. mixing ownership in a singel vm is pretty messy so we dont allow that. its possible it is a bug but i would be highly surprised if we ever intentionally supported this. the only multi teanant share resoucse im aware of are neutron shared netwrok which have ports owned by the indivitual users not the owner of the shared netwrok and manial shares which be shared between multiple project. in both cases we are not adding the shared resouse directly to the vm and i dont know of a case that does work today that would suggest a port should work. > > 在 2019年6月13日星期四,Ghanshyam Mann 写道: > > > ---- On Thu, 13 Jun 2019 12:32:21 +0900 Lingxian Kong < > > anlin.kong at gmail.com> wrote ---- > > > Hi Nova team, > > > In Nova, even the admin user cannot specify user's port to create a vm, > > is that designed intentionally or sounds like a bug? > > > > You can specify that in networks object( networks.port field) [1]. This > > takes port_id of the existing port. > > > > [1] https://developer.openstack.org/api-ref/compute/?expanded= > > create-server-detail > > - https://opendev.org/openstack/nova/src/commit/ > > 52d8d3d7f65bed99c25f39e7e38f566346586009/nova/api/openstack/ > > compute/schemas/servers.py > > > > -gmann > > > > > > > > Best regards, > > > Lingxian KongCatalyst Cloud > > > > > > From smooney at redhat.com Thu Jun 13 11:21:09 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 12:21:09 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> On Thu, 2019-06-13 at 04:42 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > After revisiting the spec [1] again and again, I got to know few points please check and let me know about my > understanding: > > > Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node > is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define > ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. that is incorrect if the vcpu_pin_set is defiend it may be used for instance with hw:cpu_policy=dedicated or not. in train if vcpu_pin_set is defiend and cpu_dedicated_set is not defiend then we use vcpu_pin_set to define the inventory of both PCPUs and VCPUs > > > Considering multiple combinations of various configuration options, I think we will need to implement below business > rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. > > > Rule 1: > > If operator sets ``[compute] cpu_shared_set`` in Train. > > 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous > release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU > inventory. cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs > > > Rule 2: > > If operator sets ``[compute] cpu_dedicated_set`` in Train. > > 1. Report inventory as PCPU yes if cpu_dedicated_set is set we will report its value as PCPUs > > 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this > compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is > used as shared compute node in previous release. this was not part of the spec. we could do this but i think its not needed and operators should check this themselves. if we decide to do this check on startup it should only happen if vcpu_pin_set is defined. addtionally we can log an error but we should not prevent the compute node form working and contuing to spawn vms. > > > Rule 3: > > If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, > ``vcpu_pin_set``) in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error > that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, > otherwise report inventory as VCPU. again this is not in the spec and i dont think we should do this. if none of the values are set we should report all cpus as both VCPUs and PCPUs the vcpu_pin_set option was never intended to signal a host was used for cpu pinning it was intoduced for cpu pinning and numa affinity but it was orignally ment to apply to floaing instance and currently contople the number of VCPU reported to the resouce tracker which is used to set the capastiy of the VCPU inventory. you should read https://that.guru/blog/cpu-resources/ for a walk through of this. > > 2. If no instances, report inventory as VCPU. we could do this but i think it will be confusing as to what will happen after we spawn an instnace on the host in train. i dont think this logic should be condtional on the presence of vms. > > > Rule 4: > > If operator sets ``vcpu_pin_set`` config option in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute > node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU > inventory. agin this is not in the spec. what the spec says for if vcpu_pin_set is defiend is we will report inventorys of both VCPU and PCPUs for all cpus in the vcpu_pin_set > > 2. If no instances, report inventory as PCPU. again this should not be condtional on the presence of vms. > > > Rule 5: > > If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options > in Train > > 1. Simply raise an error this is the only case were we "rasise" and error and refuse to start the compute node. > > > Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my > previous email. we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. > > > As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or > non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per > business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances > only. Do you think this could cause an issue in providing backward compatibility? yes the rule you have listed above will cause issue for upgrades and we rejected similar rules in the spec. i have not read your previous email which ill look at next but we spent a long time debating how this should work in the spec design and i would prefer to stick to what the spec currently states. > > > Please provide your suggestions on the above business rules. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 > > > > Thanks and Regards, > > -Bhagyashri Shewale- > > ________________________________ > From: Shewale, Bhagyashri > Sent: Wednesday, June 12, 2019 6:10:04 PM > To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; > jaypipes at gmail.com > Subject: [nova] Spec: Standardize CPU resource tracking > > > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will > return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute > B. > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. From smooney at redhat.com Thu Jun 13 11:32:02 2019 From: smooney at redhat.com (Sean Mooney) Date: Thu, 13 Jun 2019 12:32:02 +0100 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: References: Message-ID: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) vcpu_pin_set does not mean that the host was used for pinned instances https://that.guru/blog/cpu-resources/ > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ -> resources=PCPU:$ > which ultimately will return only compute node B from placement service. that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming $ <=4 > Here, we expect it should have retuned both Compute A and Compute B. it will > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. that is what would have happend in the stien version of the spec and we changed the spec specifically to ensure that that wont happen. in the train version of the spec you will get both host as candates to prevent this upgrade impact. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. i am fairly stongly opposed to useing an online data migration to modify the request spec to reflect the host they landed on. this speficic problem is why the spec was changed in the train cycle to report dual inventoryis of VCPU and PCPU if vcpu_pin_set is the only option set or of no options are set. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. From ignaziocassano at gmail.com Thu Jun 13 12:22:53 2019 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 13 Jun 2019 14:22:53 +0200 Subject: [cinder] nfs in kolla Message-ID: Hello, I' just deployed ocata with kolla and my cinder backend is nfs. Volumes are created successfully but live migration does not work. While cinder_volume container mounts the cinder nfs backend, the cinder api not and during live migration the cinder api logs reports errors accessing volumes : Stderr: u"qemu-img: Could not open '/var/lib/cinder/mnt/451bacc11bd88b51ce7bdf31aa97cf39/volume-4889a547-0a0d-440e-8b50-413285b5979c' Any help, please ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu Jun 13 13:45:11 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 08:45:11 -0500 Subject: Are most grenade plugin projects doing upgrades wrong? Message-ID: While fixing how the watcher grenade plugin is cloning the watcher repo for the base (old) side [1] to hard-code from stable/rocky to stable/stien, I noticed that several other projects with grenade plugins aren't actually setting a stable branch when cloning the plugin for the old side [2]. So I tried removing the stable/stein branch from the base plugin clone in watcher [3] and grenade cloned from master for the base (old) side [4]. Taking designate as an example, I see the same thing happening for it's grenade run [5]. Is there something I'm missing here or are most of the openstack projects running grenade via plugin actually just upgrading from master to master rather than n-1 to master? [1] https://review.opendev.org/#/c/664610/1/devstack/upgrade/settings [2] http://codesearch.openstack.org/?q=base%20enable_plugin&i=nope&files=&repos= [3] https://review.opendev.org/#/c/664610/2/devstack/upgrade/settings [4] http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/grenade.sh.txt.gz#_2019-06-12_19_19_36_874 [5] http://logs.openstack.org/47/662647/6/check/designate-grenade-pdns4/0b8968f/logs/grenade.sh.txt.gz#_2019-06-09_23_10_03_034 -- Thanks, Matt From ildiko.vancsa at gmail.com Thu Jun 13 14:13:49 2019 From: ildiko.vancsa at gmail.com (Ildiko Vancsa) Date: Thu, 13 Jun 2019 16:13:49 +0200 Subject: [edge] China Mobile Edge platform evaluation presentation next Tuesday on the Edge WG call Message-ID: Hi, I attended a presentation today from Qihui Zhao about China Mobile’s experience on evaluation different edge deployment models with various software components. As many of the evaluated components are part of OpenStack and/or StarlingX I invited her for next week’s Edge Computing Group call (Tuesday, June 18) to share their findings with the working group and everyone who is interested. For agenda and call details please visit this wiki: https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings Please let me know if you have any questions. Thanks and Best Regards, Ildikó From robson.rbarreto at gmail.com Thu Jun 13 14:28:49 2019 From: robson.rbarreto at gmail.com (Robson Ramos Barreto) Date: Thu, 13 Jun 2019 11:28:49 -0300 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hi Steve Ok Thank you. I'll have a look into the openstack-images repository. Yes sure. For now I'm evaluating if helm attend our needs so if it is I can contributing. Thank you Regards On Wed, Jun 12, 2019 at 6:14 PM Steve Wilkerson wrote: > Hey Robson, > > We’ve recently started building images out of the openstack-helm-images > repository. Currently, we use LOCI to build ubuntu based images for > releases Ocata through Rocky and leap15 images for the Rocky release. > > We’ve recently started work on the multi-distro support spec which also > added overrides and jobs required for the leap15 based images for Rocky. > We’d love to see support added for centos images added to both > openstack-helm-images and the openstack-helm charts themselves (and for > releases beyond Rocky), but we just haven’t gotten there yet. If you’re > interested in contributing and getting your hands dirty, we’d love to help > provide guidance and help here. > > In regards to the Kolla images, it’s been awhile since I’ve used them > myself so I can’t speak much there. > > Cheers, > Steve > > On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < > robson.rbarreto at gmail.com> wrote: > >> Hi all >> >> I saw in the docker hub that there is just until rocky ubuntu xenial >> version. >> >> I'd like to know how can I create my own images centos-based from new >> versions like Stein to be used with the helm charts, if is there any >> specific customization to works with helm or, for example, if can I use >> the kolla images. >> >> Thank you >> >> Regards >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilkers.steve at gmail.com Thu Jun 13 14:31:43 2019 From: wilkers.steve at gmail.com (Steve Wilkerson) Date: Thu, 13 Jun 2019 09:31:43 -0500 Subject: [openstack-helm] custom container images for helm In-Reply-To: References: Message-ID: Hey Robson, That sounds great. Please don’t hesitate to reach out to us on #openstack-helm if you’ve got any questions or concerns we can address as you look to see if openstack-helm can fit your use case(s) Steve On Thu, Jun 13, 2019 at 9:29 AM Robson Ramos Barreto < robson.rbarreto at gmail.com> wrote: > Hi Steve > > Ok Thank you. I'll have a look into the openstack-images repository. > > Yes sure. For now I'm evaluating if helm attend our needs so if it is I > can contributing. > > Thank you > > Regards > > > > On Wed, Jun 12, 2019 at 6:14 PM Steve Wilkerson > wrote: > >> Hey Robson, >> >> We’ve recently started building images out of the openstack-helm-images >> repository. Currently, we use LOCI to build ubuntu based images for >> releases Ocata through Rocky and leap15 images for the Rocky release. >> >> We’ve recently started work on the multi-distro support spec which also >> added overrides and jobs required for the leap15 based images for Rocky. >> We’d love to see support added for centos images added to both >> openstack-helm-images and the openstack-helm charts themselves (and for >> releases beyond Rocky), but we just haven’t gotten there yet. If you’re >> interested in contributing and getting your hands dirty, we’d love to help >> provide guidance and help here. >> >> In regards to the Kolla images, it’s been awhile since I’ve used them >> myself so I can’t speak much there. >> >> Cheers, >> Steve >> >> On Wed, Jun 12, 2019 at 3:45 PM Robson Ramos Barreto < >> robson.rbarreto at gmail.com> wrote: >> >>> Hi all >>> >>> I saw in the docker hub that there is just until rocky ubuntu xenial >>> version. >>> >>> I'd like to know how can I create my own images centos-based from new >>> versions like Stein to be used with the helm charts, if is there any >>> specific customization to works with helm or, for example, if can I use >>> the kolla images. >>> >>> Thank you >>> >>> Regards >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Thu Jun 13 16:01:32 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Thu, 13 Jun 2019 16:01:32 +0000 Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC In-Reply-To: <02f401d52174$9c8b1060$d5a13120$@openstack.org> References: <02f401d52174$9c8b1060$d5a13120$@openstack.org> Message-ID: Hi Alan, Thanks for your response :) I hope you don't mind, I'm replying back to you and the openstack-discuss list so the other TC members also working on the Help Most Wanted text can review your thoughts. In the etherpad, you can just start writing and it will give you an individual colour assigned to your user. If you feel like we won't see/know who you are, just put your initials next to your comments :) Thanks so much for offering to help. I've transferred over your thoughts below into the etherpad for now. Thanks, Alex On 13/06/2019 00:14, Alan Clark wrote: > Hey Alexandra, > > As I mentioned during the TC meeting I would love to help with the "Help Most Wanted" text. > > I took a look at the etherpad for the documentation role [1] > > I would like to offer a couple suggested changes. I wasn't sure where to post them, so am pinging you directly. > > My first comment is around the audience. Who is most likely to fit and fill this role. I think the audience we are after for this posting are those that take the OpenStack documentation to develop for re-use and distribution. Those are the most likely persons to convince to take a higher contribution and leadership role. Which is what this posting targets. > > The opening description section conveys the documentation teams pain and struggles. I think a more effective opening would be to convey the business and personal benefits the audience gains from contributing. They have to sell this to their boss. Boss wants to solve their pain not the documentation teams. If we agree that the most likely audience to contribute in this posted role, then their benefits are more complete documentation with less effort. Being able to leverage and re-use the community contributed text means much less text that the audience person has to write. Helping the community effort helps steer the contributed text to fill the needs and gaps that you find of most need. The first paragraph starts in the right direction but I suggest replacing the second paragraph with these ideas. I’m sure you can elaborate these ideas better than me. > > My second thought is that this posting should convey that it’s easy to do and get started. In fact you could turn it into an FAQ style. The First timers page is full of great material and a good page to point to: https://docs.openstack.org/doc-contrib-guide/quickstart/first-timers.html > Address the question of how easy it is to get started, that they use the tools they commonly use and here’s where to get their questions and concerns answered. > > Thanks, > AlanClark > > [1] https://etherpad.openstack.org/p/2019-upstream-investment-opportunities-refactor > > > > > >> -----Original Message----- >> From: Alexandra Settle [mailto:a.settle at outlook.com] >> Sent: Thursday, June 06, 2019 9:51 AM >> To: openstack-discuss at lists.openstack.org >> Subject: [tc] Recap for Technical Committee Meeting 6 June 2019 @ 1400 UTC >> >> Hello all, >> >> Thanks to those who joined the TC meeting today and running through it with me >> at the speed of light. Gif game was impeccably strong and that's primarily what I >> like about this community. >> >> For a recap of the meeting, please see the eavesdrop [0] for full detailed logs and >> action items. All items in the agenda [1] were covered and no major concerns >> raised. >> >> Next meeting will be on the 8th of July 2019. >> >> Cheers, >> >> Alex >> >> [0] http://eavesdrop.openstack.org/meetings/tc/2019/tc.2019-06-06-14.00.txt >> >> [1] >> http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006877.html >> >> > From elod.illes at ericsson.com Thu Jun 13 16:13:33 2019 From: elod.illes at ericsson.com (=?utf-8?B?RWzDtWQgSWxsw6lz?=) Date: Thu, 13 Jun 2019 16:13:33 +0000 Subject: Are most grenade plugin projects doing upgrades wrong? In-Reply-To: References: Message-ID: Actually, if you check [4] and [5], they seemingly check out master, but just below the linked line like 20 lines, there's a 'git show --oneline' and 'head -1' which shows that the branch is clearly stable/stein. So it seems there's no need for explicit branch settings in [1]... but I haven't found the code lines which cause this behavior, yet... BR, Előd On 2019. 06. 13. 15:45, Matt Riedemann wrote: > While fixing how the watcher grenade plugin is cloning the watcher > repo for the base (old) side [1] to hard-code from stable/rocky to > stable/stien, I noticed that several other projects with grenade > plugins aren't actually setting a stable branch when cloning the > plugin for the old side [2]. So I tried removing the stable/stein > branch from the base plugin clone in watcher [3] and grenade cloned > from master for the base (old) side [4]. Taking designate as an > example, I see the same thing happening for it's grenade run [5]. > > Is there something I'm missing here or are most of the openstack > projects running grenade via plugin actually just upgrading from > master to master rather than n-1 to master? > > [1] https://review.opendev.org/#/c/664610/1/devstack/upgrade/settings > [2] > http://codesearch.openstack.org/?q=base%20enable_plugin&i=nope&files=&repos= > [3] https://review.opendev.org/#/c/664610/2/devstack/upgrade/settings > [4] > http://logs.openstack.org/10/664610/2/check/watcher-grenade/ad2e068/logs/grenade.sh.txt.gz#_2019-06-12_19_19_36_874 > [5] > http://logs.openstack.org/47/662647/6/check/designate-grenade-pdns4/0b8968f/logs/grenade.sh.txt.gz#_2019-06-09_23_10_03_034 > -- > > Thanks, > > Matt From mriedemos at gmail.com Thu Jun 13 17:37:40 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:37:40 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> On 6/13/2019 4:04 AM, Chris Dent wrote: > If/when we're modelling shared disk as a shared resource provider > does this get any more complicated? Does the part of an allocation > that is DISK_GB need special handling. Nova doesn't create nor manage shared resource providers today, so deleting the compute service and its related compute node(s) and their related resource provider(s) shouldn't have anything to do with a shared resource provider. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 13 17:40:18 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:40:18 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Message-ID: <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> On 6/12/2019 5:50 PM, Thomas Goirand wrote: >> 1. Don't delete the compute service if we can't cleanup all resource >> providers - make sure to not orphan any providers. Manual cleanup may be >> necessary by the operator. > I'd say that this option is ok-ish*IF* the operators are given good > enough directives saying what to do. It would really suck if we just get > an error, and don't know what resource cleanup is needed. But if the > error is: > > Cannot delete nova-compute on host mycloud-compute-5. > Instances still running: > 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 > f8483448-39b5-4981-a731-5f4eeb28592c > Currently live-migrating: > 49a12659-9dc6-4b07-b38b-e0bf2a69820a > Not confirmed migration/resize: > cc3d4311-e252-4922-bf04-dedc31b3a425 I don't think we'll realistically generate a report like this for an error response in the API. While we could figure this out, for the baremetal case we could have hundreds of instances still managed by that compute service host which is a lot of data to generate for an error response. I guess it could be a warning dumped into the API logs but it could still be a lot of data to crunch and log. -- Thanks, Matt From mriedemos at gmail.com Thu Jun 13 17:44:52 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:44:52 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> Message-ID: On 6/12/2019 5:50 PM, Thomas Goirand wrote: >> 3. Other things I'm not thinking of? Should we add a force parameter to >> the API to allow the operator to forcefully delete (#2 above) if #1 >> fails? Force parameters are hacky and usually seem to cause more >> problems than they solve, but it does put the control in the operators >> hands. > Let's say the --force is just doing the resize --confirm for the > operator, or do an evacuate, then that's fine (and in fact, a good idea, > automations are great...). If it's going to create a mess in the DB, > then it's IMO a terrible idea. I really don't think we're going to change the delete compute service API into an orchestrator that auto-confirms/evacuates the node(s) for you. This is something an external agent / script / service could determine, perform whatever actions, and retry, based on existing APIs (like the migrations API). The one catch is the evacuated instance allocations - there is not much you can do about those from the compute API, you would have to cleanup the allocations for those via the placement API directly. > > However, I see a case that may happen: image a compute node is > completely broken (think: broken motherboard...), then probably we do > want to remove everything that's in there, and want to handle the case > where nova-compute doesn't even respond. This very much is a real life > scenario. If your --force is to address this case, then why not! Though > again and of course, we don't want a mess in the db... :P Well, that's where a force parameter would be available to the admin to decide what they want to happen depending on the situation rather than just have nova guess and hope it's what you wanted. We could check if the service is "up" using the service group API and make some determinations that way, i.e. if there are still allocations on the thing and it's down, assume you're deleting it because it's dead and you want it gone so we just cleanup the allocations for you. -- Thanks, Matt From openstack at fried.cc Thu Jun 13 17:45:24 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 12:45:24 -0500 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> Message-ID: <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> We discussed this today in the nova meeting [1] with a little bit of followup in the main channel after the meeting closed [2]. There seems to be general support (or at least not objection) for implementing "resize" for ironic, limited to: - same host [3] - just this feature (i.e. "hyperthreading") or possibly "anything deploy template" And the consensus was that it's time to put this into a spec. There was a rocky spec [4] that has some overlap and could be repurposed; or a new one could be introduced. efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.html#l-309 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-13.log.html#t2019-06-13T15:02:10 (interleaved) [3] an acknowledged wrinkle here was that we need to be able to detect at the API level that we're dealing with an Ironic instance, and ignore the allow_resize_to_same_host option (because always forcing same host) [4] https://review.opendev.org/#/c/449155/ From mriedemos at gmail.com Thu Jun 13 17:47:58 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 12:47:58 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On 6/12/2019 6:26 PM, Mohammed Naser wrote: > I would be more in favor of failing a delete if it will cause the cloud to reach > an inconsistent state than brute-force a delete leaving you in a messy state > where you need to login to the database to unkludge things. I'm not sure that the cascading delete (option #2) case would leave things in a messy state since we'd delete the stuff that we're actually orphaning today. If we don't cascade delete for you and just let the request fail if there are still allocations (option #1), then like I said in a reply to zigo, there are APIs available to figure out what's still being used on the host and then clean those up - but that's the manual part I'm talking about since nova wouldn't be doing it for you. -- Thanks, Matt From cdent+os at anticdent.org Thu Jun 13 17:49:14 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Thu, 13 Jun 2019 18:49:14 +0100 (BST) Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <8cd10d55-9656-7b50-cf92-1116eed755bc@gmail.com> Message-ID: On Thu, 13 Jun 2019, Matt Riedemann wrote: > On 6/13/2019 4:04 AM, Chris Dent wrote: >> If/when we're modelling shared disk as a shared resource provider >> does this get any more complicated? Does the part of an allocation >> that is DISK_GB need special handling. > > Nova doesn't create nor manage shared resource providers today, so deleting > the compute service and its related compute node(s) and their related > resource provider(s) shouldn't have anything to do with a shared resource > provider. Yeah, "today". That's why I said "If/when". If we do start doing that, does that make things more complicated in a way we may wish to think about _now_ while we're designing today's solution? I'd like to think that we can just ignore it for now and adapt as things change in the future, but we're all familiar with the way that everything is way more connected and twisted up in a scary hairy ball in nova than we'd all like. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent From mriedemos at gmail.com Thu Jun 13 18:00:39 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 13:00:39 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <98a6fb1f-7fd2-20da-4a5d-53821422b015@fried.cc> Message-ID: <00d1892c-809e-2faf-e58b-5f95f510504f@gmail.com> On 6/12/2019 7:05 PM, Sean Mooney wrote: >> If we can distinguish between the migratey ones and the evacuatey ones, >> maybe we fail on the former (forcing them to wait for completion) and >> automatically delete the latter (which is almost always okay for the >> reasons you state; and recoverable via heal if it's not okay for some >> reason). > for a cold migration the allcoation will be associated with a migration object > for evacuate which is basically a rebuild to a different host we do not have a > migration object so the consumer uuid for the allcotion are still associated with > the instace uuid not a migration uuid. so technically we can tell yes > but only if we pull back the allcoation form placmenet and then iterate over > them and check if we have a migration object or an instance that has the same > uuid. Evacuate operations do have a migration record but you're right that we don't move the source node allocations from the instance to the migration prior to scheduling (like we do for cold and live migration). So after the evacuation, the instance consumer has allocations on both the source and dest node. If we did what Eric is suggesting, which is kind of a mix between option 1 and option 2, then I'd do the same query as we have on restart of the compute service [1] to find migration records for evacuations concerning the host we're being asked to delete within a certain status and clean those up, then (re?)try the resource provider delete - and if that fails, then we punt and fail the request to delete the compute service because we couldn't safely delete the resource provider (and we don't want to orphan it for the reasons mnaser pointed out). [1] https://github.com/openstack/nova/blob/61558f274842b149044a14bbe7537b9f278035fd/nova/compute/manager.py#L651 -- Thanks, Matt From openstack at fried.cc Thu Jun 13 18:36:04 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 13:36:04 -0500 Subject: [nova] Strict isolation of group of hosts for image and flavor, modifying command 'nova-manage placement sync_aggregates' In-Reply-To: <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> References: <74c9da14-a0db-881c-4971-415d92d6957d@fried.cc> <29596396ea723a24cefe9635ff78d958f36be6b9.camel@redhat.com> <087b0d2a-4cd5-394f-2b2e-83335bdacf23@fried.cc> <9876798ff1a3700fcc0f7202bbfe12da1314a23e.camel@redhat.com> <91efe32e80b7c24b0bfe5875ecd053513b7fd443.camel@redhat.com> Message-ID: <6c41c425-71b5-5095-7acc-2198f7ad1d92@fried.cc> We discussed this in the nova meeting today [1] with a little spillover in the -nova channel afterward [2]. The consensus was: Don't muck with resource provider traits at all during aggregate operations. The operator must do that bit manually. As a stretch goal, we can write a simple utility to help with this. This was discussed as option (e) earlier in this thread. The spec needs to get updated accordingly. Thanks, efried [1] http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13-14.00.log.html#l-267 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-13.log.html#t2019-06-13T15:02:06-2 (interleaved) From mriedemos at gmail.com Thu Jun 13 18:45:31 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 13:45:31 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> Message-ID: On 6/12/2019 3:38 PM, Matt Riedemann wrote: > What are our options? > > 1. Don't delete the compute service if we can't cleanup all resource > providers - make sure to not orphan any providers. Manual cleanup may be > necessary by the operator. > > 2. Change delete_resource_provider cascade=True logic to remove all > allocations for the provider before deleting it, i.e. for > not-yet-complete migrations and evacuated instances. For the evacuated > instance allocations this is likely OK since restarting the source > compute service is going to do that cleanup anyway. Also, if you delete > the source compute service during a migration, confirming or reverting > the resize later will likely fail since we'd be casting to something > that is gone (and we'd orphan those allocations). Maybe we need a > functional recreate test for the unconfirmed migration scenario before > deciding on this? > > 3. Other things I'm not thinking of? Should we add a force parameter to > the API to allow the operator to forcefully delete (#2 above) if #1 > fails? Force parameters are hacky and usually seem to cause more > problems than they solve, but it does put the control in the operators > hands. > > If we did remove allocations for an instance when deleting it's compute > service host, the operator should be able to get them back by running > the "nova-manage placement heal_allocations" CLI - assuming they restart > the compute service on that host. This would have to be tested of course. After talking a bit about this in IRC today, I'm thinking about a phased approach to this problem with these changes in order: 1. Land [1] so we're at least trying to cleanup all providers for a given compute service (the ironic case). 2. Implement option #1 above where we fail to delete the compute service if any of the resource providers cannot be deleted. We'd have stuff in the logs about completing migrations and trying again, and failing that cleanup allocations for old evacuations. Rather than dump all of that info into the logs, it would probably be better to just write up a troubleshooting doc [2] for it and link to that from the logs, then the doc can reference APIs and CLIs to use for the cleanup scenarios. 3. Implement option #2 above where we cleanup allocations but only for evacuations - like the compute service would do when it's restarted anyway. This would leave the case that we don't delete the compute service for allocations related to other types of migrations - in-progress or unconfirmed (or failed and leaked) migrations that would require operator investigation. We could build on that in the future if we wanted to toy with the idea of checking the service group API for whether or not the service is up or if we wanted to add a force option to just tell nova to fully cascade delete everything, but I don't really want to get hung up on those edge cases right now. How do people feel about this plan? [1] https://review.opendev.org/#/c/657016/ [2] https://docs.openstack.org/nova/latest/admin/support-compute.html -- Thanks, Matt From openstack at fried.cc Thu Jun 13 19:21:58 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 13 Jun 2019 14:21:58 -0500 Subject: [nova] Help wanted with bug triage! Message-ID: <5c38071e-a19e-123a-1c35-01ad9baeeed9@fried.cc> Folks- Nova's queue of untriaged bugs [1] has been creeping slowly upward lately. We could really use some focused effort to get this back under control. We're not even (necessarily) talking about *fixing* bugs - though that's great too. We're talking about triaging [2]. If every nova contributor (you don't need to be a core) triaged just one bug a day, it wouldn't take long for things to be back in manageable territory. Thanks in advance. efried [1] https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New [2] https://wiki.openstack.org/wiki/Nova/BugTriage From zigo at debian.org Thu Jun 13 21:03:58 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 13 Jun 2019 23:03:58 +0200 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> Message-ID: On 6/13/19 7:40 PM, Matt Riedemann wrote: > On 6/12/2019 5:50 PM, Thomas Goirand wrote: >>> 1. Don't delete the compute service if we can't cleanup all resource >>> providers - make sure to not orphan any providers. Manual cleanup may be >>> necessary by the operator. >> I'd say that this option is ok-ish*IF*  the operators are given good >> enough directives saying what to do. It would really suck if we just get >> an error, and don't know what resource cleanup is needed. But if the >> error is: >> >> Cannot delete nova-compute on host mycloud-compute-5. >> Instances still running: >> 623051e7-4e0d-4b06-b977-1d9a73e6e6e1 >> f8483448-39b5-4981-a731-5f4eeb28592c >> Currently live-migrating: >> 49a12659-9dc6-4b07-b38b-e0bf2a69820a >> Not confirmed migration/resize: >> cc3d4311-e252-4922-bf04-dedc31b3a425 > > I don't think we'll realistically generate a report like this for an > error response in the API. While we could figure this out, for the > baremetal case we could have hundreds of instances still managed by that > compute service host which is a lot of data to generate for an error > response. > > I guess it could be a warning dumped into the API logs but it could > still be a lot of data to crunch and log. In such case, in the error message, just suggest what to do to fix the issue. I once worked in a company that made me change every error message so that each of them contained hints on what to do to fix the problem. Since, I often suggest it. Cheers, Thomas Goirand (zigo) From mriedemos at gmail.com Thu Jun 13 21:18:01 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 13 Jun 2019 16:18:01 -0500 Subject: [nova][ops] What should the compute service delete behavior be wrt resource providers with allocations? In-Reply-To: References: <48dfdbec-d662-a184-3c63-ec8f284f1702@gmail.com> <1cdb1bf8-2fea-79e2-4eb9-041eae4c6be7@debian.org> <2288a623-358e-9c3e-92d6-461d4b43b4af@gmail.com> Message-ID: <2e846728-8e46-4382-2d1a-55f7a6324a33@gmail.com> On 6/13/2019 4:03 PM, Thomas Goirand wrote: > I once worked in a company that made me change every error message so > that each of them contained hints on what to do to fix the problem. > Since, I often suggest it. Heh, same and while it was grueling for the developers it left an impression on me and I tend to try and nack people's changes for crappy error messages as a result. -- Thanks, Matt From anlin.kong at gmail.com Thu Jun 13 22:55:45 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 14 Jun 2019 10:55:45 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> Message-ID: On Thu, Jun 13, 2019 at 10:48 PM Sean Mooney wrote: > On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: > > Yeah, the api allows to specify port. What i mean is, the vm creation > will > > fail for admin user if port belongs to a non-admin user. An exception is > > raised from nova-compute. > > i believe this is intentional. > > we do not currently allow you to trasfer ownerwhip of a vm form one user > or proejct to another. > but i also believe we currently do not allow a vm to be create from > resouces with different owners > That's not true. As the admin user, you are allowed to create a vm using non-admin's network, security group, image, volume, etc but just not port. There is use case for admin user to create vms but using non-admin's resources for debugging or other purposes. What's more, the exception is raised in nova-compute not nova-api, which i assume it should be supported if it's allowed in the api layer. Best regards, Lingxian Kong Catalyst Cloud -------------- next part -------------- An HTML attachment was scrubbed... URL: From anlin.kong at gmail.com Thu Jun 13 22:57:10 2019 From: anlin.kong at gmail.com (Lingxian Kong) Date: Fri, 14 Jun 2019 10:57:10 +1200 Subject: [nova] Admin user cannot create vm with user's port? In-Reply-To: References: <16b4f310b40.b15ee317318066.1276719462324180582@ghanshyammann.com> <0956984a0d34141997110fa28091cdd37ac24c50.camel@redhat.com> Message-ID: Another use case is coming from the services (e.g. Trove) which will create vms in the service tenant but using the resources (e.g. network or port) given by the non-admin user. Best regards, Lingxian Kong Catalyst Cloud On Fri, Jun 14, 2019 at 10:55 AM Lingxian Kong wrote: > On Thu, Jun 13, 2019 at 10:48 PM Sean Mooney wrote: > >> On Thu, 2019-06-13 at 21:22 +1200, Lingxian Kong wrote: >> > Yeah, the api allows to specify port. What i mean is, the vm creation >> will >> > fail for admin user if port belongs to a non-admin user. An exception is >> > raised from nova-compute. >> >> i believe this is intentional. >> >> we do not currently allow you to trasfer ownerwhip of a vm form one user >> or proejct to another. >> but i also believe we currently do not allow a vm to be create from >> resouces with different owners >> > > That's not true. As the admin user, you are allowed to create a vm using > non-admin's network, security group, image, volume, etc but just not port. > > There is use case for admin user to create vms but using non-admin's > resources for debugging or other purposes. > > What's more, the exception is raised in nova-compute not nova-api, which i > assume it should be supported if it's allowed in the api layer. > > Best regards, > Lingxian Kong > Catalyst Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrist at redhat.com Thu Jun 13 23:11:08 2019 From: jrist at redhat.com (Jason Rist) Date: Thu, 13 Jun 2019 17:11:08 -0600 Subject: Retiring TripleO-UI - no longer supported In-Reply-To: <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> References: <3924F5DE-314C-4D41-8CEA-DCF7A2A2CDEA@redhat.com> <0583152c-5a85-a34d-577e-e7789cac344b@suse.com> Message-ID: <9C3778CC-9936-4735-9E61-88F5720CC61A@redhat.com> Thanks for pointing this out. These are now up. Jason Rist Red Hat  jrist / knowncitizen > On Jun 6, 2019, at 11:24 PM, Andreas Jaeger wrote: > > On 07/06/2019 06.34, Jason Rist wrote: >> Follow-up - this work is now done. >> >> https://review.opendev.org/#/q/topic:retire_tripleo_ui+(status:open+OR+status:merged) >> > > Not yet for ansible-role-tripleo-ui - please remove the repo from > project-config and governance repo, step 4 and 5 of > https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project > are missing. > > Andreas > -- > Andreas Jaeger aj at suse.com Twitter: jaegerandi > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg) > GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 -------------- next part -------------- An HTML attachment was scrubbed... URL: From frode.nordahl at canonical.com Fri Jun 14 08:05:48 2019 From: frode.nordahl at canonical.com (Frode Nordahl) Date: Fri, 14 Jun 2019 10:05:48 +0200 Subject: [charms] Proposing Sahid Orentino Ferdjaoui to the Charms core team In-Reply-To: References: <17abd9ed-e76d-52b3-29b1-6d6ae75161bf@canonical.com> Message-ID: +1 On Tue, May 28, 2019 at 10:37 PM Corey Bryant wrote: > On Fri, May 24, 2019 at 6:35 AM Chris MacNaughton < > chris.macnaughton at canonical.com> wrote: > >> Hello all, >> >> I would like to propose Sahid Orentino Ferdjaoui as a member of the >> Charms core team. >> > > +1 Sahid is a solid contributor and I'm confident he'll use caution, ask > questions, and pull the right people in if needed. > > Corey > >> Chris MacNaughton >> > -- Frode Nordahl Senior Engineer Canonical -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Fri Jun 14 08:35:21 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Fri, 14 Jun 2019 08:35:21 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> References: , <3b7955951183eff75b643b0cfc91c88bbf737e62.camel@redhat.com> Message-ID: >> cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. >> i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. >> so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is >> ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs Thinking of backward compatibility, I agree both of these configuration options ``cpu_shared_set``, ``vcpu_pinned_set`` should be allowed in Train release as well. Few possible combinations in train: A) What if only ``cpu_shared_set`` is set on a new compute node? Report VCPU inventory. B) what if ``cpu_shared_set`` and ``cpu_dedicated_set`` are set on a new compute node? Report VCPU and PCPU inventory. In fact, we want to support both these options so that instance can request both VCPU and PCPU at the same time. If flavor requests VCPU or hw:emulator_thread_policy=share, in both the cases, it will float on CPUs set in ``cpu_shared_set`` config option. C) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a new compute node? Ignore cpu_shared_set and report vcpu_pinned_set as VCPU or PCPU? D) What if ``cpu_shared_set`` and ``vcpu_pin_set`` are set on a upgraded compute node? As you have mentioned, ignore cpu_shared_set and report vcpu_pinned_set as PCPUs provided ``NumaTopology`` ,``pinned_cpus`` attribute is not empty otherwise VCPU. >> we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. Here we are checking Host ``NumaTopology`` ,``pinned_cpus`` attribute and not directly instances ( if that attribute is not empty that means some instance are running) and this logic will be needed to address above #D case. Regards, -Bhagyashri Shewale- ________________________________ From: Sean Mooney Sent: Thursday, June 13, 2019 8:21:09 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com Subject: Re: [nova] Spec: Standardize CPU resource tracking On Thu, 2019-06-13 at 04:42 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > After revisiting the spec [1] again and again, I got to know few points please check and let me know about my > understanding: > > > Understanding: If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this node > is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define > ``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory. that is incorrect if the vcpu_pin_set is defiend it may be used for instance with hw:cpu_policy=dedicated or not. in train if vcpu_pin_set is defiend and cpu_dedicated_set is not defiend then we use vcpu_pin_set to define the inventory of both PCPUs and VCPUs > > > Considering multiple combinations of various configuration options, I think we will need to implement below business > rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved. > > > Rule 1: > > If operator sets ``[compute] cpu_shared_set`` in Train. > > 1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous > release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU > inventory. cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect. i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated. so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs > > > Rule 2: > > If operator sets ``[compute] cpu_dedicated_set`` in Train. > > 1. Report inventory as PCPU yes if cpu_dedicated_set is set we will report its value as PCPUs > > 2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this > compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is > used as shared compute node in previous release. this was not part of the spec. we could do this but i think its not needed and operators should check this themselves. if we decide to do this check on startup it should only happen if vcpu_pin_set is defined. addtionally we can log an error but we should not prevent the compute node form working and contuing to spawn vms. > > > Rule 3: > > If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``, > ``vcpu_pin_set``) in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, then raise an error > that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``, > otherwise report inventory as VCPU. again this is not in the spec and i dont think we should do this. if none of the values are set we should report all cpus as both VCPUs and PCPUs the vcpu_pin_set option was never intended to signal a host was used for cpu pinning it was intoduced for cpu pinning and numa affinity but it was orignally ment to apply to floaing instance and currently contople the number of VCPU reported to the resouce tracker which is used to set the capastiy of the VCPU inventory. you should read https://that.guru/blog/cpu-resources/ for a walk through of this. > > 2. If no instances, report inventory as VCPU. we could do this but i think it will be confusing as to what will happen after we spawn an instnace on the host in train. i dont think this logic should be condtional on the presence of vms. > > > Rule 4: > > If operator sets ``vcpu_pin_set`` config option in Train. > > 1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute > node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU > inventory. agin this is not in the spec. what the spec says for if vcpu_pin_set is defiend is we will report inventorys of both VCPU and PCPUs for all cpus in the vcpu_pin_set > > 2. If no instances, report inventory as PCPU. again this should not be condtional on the presence of vms. > > > Rule 5: > > If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options > in Train > > 1. Simply raise an error this is the only case were we "rasise" and error and refuse to start the compute node. > > > Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my > previous email. we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances. > > > As of today, in either case, `vcpu_pin_set`` is set or not set on the compute node, it can used for both pinned or > non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per > business rule #3 , if ``vcpu_pin_set`` is not set, we are considering it to be used for non-pinned instances > only. Do you think this could cause an issue in providing backward compatibility? yes the rule you have listed above will cause issue for upgrades and we rejected similar rules in the spec. i have not read your previous email which ill look at next but we spent a long time debating how this should work in the spec design and i would prefer to stick to what the spec currently states. > > > Please provide your suggestions on the above business rules. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 409 > > > > Thanks and Regards, > > -Bhagyashri Shewale- > > ________________________________ > From: Shewale, Bhagyashri > Sent: Wednesday, June 12, 2019 6:10:04 PM > To: openstack-discuss at lists.openstack.org; openstack at fried.cc; smooney at redhat.com; sfinucan at redhat.com; > jaypipes at gmail.com > Subject: [nova] Spec: Standardize CPU resource tracking > > > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` which ultimately will > return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute > B. > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bhagyashri.Shewale at nttdata.com Fri Jun 14 08:37:58 2019 From: Bhagyashri.Shewale at nttdata.com (Shewale, Bhagyashri) Date: Fri, 14 Jun 2019 08:37:58 +0000 Subject: [nova] Spec: Standardize CPU resource tracking In-Reply-To: <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> References: , <57be9f4f426d3a8ac07c79e3e0d1f4737b09b6af.camel@redhat.com> Message-ID: >> that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and >> an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming >> $ <=4 Means if ``vcpu_pin_set`` is set in previous release then report both VCPU and PCPU as inventory (in Train) but this seems contradictory for example: On Stein, Configuration on compute node A: vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement database) On Train: vcpu_pin_set=0-3 The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db Now say user wants to create instances as below: 1. Flavor having extra specs (resources:PCPU=1), instance A 2. Flavor having extra specs (resources:VCPU=1), instance B For both instance requests, placement will return compute Node A. Instance A: will be pinned to say 0 CPU Instance B: will float on 0-3 To resolve above issue, I think it’s possible to detect whether the compute node was configured to be used for pinned instances if ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case, vcpu_pin_set will be reported as PCPU otherwise VCPU. Regards, -Bhagyashri Shewale- ________________________________ From: Sean Mooney Sent: Thursday, June 13, 2019 8:32:02 PM To: Shewale, Bhagyashri; openstack-discuss at lists.openstack.org; openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com Subject: Re: [nova] Spec: Standardize CPU resource tracking On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote: > Hi All, > > > Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1]. > > > While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue: > > > Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and > request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances. > > > So when user will create a new instance or execute instance actions like shelve, unshelve, resize, evacuate and > migration post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in > request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:- > > > For example: > > > I have two compute nodes say A and B: > > > On Stein: > > > Compute node A configurations: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) vcpu_pin_set does not mean that the host was used for pinned instances https://that.guru/blog/cpu-resources/ > > > Compute node B Configuration: > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata) > > > On Train, two possible scenarios: > > Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train) > > vcpu_pin_set=0-3 (Keep same settings as in Stein) > > > Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train) > > cpu_dedicated_set=0-3 (change to the new config option) > > 1. Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above > configuration. > 2. Now when user will perform instance action say shelve/unshelve scheduler pre-filter will change the > request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$`` it wont remove hw:cpu_policy it will just change the resouces=VCPU:$ -> resources=PCPU:$ > which ultimately will return only compute node B from placement service. that is incorrect both a and by will be returned. the spec states that for host A we report an inventory of 4 VCPUs and an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so both host will be returned assuming $ <=4 > Here, we expect it should have retuned both Compute A and Compute B. it will > 3. If user creates a new instance using old extra specs (hw:cpu_policy=dedicated, > "aggregate_instance_extra_specs:pinned": "true") on Train release with the above configuration then it will return > only compute node B from placement service where as it should have returned both compute Node A and B. that is what would have happend in the stien version of the spec and we changed the spec specifically to ensure that that wont happen. in the train version of the spec you will get both host as candates to prevent this upgrade impact. > > Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as > Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic. > > > Propose changes: > > > Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata > properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host > which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can > change instance and request_spec flavor extra specs. > > 1. Remove cpu_policy from extra specs > 2. Add “resources:PCPU=” in extra specs > > > We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the > reshape functionality. > > > Please give us your feedback on the proposed solution so that we can update specs accordingly. i am fairly stongly opposed to useing an online data migration to modify the request spec to reflect the host they landed on. this speficic problem is why the spec was changed in the train cycle to report dual inventoryis of VCPU and PCPU if vcpu_pin_set is the only option set or of no options are set. > > > [1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst at 451 > > [2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst > > > Thanks and Regards, > > -Bhagyashri Shewale- > > Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may > contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise > the sender by replying promptly to this email and then delete and destroy this email and any attachments without any > further use, copying or forwarding. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From madhuri.kumari at intel.com Fri Jun 14 10:16:59 2019 From: madhuri.kumari at intel.com (Kumari, Madhuri) Date: Fri, 14 Jun 2019 10:16:59 +0000 Subject: [Nova][Ironic] Reset Configurations in Baremetals Post Provisioning In-Reply-To: <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> References: <0512CBBECA36994BAA14C7FEDE986CA60FC00ADA@BGSMSX101.gar.corp.intel.com> <0512CBBECA36994BAA14C7FEDE986CA60FC156C3@BGSMSX102.gar.corp.intel.com> <44cf555e-0f1d-233f-04d5-b2c131b136d7@fried.cc> <0512CBBECA36994BAA14C7FEDE986CA60FC2EE6D@BGSMSX102.gar.corp.intel.com> <8e240b9c-c5f0-3d1f-5b16-c543bbbc525b@fried.cc> Message-ID: <0512CBBECA36994BAA14C7FEDE986CA60FC3299E@BGSMSX102.gar.corp.intel.com> Hi Eric, Thank you for following up and the notes. The spec[4] is related but a complex one too with all the migration implementation. So I will try to put a new spec with a limited implementation of resize. Regards, Madhuri >>-----Original Message----- >>From: Eric Fried [mailto:openstack at fried.cc] >>Sent: Thursday, June 13, 2019 11:15 PM >>To: openstack-discuss at lists.openstack.org >>Subject: Re: [Nova][Ironic] Reset Configurations in Baremetals Post >>Provisioning >> >>We discussed this today in the nova meeting [1] with a little bit of followup >>in the main channel after the meeting closed [2]. >> >>There seems to be general support (or at least not objection) for >>implementing "resize" for ironic, limited to: >> >>- same host [3] >>- just this feature (i.e. "hyperthreading") or possibly "anything deploy >>template" >> >>And the consensus was that it's time to put this into a spec. >> >>There was a rocky spec [4] that has some overlap and could be repurposed; >>or a new one could be introduced. >> >>efried >> >>[1] >>http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-06-13- >>14.00.log.html#l-309 >>[2] >>http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- >>nova.2019-06-13.log.html#t2019-06-13T15:02:10 >>(interleaved) >>[3] an acknowledged wrinkle here was that we need to be able to detect at >>the API level that we're dealing with an Ironic instance, and ignore the >>allow_resize_to_same_host option (because always forcing same host) [4] >>https://review.opendev.org/#/c/449155/ From mdulko at redhat.com Fri Jun 14 10:44:20 2019 From: mdulko at redhat.com (=?UTF-8?Q?Micha=C5=82?= Dulko) Date: Fri, 14 Jun 2019 12:44:20 +0200 Subject: [requirements][kuryr][flame] openshift dificulties In-Reply-To: <20190606141747.gxoyrcels266rcgv@mthode.org> References: <20190529205352.f2dxzckgvfavbvtv@mthode.org> <20190530151739.nfzrqfstlb2sbrq5@mthode.org> <20190605165807.jmhogmfyrxltx5b3@mthode.org> <06337c09a594e16e40086b6c64495a59c3e6cd84.camel@redhat.com> <20190606141747.gxoyrcels266rcgv@mthode.org> Message-ID: On Thu, 2019-06-06 at 09:17 -0500, Matthew Thode wrote: > On 19-06-06 09:13:46, Michał Dulko wrote: > > On Wed, 2019-06-05 at 11:58 -0500, Matthew Thode wrote: > > > On 19-05-30 10:17:39, Matthew Thode wrote: > > > > On 19-05-30 17:07:54, Michał Dulko wrote: > > > > > On Wed, 2019-05-29 at 15:53 -0500, Matthew Thode wrote: > > > > > > Openshift upstream is giving us difficulty as they are capping the > > > > > > version of urllib3 and kubernetes we are using. > > > > > > > > > > > > -urllib3===1.25.3 > > > > > > +urllib3===1.24.3 > > > > > > -kubernetes===9.0.0 > > > > > > +kubernetes===8.0.1 > > > > > > > > > > > > I've opened an issue with them but not had much luck there (and their > > > > > > prefered solution just pushes the can down the road). > > > > > > > > > > > > https://github.com/openshift/openshift-restclient-python/issues/289 > > > > > > > > > > > > What I'd us to do is move off of openshift as our usage doesn't seem too > > > > > > much. > > > > > > > > > > > > openstack/kuryr-tempest-plugin uses it for one import (and just one > > > > > > function with that import). I'm not sure exactly what you are doing > > > > > > with it but would it be too much to ask to move to something else? > > > > > > > > > > From Kuryr side it's not really much effort, we can switch to bare REST > > > > > calls, but obviously we prefer the client. If there's much support for > > > > > getting rid of it, we can do the switch. > > > > > > > > > > > > > Right now Kyryr is only using it in that one place and it's blocking the > > > > update of urllib3 and kubernetes for the rest of openstack. So if it's > > > > not too much trouble it'd be nice to have happen. > > > > > > > > > > x/flame has it in it's constraints but I don't see any actual usage, so > > > > > >